Why Data Scientists Are Never Happy

How I find projects that are right for me

Johannes Petrat
Towards Data Science

--

In pursuit of the sexiest job of the 21st century, data scientists often start jobs with unrealistic expectations or end up on projects that are misaligned with their career aspirations. I have certainly fallen into the same trap a few times. Maybe you have experienced something similar, too.

Towards the beginning of my career I often got frustrated with projects because I wanted to do it all — deep scientific research on the one side and customer-focussed work on the other. Having worked in a range of different companies I realised that DS projects can have very different characteristics based on the company and type of problem they’re addressing.

During this time I eventually concluded that I can categorise and assess the type of DS projects using three categories which I call the “three Cs”:

  • Scientific challenge: there is a need for cutting-edge machine learning (ML) techniques
  • Creative problem-solving: there is a big open-ended problem with no obvious solution
  • Customer impact: there is a clear use case and production infrastructure. Delivering an analysis or model directly translates into value for customers
Where do your past projects fit in? (Image by Author)

In reality, you only get to choose two Cs for a data science project!

Let’s take a closer look at what the three Cs actually mean.

Scientific challenge

This project characteristic is useful if you want to specialise in a technique or application (for example, NLP or computer vision). I always find these projects rewarding because they push my theoretical understanding of ML and it’s easy for me to track my learning progress.

The typical activities you would find on a project like this are:

  • Reviewing academic literature to identify the state of the art approach
  • Implementing a model from a paper and adapting it to your problem
  • Experimenting with new software libraries and open-source implementations of models

Creative problem-solving

You often find this characteristic in consulting-style projects that require quick and creative ways of solving a real-world problem using ML. Key challenges are accustoming oneself to a new environment and getting a ready-to-go ML solution out the door within fixed time and budget. These projects can be exciting because they often allow for rapid iteration on the solution.

The typical activities you would find on a project like this are:

  • Engaging with clients on a requirement to understand the business problem
  • Exploring the provided data and the in-place infrastructure to assess the feasibility of integrating a ML solution.
  • Building a “80/20” proof-of-concept to test the use case for ML

Customer impact

Projects with a customer impact have a clearly defined value proposition and often add ML features to existing products. As such model improvements can be deployed directly into the hands of users. This makes these projects the place to be for ML engineers and anyone wanting to improve their engineering skills. I get a lot of satisfaction from positive feedback from users (“this is saving me days of work every week”) and I like to measure my impact by the value I create for users of my models.

The typical activities you would find on a project like this are:

  • Working closely with software engineers to productionise a model
  • Doing A/B testing to measure model improvements
  • Extracting data from existing users for training models

Choose two Cs

When making career choices as a data scientist think of the three Cs and which Cs you are looking for. Don’t expect to get all three on one project. Based on my experience, good data science projects tend to fall into the intersections:

Look for projects in the intersections (Image by Author)

“Deep tech”

You often find these projects in tech startups and R&D divisions of larger companies. New technologies and modelling techniques are applied to open-ended problems with the potential to disrupt whole industries. Data scientists often need to develop state of the art models to gain a competitive advantage. However, actually putting the models to use can require big business development and engineering efforts.

“Embedding ML”

Many traditional companies are looking to embed ML into their processes to support their core business. This creates lots of opportunities and projects for data scientists to find the low-hanging fruit of ML applications. Using existing IT infrastructure you can ship simple models with a massive impact.

There often is more business value in moving on to a different project developing another simple model than to further improve the accuracy of an existing one. This makes it difficult for a data scientist to gain practical experience with more cutting-edge techniques and models.

“Massive-scale ML”

These projects can be found at companies that have tech at the core of their products. They employ big teams of data scientists and engineers that build new features that can be scaled to millions of users. In these environments, data scientists can specialise in a small niche where model improvements can translate to massive savings and profits. However, with increased specialisation there can be many layers of product/project managers between you and the customer.

So what?

If you are frustrated with your current role/project ask yourself these questions:

  • Which Cs do you want your projects to involve?
  • Which characteristic are you missing?
  • Are there other projects at your company that would offer it?
  • When considering a new job: does the company offer the types of projects you’re looking for?

--

--

Currently working as an Applied Scientist in the space of simulation and AI; based in London