Hiring the wrong data role first can cost you six months and $150,000 or more. That's not a worst case. It's what we see regularly when companies bring in a data scientist before they have a single reliable data pipeline in place. The data scientist spends her first quarter manually joining CSVs and debugging ETL scripts that were never designed for scale, and the actual analysis work gets pushed back indefinitely.
We've helped startups, scale-ups, and enterprise teams diagnose exactly this problem and rebuild their data hiring strategy from scratch.
This article gives you a straight answer: which role to hire first, why the order matters more than the titles, and how to know when you're ready for each one. No fluff. Based on what we've seen across dozens of real implementations.
| Dimension | Data Scientist | Data Engineer | ML Engineer |
|---|---|---|---|
| Core output | Insights, models, analysis | Pipelines, warehouses, data infrastructure | Production ML systems |
| Primary tools | Python, R, SQL, Jupyter, Tableau | Spark, Airflow, dbt, Kafka, Snowflake | MLflow, Kubeflow, Docker, TensorFlow Serving |
| When you need them | After clean data exists | Before any other data hire | After a model exists in a notebook |
| Depends on | Clean, reliable data from an engineer | Source systems and cloud infra | A trained model from a data scientist |
| Hire first if… | You have clean data and a business question | You have raw data and no pipelines | You have a notebook model going to production |
| Typical salary (UK, 2026) | £65,000 to £95,000 | £60,000 to £90,000 | £75,000 to £110,000 |
For most companies, a data engineer comes first. A data scientist comes second. An ML engineer comes third. The exceptions exist, but they're rarer than most hiring managers think.
What Each Role Actually Does
The job titles in data are genuinely confusing. They overlap in places, and different companies use them differently. Here's how we define them based on what each role produces day to day.
1. Data engineer (the plumber)
A data engineer builds and maintains the systems that move, store, and clean data. Think of them as the plumber of the data team. They write ETL pipelines (extract, transform, load) that pull data from source systems like your CRM, your product database, and your payment platform, and land it somewhere clean and queryable, usually a cloud data warehouse like Snowflake, BigQuery, or Redshift.
Their daily tools include Apache Airflow for scheduling pipelines, dbt for transforming raw data into usable tables, and Spark for processing large volumes. They also own data quality, data freshness, and the underlying infrastructure that keeps everything running.
For a data scientist or analyst, the data engineer is the person who makes their work possible. Without reliable pipelines, there's no clean data. Without clean data, every analysis is guesswork.
2. Data scientist (the analyst)
A data scientist takes clean, structured data and turns it into answers. They run statistical analyses, build predictive models, and translate business questions into hypotheses that can be tested against data.
A strong data scientist is fluent in Python and SQL, comfortable with machine learning libraries like scikit-learn and XGBoost, and able to communicate findings to non-technical stakeholders. They're the ones who can tell you which customers are about to churn, why a product feature isn't converting, or how much lifetime value differs across acquisition channels.
But here's the catch: a data scientist's output is only as good as the data they're given. Give them messy, unreliable data and you get messy, unreliable insights. This is why the data engineer has to come first.
3. ML engineer (the builder)
An ML engineer takes a model that a data scientist has trained in a notebook and makes it work in production. That means wrapping it in an API, containerizing it with Docker, setting up monitoring to catch when the model drifts, and connecting it to the live systems that need its predictions.
This role sits at the intersection of software engineering and machine learning. ML engineers are typically stronger in software development than data scientists are, and stronger in model architecture than most backend engineers. Their tools include MLflow for experiment tracking, Kubeflow or SageMaker for orchestration, and Kubernetes for deployment at scale.
If your data scientist has built a churn model in a Jupyter notebook that the sales team wants to use every day, an ML engineer is the person who turns that notebook into a live system.
Why the Hire Order Matters More Than the Job Titles
Most companies hire a data scientist first. It feels like the right call because "data science" is the visible, exciting part of a data team. But it's almost always the wrong call, and here's exactly why.
A data scientist needs three things to do their job: clean data, reliable data, and accessible data. In most companies that don't yet have a dedicated data team, none of these exist. Raw event logs sit in production databases. CRM data hasn't been de-duped in two years. Finance exports are still in Excel. There's no warehouse, no transformation layer, no data dictionary.
When you drop a data scientist into that environment, she's not doing data science. She's doing data engineering, badly, with tools that weren't built for it. We've seen data scientists spend 70% of their time on pipeline work that a data engineer could have handled in a fraction of the time with the right tooling.
Note: In our work with early-stage SaaS companies, the single most common hiring mistake is bringing on a data scientist before the data infrastructure exists. The scientist ends up doing the engineer's job at twice the cost and half the speed.
A fintech company with 3.5 million monthly transactions hired a senior data scientist as their first data role. Eight months later, her manager told us she had built one working model. The rest of her time went to cleaning raw webhook logs, writing ad hoc SQL queries for the ops team, and maintaining a reporting pipeline that had been cobbled together in Python. When we came in and placed a data engineer alongside her, she shipped four new models in the next ten weeks.
The data engineer is the foundation. Everything else is built on top of it.
The Three Hiring Scenarios
The right first hire depends on where your company is right now. Here are the three scenarios we see most often and what each one calls for.
Scenario 1: You have raw data but no infrastructure
You have data coming out of your product, your CRM, and your payment processor, but it lives in production databases or in S3 buckets that nobody has properly organised. There's no warehouse. No dashboards anyone trusts. No consistent definitions for basic metrics like "active user" or "monthly revenue."
Scenario 2: You have clean data but no models or predictions
You have a Snowflake warehouse. Your dbt models are running cleanly. Your BI tool shows dashboards that the business trusts. But nobody is doing predictive work. You're always looking backwards. You want to start asking forward-looking questions: who will churn, what will they buy, which leads will convert.
Scenario 3: You have models that need to go to production
Your data scientist has trained a model that works. The team has validated it. Now the product team wants it running live, serving predictions in real time for thousands of users. Your data scientist isn't set up to do that. Your backend engineers don't know how to maintain a model.
When you need all three and in what order
Most companies don't need all three roles at once. The tipping point for each hire is usually a specific operational constraint, not a headcount target.
Signs you're ready to hire a data engineer:
- You have more than one analyst spending 30% or more of their time cleaning data before they can use it.
- You're running more than 10 million events per month and your production database is starting to feel the reporting load.
- Your company makes decisions on data that nobody fully trusts because nobody knows where it came from.
Signs you're ready to hire a data scientist (after the engineer):
- Your data warehouse is clean and your core metrics are stable.
- You have a business question that historical reporting can't answer on its own.
- Your product or growth team is making decisions on intuition that you believe data could validate or disprove.
Signs you're ready to hire an ML engineer (after the scientist):
- A model is sitting in a notebook and the business wants it in production.
- Your data scientist is spending more than 20% of her time on deployment and infrastructure work.
- You're serving more than 50,000 predictions per day and latency is starting to matter.
Note: In our work with growth-stage e-commerce teams, the trigger for an ML engineer is almost always the same: a recommendation engine or personalisation model that needs to serve results in under 200 milliseconds. That's when the notebook stops being good enough.
Skills breakdown: Data Scientist vs Data Engineer vs ML Engineer
| Skill area | Data scientist | Data engineer | ML engineer |
|---|---|---|---|
| SQL | Strong, analytical queries | Expert, builds the schema | Moderate, uses it for feature work |
| Python | Strong (pandas, scikit-learn) | Strong (pipeline scripting) | Expert (model serving, APIs) |
| Cloud infra | Light (uses what's built) | Core skill (builds infra) | Strong (containers, orchestration) |
| Statistics | Core skill | Minimal | Moderate |
| Model deployment | Minimal | None | Core skill |
| Stakeholder comms | Strong | Moderate | Light |
The clearest way to see the difference is what each role hands off. A data engineer hands clean data to a data scientist. A data scientist hands a trained model to an ML engineer. Each role depends on the one before it.
Not Sure Which Data Role to Hire? Let's talk.
Building a data team for the first time is harder than it looks. The job titles don't tell you enough. Candidates who interview well often aren't the right fit for your stage. And getting the order wrong costs real money.
At Lucent Innovation, we place data engineers, data scientists, and ML engineers into teams at all stages, from pre-seed startups building their first pipeline to enterprise teams scaling a platform that processes billions of events. We screen for technical depth, communication skills, and the specific tooling your stack needs.
You can bring in one specialist for a focused build, or a blended squad if you need to move fast across all three disciplines at once. We match to your stage, your stack, and your timeline.
