Number of words in this post: 903

The Role of MLE

The role definition of Machine Learning Engineer varies in different companies. Generally speaking, MLEs from a small to medium size company tend to work around MLOps, sometimes on a production model. They usually collaborate with Applied or Data Scientists in the begining of a new ML service, understanding the data and business needs to create a model. Once the model is formulated, the rest of work will be mostly carried by MLEs. That includes: Data Pipelines, Service Repo, CI/CD, Automated Model Training, Model Serving, Model Monitoring, and Service Monitoring.

On the other hand, MLEs from large tech companies tend to put more effort on the modeling side. That includes: translate business needs to ML models, fine-tune large deep learning models, conduct and analyze experiments. They may need other special ML skill; NLP, Casual Inference, Operations Research, etc, but mostly depends on a specific need. Large tech companies have a role for research only position; the Research Scientist. But that’s another topic.

There was a trend of being MLE in early 2020 but probably not any more. Admittedly, tech companies tend to pay more to MLEs (5~10%) but the margin has decreased as of late 2022. ML Generalist, ML Specialist, MLOps Engineer, and Data Scientist are distinct stream of workforces. Interviews will all depends on the job requirements, not necessarily the title.

Interview Format

If a company hires large number of MLE workforce, the interview format tends to be general. But if a company; usually a start-up, hires a MLE for a specific position, the interview format and questions can be hard to foresee. Based on my empirical observation, I categorize the types of questions as the following:

1. Algorithm Coding (Leetcode)

  1. Typical Questions:
    Binary Search, Sliding Window, Line Sweep, Two Pointers, Prefix Sum
  2. Common Algorithms:
    BFS, DFS, Backtracking
  3. Sorting Algorithms:
    Merge Sort, Quick Sort, Bucket Sort
  4. Advanced Algorithms:
    Monotonic Stack, Monotonic Queue
  5. Tree Data Structure and Algorithms:
    Traversal, Serialization, LCA, Binary Search Tree, Iterator
  6. Graph Data Structure and Algorithms:
    Topological Sort, Dijkstra, Union Find, Backtracking, Trie
  7. Dynamic Programming

Other resource: Blind 75: TODO: https://duckpunch.org/blind-75/

2. ML Algorithm Coding

  • Common questions are coding KNN from scratch, Gradient Descent algorithm, Decision Tree split, etc.

3. ML Knowledge and Statistics

  • StatQuest Video Index: Go over sections: Statistics Fundamentals, Statistical Tests, Machine Learning
  • Pros and Cons of ML algorithms, i.e., Logistic Regression vs Random Forest vs Decision Tree vs Boosting, etc

4. ML Case Study

  • Data Pipeline and ML Infra
  • Product Sense: translate business question to a ML strategy
  • Feature Engineering
  • A/B Test

5. Scalable Software System Design

  • Alex Xu on Linkedin covers this section in details.

6. Past Work Projects

  • Amazon asks Applied Scientists to talk about their past projects, sometimes in a presentation format. This can be a spotlight moment to demonstrate your breadth and depth of ML expertise as well as other skills such as engineering, trade-off and comparative analysis, communication, etc. Even though Amazon selectively ask Applied Scientists candidates, I could see a trend of doing such interview for senior level candidates.
  • Consider S.T.A.R. Template:
    • Situation: Quick business background brush-up on why the project was needed.
      • Example: Meta sets ads spend cap due to fraud concern. Now we need to unblock users' ads spending.
    • Task: What tasks were created for the project. What responsibilities did you carry?
      • Example: I need to investigate whether there is opportunity to unblock users and how much revenue can the project bring. I need to come up with a strategy.
    • Action: What specific actions were taken to achieve the task?
      • Example: Analyzed data, research available solutions, build models, group discussion, resolve conflicts, make tradeoffs, present work.
    • Result: What was the outcome of the task, what benefits did you bring?
      • Example: The model increased 60MM iRev.

7. Behavioral Questions

  • Generally speaking, there are four categories of Behavoiral Questions:
    • Past experience - Wins and Fails
    • Company Culture Fit
    • Leadership and Ownership
    • Conflict Handling

8. SQL*1

  • Some start-up companies do SQL coding questions, but it’s rare. But it doesn’t hurt to spend a day to brush up some fundamental SQL queries.

9. Peer Programming: Data and Model Analysis*1

  • This type interview often appear in early start-ups. It is hard to predict what they will ask thus it’s hard to prepare. Sometimes the interviewer/recruiter will email you the question in advance, sometimes you will just have to improvise. I personally really like this type of interview because it closely simulates daily work, but it requires a good amount of proficiency in numpy and pandas to speed up the data anlaysis.

Optional Topics:

If a candidate is hired as a ML Specialist, one of the ML Case Study rounds will reflect the team’s need.

  • Deep Learning
  • Computer Vision
  • NLP
  • Casual Inference
  • Experimentation
  • Bayesian Statistics
  • Operations Research

Preparation Timeline:

Week 1 Week 2 Week 3 Week 4 Week 5
Leetcode Leetcode ML Case Study ML Case Study System Design
ML Coding ML Knowledge and Statistics
Past Work Projects Behavioral Questions

Other Caveats:

  • The prospect company; unless they use the same infra tools, really couldn’t give a damn about the ML system architecture of your current company. One can use Airflow, the other can use AWS Glue. With enough man-hours, it’s tomayto and tomahto. The workflow, from data pipeline to training to serving, is also ubiquitous. So there is really no design skill to assess. So don’t spend too much time talking about ML system with the interviewers.

  1. Rarely asked. ↩︎