The field of data science is ever evolving, spanning several industries and requiring an extensive skillset that includes mathematics, statistics, programming and marketing. As such, becoming a data scientist requires an impressive blend of technical skill, creativity and communication.
Job descriptions for data scientists can vary greatly, though all are seeking candidates with a long list of the most desirable job skills like critical thinking, problem-solving, data analytics, emotional intelligence, attention to detail and teamwork. This means that interview questions for data scientists can span several different topics and range from typical soft skills queries to extremely technical discussions.
Data science interviews require a lot of preparation. Whether you’re fresh out of a top computer science school or you’re looking to shift to a different company or industry, you should take time to go over the major concepts of your work. Just as you know how to drive but might have trouble reciting specific rules of the road, you might get stuck in an interview trying to articulate how a specific algorithm works.
To help you prepare, we’ve compiled 10 of the most common data scientist interview questions. From early screenings to second and third-stage video and on-site interviews, you’ll encounter a wide variety of examinations like these of your technical skills, communication abilities and work style.
1. ‘Tell us more about the most recent project in your portfolio.’
Data scientists are in demand in many different industries, but companies are often looking for someone with very specific skills as well as a good culture fit. A detailed online portfolio displaying the type of work you’re capable of, as well as a strong social media presence and personal brand, helps you stand out from other candidates as well as connect you with hiring managers and recruiters for jobs you’re perfectly suited to.
Be prepared in any data science interview to talk extensively about all elements of your CV, portfolio or website. Tailor your response about a project to suit your audience. If it’s an initial screening or a panel with participants from a variety of departments, your focus should be on the ways your work created positive results for the client and their business.
When you get to the part of the interview process where you’re meeting with another data scientist, engineer, analyst or other technical person, a more detailed description of the data and processes involved in your work is required.
2. ‘Why do you want to work for this company?’
Even if you were contacted directly through your online portfolio or LinkedIn profile and invited to interview for an open position, the company will still want to know why you’ve accepted and why you think you’ll be a good fit for the job.
Aside from brushing up on your technical skills, your preparation for the interview should include research on the business you’re applying to. Information about their industry, mission, staff, exactly what they do and how well they’re doing it will help you craft a specifically tailored response to this question.
Address how your skillset will help them meet their goals. Find a way to express passion about one or more aspects of your job role, including the company’s mission, philosophy, innovation or product line. If this is your dream job, it can be worth the time to put together a data science project ahead of the interview that solves a problem for them – like appealing to a new demographic or scheduling deliveries more efficiently.
3. ‘Name the data scientists you most admire and explain why.’
While this is a very personal question that doesn’t technically have a right answer, the responses you select are very important. Your research on the company, as well as those on the interview panel can help you make a good first impression with this question alone.
Knowing the people who are prominent in the field as well as those currently making waves will show the interviewers that you are both knowledgeable and passionate about the industry. It’s useful to discuss data scientists who are valued in the specific career arena you’re applying for, like finance, medicine or the stock market.
This question is more than just an impressive list of names. The ‘why’ part of the equation will also show your prospective employers what you value in your field and how you’ll approach your work. If your research has shown that the company values innovation, integrity or even a certain statistical method, this is a great opportunity to let them know you share those same values.
4. ‘How would you explain a recommendation engine to someone from the Marketing department?’
One of the important qualities that set data scientists apart from other technical geniuses is the ability to convert, display and explain data in a way that non-technical people can understand. That makes a query like this one of the most important data scientist interview questions you’ll encounter. Interviewers want to see how well you can communicate concepts like data modelling, decision trees and linear regression to any audience.
In this specific case, you’ll want to first explain in simple terms how a recommendation engine works, with examples of both content-based filtering and collaborative filtering. Then you’ll want to discuss how you can work with the marketing department to combine their skills of appealing to customers with the power of the algorithm that uses collected data to help pinpoint what consumers want.
5. ‘What are the differences between supervised and unsupervised learning?’
You can begin by summarising that the main difference between these two is that supervised learning has training data that the algorithm can learn from and provide answers. Unsupervised learning requires grouping things together by similarities, common anomalies and other pattern-seeking processes rather than by hard and fast data.
The interviewer will want you to go into more detail, so it’s important to list the specific differences and be able to speak about the various algorithms used.
- uses known and labelled data as input
- has a feedback mechanism
- used for prediction
- its common algorithms include decision tree, logistic regression, linear regression, support vector machine and random forest
- uses unlabelled data as input
- has no feedback mechanism
- used for analysis
- its common algorithms include K-means clustering, hierarchical clustering, autoencoders and association rules
You’ll want to have some examples, either generic or from a specific project you’ve worked on, to illustrate the differences between these two types of machine learning and in what instances each might be used. For instance, unsupervised learning may be used when launching a new product where the demographics of customer it might appeal to is unknown.
6. ‘How do you avoid selection bias?’
This question can take many forms in a data science interview. You may be asked to define selection bias, how to avoid it or to give a specific example of how it played a role in a project you worked on.
The main issue with selection bias is that conclusions have been drawn from a non-random sample. Obviously, the easiest solution is to always select from a random sample of a clearly defined population. You’ll need to elaborate on why that isn’t always possible.
Be aware that since selection bias can be intentional – with subject selection or data elimination purposely done to prove a preconceived theory or projection – this could be an indirect way for the hiring panel to ask one of those tough interview questions about ethics and integrity at work.
You’ll ultimately want to stress how selection bias is more often a case of unintentional or unavoidably biased data. Be sure to elaborate on some of the areas where selection bias can occur, including sampling, time interval, data and attrition. Then give some examples of how leveraging techniques like resampling and boosting can help you work around non-random samples.
If you’re in the portion of an interview when you’re speaking with representatives from less technical departments, use an easily digestible example that clearly illustrates selection bias. Data scientist Eric Hollingsworth references a lesson learned from the avian flu outbreak of 2011, where ‘only very sick individuals were counted’ in a statistical sample of ‘confirmed cases’. The resulting 80% reported death rate, so dire due to selection bias, created considerable widespread fear.
7. ‘How can outlier values be treated?’
This is a common interview question for data scientists, as it reveals how you use the data you’re given, the methods you use to process that data and whether you’re willing to put in the time to evaluate each piece of that data.
You’ll first want to talk about what constitutes an outlier, as numbers that exist way outside the cluster of data on a graph, as 2–3 standard deviations away from the mean, and so on. The next step to dealing with outliers is evaluating why they happened.
A small amount of outliers that can be attributed to simple human or machine error are easily eliminated. Be sure to note, however, that even a single outlier can be a key data point rather than a problem, as it may indicate the success of a single marketing tactic, new drug ingredient or product line.
Next, you’ll want to explain how to deal with a large number of outliers, which requires more complex solutions. For example, you may need to change the model you’re using, normalise the data to the average or use a random forest algorithm. Once again, try to use a real-life case from your experience as a data scientist to explain the correct tactics.
8. ‘Why is data cleaning important?’
Data collection and cleaning are a dominant part of your job as a data scientist, taking up to 80% of your time. Whatever industry you’re applying to, the interview questions will always include one about why data cleaning is important. Interviewers will also ask about your preferred cleansing techniques and programs.
You should stress how clean data is necessary to draw the correct conclusions, but it’s not just about the numbers. Explain how starting with complete, accurate, valid and uniform data directly impacts their business. Key benefits to discuss include:
- improved decision-making on company objectives
- faster customer acquisition and re-targeting of past customers
- time and resource savings due to eliminating inaccurate or duplicate data
- improved productivity
- boosted team morale thanks to repeated efficient and accurate results
9. ‘What is the goal of A/B testing?’
Questions about A/B testing during your interview for a data scientist position may begin with a more generic reference to using experimental design to answer a single query about user behaviour or preferences. The goal of testing a website, app or newsletter design variable is quite simply to evaluate if a change will increase interest, engagement and conversion rates.
One way to set yourself apart in answering these types of interview questions is to discuss how other data scientists might draw the wrong conclusions from A/B testing. Possible pitfalls include:
- not collecting enough data over a long enough period of time
- testing too many variables at once
- not accounting for external factors that can affect traffic during testing period
- ignoring small gains that can build over time and combine with other positive changes for increased revenue
- missing big picture interpretations like net financial gains or losses relative to conversion rates
Aside from pointing out these problems, you’ll need to express how you would solve them – or, better yet, how you already have avoided them in your previous data science projects.
10. ‘You have 48 hours to solve this coding challenge.’
The coding challenge may be an initial way to screen potential data scientists, or it may be a second step in the interview process after you’ve cleared the first hurdle with a recruiter or hiring manager. This can be an on-site test that takes 30 minutes to 2 hours, where you’ll be coding on a whiteboard or at a keyboard within view of the interviewer. You’re often given a choice of language, but be prepared to code in SQL or Python.
Some companies assign longer tasks, with deadlines up to a week. Whiteboard challenges may require writing fairly simple SQL queries, but longer tests are, of course, more complex. Typically, you’ll be given data and asked to make specific predictions using that data, and you’ll have to show your work. For example, a recent data scientist interview subject was given Airbnb data and asked to predict house prices based on accommodation features.
The interviewers will want to discuss your choices with you, the assumptions you made, the features you chose, why you used certain algorithms, and more. Often, the answer you arrive at is less important than your process, creativity, code readability and design.
This can be a nerve-wracking interview experience, so prepare yourself by creating and completing practice coding challenges with friends or colleagues in the data science field. You can also visit sites like Leetcode and SQLZOO for coding exercises. Actual mock interviews involving algorithmic and systems design problems are available for free through Interviewing.io.
As you can see, interview questions for data scientists can be difficult, and the overall process can be lengthy and gruelling. One of the most important interview tips is to stay positive, even if you feel a portion of the interview process went poorly. We’re often harder on ourselves than others, and you could still land the job despite not getting every answer as perfect as you would have liked.
If you miss out on the opportunity, ask for feedback and use it to improve on your next interview experience. After all, many well-established data scientists were rejected from several positions and still went on to success in the jobs that ultimately were the better fit!
What questions and coding challenges did you encounter when trying to land a data science job? Join the discussion in the comments below and help your fellow data scientists prepare for their next interview!