How can you design a large scale distributed system during an interview?
Over the last 2 years, I’ve taken 100s of System Design Interviews and helped engineers prepare for their interviews. Based on that experience, I’ve devised a set of steps that are helpful in approaching a system design interview problem. In this post, I’ll focus on the following topics:
EDIT: Also look at Top 10 System Design Interview Questions for Software Engineers.
I previously wrote a couple of blog posts listing the common mistakes in programming interviews: how not to design Netflix in your 45-minute system design interview and how not to succeed in your 45-minute coding interview.
I got a lot of feedback (and emails) on my earlier posts. The most common question was how should an interviewee approach the system design interviews. There are so many concepts, directions, components, pros and cons that one cannot describe all of them in 4 hours, let alone in 45 minutes.
Unlike whiteboard coding interviews, there are very few “Aha” moments in system design interviews.
In other words, System Design interviews are less about getting lucky and more about actually doing the hard work of attaining knowledge. At the end, your performance in these interviews depends on the following 2 factors.
When companies ask design questions, they want to evaluate your design skills and experience in designing large scale distributed systems. How well you do in such interviews often dictates your hiring level (and in some cases even salary). Hence, it’s in your best interest to have a plan and prepare for these interviews.
If you are looking for resources to prepare for system design and programming interviews, take a look at:
As you are studying, here’s a 7-step framework that I recommend to approach each problem. For keeping the examples real, we will pick up a common interview question: Design a scalable service like Twitter and see how each step can be applied to designing Twitter.
Many candidates think that system design interviews are all about “scale”, forgetting to put required emphasis on the “system” part of the interview.
You need to have a working “system” before you can scale it.
As the first step in your interview, you should ask questions to find the exact scope of the problem. Design questions are mostly open-ended, and they don’t have ONE correct answer. That’s why clarifying ambiguities early in the interview becomes critical. Candidates who spend time in clearly defining the end goals of the system, always have a better chance of success.
Here are some questions for designing Twitter that should be answered before moving on to next steps:
If you notice, some of these answers are not exactly similar to the real Twitter, and that’s ok. It’s a hypothetical problem geared towards evaluating your approach. You are just asking these questions to scope the problem that you are going to solve today. e.g. you now don’t have to worry about handling videos or generating a timeline using algorithms etc.
If you have gathered the requirements and can identify the APIs exposed by the system, you are 50% done.
Define what APIs are expected from the system. This would not only establish the exact contract expected from the system but would also ensure if you haven’t gotten any requirements wrong. Some examples for our Twitter-like service would be:
postTweet*(user_id, tweet_text, image_url, user_location, timestamp, …)* generateTimeline*(user_id, current_time)* recordUserTweetLike*(user_id, tweet_id, timestamp, …)*
It’s always a good idea to estimate the scale of the system you’re going to design. This would also help later when you’ll be focusing on scaling, partitioning, load balancing and caching.
Defining the data model early will clarify how data will flow among different components of the system. Later, it will guide you towards better data partitioning and management. Candidate should be able to identify various entities of the system, how they will interact with each other and different aspect of data management like storage, transfer, encryption, etc. Here are some entities for our Twitter-like service:
User: UserID, Name, Email, DoB, CreationData, LastLogin, etc.Tweet: TweetID, Content, TweetLocation, NumberOfLikes, TimeStamp, etc.UserFollows: UserdID1, UserID2FavoriteTweets: UserID, TweetID, TimeStamp
Which database system should we use? Would NoSQL like Cassandra best fits our needs, or we should use MySQL-like solution. What kind of blob storage should we use to store photos and videos?
Draw a block diagram with 5–6 boxes representing core components of your system. You should identify enough components that are needed to solve the actual problem from end-to-end.
For Twitter, at a high level, we would need multiple application servers to serve all the read/write requests with load balancers in front of them for traffic distributions. If we’re assuming that we’ll have a lot more read traffic (as compared to write), we can decide to have separate servers for handling reads v.s writes. On the backend, we need an efficient database that can store all the tweets and can support a huge number of reads. We would also need a distributed file storage system for storing photos (and videos) and a search index and infrastructure to enable searching of tweets.
Dig deeper into 2–3 components; interviewers feedback should always guide you towards which parts of the system she wants you to explain further. You should be able to provide different approaches, their pros and cons, and why would you choose one? Remember there is no single answer, the only thing important is to consider tradeoffs between different options while keeping system constraints in mind. e.g.
Try to discuss as many bottlenecks as possible and different approaches to mitigate them.
In short, due to the unstructured nature of software design interviews, candidates who are organized with a clear plan to attack the problem have better chances of success.
One again, if you are looking for resources to prepare for system design and programming interviews, take a look at:
Happy interviewing!
If you liked this post, click the 💚 sign and follow me for more posts. If you have any feedback, reach out to me on Twitter.
Fahim is the co-founder of Educative. We are building the next generation interactive learning platform for software engineers and instructors. Learners learn by going through interactive courses. Instructors can quickly create and publish interactive courses using our course builder. If you are interested in publishing courses or knowing more, feel free to reach out.