Designing Twitter

Deepak Pant
3 min readJan 26, 2020

Think of 5 most important features a product have Just list them

  1. Post a tweet

2. Generate Home timeline

3. Add Following functionality

3. Search functionality

4. User Authentication and creation service

5. Ads

You only have to focus of 1 to 3 of these in a interview settings. Which one of them will come from the interviewer. So Ask!

Make API (REST)

twitter.com/tweet

twitter.com/home

User id and authentication (session_id) goes in header.

Make basic RDBMS based schema. Most likely it wont scale for 1 B + user

  1. Precompute timeine for all active users once a tweet arrives:

Fanout and build timeline once the tweet is done. Also dump the tweet to a DB for persistency.

Do home timeline in in-memory DB with Redis as twitter do. Key is to precompute the hometime line for active users. Any tweet done by user X will update the timeline (stored as Redis List) of all the user who are following X. To improve the space utilization you can keep only last 3 days or (past 30 tweets) in the rediss cluster.

One Redis node will not be able to hold the timeline of all the users. So you need to Shard.

what if the fanout is too high (Think Justin bebber).

You have multiple ways to do this. Handle in batches (10K each in every 2 mins) slowely.

Only update timeline of users who are active in last 1 day.

2. Building timeline on the fly.

Also if we decide to generate the timeline on the fly we need the follwing information very fast . Last 30 tweets from a user X. so if a user U [who is folloing X1… Xn] viewing his timeline we have to call the LastestTweet(X1)…. LastestTweet(XN). and merge them. Notice it generates the requirment of quering lastest tweet by a userid. How to get that? We need some sort of index on creation time

What to shard on:

Userid? This can be not a good way to store the a user in same shard. Some user can be very popular. We have to fetch their tweets to update many followers timeline.

tweetId?

Now same persion tweet will be sharded to very different server. This will increase the latency because to generate your self timeline you have to hit 20 different servers!

timestamp [32bits]+ tweetid [auto incr counter]

The above will make the range query for the latest tweet faster as we just have to just index the timestamp_tweetid. and apply range query. We still have to query many servers to generate the timeline. But we dont need a secondary index on the creation time. (this increase the write speed)

you also need to replicate the timeline ateach shard. This is for the resilency (if any node Redis node fails we still want to keep serving the time with the same speed)

Precomputed timeline generation architecture using Redis

--

--