In this post, we are going to design a news feed system.
What is a news feed? According to the Facebook help page, "News feed is the constantly updating list of stories in the middle of your home page. News Feed includes status updates, photos, videos, links, app activity, and likes from people, pages, and groups you follow on Facebook".
Building a scalable system that handled feed for a lot of users is not an easy task. It requires a deep understanding of caching and databases.
Before jumping into the design, we must finalize the requirements and establish the design scope:
Question: Is this a mobile app? Or a web app? Or both?
Answer: Both
Question: What are the important features?
Answer: A user can publish a post and see her friends’ posts on the news feed page.
Question: Is the news feed sorted by reverse chronological order or any particular order such as topic scores? For instance, posts from your close friends have higher scores.
Answer: To keep things simple, let us assume the feed is sorted by reverse chronological order.
Question: How many friends can a user have?
Answer: 5000
Question: What is the traffic volume?
Answer: 10 million DAU
Question: Can the feed contain images, videos, or just text?
Answer: It can contain media files, including both images and videos.
Now that we have gathered the requirements, we can focus on designing the system.
The design is divided into two flows: feed publishing and news feed building.
Feed publishing: when a user publishes a post, corresponding data is written into the cache and database. A post is populated to her friends’ news feeds.
Newsfeed building: for simplicity, let us assume the news feed is built by aggregating friends’ posts in reverse chronological order.
The news feed APIs are the primary ways for clients to communicate with servers. Those APIs are HTTP based that allows clients to perform actions, which include posting a status, retrieving news feed, adding friends, etc. We discuss the two most important APIs: feed publishing API and news feed retrieval API.
To publish a post, an HTTP POST request will be sent to the server. The API is shown below:
POST /v1/me/feed
Params:
- content: content is the text of the post.
- auth_token: it is used to authenticate API requests.
The API to retrieve the news feed is shown below:
GET /v1/me/feed
Params:
- auth_token: it is used to authenticate API requests.
The following figure shows the high-level design of the feed publishing flow.
- User: a user can view news feeds on a browser or mobile app. A user makes a post with the content "Hello" through API:
/v1/me/feed?content=Hello&auth_token={auth_token}
- Load balancer: Distributes traffic to web servers.
- Web servers: Web servers redirect traffic to different internal services.
- Post service: Persist post in the database and cache.
- Fanout service: Push new content to friends’ news feeds. Newsfeed data is stored in the cache for fast retrieval.
- Notification service: Inform friends that new content is available and send out push notifications.
In this section, we discuss how news feed is built behind the scenes. The following figure shows the high-level design:
- User: a user sends a request to retrieve her news feed. The request looks like this: /v1/me/feed.
- Load balancer: load balancer redirects traffic to web servers.
- Web servers: web servers route requests to the newsfeed service.
- Newsfeed service: news feed service fetches news feed from the cache.
- Newsfeed cache: store news feed IDs needed to render the news feed.
The high-level design briefly covered two flows: feed publishing and news feed building. Here, we discuss those topics in more depth.
The following figure outlines the detailed design for feed publishing. We have discussed most of the components in the high-level design, and we will focus on two components: web servers and fanout service.
Besides communicating with clients, web servers enforce authentication and rate-limiting. Only users signed in with valid auth_token are allowed to make posts. The system limits the number of posts a user can make within a certain period, vital to prevent spam and abusive content.
Fanout is the process of delivering a post to all friends. Two types of fanout models are: fanout on write (also called push model) and fanout on read (also called pull model). Both models have pros and cons. We explain their workflows and explore the best approach to support our system.
Fanout on write. With this approach, the news feed is pre-computed during write time. A new post is delivered to friends’ cache immediately after it is published.
Pros:
- The news feed is generated in real-time and can be pushed to friends immediately.
- Fetching news feed is fast because the news feed is pre-computed during write time.
Cons:
- If a user has many friends, fetching the friend list and generating news feeds for all of them are slow and time consuming. It is called hotkey problem.
- For inactive users or those rarely log in, pre-computing news feeds waste computing resources.
Fanout on read: The news feed is generated during read time. This is an on-demand model. Recent posts are pulled when a user loads her home page.
Pros:
- For inactive users or those who rarely log in, fanout on read works better because it will not waste computing resources on them.
- Data is not pushed to friends so there is no hotkey problem.
Cons:
- Fetching the news feed is slow as the news feed is not pre-computed.
We adopt a hybrid approach to get both approaches' benefits and avoid pitfalls. Since fetching the news feed fast is crucial, we use a push model for the majority of users. For celebrities or users who have many friends/followers, we let followers pull news content on-demand to avoid system overload. Consistent hashing is a useful technique to mitigate the hotkey problem as it helps to distribute requests/data more evenly.
Let us take a close look at the fanout service as shown in the following figure.
The fanout service works as follows:
1. Fetch friend IDs from the graph database. Graph databases are suited for managing friend relationships and friend recommendations.
2. Get friends info from the user cache. The system then filters out friends based on user settings. For example, if you mute someone, her posts will not show up on your news feed even though you are still friends. Another reason why posts may not show is that a user could selectively share information with specific friends or hide it from other people.
3. Send friends list and new post ID to the message queue.
4. Fanout workers fetch data from the message queue and store news feed data in the news feed cache. You can think of the news feed cache as a <post_id, user_id> mapping table. Whenever a new post is made, it will be appended to the news feed table as shown in the following figure. The memory consumption can become very large if we store the entire user and post objects in the cache. Thus, only IDs are stored.
5. To keep the memory size small, we set a configurable limit. The chance of a user scrolling through thousands of posts in news feed is slim. Most users are only interested in the latest content, so the cache miss rate is low.
6. Store <post_id, user_id> in news feed cache. The above figure shows an example of what the news feed looks like in cache.
The following figure illustrates the detailed design for news feed retrieval.
As shown in the above figure, media content (images, videos, etc.) are stored in CDN for fast retrieval. Let us look at how a client retrieves news feed:
1. A user sends a request to retrieve her news feed. The request looks like this: /v1/me/feed
2. The load balancer redistributes requests to web servers.
3. Web servers call the news feed service to fetch news feeds.
4. News feed service gets a list post IDs from the news feed cache.
5. A user’s news feed is more than just a list of feed IDs. It contains username, profile picture, post content, post image, etc. Thus, the news feed service fetches the complete user and post objects from caches (user cache and post cache) to construct the fully hydrated news feed.
6. The fully hydrated news feed is returned in JSON format back to the client for rendering.
Cache is extremely important for a news feed system. We divide the cache tier into 5 layers as shown in the following figure
News Feed: It stores IDs of news feeds.
Content: It stores every post data. Popular content is stored in hot cache.
Social Graph: It stores user relationship data.
Action: It stores info about whether a user liked a post, replied a post, or took other actions on a post.
Counters: It stores counters for like, reply, follower, following, etc.
In this post, we designed a news feed system. Our design contains two flows: feed publishing and news feed retrieval.
Like any system design topic, there is no perfect way to design a system. Every company has its unique constraints, and you must design a system to fit those constraints. Understanding the tradeoffs of your design and technology choices are important.
If there are a few minutes left, you can talk about scalability issues. To avoid duplicated discussion, only high-level talking points are listed below.
Scaling the database:
- Vertical scaling vs Horizontal scaling
- SQL vs NoSQL
- Master-slave replication
- Read replicas
- Consistency models
- Database sharding
Other talking points:
- Keep web tier stateless
- Cache data as much as you can
- Support multiple data centers
- Lose couple components with message queues
- Monitor key metrics. For instance, QPS during peak hours and latency while users refreshing their news feed are interesting to monitor.
That's all for this post.
Thanks for reading out. Hope you have a nice day!
Design A Notification System A notification system has already become a very popular feature for many applications in recent years. A notification alerts users with important information like breaking news, product updates, events, offerings, etc. It has become an indispensable part of our daily life.
React State Management in 2023 History of React State Management • 2013 – Introduction • 2015 – Redux • 2016 – MobX • 2018 – Context • 2019 – Hooks • 2019 – Zustand • 2020 – Jotai, Recoil The future is exciting! Here's a brief summary of how we got here ⬇️