Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
An event records that "something happened" in the world or your business. It is also called a record or message in the documentation.
When you read or write data to Kafka, you do this in the form of events. An event in Kafka has a key, value, timestamp, and optional metadata headers. Here's an example event:
Producers are those client applications that publish (write) events to Kafka, and consumers are those that subscribe to (read and process) these events.
In Kafka, producers and consumers are fully decoupled and agnostic of each other, which is a key design element to achieving the high scalability that Kafka is known for.
For example, producers never need to wait for consumers. Kafka provides various guarantees such as the ability to process events exactly once.
Events are organized and durably stored in topics. Very simplified, a topic is similar to a folder in a filesystem, and the events are the files in that folder.
An example topic name could be "payments". Topics in Kafka are always multi-producer and multi-subscriber: a topic can have zero, one, or many producers that write events to it, as well as zero, one, or many consumers that subscribe to these events.
Events in a topic can be read as often as needed—unlike traditional messaging systems, events are not deleted after consumption. Instead, you define for how long Kafka should retain your events through a per-topic configuration setting, after which old events will be discarded.
Kafka's performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.
Topics are partitioned, meaning a topic is spread over a number of "buckets" located on different Kafka brokers. This distributed placement of your data is very important for scalability because it allows client applications to both read and write the data from/to many brokers at the same time.
When a new event is published to a topic, it is actually appended to one of the topic's partitions. Events with the same event key (e.g., a customer or vehicle ID) are written to the same partition, and Kafka guarantees that any consumer of a given topic partition will always read that partition's events in exactly the same order as they were written.
This example topic has four partitions P1–P4. Two different producer clients are publishing, independently from each other, new events to the topic by writing events over the network to the topic's partitions. Events with the same key (denoted by their color in the figure) are written to the same partition.
To make your data fault-tolerant and highly available, every topic can be replicated, even across geo-regions or data centers, so that there are always multiple brokers that have a copy of the data just in case things go wrong, you want to do maintenance on the brokers, and so on.
A common production setting is a replication factor of 3, i.e., there will always be three copies of your data. This replication is performed at the level of topic partitions.
This primer should be sufficient for an introduction.
An implementation guide for Kafka will be soon published on my other space at https://fifo.im/+glue_labs_engineering. Here we'll be posting more about how we are building https://fifo.im/ and how what techs are we using.
Thanks for reading out! Hope you have a nice day.
Design A News Feed System In this post, we are going to design a news feed system. What is a news feed? According to the Facebook help page, "News feed is the constantly updating list of stories in the middle of your home page. News Feed includes status updates, photos, videos, links, app activity, and likes from people, pages, and groups you follow on Facebook".
Design A Notification System A notification system has already become a very popular feature for many applications in recent years. A notification alerts users with important information like breaking news, product updates, events, offerings, etc. It has become an indispensable part of our daily life.