System Design Log

19th September

CDC: Loved this video on change data capture

For CDC, you can connect Flink via connectors to relevant DB and it should be able to read the relevant logs from the db for the same

Whatsapp / FB messenger message processing using kafka and flink

Good question! Jordan introduces Kafka and Flink into the architecture to address the potential performance limitations of HBase, specifically regarding its write throughput, and to improve message delivery reliability. Here's a breakdown of their roles:

Kafka:

High Write Throughput: Kafka is a distributed message broker designed to handle high volumes of data in real-time. By buffering messages in Kafka first, we avoid the potential bottleneck of directly writing every message to HBase, which might be slow due to its single-leader replication model.
Durability and Reliability: Kafka is highly durable and fault-tolerant. Messages are replicated across multiple brokers, ensuring that even if a broker fails, messages are not lost. This improves the reliability of message delivery, as even if HBase is temporarily unavailable, messages are safely stored in Kafka until Flink can process them.

Flink:

Stream Processing: Flink is a powerful stream processing framework that enables complex data processing tasks in real-time. It consumes messages from Kafka and performs necessary operations before writing to HBase.
Idempotency: Flink helps ensure idempotent writes to HBase by assigning a unique identifier (UUID) to each message. This way, if a message is accidentally processed multiple times (due to network issues or Flink restarts), HBase can use the UUID to identify duplicates and prevent data corruption.
Message Routing: Flink reads changes in the "Chat Members" table (Change Data Capture) and routes messages to the appropriate chat servers. This ensures that each user only receives messages for the chats they are a member of.

In summary:

Kafka acts as a high-throughput, durable buffer for incoming messages, decoupling the message ingestion process from the slower write operations in HBase.
Flink provides stream processing capabilities to handle message routing, ensure idempotency, and perform any additional processing before storing the messages in HBase.

This architecture effectively balances the strengths of each component, allowing for high message throughput and reliable delivery while maintaining the read optimization benefits of HBase.

18th - finally learnt consistent hashing, watched whatsapp system design 16th, 17th -> leetcode 15th sept

studying chapter 3 (storage and retrieval from ddia)

hash index

let's say you are appending rows to a log file. instead of scanning entire file, you can use a hash index which is simply a hashmap / key-value store in memory. you can record the key and it's byte offset (i.e byte in memory from where it starts). then when reading, can refer that offset and jump to that location.

Bitcask

The above is what Bitcask does. It offers high perf in reads and writes subject to req that all keys fit in the available RAM.

If you are only appending to a log file, you will eventually run out of disk space. Solution -> store on another segment when reach certain size. then perform compaction (i.e throw away duplicate keys in the log and keeping only the most recent update for each key)

logs are the best write strategy because appending to a file is the simplest write operation.

indexes are additional data structures derived from the data to help in optimized reads.

cons of index is they slow down writes

any kind of index usually slows down writes, because the index also needs to be updated every time data is written. well chosen indexes speed read queries but every index slows down writes. for this reason, db don't usually index everything by default, but require you the appplication develor to choose indexes manually, using your knowledge of the appplication's typical query patterns.

14th September

Watched a bit on nosql vs sql Watched video on choosing db design by jordan

Watching a video on wide column storage

main difference is how data is stored on disk. In wide column storage, all column data is stored in a text file. Can be useful for analytics when you want all the data for a particular field, min, max, avg etc. data is closer / contigous so faster.

you can also do column compression - data needs to be similar like int. he gave examples using bitmap encoding -> runlength encoding (running count of alternate 0s and 1s)

Benefits of column compression

less data to send over the network to a server
can keep more data stored in memory of a cpu cache

Predicate pushdown

Predicate pushdown involves moving the filtering operation (the predicate) from the query processor down to the storage layer or even to the data source itself.

In simple words, some meta-data like min/max, avg is added to the file. So when a query like "select data from table where column > 60", all files with max < 60 can be skipped for querying.

Downside of column oriented storage

Every column must have the same sort order
Each write needs to go different places

Column Oriented Storage (with Parquet!) _ Systems Design Interview_ 0 to 1 with Ex-Google SWE 12-10 screenshot

Watching this

partition: the "queue". An ordered immutable sequence of messages that we appened to, like a log file. these are physical things like separate files.

topic: a logical grouping of partitions.can have multiple partition.

Kafka Deep Dive w_ a Ex-Meta Staff Engineer 8-22 screenshot

Kafka - distributed streaming platform that serves three key functions

Publish and subscribe to streams for records (similar to a message queue)
Store streams for records in a fault tolerant durable way
Process streams of records as they occur

Key Features partitioning: topics are divided into partitions for parallelism and scalability

Scalability => kafka can handle large volumes of data with high thruput and low latency durability: data is written to disk and replicated across cluster to prevent data loss

retention: configurable data retention policies allow kafka to store data for a specified period

What Are Exactly-Once Semantics?

Exactly-Once Semantics (EOS) is a guarantee provided by distributed data processing systems that ensures each record or message is processed exactly one time—no more, no less. This means that:

No Duplicates: A message won't be processed multiple times, preventing duplicate entries.

No Missed Processes: Every message sent to the system will be processed, ensuring no data loss.

In high-thruput systems like whatsapp, handling billions of messages, maintaining data integrity is crucial

Avoiding duplicates => you don't want identical messages appearing
Ensuring data consistency => Users expect consistent and reliable messaging experiences without data loss or corruption
Analytics pov => duplicate or lost data can skew results

Apache Flink's checkpointing mechanism

Checkpointing is Flink's mechanism to achieve fault tolerance and Exactly once processing guarantees. It involves periodically taking snapshots of the application's state and the positions of the data streams (like kafka topics) to allow recovery in case of failures

How Does Checkpointing Work?

Snapshot Creation: Flink periodically captures the state of all operators (tasks) in the data processing pipeline. This includes any state stored in memory, such as windowed aggregations or keyed state.

Barrier Alignment: During checkpointing, Flink injects barriers into the data streams. These barriers ensure a consistent point across all input streams where the snapshot is taken.

State Persistence: The snapshots are stored in a Durable Storage system, such as HDFS, S3, or any other reliable distributed filesystem.

Operator State Management: Each operator (e.g., map, filter, sink) records its state as part of the snapshot. This includes information necessary to resume processing without duplication or loss.

Failure Recovery: If a failure occurs, Flink uses the latest successful checkpoint to restore the state of the application. Processing resumes from this checkpoint, ensuring that each message is processed exactly once.