system design 2

24/1/25 Design Uber

I watched the entire video. it falls under the pattern of proximity based services.

novel things i learnt ->

quad tree db when you have to deal with uneven distributions location based range queries - bounding box location coordinate.
- yelp, airbnb type which don't have many location based updates
- nyc many people live but less people in antarctic
geohash - if distribution frequently changes, then tree needs to update which can be expensive. when write speed is very very high and distribution is not as uneven, we prefer geohash as it does not need updating as much.

Watched Jordan's videos of Map reduce. Thinking to read batch processing chapter in DDIA but maybe i shouldn't be spending so much time on this.

ad click aggregator

cool stuff

separate the redirect load and the counter load
initially uses cassandra db for high writes (LSM tree + SSH table) and then use OLAP to read from the db. any normal db will do. this is for aggregation queries.
above does not scale well enough 10K user clicks per second. use click processor events -> kafka (msg broker, buffer, producer of events) -> flink (stream processing in real time) -> OLAP can read from this
kafka is already highly distributed, fault tolerant. can connect to s3 to store streaming data
flink also has 7 day retention so it can restore data, exactly once semantics to unique process each

Stream processors like Flink also have a feature called checkpointing. This is where the processor periodically writes its state to a persistent storage like S3. If it goes down, it can read the last checkpoint and resume processing from where it left off. This is particularly useful when the aggregation windows are large, like a day or a week. You can imagine we have a weeks worth of data in memory being aggregated and if the processor goes down, we don't want to lose all that work.

reconcialiation strategy

22-23 january

currently trying to read all the free problems in hellointerview till the deep dive sections
read full of tinyURL and adclick aggregator