Posts

Quick Notes on Go - Part 2

Image
In this post, I share some notes on Go. Check out part 1 .

Quick Notes on Go - Part 1

Image
In this post, I share some notes on Go.

Availability Modeling

Image
Availability is one of the most important aspects of every service. The failure of a critical service can result in the unavailability of the final application for end-users, thus promptly impacting the company's revenue. That is why businesses usually care about availability more than other aspects of a service such as consistency, and in the trade-off between these two, under the CAP theorem , they usually favor availability. Being of utmost importance, it is critical for distributed systems and site reliability engineers to understand how their design or operational decisions affect availability of their service.

Ray: A Framework for Scaling Python Applications

Image
Ray is a framework for scaling Python applications. Ray allows you to execute the same Python program that you would run on your laptop on a cluster of computers with minimum effort. All you need is to tell Ray what part of your program needs to be scalable by decorating your Python functions and classes, and Ray takes care of the rest for you. Ray has been used by several companies including OpenAI  for building ChatGPT.

ByteGraph: A Graph Database for TikTok

Image
ByteGraph is a distributed graph database developed by ByteDance, the company behind the popular social network TikTok. As a large social media platform, ByteDance handles an enormous amount of graph data, and using a specialized graph database like ByteGraph is essential for efficiently managing and processing this data. In 2022, ByteDance published a paper on ByteGraph at the VLDB conference. In this post, I will provide an overview of my understanding of ByteGraph and highlight some of the key points from the paper that I found interesting.

DynamoDB, Ten Years Later

Image
Ten years after the public availability of DynamoDB in 2012, and fifteen years after the publication of the original Dynamo paper in 2007, the DynamoDB team published a paper in Usenix ATC 2022 on how DynamoDB has evolved over its ten years of operation. The paper covers several aspects of DynamoDB such as adaptation to customer access patterns, smarter capacity allocation, durability, correctness, and availability. In this post, I share some of points that I found interesting in the paper. 

Dual Data Structures

Image
Dual data structures are concurrent data structures that not only hold data, but also keep track of read requests using reservations. By holding both data and anti-data, dual data structures achieve better performance and fairness compared with other blocking and non-blocking concurrent data structures.

Practical Tips on using TLA+ and P

Image
TLA+ and P are formal specification languages that let us formally define the behavior of our system and have a model checker reason about its correctness. In this post, I want to share some practical tips on using these awesome tools. I am not going to explain the basics and I assumes the reader is already familiar with these languages. If you are not, you can learn more about TLA+ at  Leslie Lamport's website . For P, a great place to start is the official website . I also wrote a blog post on P that explains P with examples. TLA+ vs. P While both TLA+ and P are formal specification languages, they are used for different purposes. Proving vs. Testing The main difference between TLA+ and P is that TLA+ is used to  prove , while P is used to  test ,   the correctness. While both use model checking, only TLA+ is used for exhaustive model-checking where we explore all possible executions of a given specification. Thus,  when the model checker does not find a bug, you are guaranteed

Eventual Consistency and Conflict Resolution - Part 2

Image
This is the second part of a two-part post on eventual consistency and the problem of conflict resolution. In the  first part , we focused on the definition of the eventual consistency, its sufficient conditions, and conflict resolution using physical and logical clocks. We saw each method has its problems. In this part, we will see how we can solve these issues and provide eventual consistency with better conflict resolution. Listen to the Audio Blog Conflict Resolution using Vector Clocks The good thing about physical clock timestamps is that they carry information related to the real-time order of events, e.g., if we write v2, one minute after writing v1, the physical timestamp of v2 will be most likely larger than that of v1, assuming we have reasonable clock synchronization in place. However, physical clocks are not 100% accurate. Thus, due to clock synchronization errors, the timestamps may not reflect t