Videos

Distributed Transactions

I this video, we review serializability and three systems with distributed ACID transactions, namely Spanner, Calvin, and FoundationDB.

Comments

In-memory vs. On-disk Databases

Figure 1 shows the main difference between in-memory and on-disk databases: An in-memory database stores the data in memory and uses disk for backup, while an on-disk database stores the data on disk and uses memory for caching. Figure 1. In-memory vs On-disk storage engines

P is the name of a domain-specific language for asynchronous event-driven programming. With P, we can specify a system as a set of interacting state machines which talk to each other by sending events. That makes P a suitable language for modeling distributed systems where nodes talk to each other asynchronously via messages. P originally developed by Microsoft and according to its Github repository, it is currently being used extensively inside Amazon AWS for model-checking complex distributed systems. In this post, we want to see how we can use P to catch bugs in our protocols.

Log-structured storage is a powerful approach to organizing data that prioritizes sequential writes for efficiency and durability. In this post, I delve into the inner workings of log-structured storage, focusing on LSM-trees. We'll explore their write and read paths, the mechanics of compaction, and compare leveled and tiered compaction strategies. Additionally, I'll discuss the differences between LSM-trees and B-trees, and talk about variations like Jungle and Bitcask, highlighting their unique optimizations and trade-offs. Basics Write Path Write are first appended to a Write-Ahead-Log (WAL) for durability, then they are written to a MemTable which is basically an in-memory sorted data structure such as a balanced tree (e.g. red-black or AVL tree) or a skip list (I have a detailed post about skip list on this blog). After writing a write to WAL and the MemTable, we can return success to the client. We can have multiple MemTables, but at any given time, ...

Ten years after the public availability of DynamoDB in 2012, and fifteen years after the publication of the original Dynamo paper in 2007, the DynamoDB team published a paper in Usenix ATC 2022 on how DynamoDB has evolved over its ten years of operation. The paper covers several aspects of DynamoDB such as adaptation to customer access patterns, smarter capacity allocation, durability, correctness, and availability. In this post, I share some of points that I found interesting in the paper.

ByteGraph is a distributed graph database developed by ByteDance, the company behind the popular social network TikTok. As a large social media platform, ByteDance handles an enormous amount of graph data, and using a specialized graph database like ByteGraph is essential for efficiently managing and processing this data. In 2022, ByteDance published a paper on ByteGraph at the VLDB conference. In this post, I will provide an overview of my understanding of ByteGraph and highlight some of the key points from the paper that I found interesting.

Search This Blog

MyDistributed.Systems

Videos

Distributed Transactions

Comments

Post a Comment

Popular posts from this blog

In-memory vs. On-disk Databases

Model-based Testing Distributed Systems with P Language

Log Structured Storage

DynamoDB, Ten Years Later

ByteGraph: A Graph Database for TikTok