Posts

Genefication: Generative AI + Formal Verification

Image
Generative AI is a powerful tool that can significantly enhance developer productivity by generating code quickly and efficiently. However, a major challenge when using generative AI is ensuring the correctness of the generated code, especially in critical systems where errors can have significant consequences. I have heard of companies dealing with critical systems have banned the use of any code generation technologies, including generative AI, to minimize the risk of introducing bugs. In addition, there are reports suggesting using AI coding assistants has led to a 40% increase in the bug rate. A promising way to address the correctness challenges of AI-generated code, particularly in the context of high-level distributed system design, is by combining generative AI with formal verification. This approach could be referred to a...

NuRaftOnTheRocks: A Replicated Key-value Store using RocksDB and NuRaft

Image
NuRaft is an open source C++ Raft implementation. NuRaft is highly pluggable; it allows you to provide your own persistent log store and state manager and integrate it with your state machine. That allows you to integrate NuRaft with any application whenever you need log replication. In this post, I show how you can integrate NuRaft with RocksDB to build a replicated key-value store.  Raft Raft is a consensus and log replication protocol. Raft allows multiple processes in a distributed system to agree on the entries of a shared log. Using such a guarantee, we can achieve state machine replication as follows: having a set of state machines starting at the same initial state, we record operations on the state machine on a shared log, and have each state machine perform operations specified in the shared log. This way all state machines remain consistent with each other. State machine replication is very useful. We can create replicated data stores. In this post, we ...

Log Structured Storage

Image
Log-structured storage is a powerful approach to organizing data that prioritizes sequential writes for efficiency and durability. In this post, I delve into the inner workings of log-structured storage, focusing on LSM-trees. We'll explore their write and read paths, the mechanics of compaction, and compare leveled and tiered compaction strategies. Additionally, I'll discuss the differences between LSM-trees and B-trees, and talk about variations like Jungle and Bitcask, highlighting their unique optimizations and trade-offs.  Basics Write Path Write are first appended to a Write-Ahead-Log (WAL) for durability, then they are written to a MemTable which is basically an in-memory sorted data structure such as a balanced tree (e.g. red-black or AVL tree) or a skip list (I have a detailed post about skip list on this blog). After writing a write to WAL and the MemTable, we can return success to the client. We can have multiple MemTables, but at any given time, ...

Quick Notes on Kubernetes with Go

Image
I this blog post, we talk about the various levels of abstraction available for working with Kubernetes (K8s) in Go. We start with fundamental building blocks for interacting with Kubernetes resources, starting with the REST API, and move through the Go API Library, API Machinery, the high-level client-go library,  controller-runtime, and kubebuilder. Each section provides insights into the utilities and patterns for resource creation, updates, and management, as well as details on working with custom resources and building operators.

Quick Notes on Go - Part 2

Image
In this post, I share some notes on Go. Check out part 1 .

Quick Notes on Go - Part 1

Image
In this post, I share some notes on Go.

Ray: A Framework for Scaling Python Applications

Image
Ray is a framework for scaling Python applications. Ray allows you to execute the same Python program that you would run on your laptop on a cluster of computers with minimum effort. All you need is to tell Ray what part of your program needs to be scalable by decorating your Python functions and classes, and Ray takes care of the rest for you. Ray has been used by several companies including OpenAI  for building ChatGPT.