Practical Tips on using TLA+ and P

TLA+ and P are formal specification languages that let us formally define the behavior of our system and have a model checker reason about its correctness. In this post, I want to share some practical tips on using these awesome tools. I am not going to explain the basics and I assumes the reader is already familiar with these languages. If you are not, you can learn more about TLA+ at Leslie Lamport's website. For P, a great place to start is the official website. I also wrote a blog post on P that explains P with examples.

TLA+ vs. P

While both TLA+ and P are formal specification languages, they are used for different purposes.

Proving vs. Testing

The main difference between TLA+ and P is that TLA+ is used to prove, while P is used to test, the correctness. While both use model checking, only TLA+ is used for exhaustive model-checking where we explore all possible executions of a given specification. Thus, when the model checker does not find a bug, you are guaranteed that your specification is correct. P, on the other hand, is used for bounded model-checking where we explore only part of the state space. Thus, like any other form of testing, you may miss a bug with P. 
 If P does not guarantee correctness, what is the point of using it instead of conventional testing?

First of all, you never use any formal technique instead of testing. Formal techniques never eliminate the need for testing the final piece of software you write using conventional testing methods. The reason is that even when your formal specifications are all good, you still may have bugs in your final code because you have probably abstracted some aspect of your code in your model, you may have errors in your formal specification, or you may have an wrong assumption about your system, e.g., you assume your system behave in a certain way while in practice it does not. In other words, your model may not be faithful to reality. Regarding the benefits of using P over conventional testing, we can at least mention two: 1) Although it does not explore all possible executions, it still intentionally tries to explore alternative paths of executions. It is still better than conventional testing where you test only a single path of execution out of many. 2) The second benefit is that when the P model checker finds a bug, it gives you a trace that you can replay to reproduce the situation that leads to the bug. Thus, after a fix, you can replay the trace and make sure the issue is resolved. In conventional testing, you cannot easily do that; any distributed systems engineer knows that reproducing bugs is one of the hardest things about developing distributed systems. 


  It seems TLA+ is better than P because it checks all states and proves correctness. Why should someone use P instead of TLA+ then?
Read the next section. 

Levels of Abstraction

TLA+ should be used for reasoning about the high-level design of a distributed system. Usually, we cannot practically model fine-grain code-level details of a real system in a TLA+ specification. There are two reasons for this limitation: 1) It is hard to specify code details in TLA+ language, and even if you do, it might be very hard to maintain the specification. For example, if some developer changes some Java code, it may be very difficult to figure out how your TLA+ specification should change to reflect that change. 2) The second issue is that for real systems, the state space is usually very big and exploring the entire state space is not practically possible. P, on the other hand, is very close to high-level programming languages. Thus, translation from C++ or Java for example to P is straightforward. Also, since P is not exhaustive, you can still use it even for systems with very big state spaces. 

Tip 0: Always start with TLA+

Always start with reasoning about you high-level design using TLA+, even when your high-level design looks simple and you feel confident about it. You might be surprised to find bugs in your simple high-level design, and catching bugs at that level is much easier than going to the code-level details and modeling in P. 

TLA+

In this section, I share some practical suggestions for using TLA+. Thanks to Markus Kuppe for teaching me these tips. 

Tip 1: Use the VSCode plugin

I used to use TLA+ Toolbox, and while it does the job, it does not provide the best user experience. Recently, I gave the VSCode TLA+ extension a try, and I loved it! It is much better than the Toolbox. Follow the instructions on the extension page to install and configure it. 

Figure 1. TLA+ VSCode Plugin

To define properties, following the Toolbox, I do this: I create another .tla file that extends my main specification and defines properties there. For example, suppose I define my protocol in MyProtocol.tla file. I create another .tla file named Main.tla that extends MyProtocol and define my desired conditions there. 

---- MODULE Main ----
EXTENDS MyProtocol
livenss == <>(desiredCondition)
----
=====
Then, I create a config file Main.cfg to define the constants and specify the specification and properties for the model checker. 

SPECIFICATION Spec CONSTANT defaultInitValue = defaultInitValue \* Add statements after this line. CONSTANT NUMBER_OF_SERVERS = 10 PROPERTY liveness

Tip 2: How to enforce fairness for PlusCal?

I like to write my specifications in PlusCal and let a translator translate them to TLA+. PlusCal is closer to high-level programming languages than TLA+, so it is more intuitive to write in PlusCal, especially when you have an existing code base. However, it is not straightforward to define fairness assumptions in PlusCal. For example, suppose we want to model a faulty network that may non-deterministically drop messages. To model this behavior, we can define a variable named isNetworkHealthy and non-deterministically set it to true or false in each round of network communication using either/or

either
    isNetworkHealthy := TRUE;
or
    isNetworkHealthy := FALSE;
end either;

Then, when we want to send a message, we only send it if isNetworkHealthy is True, and ignore the message otherwise. 

send_message:
    if isNetworkHealthy == TRUE then
        \* send
    else
        \* drop
    end if;

So far so good, except we typically don't want the model checker to consider the case where the network is never healthy. I couldn't find a way to enforce this assumption in PlusCal. However, enforcing it directly in TLA+ is easy. All we have to do is to enforce this simple temporal condition []<>isNetworkHealthy which says: always eventually network is healthy. To do that when writing primarily in PlusCal, you can do this: after the TLA+ translation, define your FairSpec as follows:

\* END TRANSLATION
FairSpec ==
    /\ Spec
    /\ []<>(isNetworkHealthy)
Then, in the config file, instead of Spec, specify FairSpec as the specification for the model checker. 

SPECIFICATION FairSpec CONSTANT defaultInitValue = defaultInitValue \* Add statements after this line. CONSTANT NUMBER_OF_SERVERS = 10 PROPERTY liveness

P

In this section, I share some tips on using P. 

Tip 3: How to model network behavior in presence of faults?

You can model network failure by simply not sending a message, but what about modeling network communication when we have process failure? Consider the common RESTful request-response network communication model. Suppose the requester process crashes and restarts after sending a request. In reality, the TCP network connection is gone after the process crashes. Thus, even if the responder sends the response, it will be dropped by the requester machine. However, if you are not careful, you may model your system in a way the requester still receives and process the response. 

To avoid this behavior, you can assign an integer as an id to your P machine and increment it after each restart. You should also include this id in each request and make the responder machine to include it in its response as well. Now, in the requester machine, you check the id of each response and drop the message if you see the id of the message does not match the current id of the machine. 

Tip 4: Use schedule files

Whenever the model checker finds a bug, it produces a .txt file that includes the log messages and a .schedule file that includes the schedule that lead to the bug. That schedule file is very valuable! and as said earlier, it is one of the great advantages of using model-based testing over conventional testing. The first thing that you should do after finding and fixing a bug is to re-run the model checking using the schedule file, as shown below. This way, you make sure you have fixed the bug. Note that if you simply repeat the model checking, without specifying the schedule, there is no guarantee that the model checker considers the path that lead to the bug, so you may mistakenly think that you have fixed the bug while you have not. 

pmc mydll.dll -m PImplementation.Test0.Execut --schedule my_schdeule.schdule

Tip 5: Be careful about choose statements

Note that unlike with or either/or statements in TLA+,  P model checker does NOT branch on choose statements. A choose statement randomly picks a number and the random selection changes every time. Thus, if you have a choose statement, then replaying a schedule is not guaranteed to be identical to the previous run. Be mindful about this and make sure that resolving your bug is not due to random nature of choose statements. 

Tip 6: max_steps vs. liveness_temperature_threshold

The current P model checker is a bounded model checker, i.e., it explores only part of the state space. Thus, it may not be able to tell you that you have a liveness bug. Instead, P turns liveness properties to safety properties using the notion of temperature. If you stay in a hot state that models an undesired state, for a longer than a threshold, the model checker notifies you of a potential liveness bug. 

The max_steps parameter tells the model-checker how deep it can go. The larger the max_steps, the deeper the model checker goes, and of course the longer model checking time. The liveness_temperature_thresholdparameter, on the other hand, specifies the threshold for the model checker to report a liveness bug. The larger the liveness_temperature_thresholdparameter, the more time for the system to satisfy its liveness. 

Unfortunately, there is no exact formula to specify these parameters.  In an ideal world, you would want infinity for max_steps, because you would want to go as deep as possible and gain more confidence, but larger max_steps means longer model checking time. The way P treats liveness properties is not far from reality in my view, as we almost never mean pure liveness when we talk about liveness in practice; we always have some timing expectation. For example, when we say this database is "eventually consistent", we still expect the replicas to converge in a reasonable time, so it is not really "eventual" in its temporal logic sense.

Tip 7: Model checking got stuck? Make sure you don't have an infinite loop

When model checking a single iteration seems to be taking forever when you have not set a large max_steps, chances are you have an infinite loop in your P code. Maybe you forgot to increase the counter of a while loop. To avoid that, I usually first write my while loop block with incrementing the counter, then start writing its body. This accidental infinite loop could have been avoided if we had for loops in P.

Tip 8: Organize your tests and run them with a script

Follow the structure used here for your tests, i.e., have TestDriver and TestScript. I also usually use a script like this to run my tests. Note that Test1, Test2, and Test3 are the names of tests I define in my test script using the test keyword. Using this script, tests will be executed automatically, and results will be stored in my bugs folder. 

Tip 9: Create views and use grep

An output txt file created by the P model checker includes log messages from all processes. Thus, it is not efficient to look at the output file the way you might check a log file in a real system. To solve this issue, you can create views from different perspectives and investigate bug traces more efficiently using them. Views for example can be used to show how different actors in the distributed system perceive the system. To create a view, simply add a signature text to your log messages that must be included in the view. For example, start your message with something like "leader_view:". Now, once you have a bug trace, check the output .txt file with grep as shown below. This way, you can have a nice output of how the leader's view evolves during the execution. 

> grep leader_view myProtocol_0_0.txt

<PrintLog> -------- leader_view : round=1 -------- <PrintLog> leader_view: server=0, state=A
<PrintLog> leader_view: server=1, state=B
<PrintLog> leader_view: server=2, state=B
<PrintLog> leader_view: server=3, state=A

<PrintLog> -------- leader_view : round=2 -------- <PrintLog> leader_view: server=0, state=A
<PrintLog> leader_view: server=1, state=B
<PrintLog> leader_view: server=2, state=B
<PrintLog> leader_view: server=3, state=B

<PrintLog> -------- leader_view : round=3 -------- <PrintLog> leader_view: server=0, state=B
<PrintLog> leader_view: server=1, state=C
<PrintLog> leader_view: server=2, state=B
<PrintLog> leader_view: server=3, state=C 

Tip 10: Use JSON log messages

Another useful technique for being more productive in reading output files is using JSON log messages. You can print JSON messages with the format function. Although the following example is very simple, I usually use JSON messages only for large messages with many fields. 

format("{{\"issue\":\"leader state.\", \"state\":\"{0}\"}}", state);
The nice thing about JSON is that you can prettify the log message and read it more easily. You can use IntelliJ for example: just paste your JSON log message to a .json file and press cmd + opt + L. Now, you can read the message easily, or close/open blocks to focus on part of the message you are interested in. You can also use jq command to prettify JSON messages in a terminal. 

Comments

Popular posts from this blog

In-memory vs. On-disk Databases

ByteGraph: A Graph Database for TikTok

Amazon DynamoDB: ACID Transactions using Timestamp Ordering

DynamoDB, Ten Years Later