Handling Concurrency in Online Services

Handling Concurrency in Online Services – Part 1 - (Transaction Isolation Levels)

Consider an online book store powered by a single database instance. The store is not too popular yet and thus, a single DB instance suffices for its needs.

Figure 1.0

Now, imagine a scenario where multiple customers intend to buy a certain book with only one copy left in the inventory. How would you implement this scenario? How do we ensure that there are no race conditions, request conflicts or deadlocks in the system?

We can use the keyword synchronized on the buyBook() method/function in our code.

public synchronized void buyBook() { // book purchase code }

The synchronized keyword will ensure that only a single request thread (spawned by one specific buyer online) will access the buyBook() method at one point in time and until it completes its transaction, other threads (spawned by other concurrent buyers online) would have to wait. Once the buyer has completed their transaction (i.e., the book purchase), other transactions can proceed.

Figure 1.1

This should handle our concurrency scenario well. But there is a slight hitch.

The synchronized keyword will obtain the lock on the method only for a single JVM instance. If the application is running on multiple application servers with a load balancer, the synchronized keyword will fail. In this scenario, we need a distributed lock across JVMs.

A better solution is to use the database transaction isolation levels to deal with this scenario instead of implementing the synchronized keyword in our code. Let the database handle the concurrent access to a resource by multiple threads.

Transaction Isolation Levels

We are aware of ACID (Atomicity, Consistency, Isolation, and Durability) in the context of transaction processing, primarily in relational databases.

An ACID transaction means if a transaction, say a financial transaction, occurs in a system, it will be executed with perfection without affecting any other processes or transactions.

After the transaction is complete, the system will have a new state that is durable and consistent. In case anything amiss happens during the transaction, say a minor system failure, the entire operation is rolled back to its former state.

Database transaction isolation levels are the I (Isolation) in the ACID. The database ensures that all the transactions are kept isolated from each other to maintain the system's consistency.

Every ACID-compliant database offers different transaction isolation levels (with their respective implementation) that the developers can leverage to ensure their application behavior is as expected.

Let's have a look at different transaction isolation levels:

Serializable
Repeatable Reads
Read Committed
Read Uncommitted

Serializable

This is the most strict isolation level that the databases offer. It ensures that the transactions happening on the system are serializable.

What does this mean?

This means that the database ensures that the concurrent transactions occur in the system as if they are occurring serially, one after the other.

It is important to remember that the degree of strictness in the isolation of the transactions comes with a cost—degraded performance. Since databases lock the resource for the transactions to happen serially, this takes a hit on the throughput. Only a single transaction can access a resource at a time.

This ensures there are no dirty reads, lost updates, non-repeatable reads and phantoms.

What are these?

These are the scenarios that can leave our database in an inconsistent state. Let's understand these, one by one.

Dirty Reads

Imagine two transactions, T1 and T2, working on the same row of a database table. T1 updates the row but does not commit it (the transaction is still in process). At the same time, transaction T2 reads the updated data and sends it to the UI.

Now, T1 either further updates the row values or rolls back its changes. In either case, T2 has read and delivered inconsistent/incorrect row values to the UI.

This phenomenon is known as dirty reads.

A real-world scenario of dirty reads would be a salesperson of our bookstore in conversation with a corporate client. He/she adds an order to the database but hasn't committed it yet as the talks are ongoing.

At the same time, the clerk pulls the (uncommitted) corporate customer's order from the table, generates a bill, and sends an email. Two major issues could happen here. If the customer cancels the order altogether, they still get a bill in their inbox. If the salesperson updates the order, the customer receives an incorrect bill.

What impression does this give of us, as a business, to the customer?

These issues could have been avoided if the transactions were SERIALIZABLE.

Lost Update

Imagine two ongoing concurrent transactions, T1 and T2, that enable buyers to purchase a book from the website, subsequently reducing the book count from the inventory. T1 reduces the book count from 10 to 8 after the purchase.

At the same time, T2 occurs and does not see the update made by T1. It reads the book count as 10 and reduces it to 6, creating an inconsistency in the system.

Ideally, the correct book count should be 4 (T1 buys 2 books, T2 buys 4).

In this scenario, the update made by transaction T1 is lost. This phenomenon is called the lost update.

Let's continue the discussion in the next lesson.

Complete and Continue