Overview
This research paper has the aim to provide a deep understanding of transaction management in general but with reference to Reloadly microservices. This paper is open to correction and modification from any internal engineer who finds flaws or sees an opportunity to improve its current content.
Goals
- Understanding the concept of transaction management
- Understanding how Reloadly makes use of Transaction management capabilities offered by Spring
- Creating a ground for other engineers to bring more ideas based on their experience and expertise
Special thanks
I owe special thanks to Arun for being such an amazing mentor and leader and Emmanuel for the endless coaching sessions and investing in Reloadly’s engineers skills. I present my appreciation to all the engineering team at Reloadly and especially the Apollo Team that is doing an amazing job and making collaboration and communication look easy to maintain.
Transaction Management in General
In the world of Databases, when we talk about a transaction we refer to a group of actions (normally a sequence of actions) that are considered as a single unit of work.
This means that these actions should all be completed and persisted to the DB or they should not take any effect at all. In simpler terms we are trying to say we should never find ourselves in a situation where some actions were saved to the DB while others failed and got lost. To better understand transaction management we should learn the ACID properties of a transaction. ACID is an abbreviation that represents the properties that a transaction must respect.
Detailed explanation of ACID properties
- Atomicity: As mentioned above a transaction should be considered as a single unit of work. This means that either the entire sequence of operations succeeds or it all fails. No partial success or partial failures.
One example: considering we have application and in our service layer we have this sequence of operations during user registration:
- We create the user data in the users table
- Then we create a balance for the user in the balances table
- After that we create an initial record in the balance_history table
- Then we create some preferences in the preferences table
- finally we create API credentials in the api_credentials. etc
What Atomicity means is that we should never allow a situation where we have inserted data in the balance table while the creation of balance_history failed and vice versa. We should never find ourselves in a situation where we have created data in the api_credentials table while the creation of the user failed. All these operations have to succeed. if even one operation fails, then all of them have to fail and the DB state is rolled back.
Another example: in an application that transfers funds from one account to another, the atomicity property ensures that, if a debit is made successfully from one account, the corresponding credit is made to the other account. We should not let the possibility of the debit operation being successful but the credit operation fail.
Atomicity is also known as the ‘All or nothing rule’.
in the atomicity property there are two possible outcomes:
—Abort: If a transaction aborts, changes made to the database are not visible. We roll back the data to its previous state.
—Commit: If a transaction commits, changes made are visible.
- Consistency: in transaction management, we want to have values correctness maintained before and after the transaction. We need to ensure our data is always in the format we expect and we avoid having surprises. Consistency emphasizes on never allowing constraints violations. For example in a banking system, we should never allow the balance value to be negative. Another example is if a column should only contain numbers, we should never find some places where it is storing different formats like images or names.
- Isolation: In transaction management, each transaction should be independent of other transactions happening at the same time. Isolation is all about concurrency of multiple transactions. Though different transactions can run concurrently, the Database operations should still be executed in a way that avoids concurrency issues. One of the ways to achieve this is using locks or enforcing some isolation level strategies on the DB. Meaning if a transaction is dealing with a given value in the DB and at the same time another transaction comes to deal with the same value, the second transaction should wait for the first transaction to finish its operations in order to proceed. Some isolation violations are allowed in some cases and based on the business requirements. We are going to talk about that too.
- Durability: From all the properties of the ACID acronym, durability is the simplest to understand. It just means that after a transaction successfully completes, changes to data persist and can never be lost, even in the event of a system failure.For example, in an application that transfers funds from one account to another, the durability property ensures that the changes made to each account will not be reversed.
Among all the attributes of a transaction, Isolation is the most troublesome. Of course because where there is concurrency there is always fire to fight.
As we mentioned above, isolation ensures that multiple transactions can run concurrently without affecting each other or causing others to have incorrect data values. We can achieve that by specifying the isolation levels. To understand isolation levels we need to learn what are the common anomalies we find the Databases when we are dealing with concurrent transactions:
- Dirty Read: Let’s take an example: we have two transactions T1 and T2 dealing with the price of a product which is initially equal to 100$. T1 reads the price as 100, does some addition to it of 50. Therefore the price becomes 150 but the transaction is not yet committed. T2 comes and reads the price which is 150(though it is not yet committed it is available to other transactions). Right after T2 has read the price, T1 tries to commit the changes but due to some reasons the commit fails and the price rolls back to 100. This means that T2 is now dealing with an amount that is no longer correct. We call that ‘dirty reads’.
- Non Repeated read: Let’s take another example: T1 reads the balance from the table and it returns 40, after that T2 comes and updates the amount to 60. After doing some business logic, T1 tries to read again the amount and find it is now 60 yet it was 40 at the beginning of the transaction. T2 is confused because it doesn’t know what made the value change and who made it change. This scenario is called Non repeated read.
- Phantom read: Let’s take an example T1 does a “select * from users where country = ‘US’ ”. it receives 2 records as the result of the query. After that T2 inserts another user with country = ‘US’. Later on T1 tries to read again and find the results now contain 3 records. There is one phantom record.
These anomalies cause data inconsistency and unreliability. To resolve all these issues, we need to have Isolation levels. we have different isolation levels:
- Read Uncommitted:
This is the lowest isolation level. What it does is that it allows one transaction to read the intermediate (uncommitted) data being processed by another transaction. Meaning here we can still get the dirty read problem, we can still get the non repeated read problem, we can still get the phantom read problem. But why is it among the isolation levels then? It is just used to explicitly show that we allow our concurrent transactions to affect each other. We use it to make it clear that we understand that the Database being used supports isolation level but we have decided to allow the possibility of concurrency problems for some reasons.
- Read Committed:
As the name suggested, this isolation level says we can only read data that is already committed. Meaning a transaction cannot read temporary data being processed by another transaction.
Let’s take an example, T1 fetches the price of product A which is equal to 100. it starts processing it and doing some addition to it the value becomes 120. T2 comes and reads the value of the price of the same product. As long as T1 has not yet committed his changes T2 will not read the temporary changes. So it will read the last committed data.
This isolation level allows us to avoid the problem of dirty reading. But we will still have the problems of non repeatable reads because if T2 query against the data and T1 has already committed his changes T2 will not be able to reproduce the same results he got in the previous read. I will have been modified. That means we will still have the problem of unrepeatable reading. We will also still have the problem of Phantom read. An example to illustrate this: if T1 is executing its logic and does for example: “select * from users where age >20” it will get only data that is committed. another transaction can be running concurrently and insert another user with age >20. if T1 executes again the same query there will be a new committed record thus a phantom read.
- Repeatable Reads:
This isolation level ensures that within a transaction we can query the data as many times as we want and we will still get the same results. This is possible because when we do the query, the rows returned are locked to that specific transaction. No other transaction can get or modify any data contained in those rows.
An example: if T1 does ”Select * from users where city=’paris’“ and there are 15 rows that meet that criteria, all those 15 rows will be locked. No other transaction can access them while T1 is still on them. So there can’t be modification nor reading nor deletion of any data contained in those rows. So here we avoid dirty reads and non repeatable reads because. But another transaction can still add a new record to the table which means there can still be a phantom read.
So, this helps us solve the problem of dirty read and non repeated reads but there is still the problem of phantom read since another transaction can add a new row to the table then we repeat the query we get one more record that we were not expecting. This isolation level is the default in Mysql
- Serializable:
This isolation level is the highest. It eliminates all the concurrency problems. When a transaction access is going on the whole table is locked and no other transaction can be done until the previous transaction finishes. Though this solves all our problems it is not recommended to use it because of performance issues. locking the entire table every time there is a new transaction will make the database very slow and therefore this isolation level should be avoided.
Example: if T1 is fetching data from table users, no other transaction can do anything to the entire table. Meaning the table is locked until T1 finishes its operation. Clearly here we get ready for all the concurrency problems but we end up with a very slow DB.
Transaction management in Spring
Spring provides extensive support for transaction management and helps developers to focus more on business logic rather than worrying about the integrity of data incase of any system failures. If not using Spring developers will have to write boilerplate codes to begin a transaction, execute the transaction, commit or rollback the transaction, dealing with the isolation levels, dealing with transaction propagation levels. It gets worse when the DB has to scale and we have to decide when to route some transactions to the read replicas(or slave DB) and when to route to the master DB.
Spring provides two ways to manage transactions:
- Programmatic Transaction Management
- Declarative Transaction Management
Programmatic Transaction management
With this approach, Spring allows developers to manage transactions by using some existing classes and methods provided. Meaning we manage transactions by still writing some codes on top of the abstraction offered by Spring. This is better than doing it on our own but the problem here is that the developer still has to write codes. What if a single annotation could be used on a method and only by doing that Spring understands that it should handle behind the scene everything related to transaction on that piece of code!!! That’s where declarative transaction management comes into the picture.
UserNoteTransaction userNoteTransaction = entityManager.getTransaction()
try {
//Begin Transaction
userNoteTransaction.begin();
/* register user - query 1
create note
link note to user - query 2 */
//Commit Transaction
userNoteTransaction.commit();
} catch(Exception exception) {
//Rollback Transaction
userNoteTransaction.rollback();
throw exception;
} ;
Declarative Transaction Management
In declarative transaction management, we usually use @Transactional annotation. Just by using that annotation Spring does all the magic and we will not even be aware of what has happened. We will just see our transaction has been handled correctly by Spring. The @Transactional annotation is convenient because we as developers no longer need to think about how to start a transaction, how to execute it and when to do rollback or commit.
@Transactional
public void addNoteToSpecificUser() {
}
All of this is done automatically in a proxy class that Spring creates to hold the transaction management code. So yes, all those boilerplate codes are there but we just don’t see it. At run time, when the method annotated with @Transactional is called Spring, create a proxy class, in that proxy class it writes all the necessary codes to manage the transaction then it calls the actual class. to the logic and codes related to transactions is only done in the proxy class.
Proxy is a hot topic in the AOP world. There is one important thing to know about how proxies work. we need to be aware of it in order to not be surprised if we see the @Transactional annotation is not doing anything to handle the transaction:
- A proxy is created only by an external component. Not the same component(class in our case). Meaning if we have a method annotated with @Transactional in a class and we call it from another method within the same class then Spring will not be able to create a proxy class and therefore the annotation will have no effect.
The @Transactional annotation can either be used at the class, interface, or method level. When placed on the class or interface, all methods within it become transactional.
By using @Transactional, many important aspects such as isolation, transaction, propagation are handled automatically.
As we have already discussed transaction isolation levels, let’s now discuss transaction propagation levels.
There are 5 propagations level in transaction management:
- REQUIRED
- SUPPORTS
- NOT_SUPPORTED
- REQUIRES_NEW
- NEVER
- MANDATORY
But before we define them, it is important to understand what transaction propagation means in the first place.
Transaction propagation is the journey and behavior of a transaction from one class to another. This means if Class A has a method that creates a transaction and that method at some point has to call another transactional method in class B how should be the behavior of the method class B in regard to the transaction started by class A. And if there was no transaction started by class A what would class B do about that?
- REQUIRED: Let’s use the example mentioned above. If method in class B is having the propagation level as REQUIRED this means it will start checking if the calling method(in class A) has an active transaction. If yes it uses it. If not then it creates a new transaction. A transaction must be there. This is the default. Meaning if you don’t specify any this is what the annotation will understand you want.
- SUPPORT: Here the method in class B says if it can support a transaction. Meaning if the calling method in class has an active transaction the one in class B uses it. Otherwise it continues without a transaction.
- NOT_SUPPORTED: Here the method in class B says it does not support transactions. If there is an active transaction it is paused and the process continues without a transaction.
- REQUIRES_NEW: When the method is called a new transaction has to be created. If there was already an existing one it is terminated and a new one is created.
- NEVER: the method being called says a transaction should never be present. If there is one it throws an Exception.
- MANDATORY: The method being called says there must be an existing transaction when it is called. If there is not it throws an exception.
The @Transactional annotation does not only support the isolation and propagation parameters. It can also support timeout and read only parameters (at least the annotation provided by Spring itself).
timeout: This indicates the number of seconds within which a transaction must complete. if it delays, a TransactionTimeOutException is thrown.
readonly: by default this is set to false. When we set it to true, we are telling Spring that all the sql statements produced in our transaction will be only for reading data. This is very important in cases where we have a DB that has scaled to have read replicas. Spring will use this flag to efficiently route the requests to the read replicas.
So when using this annotation, Spring will commit our operations or rollback in case there is an Exception. But not just any type of Exception. Spring will only rollback where there is a RuntimeException(unchecked Exception). Meaning the unchecked exceptions will not make Spring rollback the changes. But we can customize this behavior with two more parameters passed to the annotation:
rollbackFor: for the value we pass the Exception we want to add. Here we can even add a checked Exception.
noRollbackFor: for the value we pass the RuntimeException that the method should not roll back for.
Spring @Transactional annotation vs JTA @Transactional annotation
Note: In most SpringBoot applications you will find two @Transactional annotations. One is provided by Spring the other is provided by JTA(Javax Transactional Annotations). They are in different packages. So when using the annotation make sure you know which one you have imported. I recommend using the one provided by Spring as it supports all the parameters here. The one provided by javax does not support timeout and readonly parameters but also the naming of its parameters are different. I don’t mean Spring is better. In some cases JTA is better because it is not tied to a specific framework. Meaning if you are using the @Transactional annotation from JTA you can later on change the framework from Spring to let’s say Struts and your transactions will still work. But how likely are projects to move from Spring to any other framework??? Regardless I will give a full overview of the difference between the two options.
Spring’s Transactional annotation comes with additional configuration compared to its JTA counterpart:
- Isolation – Spring offers transaction-scoped isolation through the isolation property; however, in JTA, this feature is available only at a connection level
- Propagation – available in both libraries, through the propagation property in Spring, and the value property in Java EE; Spring offers Nested as an additional propagation type
- Read-Only – available only in Spring through the readOnly property
- Timeout – available only in Spring through the timeout property
- Rollback – both annotations offer rollback management; JTA provides the rollbackOn and dontRollbackOn properties, while Spring has rollbackFor and noRollbackFor, plus two additional properties: rollbackForClassName and noRollbackForClassName
Spring @Transactional Annotation Configuration
As an example, let’s use and configure the Spring Transactional annotation on a simple car service:
import org.springframework.transaction.annotation.Transactional;
@Service
@Transactional(
isolation = Isolation.READ_COMMITTED,
propagation = Propagation.SUPPORTS,
readOnly = false,
timeout = 30)
public class CarService {
@Autowired
private CarRepository carRepository;
@Transactional(
rollbackFor = IllegalArgumentException.class,
noRollbackFor = EntityExistsException.class,
rollbackForClassName = "IllegalArgumentException",
noRollbackForClassName = "EntityExistsException")
public Car save(Car car) {
return carRepository.save(car);
}
}
JTA @Transactional Annotation Configuration
Let’s do the same for a simple rental service using the JTA Transactional annotation:
import javax.transaction.Transactional;
@Service
@Transactional(Transactional.TxType.SUPPORTS)
public class RentalService {
@Autowired
private CarRepository carRepository;
@Transactional(
rollbackOn = IllegalArgumentException.class,
dontRollbackOn = EntityExistsException.class)
public Car rent(Car car) {
return carRepository.save(car);
}
}
As mentioned earlier, JTA Transactional annotation applies to CDI-managed beans and classes defined as managed beans by the Java EE specification, whereas Spring’s Transactional annotation applies only to Spring beans.
It’s also worth noting that support for JTA 1.2 was introduced in Spring Framework 4.0. Thus, we can use the JTA Transactional annotation in Spring applications. However, the other way around is not possible since we can’t use Spring annotations outside the Spring context.