Wednesday, November 4, 2015

JPA in case of asynchronous processing

Few years ago in Java world it was almost obvious that every "enterprise" class project needed JPA to communicate with database. JPA is a perfect example of "leaky abstraction" described by Joel Spolsky. Great and easy at the beginning but hard to tune and limiting at the end. Hacking and working directly with caches, flushes and native queries is a daily routine for many backend developers involved in data access layer. There are enough problems and workarounds to write a dedicated book "JPA for hackers", but in this article I'll focus only on concurrent entity processing.

Let's assume the situation: we have Person entity which in some business process is updated by some service.

public class Person {

    private Long id;

    private String uuid = UUID.randomUUID().toString();

    private String firstName;

    private String lastName;

    // getters and setters


To ignore any domain complexity we're talking about updating first and last name of the person. Of course it's just a trivial use case but it allows us to focus on the real issues instead of discussing domain modeling. We can imagine code looks like below:

firstNameUpdater.update(personUuid, "Jerry");
lastNameUpdater.update(personUuid, "Newman");

After some time business decided it's taking too long to update both elements, so reducing duration becomes top priority task. Of course there are a lot of different ways of doing it, but let's assume that it this particular case going concurrent will solve our pain. This seems to be trivially easy - just need to annotate our service methods with @Async from Spring and voilĂ  - problem solved. Really? We've two possible issues here depending on use of optimistic locking mechanism.

  • With optimistic locking it's almost sure that we'll get OptimisticLockException from one of the update methods - the one which will finish second. And that's better situation compared to not using optimistic locking at all. 
  • Without versioning all update will finish without any exceptions but after loading updated entity from database we'll discover only one change. Why it happened? Both methods were updating different fields! Why the second transaction has overwritten other update? Because of the leaky abstraction :)

We know that Hibernate is tracking changes (it's called dirty checking) made on our entities. But to reduce time needed to compile the query, by default it's including in update query all fields instead of only those changed. Looks strange? Fortunately we can configure Hibernate to work in a different way and generate update queries based on actually changed values. It can be enabled with @DynamicUpdate annotation. This can be considered as a workaround for partial-updates problem, but you have to remember it's a trade-off. Now every update of this entity is more time-consuming than it was before.

Now let's get back to the situation with optimistic locking. To be honest - what we want to do is generally in opposite with the idea of such locking, which assumes that there probably won't be any concurrent modification of the entity and when such situation occurs it raises an exception. Now we definitely want concurrent modification! As an express workaround we can exclude those two fields (firstName and lastName) from locking mechanism. It can be achieved with @OptimisticLock(excluded = true) added on each field. Now updating names won't trigger version increment - it'll stay unmodified, which of course can be a source of many nasty and hard to find consistency issues.
Last but not least solution is a a spin change. To use it we have to wrap update logic with loop, which renew while transaction when OptimisticLock occurs. That works the better the less threads are involved in the process. Source code with all those solutions can be found on my GitHub in jpa-async-examples repository. Just explore commits.

Wait - still no proper solution? In fact no. Just due to use of JPA we're closed to easy solutions of concurrent modification problem. Of course we can remodel our application to introduce some event based approaches, but still we've JPA above. If we use Domain Driven Design we try to close whole aggregate by using OPTIMISTIC_FORCE_INCREMENT locking, just to be sure that changing composite entity, or adding element to collection will update whole aggregate, as it should protect invariants. So why not to use any direct access tool like for example JOOQ or JdbcTemplate? The idea is great, but unfortunately won't work concurrently with JPA. Any modification done by JOOQ won't propagate to JPA automatically, which means session or caches can contain outdated values.

To solve this situation properly, we should extract this context into separate element - for example new table, which would be handled directly with JOOQ. As you probably noticed doing such concurrent update in SQL is extremely easy:

update person set first_name = "Jerry" where uuid = ?;

With JPA abstraction it becomes really complex task which requires really deep understanding of Hibernate behavior as well as implementation internals. To sum up, in my opinion JPA is not following the "reactive" approach. It was built to solve some problems, but currently we force different problems, and in many applications persistence is not one of them.

1 comment:

Ronald said...

Treat the entity as a document rather than scalar fields and always have a service wrapping the call to the jpa layer which is able understand context of the update.