Understanding the first level JPA cache

I can bet that every Java developer at least heard about L1 (aka EntityManager or Session) cache. But is your level of understanding it good enough? If you're not sure, consider going through this post.

At first, we need to know what the persistence context is. Following EntityManager JavaDoc we know, that:
"A persistence context is a set of entity instances in which for any persistent entity identity there is a unique entity instance. Within the persistence context, the entity instances and their lifecycle are managed." 
In fact, the first level cache is the same as the persistence context. That means operations such as persist(), merge(), remove() are changing only internal collections in the context and are not synchronized to the underlying database. What is the mosts important here is what happens when you invoke the clear() method. It clears the L1 cache. But we know L1 == persistence context. Does it mean clearing L1 removes all entities? In fact yes - all entities will be dropped and never synchronized to the database. That's not a secret, it states in the documentation - "Unflushed changes made to the entity (...) will not be synchronized to the database." But who cares about the docs? :)

So how does it look in practice? Take a look at the code below:

em.persist(myEntity); // saves entity to the context
em.flush(); // triggers insert into database
em.clear(); // removes entity from the context == entity is no longer managed

If you omit flush() the entity won't hit the database. It will live only in your code, and after leaving method which created this object will be lost. Let's take a look at the next sample:

myEntity.setName("old name");
myEntity.setName("new name");

What will be the value of the name property after finishing this code? Of course still "old name", because in the moment when setName() has been invoked the entity is no longer managed and it has to be merged with the persistent context (by invoking em.merge(myEntity) method) to be the subject of the dirty checking.

But I'm not calling flush() method in my code and everything works!? But do you call clear()? That's what I thought. What is the default flush behavior? JPA by default flushes changes on commit as well as every query execution (FlushModeType.AUTO). If you change it to COMMIT (with em.setFlushMode(FlushModeType.COMMIT) method) then flush will occur (as name suggests) only on commit.

Deep understanding of the L1 behavior is especially important when dealing with batch operations. Why? If you want such operation to be effective, we must manually flush changes from time to time (let's say every 100 operations). Do you know, that flush() doesn't clear the persistence context? So what? Flush is not cheap because it must process all entities in the context to see if there is anything to synchronize with the database. If you won't clear the context manually immediately after flush(), every next operation will take longer and longer. To be honest this time grows exponentially, which seems to be sufficient reason to remember about mentioned technique.

If you're interested in deeper dive into persistence context please feel free to clone and play with this repository, which contains all described cases and examples.


Popular posts from this blog

Smart package structure to improve testability

Understanding Spring Web Initialization

Injecting Spring beans into non-managed objects