Sunday, November 8, 2015

Smart package structure to improve testability

There are many ways of dividing whole application into packages. Discussions about pros and cons of packaging by feature or by layer can we found on many programming blogs and forums. I want to discuss this topic starting from testability and see if it will drive to any meaningful result.

At the beginning let's try to describe what we usually want to test in our applications across different layers. Let's assume standard three tier architecture. At the bottom we have data layer.

Depending on our attitude to Domain-Driven-Design we'll try to maximize (for rich, business-oriented entities) or minimize (for anemic entities built only from getters and setters) test coverage. In the second approach it's even hard to say about any tests, unless you don't trust Java and want to verify if get can retrieve value assigned before by set invocation. For rich entities we definitely want to verify business logic correctness. But to be honest it almost always can be done by simple unit tests with proper mock setup. There are often thousands of tests in this layer, so we want them to be maximally fast. That's a great field for unit testing frameworks! Wait? Why don't you want to test entities with database? I can ask opposite question - why should I do? To verify if JPA or any other persistence API are still working? Of course there are always some really complex queries that should be verified with real database underneath. For those cases I'll use integration tests on repository level. Just database + repository + entities. But remember about single responsibility. Your integrational tests is checking only query - leave whole entity logic for unit tests.

Next layer is usually built from services. In DDD services are just working with repositories to load entities and delegate them whole business logic processing. As you can predict those tests will be pretty simple. Do you think we need database here? Will it provide any added value? Don't think so. And what about second scenario? Anemic entities in our model? Whole logic is concentrated in services so we have to accumulate our test coverage in this layer. But as we already discussed with domain logic it can we done without using external resources. One more time - all we need is a unit test. So still no database. We can run all tests based on repositories mocks. No problems with managing datasets leading to "expected 3 but found 2" tests failures. Just because some other test committed one more order with value between 200$ and 300$. Even if we want to use IoC framework here, it can simulate repository layer with mocks. Without proper decoupling from data layer framework would automatically load repositories via some scanning mechanism. And it's not something we want.

On top of the services we usually place something allowing users to use our application. It can we fronted, RESTful API, SOAP services, etc. What is important to check here? To be fair with our customers we should stick to the contract we have with them. This whole can be material for separate blog post, but narrowing down to REST services:
 "If you will send we POST request to /users URL I'll answer with list of all users. Every user will have id as an integer and string with username."
OK - that looks as a contract. So what should we check in this layer? Of course if this contract is valid. Send HTTP request and verify if response contains array of users, from which every entry is build from integer ID and string username. Can we do it on top of the services mocks? Sure :)

So to encapsulate everything:

  • data layer = unit tests for logic and integration tests with DB for complex query validation
  • service layer = unit tests for logic and light integration tests without DB for testing IoC framework dependent logic
  • front layer = integration tests without DB to verify customer contract

So far we've described in details what is worth to test on different levels. Now let's move to feature based packaging. It definitely helps to keep the code well organized when it's build around different business contexts. For large applications it's something that allows you to cut it down into many modules or even many applications. Without such feature layout such actions will require huge refactorings before. But is it still needed after splitting our monolith into applications? Just think about starting new application. What will be its base package? com.my.company.application? It's nothing else than a feature packaging :) But would you stop on this base package, or still will you split in into layers? As you see those two structures can live together.

For layer based structure our application will look like below:
com.company.application
                      \.data
                           \.config
                           \.model
                           \.repository
                      \.service
                           \.config
                      \.api
                           \.config
                           \.controller

For feature based we'll get something like
com.company.application
                      \.order
                      \.client
                      \.invoice

But usually as business logic always grows, it leads to splitting whole application into modules or services, so finally we get:
com.company.application.order
                            \.data
                            \.service
                            \.api

com.company.application.client
                             \.data
                             \.service
                             \.api

com.company.application.invoice
                              \.data
                              \.service
                              \.api

To sum up. In my opinion packaging by layer is a must. It allows us to test each layer separately and keep our tests well organized. Package by feature is really useful in bigger projects. For microservices which are built around single bounded context more detailed division can lead to uncomfortable navigation. However code inside feature package should be still broken on layers for the same reason as mentioned above. Especially with Spring Framework layer-based structure helps us with setting useful component-scan, and won't drive us to setup a database just because we want to start context with two services. In my GitHub repository https://github.com/jkubrynski/spring-package-structure you can find sample project based on Spring.

Wednesday, November 4, 2015

JPA in case of asynchronous processing

Few years ago in Java world it was almost obvious that every "enterprise" class project needed JPA to communicate with database. JPA is a perfect example of "leaky abstraction" described by Joel Spolsky. Great and easy at the beginning but hard to tune and limiting at the end. Hacking and working directly with caches, flushes and native queries is a daily routine for many backend developers involved in data access layer. There are enough problems and workarounds to write a dedicated book "JPA for hackers", but in this article I'll focus only on concurrent entity processing.

Let's assume the situation: we have Person entity which in some business process is updated by some service.

@Entity
public class Person {

    @Id
    @GeneratedValue
    private Long id;

    private String uuid = UUID.randomUUID().toString();

    private String firstName;

    private String lastName;

    // getters and setters

}

To ignore any domain complexity we're talking about updating first and last name of the person. Of course it's just a trivial use case but it allows us to focus on the real issues instead of discussing domain modeling. We can imagine code looks like below:

firstNameUpdater.update(personUuid, "Jerry");
lastNameUpdater.update(personUuid, "Newman");

After some time business decided it's taking too long to update both elements, so reducing duration becomes top priority task. Of course there are a lot of different ways of doing it, but let's assume that it this particular case going concurrent will solve our pain. This seems to be trivially easy - just need to annotate our service methods with @Async from Spring and voilĂ  - problem solved. Really? We've two possible issues here depending on use of optimistic locking mechanism.

  • With optimistic locking it's almost sure that we'll get OptimisticLockException from one of the update methods - the one which will finish second. And that's better situation compared to not using optimistic locking at all. 
  • Without versioning all update will finish without any exceptions but after loading updated entity from database we'll discover only one change. Why it happened? Both methods were updating different fields! Why the second transaction has overwritten other update? Because of the leaky abstraction :)

We know that Hibernate is tracking changes (it's called dirty checking) made on our entities. But to reduce time needed to compile the query, by default it's including in update query all fields instead of only those changed. Looks strange? Fortunately we can configure Hibernate to work in a different way and generate update queries based on actually changed values. It can be enabled with @DynamicUpdate annotation. This can be considered as a workaround for partial-updates problem, but you have to remember it's a trade-off. Now every update of this entity is more time-consuming than it was before.

Now let's get back to the situation with optimistic locking. To be honest - what we want to do is generally in opposite with the idea of such locking, which assumes that there probably won't be any concurrent modification of the entity and when such situation occurs it raises an exception. Now we definitely want concurrent modification! As an express workaround we can exclude those two fields (firstName and lastName) from locking mechanism. It can be achieved with @OptimisticLock(excluded = true) added on each field. Now updating names won't trigger version increment - it'll stay unmodified, which of course can be a source of many nasty and hard to find consistency issues.
Last but not least solution is a a spin change. To use it we have to wrap update logic with loop, which renew while transaction when OptimisticLock occurs. That works the better the less threads are involved in the process. Source code with all those solutions can be found on my GitHub in jpa-async-examples repository. Just explore commits.

Wait - still no proper solution? In fact no. Just due to use of JPA we're closed to easy solutions of concurrent modification problem. Of course we can remodel our application to introduce some event based approaches, but still we've JPA above. If we use Domain Driven Design we try to close whole aggregate by using OPTIMISTIC_FORCE_INCREMENT locking, just to be sure that changing composite entity, or adding element to collection will update whole aggregate, as it should protect invariants. So why not to use any direct access tool like for example JOOQ or JdbcTemplate? The idea is great, but unfortunately won't work concurrently with JPA. Any modification done by JOOQ won't propagate to JPA automatically, which means session or caches can contain outdated values.

To solve this situation properly, we should extract this context into separate element - for example new table, which would be handled directly with JOOQ. As you probably noticed doing such concurrent update in SQL is extremely easy:

update person set first_name = "Jerry" where uuid = ?;

With JPA abstraction it becomes really complex task which requires really deep understanding of Hibernate behavior as well as implementation internals. To sum up, in my opinion JPA is not following the "reactive" approach. It was built to solve some problems, but currently we force different problems, and in many applications persistence is not one of them.