Saturday, December 3, 2016

Why should you care about equals and hashcode

Equals and hash code are fundamental elements of every Java object. Their correctness and performance are crucial for your applications. However often we see how even experienced programmers are ignoring this part of class development. In this post, I will go through some common mistakes and issues related to those two very basic methods.

Contract

What is crucial about mentioned methods is something called "contract." There are three rules about hashCode and five about equals (you can find them in the Java doc for Object class), but we'll talk about three essential. Let's start from hashCode():

"Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified."

That means the hash code of an object doesn't have to be immutable. So let's take a look at the code of really simple Java object:

public class Customer {

 private UUID id;
 private String email;

 public UUID getId() {
  return id;
 }

 public void setId(final UUID id) {
  this.id = id;
 }

 public String getEmail() {
  return email;
 }

 public void setEmail(final String email) {
  this.email = email;
 }

 @Override
 public boolean equals(final Object o) {
  if (this == o) return true;
  if (o == null || getClass() != o.getClass()) return false;
  final Customer customer = (Customer) o;
  return Objects.equals(id, customer.id) &&
    Objects.equals(email, customer.email);
 }

 @Override
 public int hashCode() {
  return Objects.hash(id, email);
 }
}

As you probably noticed equals and hashCode were generated automatically by our IDE. We are sure those methods are not immutable, and such classes definitely are widely used. Maybe if such classes are so common there is nothing wrong with such implementation? So let's take a look at simple usage example:

def "should find cart for given customer after correcting email address"() {
 given:
  Cart sampleCart = new Cart()
  Customer sampleCustomer = new Customer()
  sampleCustomer.setId(UUID.randomUUID())
  sampleCustomer.setEmail("emaill@customer.com")

  HashMap customerToCart = new HashMap<>()

 when:
  customerToCart.put(sampleCustomer, sampleCart)

 then:
  customerToCart.get(sampleCustomer) == sampleCart
 and:
  sampleCustomer.setEmail("email@customer.com")
  customerToCart.get(sampleCustomer) == sampleCart
}

In above test, we want to ensure that after changing email of a sample customer we're still able to find its cart. Unfortunately, this test fails. Why? Because HashMap stores keys in "buckets." Every bucket holds the particular range of hashes. Thanks to this idea hash maps are so fast. But what happens if we store the key in the first bucket (responsible for hashes between 1 and 10), and then the value of hashCode method returns 11 instead of 5 (because it's mutable)? Hash map tries to find the key, but it checks second bucket (holding hashes 11 to 20). And it's empty. So there is simply no cart for a given customer. That's why having immutable hash codes is so important! The simplest way to achieve it is to use immutable objects. If for some reasons it's impossible in your implementation then remember about limiting hashCode method to use only immutable elements of your objects.

Second hashCode rule tells us that if two objects are equal (according to the equals method) the hashes must be the same. That means those two methods must me related which can be achieved by basing on the same information (basically fields).

Last but not least tells us about equals transitivity. It looks trivial but it's not - at least when you even think about inheritance. Imagine we have a date object with extending the date-time object. It's easy to implement equals method for a date - when both dates are same we return true. The same for date-times. But what happens when I want to compare a date to a date-time? Is it enough they will have same day, month and year? Can wet compare hour and minutes as this information is not present on a date? If we decide to use such approach we're screwed. Please analyze below example:
 2016-11-28 == 2016-11-28 12:20
 2016-11-28 == 2016-11-28 15:52
Due to transitive nature of equals, we can say, that 2016-11-28 12:20 is equal to 2016-11-28 15:52 which is, of course, stupid. But it's right when you think about equals contract.

JPA use-case

Not let's talk about JPA. It looks like implementing equals and hashCode methods here is really simple. We have unique primary key for each entity, so implementaion based on this information is right. But when this unique ID is assigned? During object creation or just after flushing changes to the database? If you're assigning ID manually it's OK, but if you rely on the underlaying engine you can fall into a trap. Imagine such situation:

public class Customer {

 @OneToMany(cascade = CascadeType.PERSIST)
 private Set
addresses = new HashSet<>(); public void addAddress(Address newAddress) { addresses.add(newAddress); } public boolean containsAddress(Address address) { return addresses.contains(address); } }

If hashCode of the Address is based on ID, before saving Customer entity we can assume all hash codes are equal to zero (because there is simply no ID yet). After flushing the changes, the ID is being assigned, which as well results in new hash code value. Now you can invoke containsAddress method, unfortunately, it will always return false, due to the same reasons which were explained in the first section talking about HashMap. How can we protect agains such problem? As far as I know there is one valid solution - UUID.

class Address {

 @Id
 @GeneratedValue
 private Long id;
 
 private UUID uuid = UUID.randomUUID();

 // all other fields with getters and setters if you need

 @Override
 public boolean equals(final Object o) {
  if (this == o) return true;
  if (o == null || getClass() != o.getClass()) return false;
  final Address address = (Address) o;
  return Objects.equals(uuid, address.uuid);
 }

 @Override
 public int hashCode() {
  return Objects.hash(uuid);
 }
}

The uuid field (which can be UUID or simply String) is assigned during object creation and stays immutable during the whole entity lifecycle. It's stored in the database and loaded to the field just after querying for this object. It or course adds some overhead and footprint but there is nothing for free. If you want to know more about UUID approach you can check two briliant posts talking about that:

Biased locking

For over ten years the default locking implementation in Java uses something called "biased locking." Brief information about this technique can be found in the flag comment (source: Java Tuning White Paper):

-XX:+UseBiasedLocking 
Enables a technique for improving the performance of uncontended synchronization. An object is "biased" toward the thread which first acquires its monitor via a monitorenter bytecode or synchronized method invocation; subsequent monitor-related operations performed by that thread are relatively much faster on multiprocessor machines. Some applications with significant amounts of uncontended synchronization may attain significant speedups with this flag enabled; some applications with certain patterns of locking may see slowdowns, though attempts have been made to minimize the negative impact.

Something that is interesting for us regarding this post is how biased locking is implemented internally. Java is using the object header to store ID of the thread holding the lock. The problem is that the object header layout is well defined (if you're interested, please refer to OpenJDK sources hotspot/src/share/vm/oops/markOop.hpp) and it cannot be "extended" just like that. In 64 bits JVM thread ID is 54 bits long so we must decide if we want to keep this ID or something else. Unfortunately "something else" means the object hash code (in fact the identity hash code, which is stored in the object header). This value is used whenever you invoke hashCode() method on any object which doesn't override it since Object class or when you directly call System.identityHashCode() method. That means when you retrieve default hash code for any object; you disable biased locking support for this object. It's pretty easy to prove. Take a look at such code:

class BiasedHashCode {

 public static void main(String[] args) {
  Locker locker = new Locker();
  locker.lockMe();
  locker.hashCode();
 }

 static class Locker {
  synchronized void lockMe() {
   // do nothing
  }

  @Override
  public int hashCode() {
   return 1;
  }
 }
}

When you run the main method with the following VM flags:
-XX:BiasedLockingStartupDelay=0 -XX:+TraceBiasedLocking
you can see that... there is nothing interesting :)

However, after removing hashCode implementation from Locker class the situation changes. Now we can find in logs such line:
Revoking bias of object 0x000000076d2ca7e0 , mark 0x00007ff83800a805 , type BiasedHashCode$Locker , prototype header 0x0000000000000005 , allow rebias 0 , requesting thread 0x00007ff83800a800

Why did it happen? Because we have asked for the identity hash code. To sum up this part: no hashCode in your classes means no biased locking.

Big thanks to Nicolai Parlog from https://www.sitepoint.com/java/ for reviewing this post and pointing me some mistakes.

Sunday, November 8, 2015

Smart package structure to improve testability

There are many ways of dividing whole application into packages. Discussions about pros and cons of packaging by feature or by layer can we found on many programming blogs and forums. I want to discuss this topic starting from testability and see if it will drive to any meaningful result.

At the beginning let's try to describe what we usually want to test in our applications across different layers. Let's assume standard three tier architecture. At the bottom we have data layer.

Depending on our attitude to Domain-Driven-Design we'll try to maximize (for rich, business-oriented entities) or minimize (for anemic entities built only from getters and setters) test coverage. In the second approach it's even hard to say about any tests, unless you don't trust Java and want to verify if get can retrieve value assigned before by set invocation. For rich entities we definitely want to verify business logic correctness. But to be honest it almost always can be done by simple unit tests with proper mock setup. There are often thousands of tests in this layer, so we want them to be maximally fast. That's a great field for unit testing frameworks! Wait? Why don't you want to test entities with database? I can ask opposite question - why should I do? To verify if JPA or any other persistence API are still working? Of course there are always some really complex queries that should be verified with real database underneath. For those cases I'll use integration tests on repository level. Just database + repository + entities. But remember about single responsibility. Your integrational tests is checking only query - leave whole entity logic for unit tests.

Next layer is usually built from services. In DDD services are just working with repositories to load entities and delegate them whole business logic processing. As you can predict those tests will be pretty simple. Do you think we need database here? Will it provide any added value? Don't think so. And what about second scenario? Anemic entities in our model? Whole logic is concentrated in services so we have to accumulate our test coverage in this layer. But as we already discussed with domain logic it can we done without using external resources. One more time - all we need is a unit test. So still no database. We can run all tests based on repositories mocks. No problems with managing datasets leading to "expected 3 but found 2" tests failures. Just because some other test committed one more order with value between 200$ and 300$. Even if we want to use IoC framework here, it can simulate repository layer with mocks. Without proper decoupling from data layer framework would automatically load repositories via some scanning mechanism. And it's not something we want.

On top of the services we usually place something allowing users to use our application. It can we fronted, RESTful API, SOAP services, etc. What is important to check here? To be fair with our customers we should stick to the contract we have with them. This whole can be material for separate blog post, but narrowing down to REST services:
 "If you will send we POST request to /users URL I'll answer with list of all users. Every user will have id as an integer and string with username."
OK - that looks as a contract. So what should we check in this layer? Of course if this contract is valid. Send HTTP request and verify if response contains array of users, from which every entry is build from integer ID and string username. Can we do it on top of the services mocks? Sure :)

So to encapsulate everything:

  • data layer = unit tests for logic and integration tests with DB for complex query validation
  • service layer = unit tests for logic and light integration tests without DB for testing IoC framework dependent logic
  • front layer = integration tests without DB to verify customer contract

So far we've described in details what is worth to test on different levels. Now let's move to feature based packaging. It definitely helps to keep the code well organized when it's build around different business contexts. For large applications it's something that allows you to cut it down into many modules or even many applications. Without such feature layout such actions will require huge refactorings before. But is it still needed after splitting our monolith into applications? Just think about starting new application. What will be its base package? com.my.company.application? It's nothing else than a feature packaging :) But would you stop on this base package, or still will you split in into layers? As you see those two structures can live together.

For layer based structure our application will look like below:
com.company.application
                      \.data
                           \.config
                           \.model
                           \.repository
                      \.service
                           \.config
                      \.api
                           \.config
                           \.controller

For feature based we'll get something like
com.company.application
                      \.order
                      \.client
                      \.invoice

But usually as business logic always grows, it leads to splitting whole application into modules or services, so finally we get:
com.company.application.order
                            \.data
                            \.service
                            \.api

com.company.application.client
                             \.data
                             \.service
                             \.api

com.company.application.invoice
                              \.data
                              \.service
                              \.api

To sum up. In my opinion packaging by layer is a must. It allows us to test each layer separately and keep our tests well organized. Package by feature is really useful in bigger projects. For microservices which are built around single bounded context more detailed division can lead to uncomfortable navigation. However code inside feature package should be still broken on layers for the same reason as mentioned above. Especially with Spring Framework layer-based structure helps us with setting useful component-scan, and won't drive us to setup a database just because we want to start context with two services. In my GitHub repository https://github.com/jkubrynski/spring-package-structure you can find sample project based on Spring.

Wednesday, November 4, 2015

JPA in case of asynchronous processing

Few years ago in Java world it was almost obvious that every "enterprise" class project needed JPA to communicate with database. JPA is a perfect example of "leaky abstraction" described by Joel Spolsky. Great and easy at the beginning but hard to tune and limiting at the end. Hacking and working directly with caches, flushes and native queries is a daily routine for many backend developers involved in data access layer. There are enough problems and workarounds to write a dedicated book "JPA for hackers", but in this article I'll focus only on concurrent entity processing.

Let's assume the situation: we have Person entity which in some business process is updated by some service.

@Entity
public class Person {

    @Id
    @GeneratedValue
    private Long id;

    private String uuid = UUID.randomUUID().toString();

    private String firstName;

    private String lastName;

    // getters and setters

}

To ignore any domain complexity we're talking about updating first and last name of the person. Of course it's just a trivial use case but it allows us to focus on the real issues instead of discussing domain modeling. We can imagine code looks like below:

firstNameUpdater.update(personUuid, "Jerry");
lastNameUpdater.update(personUuid, "Newman");

After some time business decided it's taking too long to update both elements, so reducing duration becomes top priority task. Of course there are a lot of different ways of doing it, but let's assume that it this particular case going concurrent will solve our pain. This seems to be trivially easy - just need to annotate our service methods with @Async from Spring and voilĂ  - problem solved. Really? We've two possible issues here depending on use of optimistic locking mechanism.

  • With optimistic locking it's almost sure that we'll get OptimisticLockException from one of the update methods - the one which will finish second. And that's better situation compared to not using optimistic locking at all. 
  • Without versioning all update will finish without any exceptions but after loading updated entity from database we'll discover only one change. Why it happened? Both methods were updating different fields! Why the second transaction has overwritten other update? Because of the leaky abstraction :)

We know that Hibernate is tracking changes (it's called dirty checking) made on our entities. But to reduce time needed to compile the query, by default it's including in update query all fields instead of only those changed. Looks strange? Fortunately we can configure Hibernate to work in a different way and generate update queries based on actually changed values. It can be enabled with @DynamicUpdate annotation. This can be considered as a workaround for partial-updates problem, but you have to remember it's a trade-off. Now every update of this entity is more time-consuming than it was before.

Now let's get back to the situation with optimistic locking. To be honest - what we want to do is generally in opposite with the idea of such locking, which assumes that there probably won't be any concurrent modification of the entity and when such situation occurs it raises an exception. Now we definitely want concurrent modification! As an express workaround we can exclude those two fields (firstName and lastName) from locking mechanism. It can be achieved with @OptimisticLock(excluded = true) added on each field. Now updating names won't trigger version increment - it'll stay unmodified, which of course can be a source of many nasty and hard to find consistency issues.
Last but not least solution is a a spin change. To use it we have to wrap update logic with loop, which renew while transaction when OptimisticLock occurs. That works the better the less threads are involved in the process. Source code with all those solutions can be found on my GitHub in jpa-async-examples repository. Just explore commits.

Wait - still no proper solution? In fact no. Just due to use of JPA we're closed to easy solutions of concurrent modification problem. Of course we can remodel our application to introduce some event based approaches, but still we've JPA above. If we use Domain Driven Design we try to close whole aggregate by using OPTIMISTIC_FORCE_INCREMENT locking, just to be sure that changing composite entity, or adding element to collection will update whole aggregate, as it should protect invariants. So why not to use any direct access tool like for example JOOQ or JdbcTemplate? The idea is great, but unfortunately won't work concurrently with JPA. Any modification done by JOOQ won't propagate to JPA automatically, which means session or caches can contain outdated values.

To solve this situation properly, we should extract this context into separate element - for example new table, which would be handled directly with JOOQ. As you probably noticed doing such concurrent update in SQL is extremely easy:

update person set first_name = "Jerry" where uuid = ?;

With JPA abstraction it becomes really complex task which requires really deep understanding of Hibernate behavior as well as implementation internals. To sum up, in my opinion JPA is not following the "reactive" approach. It was built to solve some problems, but currently we force different problems, and in many applications persistence is not one of them.

Sunday, May 17, 2015

Do we really still need a 32-bit JVM?

Even today (and it's 2015) we have two versions or Oracle HotSpot JDK - adjusted to 32 or 64 bits architecture. The question is do we really would like to use 32bit JVM on our servers or even laptops? There is pretty popular opinion that we should! If you need only small heap then use 32bits - it has smaller memory footprint, so your application will use less memory and will trigger shorter GC pauses. But is it true? I'll explore three different areas:
  1. Memory footprint
  2. GC performance
  3. Overall performance
Let's begin with memory consumption.

Memory footprint

It's known that major difference between 32 and 64 bits JVM relates to memory addressing. That means all references on 64bit version takes 8 bytes instead of 4. Fortunately JVM comes with compressed object pointers which is enabled by default for all heaps less than 26GB. This limit is more than OK for us, as long as 32 bit JVM can address around 2GB (depending on target OS it's still about 13 times less).  So no worries about object references. The only thing that differs object layout are mark headers which are 4 bytes bigger on 64 bits. We also know that all objects in Java are 8 bytes aligned, so there are two possible cases:

  • worst - on 64 bits object is 8 bytes bigger than on 32 bits. It's because adding 4 bytes to header causes object is dropped into another memory slot, so we have to add 4 more bytes to fill alignment gap.
  • best - objects on both architectures have the same size. It happens when on 32 bits we have 4 bytes alignment gap, which can be simply filled by additional mark header bytes.
Let's calculate now both cases assuming two different application sizes. IntelliJ IDEA with pretty big project loaded contains about 7 million objects - that will be our smaller project. For the second option lets assume that we have big project (I'll call it Huge) containing 50 million objects in the live set. Let's now calculate the worst case:
  • IDEA ->  7 millions * 8 bytes =  53 MB
  • Huge -> 50 millions * 8 bytes = 381 MB

Above calculations shows us that real application footprint is in the worst case raised for around 50MB heap for IntelliJ and around 400MB for some huge, highly granulated project with really small objects. In the second case it can be around 25% of the total heap, but for vast majority of projects it's around 2%, which is almost nothing.

GC Performance

The idea is to put 8 million String objects into Cache with Long key. One test consists of 4 invocations, which means 24 million puts into cache map. I used Parallel GC with total heap size set to 2GB. Results were pretty surprising, because whole test finished sooner on 32bit JDK. 3 minutes 40 seconds compared to 4 minutes 30 seconds on 64bit Virtual Machine. After comparing GC logs we can see, that the difference mostly comes from GC pauses: 114 seconds to 157 seconds. That means 32 bit JVM in practice brings much lower GC overhead - 554 pauses to 618 for 64bits. Below you can see screenshots from GC Viewer (both with the same scale on both axis)

32bit JVM Parallel GC

64bit JVM Parallel GC
I was expecting smaller overhead of 64bits JVM but benchmarks shows that even total heap usage is similar on 32bits we are freeing more memory on Full GC. Young generation pauses are also similar - around 0.55 seconds for both architectures. But average major pause is higher on 64bits - 3.2 compared to 2.7 on 32bits. That proves GC performance for small heap is much better on 32bits JDK. The question is if your applications are so demanding to GC - in the test average throughput was around 42-48%.

Second test was performed on more "enterprise" scenario. We're loading entities from database and invoking size() method on loaded list. For total test time around 6 minutes we have 133.7s total pause time for 64bit and 130.0s for 32bit. Heap usage is also pretty similar - 730MB for 64bit and 688MB for 32bit JVM. This shows us that for normal "enterprise" usage there are no big differences between GC performance on various JVM architectures.


32bit JVM Parallel GC selects from DB

64bit JVM Parallel GC selects from DB
Even with similar GC performance 32bit JVM finished the work 20 seconds earlier (which is around 5%).

Overall performance

It's of course almost impossible to verify JVM performance that will be true for all applications, but I'll try to provide some meaningful results. At first let's check time performance.

Benchmark                    32bits [ns]   64bits [ns]   ratio

System.currentTimeMillis()       113.662        22.449    5.08
System.nanoTime()                128.986        20.161    6.40

findMaxIntegerInArray           2780.503      2790.969    1.00
findMaxLongInArray              8289.475      3227.029    2.57
countSinForArray                4966.194      3465.188    1.43

UUID.randomUUID()               3084.681      2867.699    1.08

As we can see the biggest and definitely significant difference is for all operations related to long variables. Those operations are between 2.6 up to 6.3 times faster on 64bits JVM. Working with integers is pretty similar, and generating random UUID is faster just around 7%. What is worth to mention is that interpreted code (-Xint) has similar speed - just JIT for the 64bits version is much more efficient. So are there any particular differences? Yes! 64bit architecture comes with additional processor registers which are used by JVM. After checking generated assembly it looks that performance boost mostly comes from possibility to use 64bit registers, which can simplify long operations. Any other changes can be found for example under wiki pageIf you want to run this on your machine you can find all benchmarks on my GitHub - https://github.com/jkubrynski/benchmarks_arch

Conclusions

As in the whole IT world we cannot answer simply - "yes, you should always use **bits JVM". It strongly depends on your application characteristics. As we saw there are many differences between 32 and 64 bits architecture. Even if JIT performance for long related operations is few hundred percents better we can see that tested batch processes finished earlier on 32bits JVM. To conclude - there is no simple answer. You should always check which architecture fits to your requirements better.

Big thanks to Wojtek Kudla for reviewing this article and enforcing additional tests :)


UPDATE 24.05.2015

Build tools

After all I decided to do one more performance comparison for different JVM architectures. Results are quite surprising.

For Gradle I've cloned Spring Framework project. Results for Gradle 2.4 on Linux 4.0.4-301.fc22.x86_64 and JVM 1.8.0_45 (./gradlew --parallel clean build) looks as follows (average from 7 builds per arch):
  • 32 bits => 5m38.6s
  • 64 bits => 6m55.3s
That shows us the 32 bits architecture gives us around 23% boots compared to 64 bits. Considering that all we've to do is to change the JAVA_HOME it's pretty nice. 

I've also run tests for Apache Maven on Spring Boot sources. Running builds on Maven 3.3.3, same Linux kernel and same JVM shows that there's totally no difference between different architectures.
  • 32 bits => 12m01.5s
  • 64 bits => 11m59.0s
That's not what I've expected but I've run those tests many times with very similar results. I assume it's something more related to particular project than to build tool.

Of course it's not sure that for all projects we will get (or not) same improvement, as it depends on the particular sources, tests, etc. To get some meaningful conclusions we should measure many different projects, but it's not something I'm gonna do in this article. I just wanted to show it's worth to check if your projects won't compile faster on 32 bits architecture. Because when it does it could be really cheap and significant improvement.

Wednesday, March 18, 2015

Using jstat to report custom JVM metric sets

I've always been missing possibility to configure custom headers in JStat. Of course there are a lot of predefined data sets, but it'll be nicer if we could create our own data set. And as you probably already devised I'm writing this post because such functionality is of course available :) Unfortunately I haven't found it in any documentation so now I'll try to fill this gap.

First thing we have to do is to provide custom descriptor with possible JStat options. This descriptor is just a text file containing something we'll call "jstat specification language". To make this custom file available to JStat we should place it in the following path: $HOME/.jvmstat/jstat_options
If you want to view the bundled options please refer to file in OpenJDK repository.

The specification language is pretty similar to json files, and it contains the group of option elements. Each option should be threaten as a set of columns that can be shown in single jstat execution. Just to name some of the predefined options: gcnew, gccause or printcompilation.

Each option element consists of several column segments. I think it's quite obvious what column means :) And whats the most important in this descriptor is just a column specification.

Each column must contain at least two nodes: header and data. Header is used to describe the column and can be aligned using special char ^ which I'll call "the grip". Grip means that it sticks the header with particular side of the column, so:
  • ^Name will be aligned to the left,
  • ^Name^ will be centered,
  • Name^ will be aligned to the right.
The next important node is the data column. It uses PerfCounter metrics and is able to make some basic arithmetic operations - like add, minus, divide and multiply as well as use parenthesis to group operations . You can also group If you want to see all metrics that are available via this mechanism just can invoke
$jcmd <PID> PerfCounter.print
and see the output values.

Sample minimum file content can be like that:
option customgc {
  column {
    header "Tenuring"
    data sun.gc.policy.tenuringThreshold
  }
}
When we'll invoke it using
$jstat -customgc <PID> 1s 3
we'll see something like:
Tenuring
6
4
5

We can also use the operations to show for example joint young generation usage:
option customgc {
  column {
    header "YoungC"
    data sun.gc.generation.0.space.0.used + sun.gc.generation.0.space.1.used + sun.gc.generation.0.space.2.used
  }
}

There are also four additional columns that are used to setup layout for our column.
  1. First is for alignments setting. We can choose if we want to align our data to left, center or right by setting align element to one of above values. 
  2. In the case of number metrics we can specify string used as DecimalFormat input by entering it in the format node.
  3. We're also able to specify the size of the column by adding width element with particular length.
  4. Last but not least is a scaling functionality. Because most of the metrics contain just raw output from JVM we need to transform it a little bit to make if useful for human eye. This can be done with the use of scale attribute set to one of the below values (token column).
    tokenfactordesc
    raw1no scaling
    percent1/100convert to percentage
    K1024kilo
    M1024*1024mega
    G1024*1024*1024giga
    n10^-9nano
    u10^-6micro
    m10^-3milli
    us10^-6microseconds
    ms10^-3milliseconds
    s1seconds
    min1/60minutes
    h1/3600hour

Now let's see the polished example that will show how we can use additional properties:
option customgc {
  column {
    header "YoungC^"
    data sun.gc.generation.0.space.0.used + sun.gc.generation.0.space.1.used + sun.gc.generation.0.space.2.used
    align right
    scale M
    width 7
    format "0.0"
  }
  column {
    header "OldC^"
    data sun.gc.generation.1.space.0.used
    align right
    scale M
    width 7
    format "0.0"
  }
}
Which produces
 YoungC    OldC
   67.7   161.0
   37.8   165.4
   92.2   182.8

End of topic :) Good luck!

Tuesday, February 3, 2015

DataTransferObject myth-busting

DataTransferObject is one of the most popular design patterns. It has been widely used in archaic Java Enterprise application and some time ago got second live due to growth of RESTful APIs. I don't want to elaborate on DTOs pros (like for example hiding domain logic, etc) because there are whole books talking about it. I just want to cover technological aspects of DTO classes.

Usually we create just simple POJO with getters, setters and default constructor. The whole mapping configuration usually limits to @JsonIgnore annotation added on getter we want to skip.
It's the simplest way but can we do better? I'll focus on most popular in Java world Jackson Mapper and try to answer 3 basic questions which are usually skipped during development, and which in my opinion can improve the design of the transport layer.

1. Can we skip getters?

Yes! Jackson is able to serialize our objects using different strategies. Those strategies are fully customizable on two levels: for ObjectMapper and for particular object. If we for example want to serialize all fields for all pojos in our context notwithstanding presence of getter we can do something like this:

mapper.setVisibility(PropertyAccessor.FIELD, JsonAutoDetect.Visibility.ANY)

To achieve the same result for single DTO you can use JsonAutoDetect annotation on class level:

@JsonAutoDetect(fieldVisibility = JsonAutoDetect.Visibility.ANY)

Please notice that there are a lot different accessor settings - like including only public getters (and skipping package-protected), etc.

2. Are setters necessary?

No! Jackson can deserialize our objects without setters, and there is nothing special we have to do... except removing needless setters.

3. Is the default constructor required? I always add it due to "No suitable constructor found for type" exception.

No! Jackson comes with @JsonCreator annotations which can be used to specify given constructor to be used for deserialization.

public class Product {

    private final String name;

    @JsonCreator
    public Product(@JsonProperty("name") String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }
}

This approach allows us to create immutable objects and use them for round-trip communication (serialize/deserialize). As you've probably noticed parameter name is redundant in this case. The good news is that since Java 8 it's possible to store parameter info in runtime. To do that we've to add -parameters option to javac compiler. Jackson have module activating such support. After including it in the project

<dependency>
    <groupid>com.fasterxml.jackson.module</groupid>
    <artifactid>jackson-module-parameter-names</artifactid>
    <version>2.5.0</version>
</dependency>

and registering into ObjectMapper

objectMapper.registerModule(new ParameterNamesModule());

our brand new DTO looks as follows:

public class Product {

    private final String name;

    @JsonCreator
    public Product(String name) {
        this.name = name;
    }

    public String getName() {
        return name;
    }
}

Note that @JsonCreator is needed only from jackson-databind 2.5.0. As you can see there are many options to make our DTOs lighter and prettier. The only thing needed is to understand how it's made :)

Sunday, January 4, 2015

Dependency injection pitfalls in Spring

There are three injection variants in Spring framework:
  • Setter-based injection
  • Constructor-based injection
  • Field-based injection
Each of those mechanisms has advantages and disadvantages and there is not only one right approach. For example field injection:
@Autowired
private FooBean fooBean;
It's generally not the best idea to use it in the production code, mostly because it makes our beans impossible to test without starting Spring context or using reflection hacks. On the other hand it requires almost no additional code and could be used in integration tests - which definitely won't be instantiated independently. And in my opinion this is the only case for field-based injections.

Now let's focus on two major variants. In Spring documentation we can read that
...it is a good rule of thumb to use constructor arguments for mandatory dependencies and setters for optional dependencies.
Also in documentation referring Spring up to 3.1 we could find a sentence
The Spring team generally advocates setter injection, because large numbers of constructor arguments can get unwieldy, especially when properties are optional.
This situation has changed in documentation to fourth version, which says:
The Spring team generally advocates constructor injection as it enables one to implement application components as immutable objects and to ensure that required dependencies are not null. 
Pretty cool especially that prior to version 4.0 people using constructor-based injection where just "some purists" (this also can be found in this documentation) :) Please note that before fourth framework release there used to be a big problem with this injection variant - aspects demanded default constructor. Now there is still one "drawback" of constructor-based injection: it doesn't allow circular dependencies. I intentionally put drawback into quotation marks because for me it's a huge advantage of this mechanism :) One more sentence from the documentation:
It is generally recommended to not rely on circular references between your beans. 
But why? What can happen if we have circular references in our applications? I don't want to write about application design because almost always it's possible to refactor our code and delegate problematic logic to a third bean. There are two significant and unfortunately "silent" problems.

First pitfall

When you invoke ListableBeanFactory.getBeansOfType() method, you can't be sure which beans will be returned. Let's see the code of the DefaultListableBeanFactory class:
if (isCurrentlyInCreation(bce.getBeanName())) {
  if (this.logger.isDebugEnabled()) {
    this.logger.debug("Ignoring match to currently created bean '" 
        + beanName + "': " + ex.getMessage());
  }
  // ...
  continue;
}

As you can see if you don't use DEBUG logging level there will be zero information that Spring skipped particular bean in resolution process. If you wanted to get all event handlers you're screwed :)

Second pitfall

Second problem refers to AOP. If you want to have aspect on your bean, please ensure there it's not involved in circular reference - otherwise Spring will create two instances of your bean - one without aspect and the other with proper aspect. Of course still without any information. Surprised?

For me it's enough to stop using circular dependencies in our applications (especially that there are probably more interesting behaviors related to this).

DO NOT USE CIRCULAR DEPENDENCIES!

But what can we do to get out of the problematic situation? Of course you can use constructor-based injection :) But if you have huge application it's not the best idea to spend many days rewriting all classes to use constructors instead of setters. Fortunately I have good news - allowCircularReferences field in AbstractRefreshableApplicationContext class. Just add single line to application context creation (by the way described in this post)
AnnotationConfigWebApplicationContext applicationContext =
    new AnnotationConfigWebApplicationContext();
applicationContext.setAllowCircularReferences(false);
// rest of context initialization

Finally, to keep you in a good mood I'm pasting one more code snippet from DefaultListableBeanFactory :
catch (NoSuchBeanDefinitionException ex) {
  // Shouldn't happen - probably a result of circular reference resolution...
  if (logger.isDebugEnabled()) {
    logger.debug("Failed to check manually registered singleton with name '" 
        + beanName + "'", ex);
  }
}

Have a nice day! :)