Sunday, May 17, 2015

Do we really still need a 32-bit JVM?

Even today (and it's 2015) we have two versions or Oracle HotSpot JDK - adjusted to 32 or 64 bits architecture. The question is do we really would like to use 32bit JVM on our servers or even laptops? There is pretty popular opinion that we should! If you need only small heap then use 32bits - it has smaller memory footprint, so your application will use less memory and will trigger shorter GC pauses. But is it true? I'll explore three different areas:
  1. Memory footprint
  2. GC performance
  3. Overall performance
Let's begin with memory consumption.

Memory footprint

It's known that major difference between 32 and 64 bits JVM relates to memory addressing. That means all references on 64bit version takes 8 bytes instead of 4. Fortunately JVM comes with compressed object pointers which is enabled by default for all heaps less than 26GB. This limit is more than OK for us, as long as 32 bit JVM can address around 2GB (depending on target OS it's still about 13 times less).  So no worries about object references. The only thing that differs object layout are mark headers which are 4 bytes bigger on 64 bits. We also know that all objects in Java are 8 bytes aligned, so there are two possible cases:

  • worst - on 64 bits object is 8 bytes bigger than on 32 bits. It's because adding 4 bytes to header causes object is dropped into another memory slot, so we have to add 4 more bytes to fill alignment gap.
  • best - objects on both architectures have the same size. It happens when on 32 bits we have 4 bytes alignment gap, which can be simply filled by additional mark header bytes.
Let's calculate now both cases assuming two different application sizes. IntelliJ IDEA with pretty big project loaded contains about 7 million objects - that will be our smaller project. For the second option lets assume that we have big project (I'll call it Huge) containing 50 million objects in the live set. Let's now calculate the worst case:
  • IDEA ->  7 millions * 8 bytes =  53 MB
  • Huge -> 50 millions * 8 bytes = 381 MB

Above calculations shows us that real application footprint is in the worst case raised for around 50MB heap for IntelliJ and around 400MB for some huge, highly granulated project with really small objects. In the second case it can be around 25% of the total heap, but for vast majority of projects it's around 2%, which is almost nothing.

GC Performance

The idea is to put 8 million String objects into Cache with Long key. One test consists of 4 invocations, which means 24 million puts into cache map. I used Parallel GC with total heap size set to 2GB. Results were pretty surprising, because whole test finished sooner on 32bit JDK. 3 minutes 40 seconds compared to 4 minutes 30 seconds on 64bit Virtual Machine. After comparing GC logs we can see, that the difference mostly comes from GC pauses: 114 seconds to 157 seconds. That means 32 bit JVM in practice brings much lower GC overhead - 554 pauses to 618 for 64bits. Below you can see screenshots from GC Viewer (both with the same scale on both axis)

32bit JVM Parallel GC

64bit JVM Parallel GC
I was expecting smaller overhead of 64bits JVM but benchmarks shows that even total heap usage is similar on 32bits we are freeing more memory on Full GC. Young generation pauses are also similar - around 0.55 seconds for both architectures. But average major pause is higher on 64bits - 3.2 compared to 2.7 on 32bits. That proves GC performance for small heap is much better on 32bits JDK. The question is if your applications are so demanding to GC - in the test average throughput was around 42-48%.

Second test was performed on more "enterprise" scenario. We're loading entities from database and invoking size() method on loaded list. For total test time around 6 minutes we have 133.7s total pause time for 64bit and 130.0s for 32bit. Heap usage is also pretty similar - 730MB for 64bit and 688MB for 32bit JVM. This shows us that for normal "enterprise" usage there are no big differences between GC performance on various JVM architectures.


32bit JVM Parallel GC selects from DB

64bit JVM Parallel GC selects from DB
Even with similar GC performance 32bit JVM finished the work 20 seconds earlier (which is around 5%).

Overall performance

It's of course almost impossible to verify JVM performance that will be true for all applications, but I'll try to provide some meaningful results. At first let's check time performance.

Benchmark                    32bits [ns]   64bits [ns]   ratio

System.currentTimeMillis()       113.662        22.449    5.08
System.nanoTime()                128.986        20.161    6.40

findMaxIntegerInArray           2780.503      2790.969    1.00
findMaxLongInArray              8289.475      3227.029    2.57
countSinForArray                4966.194      3465.188    1.43

UUID.randomUUID()               3084.681      2867.699    1.08

As we can see the biggest and definitely significant difference is for all operations related to long variables. Those operations are between 2.6 up to 6.3 times faster on 64bits JVM. Working with integers is pretty similar, and generating random UUID is faster just around 7%. What is worth to mention is that interpreted code (-Xint) has similar speed - just JIT for the 64bits version is much more efficient. So are there any particular differences? Yes! 64bit architecture comes with additional processor registers which are used by JVM. After checking generated assembly it looks that performance boost mostly comes from possibility to use 64bit registers, which can simplify long operations. Any other changes can be found for example under wiki pageIf you want to run this on your machine you can find all benchmarks on my GitHub - https://github.com/jkubrynski/benchmarks_arch

Conclusions

As in the whole IT world we cannot answer simply - "yes, you should always use **bits JVM". It strongly depends on your application characteristics. As we saw there are many differences between 32 and 64 bits architecture. Even if JIT performance for long related operations is few hundred percents better we can see that tested batch processes finished earlier on 32bits JVM. To conclude - there is no simple answer. You should always check which architecture fits to your requirements better.

Big thanks to Wojtek Kudla for reviewing this article and enforcing additional tests :)


UPDATE 24.05.2015

Build tools

After all I decided to do one more performance comparison for different JVM architectures. Results are quite surprising.

For Gradle I've cloned Spring Framework project. Results for Gradle 2.4 on Linux 4.0.4-301.fc22.x86_64 and JVM 1.8.0_45 (./gradlew --parallel clean build) looks as follows (average from 7 builds per arch):
  • 32 bits => 5m38.6s
  • 64 bits => 6m55.3s
That shows us the 32 bits architecture gives us around 23% boots compared to 64 bits. Considering that all we've to do is to change the JAVA_HOME it's pretty nice. 

I've also run tests for Apache Maven on Spring Boot sources. Running builds on Maven 3.3.3, same Linux kernel and same JVM shows that there's totally no difference between different architectures.
  • 32 bits => 12m01.5s
  • 64 bits => 11m59.0s
That's not what I've expected but I've run those tests many times with very similar results. I assume it's something more related to particular project than to build tool.

Of course it's not sure that for all projects we will get (or not) same improvement, as it depends on the particular sources, tests, etc. To get some meaningful conclusions we should measure many different projects, but it's not something I'm gonna do in this article. I just wanted to show it's worth to check if your projects won't compile faster on 32 bits architecture. Because when it does it could be really cheap and significant improvement.