Java Performance: there is always a next bottleneck

At LodgON, we created an application for the upcoming regional and European elections.
The customer, VTM, is one of the two major broadcast companies in Belgium, and a high volume of users was expected on the day of launch. Therefore, performance was one of the key requirements.
The article below is not a Bible, but it describes how we managed to overcome the performance issues. It might help others in solving their Java Enterprise applications performance issues as well.

The application is mainly an online survey, where people can compare their answers on a number of questions with the answers other people, famous people, and their Facebook friends provided (the Facebook integration is cool and popular, but outside the scope of this blog entry).

Without exactly knowing how many visitors were expected, we had to come up with an architecture and infrastructure. Measurements are extremely important in such a project. We developed a non-optimized application, installed it on a development server, and did a number of JMeter tests. A number of load testing tools exists, and we found JMeter very convenient.

The first thing to decide was whether to go for a single, high-performance server, or to choose a cluster of cheaper servers where servers can be added on-demand.

Most of our servers are at cari.net, and we decided to go for a single high-level 4 CPU Xeon server. If required, we can add 4 more CPU's to this server. Based on the first results with the first version of the application, that seemed to be enough for the expected peak load -- provided we could scale easily.

There are many articles about Java Performance in general and Java Enterprise Performance in particular. While many of these articles have great general tips, these tips are not always relevant to all projects. In my opinion, it doesn't make sense to spend 80% of the time while saving less than 1% in CPU time, especially if the CPU is not the bottleneck.


My approach in dealing with performance is moving from one bottleneck to the next one. There are no general rules for this, since every application is different. Most important is that you can identify the bottlenecks. If you don't know the current bottleneck, you are moving blindly in a direction that hopefully increases your performance. If your application has a deadlock on a Lucene index, it won't help if you replace all String additions with StringBuilder concats or something similar.
Therefore, measuring bottlenecks is extremely important.

First of all, a number of acceptance rules should be defined. This can be the number of concurrent users, concurrent transactions, throughput, wait time,... . One of the best resources I encountered so far on the Internet is the GlassFish Performance Tuning Guide

Using JMeter allows one to gather numbers on the acceptance criteria. When we used JMeter testing on the first version of our application, the throughput was not even 1% of what we required. We didn't panic, after all we had almost a week to fix this.

Here are three important things to monitor:

  • Monitor the server load (e.g. top or vmstat 5
  • Monitor the network traffic
  • Monitor the number of busy threads

Monitoring the number of busy threads is extremely important. One of my favorite commands at the moments is this:

asadmin get -m --iterations 2000 --interval 5 server.http-service.server.http-listener-1.currentthreadcount-count server.http-service.server.http-listener-1.currentthreadsbusy-count server.http-service.connection-queue.countqueued-count server.http-service.connection-queue.countoverflows-count

(this is a GlassFish specific command, but I guess every application server has its own similar commands)

This command results in 4 numbers showing every 5 seconds:

  • countoverflows-count shows how often you had a problem. Should be 0.
  • countqueued-count shows the number of requests that are queued and thus waiting to be processed.Your users become impatient if this is too high.
  • currentthreadcount-count shows the number of processing threads that are currently in the pool and that can be used. The maximum number of these processing threads is extremely important.
  • currentthreadbusy-count shows how many threads are currently processing

Apart from these commands, the garbage collector log is useful as well. The following jvm-options are always set in my configuration:

-verbose:gc

-XX:+PrintGCDetails

-Xloggc:/tmp/gc.log

When a performance problem occurs, the combination of all these numbers should give you an idea of the bottleneck. If the CPU is highly used, and if the number of busy threads is high (reaching the currentthreadcount-count), and if the CPU is mainly active in the java process, you most likely have an application bottleneck. That's good, those are often easier to solve because it is your own code.
This is what we first had in our survey application, and by using some caching in the Webtier, this was solved.

When the number of busy threads equals the number of available threads in the pool, and the cpu is not fully used at the same time, your system is under-used. The maximum number of processing threads available to the HTTPService provides some sort of a fuse to your CPU. It is often recommended to make this maximum number equal to the number of CPU's in the server. Indeed, in an optimal system, a single processing threads keeps a single CPU busy while it is active.
However, in many cases the processing thread releases the CPU, e.g. when it waits for a lock, or when it reads data over a relatively slow network. In those cases, the maximum number of processing threads can be increased. This should be done very carefully, though. Suppose there are two types of requests: one that consumes the CPU for about 100% of the time, and another one that leaves the CPU idle for 90% of the time. The server can probably handle 10 concurrent requests of the second type on a single CPU. But if we set the maximum number of processing threads to 10 times the number of CPU's, we can burn the CPU's if all incoming requests are of the first type.

What we typically do is to set the maximum number of processing threads equal to the amount of CPU, put the system under load, and measure the CPU. Then, we increase the number of threads until the system reaches full CPU usage.
At this moment, the bottleneck is most likely again the application. The CPU's are almost completely dedicated to the Java process or to a dependent database process. (database performance is out of scope for this entry). We now need to know what part of the application is the current bottleneck. I switch between two commands for this task:

asadmin generate-jvm-dump --type=thread

and

jstack -m -l

Both commands will give you a thread dump, and in most cases, this thread dump immediately tells you where the bottleneck is. In case you have 10 processing threads, and all of them are in a particular method at the moment of the snapshot, that method is subject to optimization.
Optimize the method, and repeat the whole process.

In a number of cases, the problems are memory-based. This is easily traceable, since the gc.log contains all the information. Too many full gc's occurring after each other indicate a memory problem. I often find these more difficult to solve, although great tools exist, e.g. jhat and jmap.

In summary, if your Java Enterprise application suffers from performance problems, try to measure what is going wrong and where the current bottleneck is. And stop optimizing once your targets are reached.

written on 28 May 2009 12:04.

no comments

Create comment