Glassfish 3.0.1 on Amazon EC2: performance and scaling

Last week, our partner IvoxTools launched 3 voting tests, based on LodgON's DaliCompare software. Performance and scalability were two of the main requirements, and we realized this using a promoted build of Glassfish 3.0.1 and leveraging the Amazon Elastic Cloud Computing (EC2).

With elections for both the Dutch and the Belgian parliament nearing, many people want to know which candidate best matches his own profile. The DaliCompare software allows users to complete a survey and match their answers with the answers of famous people, selected parts of the population (e.g. men, women, people younger than 24 years) and their friends at social network (currently, Facebook, NetLog and Hyves are supported). This allows a funny way for helping people to find out who they have to vote for.

Our partner IvoxTools commercializes this software, and a number of Dutch and Belgian media partners created projects based on DaliCompare:

I'm probably not allowed to give the exact number of visitors, but we are talking more than 6 digits (and counting).
In order to be able to handle millions of users in a short timeframe, the application needs to be performant (responsive) and scalable. For previous DaliCompare projects, we used a high-end dedicated server. The DaliCompare software leverages LodgON's DaliCore framework, which is extremely performant already. Expectations for the current projects were huge, and the expectations were proven to be right. Hence, a single dedicated server for a project would be insufficient during peak moments --- the project URL's were announced several times during prime-time on the major tv stations. The DaliCompare application is written using JavaEE 6. Due to a cookie-bug in Glassfish 3, we decided to use a promoted build of Glassfish 3.0.1. Glassfish, together with a well-written application, guarantees the performance. Scalability is important as well, since the load during peak moments is much higher than the load in off-peak moments. We are using Amazon Elastic Cloud Computing (EC2) to achieve this. This turned out to be very easy, both in configuration and in monitoring.
For each project, we use an Amazon Load Balancer instance with session stickiness, dispatching all requests to a number of instances running DaliCompare and Glassfish. These instances communicate with an instance running mysql for persistence. The architecture of the projects is shown below (click for full-size):

During non-peak moments (e.g. less than 20.000 completed surveys in an hour), the Load Balancer is not needed, and a single instance is sufficient. But it is very easy to create a new instance on the fly (based on an AMI), and add it to the LB --- which is what we do shortly before a major announcement on TV is scheduled.

About the application

DaliCore is the framework created by LodgON that provides granular functionality for social software in general. Basically, it extends the core Application Server functionality with the notion of users, roles, permissions, groups, data, activities, preferences,... . This functionality is needed in most of the projects we do. We have a number of tools that we created on top of DaliCore. The DaliCore is very well tested, both from functional point as from performance point. We use jmeter for testing and regression-testing the performance of DaliCore and are very confident about its behavior.
The DaliCore functionality is made available through REST-API's (implemented using Jersey). Direct EJB-calls are possible as well, but the REST-interface is the one that is extensively tested and we know the performance characteristics very well. DaliCore contains stateless Session Beans and Entity Beans (using JPA).
The DaliCompare application is a WAR, containing the graphical setup and providing the application-specific logic needed for the projects. The real business logic is delegated to the DaliCore application, using REST calls (using the Jersey Client). It may sound strange, DaliCompare is using DaliCore using REST-calls instead of e.g. EJB-calls. We did lots of load testing, though, and Jersey on top of Grizzly behaves extremely well.

The Amazon Load Balancer dispatches requests to the instances that are attached to it. In practice, a single instance is sufficient for most of the time, but failover is a good thing for people's night rest. While completing the survey, some information is kept on the HttpSession. Therefore, it is important that the load balancer selects the same instance for different requests in the same session. This can be easily configured with the Amazon LB.

The DaliCompare and DaliCore applications each have their own http-listener with separate thread-pools. This allows us to fine-tune the size of the pools, both for external calls (to DaliCompare) and internal calls (to DaliCore). The 2 thread pools are also needed, in order to avoid a deadlock. Requests to the DaliCompare often need a request to DaliCore. If there was only a single threadpool, and all these threads were occupied at the same moment by requests to DaliCompare, no request to the DaliCore could be made, hence no requests to DaliCompare could be completed and no threads could be released.

Monitoring is an important part of the DaliCompare operations. The Amazon EC2 console is very useful, it shows the CPU and network load on a per-minute granularity. Apart from that, we monitor the number of threads that are used in every threadpool, in order to optimize CPU usage.

If you want to run an application that needs to handle high volumes or loads, it is very important that you understand how the constituting parts work. Tuning the Application Server can save lots of money. Fortunately, Glassfish allows for a very flexible and transparent tuning.

written on 28 May 2010 10:38.

no comments

Create comment