Simple way to scale your Web App – Part 1

Every time i made an J2EE application, my main concern was about how many requests it could handle in the end. In my early dawns of my career I have been in the situation when my app was his own victim of success, to much load for a single server. We always had a reactive attitude and tried to deal with the problem when it happened, but some times it was too damn late. To be able to scale you application it must be made to be scaled.

But what would be a simple scalable architecture?

Simple Web Application Architecture

Let’s consider the diagram from the left.  Our application will be splited in three logical clusters.  First one is the application cluster, second database cluster and finally logging and statistic cluster.

The Load Balancer

A load balancer distributes the traffic among you application servers.  It can be  software or a hardware device.  A handy solution is to use a software balancer, such an Apache, but the software solution is not so robust and performant as a hardware balancer.

Hardware

A dedicated device for load balancing is more suitable and gives you more performance. Thus, this comes with an additional costs but there many devices on the market and you should choose the best one on cost/ features. When evaluating a load balancer some things must be kept in mind

Software alternatives

To balance SSL connections your balancer should provide SSL termination capability. Otherwise being a connection level protocol the SSL connection should persists between server and client by allocating the same host to the same client.

More about Load Balancing in a future post.

The Application Cluster

To be able to scale a web application, it has to be designed to scale. The main issue is session replication. Depending on the load balancing algorithm you will need to replicate the session or to use sticky session.

Stateless

This mode doesn’t require much from a load balancer. This is very common to REST applications.  For a web 2.0 application this could be the common aproach. The application stores everything it needs on the client side.  The first user request goes on the first machine while de second will hit a different machine. No data has to be shared between web servers. To handle more requiest new servers can be added in the web pool and the system will scale out.

Having a stateless design allows us a seamless failover. This can be achieved no matter what language we use to develop our application

Sticky session

Some times our application needs to store user specific data at session level. This means that every time a user hits the server we need some data to be able to process user request, we need local data and state. This data must be available on all server where the user request come. To cope with this problem we can use “sticky session” which means we need to ensure that the user will hit the same server as the initial request.

The most common aproach with sticky sessions  is:

Session replication

This technique is very common in J2EE where the Servlet Containers provide a way to replicate the session between the servers. There are several condition for a session to be replicated but it can be a viable alternative to sticky session.

The Database cluster

Obviously, my immediate choice will be to use MySQL as database server. It’s free, has a lot of community, it serves a lot of well known web 2.0 sites.

Using MySql you can scale horizontal by using MySql’s replication mechanism. When the database grows is time to partition our data horizontally into shards.

Replication

Master-Slave

This type of configuration will help you to scale the reads from the write.  The reads will always go to the Slave while the transactions that alter the data will go to the Master.  You can have one Master and multiple slaves.

Master – Master

Such an approach will distribute the load evenly and it also provides High Availability. MySQL 5.0 provides replication at statement level which often can crash the replication because of conflicts. The most common conflicts i ever met are for unique indexes but you can coupe with this problem by using “replace” command instead of insert. Anyway the MySQL 5.1 will have some semnificative changes in the replication module.

Tree Replication

This gives you a lot of posibilities, you can conbine master-master with master-slave in a tree structure and conbined with sharding it can result into a fine tuned MySQL cluster. More about Tree Replication and data partitioning in further posts.

Logging & Batch processing

In the end we reached the final component of our application.: the logging server and the batch processing machine. Why do we need them? Simply because somebody has to do them. Every application has some batch processing to be done. This is done usually by using cron and scripts. I recommend to use a scripting language for batch processing such as: bash, ruby, python etc.

For logging I would suggest you to use syslog or syslog-ng which has more advanced features than syslog, better performance in terms of cpu and supports UTF-8.

What’s next?

Next I would like to walk step by step through designing a J2EE based on the architecture discussed above by using Apache, Tomcat, Struts, Hibernate and MySQL.

Comments

5 Responses to “Simple way to scale your Web App – Part 1”

  1. Alex on February 9th, 2009 5:21 pm

    Take a look at Varnish:
    http://varnish.projects.linpro.no/

    Load balancing, smart caching … it rocks 😉

    Intro tutorial:
    http://www.catalystframework.org/calendar/2008/14

    Page fragments caching:
    http://www.catalystframework.org/calendar/2008/17
    (unfortunately the second tutorial is heavy on Perl, but you can figure it out)

  2. bserban on February 10th, 2009 4:01 am

    Looks nice, I included it into the post. I’ll definitely give it a try.
    Tx for mention it.

  3. Cris Radu on June 9th, 2009 9:37 am

    When checking for SSL termination capability of the load balancer, make sure the appliance embeds cryptographic accelerators, asymmetric encryption required to establish a SSL connection can be quite heavy in terms of processing power.

    If you plan on running an advanced security platform, check also for more subtle features such as OCSP support, a protocol used for real-time validation of digital certificates rather than through CLR published on LDAP.

  4. Cris Radu on June 9th, 2009 9:39 am

    For real heavy processing you need a serious balancer, check out F5 Big-IP, everything in a box plus unparalleled scalability.
    Of course the price tag is accordingly.

  5. Frank on July 3rd, 2010 2:38 pm

    Nice post. Evaluate nginx for load balancing and varnish for caching reverse proxy.