Simple way to scale your Web App – Part 1
Every time i made an J2EE application, my main concern was about how many requests it could handle in the end. In my early dawns of my career I have been in the situation when my app was his own victim of success, to much load for a single server. We always had a reactive attitude and tried to deal with the problem when it happened, but some times it was too damn late. To be able to scale you application it must be made to be scaled.
But what would be a simple scalable architecture?
Let’s consider the diagram from the left. Our application will be splited in three logical clusters. First one is the application cluster, second database cluster and finally logging and statistic cluster.
The Load Balancer
A load balancer distributes the traffic among you application servers. It can be software or a hardware device. A handy solution is to use a software balancer, such an Apache, but the software solution is not so robust and performant as a hardware balancer.
Hardware
A dedicated device for load balancing is more suitable and gives you more performance. Thus, this comes with an additional costs but there many devices on the market and you should choose the best one on cost/ features. When evaluating a load balancer some things must be kept in mind
- Maximum connection supported
- Throughput
- RIPS (Real IP Address of a Real Server)
- SSL
- Gb-NICs
- Price
Software alternatives
- Apache, mode_rewrite, it can be used to provide simple url based balancing capabilities.
- Apache, mod_proxy_balancer, It provides load balancing support for
HTTP
,FTP
andAJP13
protocols. The best option for a medium web application. - Apache, mod backhand, http and log spread. It is considered more performant than mod_proxy_balancer
- Perball from Danga, already used for LiveJournal, Vox and TypePad
- Varnish, open source http server, a new commer. Has also load balancing capabilities , see here http://varnish.projects.linpro.no/wiki/FAQ
- Pure Load Balancer [PLB], for http and smtp.
- Linux Virtual Server, an advanced load balancing solution can be used to build highly scalable and highly available network services, such as scalable web, cache, mail, ftp, media and VoIP services. Though this solution is for bigger purposes than scale a simple application.
To balance SSL connections your balancer should provide SSL termination capability. Otherwise being a connection level protocol the SSL connection should persists between server and client by allocating the same host to the same client.
More about Load Balancing in a future post.
The Application Cluster
To be able to scale a web application, it has to be designed to scale. The main issue is session replication. Depending on the load balancing algorithm you will need to replicate the session or to use sticky session.
Stateless
This mode doesn’t require much from a load balancer. This is very common to REST applications. For a web 2.0 application this could be the common aproach. The application stores everything it needs on the client side. The first user request goes on the first machine while de second will hit a different machine. No data has to be shared between web servers. To handle more requiest new servers can be added in the web pool and the system will scale out.
Having a stateless design allows us a seamless failover. This can be achieved no matter what language we use to develop our application
Sticky session
Some times our application needs to store user specific data at session level. This means that every time a user hits the server we need some data to be able to process user request, we need local data and state. This data must be available on all server where the user request come. To cope with this problem we can use “sticky session” which means we need to ensure that the user will hit the same server as the initial request.
The most common aproach with sticky sessions is:
- a cookie that holds the routing information
- configuration that defines this cookie id for the balancer
- configuration that defines routing for each back-end
Session replication
This technique is very common in J2EE where the Servlet Containers provide a way to replicate the session between the servers. There are several condition for a session to be replicated but it can be a viable alternative to sticky session.
The Database cluster
Obviously, my immediate choice will be to use MySQL as database server. It’s free, has a lot of community, it serves a lot of well known web 2.0 sites.
Using MySql you can scale horizontal by using MySql’s replication mechanism. When the database grows is time to partition our data horizontally into shards.
Replication
Master-Slave
This type of configuration will help you to scale the reads from the write. The reads will always go to the Slave while the transactions that alter the data will go to the Master. You can have one Master and multiple slaves.
Master – Master
Such an approach will distribute the load evenly and it also provides High Availability. MySQL 5.0 provides replication at statement level which often can crash the replication because of conflicts. The most common conflicts i ever met are for unique indexes but you can coupe with this problem by using “replace” command instead of insert. Anyway the MySQL 5.1 will have some semnificative changes in the replication module.
Tree Replication
This gives you a lot of posibilities, you can conbine master-master with master-slave in a tree structure and conbined with sharding it can result into a fine tuned MySQL cluster. More about Tree Replication and data partitioning in further posts.
Logging & Batch processing
In the end we reached the final component of our application.: the logging server and the batch processing machine. Why do we need them? Simply because somebody has to do them. Every application has some batch processing to be done. This is done usually by using cron and scripts. I recommend to use a scripting language for batch processing such as: bash, ruby, python etc.
For logging I would suggest you to use syslog or syslog-ng which has more advanced features than syslog, better performance in terms of cpu and supports UTF-8.
What’s next?
Next I would like to walk step by step through designing a J2EE based on the architecture discussed above by using Apache, Tomcat, Struts, Hibernate and MySQL.