Recently I had the chance to participate to a Test Driven Development session. Until now I never tried TDD I just simply used the TFP (Test First Programming), so let’s see what are the differences between Waterfall model, TDD and TFP. After that I’ll list some key points about doing TDD.
Programmers’ headache started in 1970 when Winston Royce published the Waterfall model. This model was based on a set of sequential steps: Analysis, Design, Code, Test, Maintain. The big assumption made was that the design precedes the coding phase and the testing phase comes after the coding. A very structured approach. This is a description of a well defined process which works great on other industries but the practice proved that software development is an empiric process rather than a very well defined one and for such process you need a short feedback loop. This assumption is the foundation of the agile methodologies which are surfacing the project problems since the beginning.
Test Driven Development and Test First Programming are reducing the gap between testing and design allowing you to adapt your design as you go. The Test First Programming assumes that you have a design upfront and you start your coding by writing tests to validate your design. The loop is only between code and tests but no for design. This is in fact a longer development cycle because the design is corrected at the later time.
TDD assumes that you don’t have design upfront, your design emerges with the code and you have a loop between all three key elements: Test, Design, Code. Now you can incorporate testing feedback into design and code as you go. Such a cycle should take around 30 minutes.
The four golden rules of TDD:
Last week i was waked up from bed by the monitoring team from my company. There was a problem with my system, there was a DNS problem undergoing but as a side effect my app was down. Since it has a lot of traffic it had to be solved immediately.
I jumped to the computer and I quickly diagnosed the system. Everything was fine except the Mysql connection pool which was exhausted. The first thing that crossed my mind is that it was just a coincidence and I quickly ran show processlist to see a list of MySQL processes. The output was an infinite list of load balancer’s ip address having “login” text as status. In order to achieve high availability i am using Mysql by having a balanced ip address between two Mysql servers. The balancer runs a quick check every 5 seconds by connecting to Mysql and does a simple select on a table.
So for a particular reason the “load balancer” was not able to finish its login attempts and it was overloading my Mysql servers. While I was in the middle of the investigation the problem suddenly stopped. I was happy but somehow scared, i had no idea what the hell happened.
A quick search into Mysql documentation reveals that Mysql is doing a reverse DNS lookup which was the cause of my problems. Since the DNS server had a problem, the operation of reverse DNS was taking far more that 5 seconds to time out. This resulted in overloading the database servers. Check this explanation in the official documentation, How MySQL Uses DNS
After reading tha page I think that mysql needs this reverse DNS lookup only for its permission module and if you don’t use host names with the grant option then you are safe to disable this option. I quote here the parameter which does this:
Do not resolve host names when checking client connections. Use only IP numbers. If you use this option, all Host column values in the grant tables must be IP numbers or localhost. See Section 7.5.11, “How MySQL Uses DNS”.
I have been able to avoid this? Perhaps, but considering that I used MySQL in production for the first time, it is unlikely to think so.
Long live the reverse DNS, cheers!
What is JMX?
Java Management Extension is an open technology for management, and monitoring that can be deployed wherever management and monitoring are needed. The most common use in a web application is for application management. This is very often an afterthought which results in many unmanaged application deployments.
You can monitor you application for availability and performance but in the same time you can use the JMX to manage and monitor you application from business perspective. Application’s runtime metrics can be expose through JMX, or in a service oriented architecture you could use JMX to control your services.
All good but when you start to work with JMX and JDK 1.5 soon you will discover one big limitation that was fixed in jdk 1.6 update 16 if i recall correctly:
Default RMI JMX agent for remote access opens 2 ports, one which is set by the -Dcom.sun.management.jmxremote.port=XXXX and one randomly assigned port.. What about firewalls?
JMX service url
- port1 is the port number on which the RMIServer and RMIConnection remote objects are exported
- port2 is the port number of the RMI Registry
The most common way to deploy your application in the production environment is to hide the Tomcat behind Apache. This has good and bad parts but it gives you a lot of flexibility and support from Apache. There are a couple of alternatives to put these two severs together:
- mod_jk, this is the old connector developed under the Tomcat project and it is using the Tomcat’s AJP protocol. It is expected to be faster than the HTTP protocol which is text based.
- mod_proxy, is the support module for HTTP protocol. It is TCP based and uses the HTTP which is plain text. When a web client makes a request to Apache, the Apache will make the same call to the Tomcat and then the Tomcat’s response is passed back to the web client. This connector is part of the Apache for a very long time and it is available also for older versions of Apache. This is the simplest way to put the Apache in front of a Tomcat but also the slowest way to do it.
- mod_proxy_ajp, is new and is part of the Apache 2.2. It is working like mod_proxy, but as the name says it is using the AJP connector for sending and getting data from Tomcat. It is using also TCP and it is expected to be faster than plain mod_proxy
After I read more about Tomcat Clustering I realized that the main purpose of Tomcat clustering is to offer fault tolerance, failover and high availability support. I read a lot about load balancing but when it comes to Java Servlets I found out that the only choice you have in terms of balancing is to use sticky sessions. This is more a limitation that comes from Java Servlet Specification and not from Tomcat, but it make sense.
For an application to be “distributed” you have to mark it as “distributable” by add the <distributable/> tag in web.xml.
There are multiple ways to balance the client request to your server pool but when it comes to Java Servlet Specification you have only one choice, as the specs say:
“Within an application that is marked as distributable, all requests that are part of a session can only be handled on a single JVM at any one time.”
“You may have multiple JVMs, each handling requests from different clients concurrently for any given distributable web application”
So, I guess you can kiss goodbye the round robin and all other load balancing options, but at least Tomcat will provide you failover, scalability and high availability.