Monday, October 20, 2014

Data structures are not tabelar

Recently one of my colegues has been designing a piece of structure that was supposed to store some data about something. It isn't really important why or what data. The point is that for the most part there have been 1:1 or 1:n relations. I can't tell why or how it came to me but the itch was just to strong to let it go and I really had to ask what is actual thing he's trying to achieve.

The data model consisted of 8 tables, 2 additional artificial concepts (just for the sake of storing data) and a bunch of names that didn't really make any sense.

It turns out that when we tried to create example records we wrote them down very naturally in JSON because storing data in tables on paper was just to cumbersome.

A rule of thumb: if you can't picture something don't do it!

I started to look for an alternative storage for those documents and found out that there are not too many options available for free. There's obviously the (almost infamous) MongoDB but with its recent bad press and the lack of embedded mode I felt it's not the right way to go. Luckily we've stumbled upon OrientDB - a multi-paradigm database implemented in Java. Since the application was already on the JVM having an option to do an embedded document database seemed like the perfect match.

Now the whole thing is just a document with some embedded documents since all the data comes as one. And using OrientDB's pseudo-SQL dialect is super simple!

Go ahead and try it out yourself! It's super simple!


Happy coding!

Friday, October 10, 2014

Continous deployment - Java style

Recently the topic of continuous deployment has been a hot topic around the office. CD this, CD that, we need to CD because [increase productivity, faster time to market, agile deployment]... You name - we have heard it. I'm sure you don't really need to be told what continuous deployment / delivery is. It's all about making deployments easy, boring and natural part of your work day.

When faced with the task in any of the scripting languages (take PHP for instance) the task is extremely easy: just rsync the files and you're done. This means that you need to take into account that some session data may be null in certain circumstances and you're pretty much done. It's like having a zillion nano applications, each file you hit from the browser being one of them and some services (the includes). Bum! You're done!

In Java the case is a little bit different. First of all one needs to understand that the usual deployment of Java applications involves deploying of 3rd party components in binary (a.k.a. compiled) form. The library takes a form a zip file with a ".jar" extension and contains lots of folders with ".class" files (the product of compilation of ".java" sources). For that very reason doing CD on Java applications isn't all that easy. The second thing that makes it even more attractive is that the actual application is packaged in yet another zip file, this time with a ".war" extension (as in "Web Application aRchive" I presume). The third insanity level is the packaging of multiple web applications alongside some business layer services all packaged into yet another zip archive, this time with the ".ear" extension (no, it's not your hearing organ, it's the "Enterprise Application aRchive"). This has historically had only one reason: to be able to provide a packaging mechanism and to minimize the overhead of data transfer over the wire (I mean the must have been something else but I didn't find anything on that topic so far so I take it I'm right on this one).

To be completely fair there is a way to deploy both unpackaged .war's as well as .ear's (however strange that sounds :D) to an application server, but since it doesn't really matter if in the application a single ".jsp" (as in Java Server Pages, similar to ASP's - Active Server Pages in the Redmond world) file gets updated because it most likely uses some binary ".class" file that will not get updated. There are paid solutions to this problem but I think it's going to be a fairly seldom case where you'd want to pay lots of money to get CD done (unless you can spare then off you go!).

For the purpose of this discussion we're going to focus only on .war deployment descriptors and only on the reference implementation of the servlet container, Apache Tomcat, and only in version 7+.

What do you need to get continuous deployment done? The answer couldn't be simpler: proper naming!

Here's an example: we're working with an application called "example" (for lack of a better name) and we want the following:

1. Users using the system will not experience any undesired results, that includes:
- sudden unavailability of the system
- change in behaviour
2. Users using the system will make a semi-conscious decision to start using the new version
3. The old version will be automatically uninstalled once every user makes the decision from pt. 2

So here we go. The first version can be named anything. So let's go simple and call it example.war. Since it is most likely that the application will utilize some server state in the form of a session the client will get a cookie with unique ID called JSESSIONID. This is what binds the user to a deployed version of the application on the server. Now if a user logs out then a new JSESSIONID is generated. This is very important. Read on.

Tomcat has the capability to run multiple versions of the same application out of the box since version 7. How is it done? By naming the next versions properly:

example##001.war
example##002.war
example##003.war

Please note that the naming of the version is alphanumeric therefore I took the leading zero to pad the version number so that it always increases. The main point here is that the resolution which user will hit which version is done by the JSESSIONID!

- if no JSESSIONID is sent from the client - newest version
- if JSESSIONID is sent from the client but cannot be bound to any existing version - newest version
- otherwise there's a JSESSIONID matching a running version

An automated shell script to get the next version number from a remote server is as follows:

#!/bin/bash

user='your-user-on-remote-machine'
host='name-or-address-of-remote-machine'
location='location-of-webapps-folder'
apppatern='base-name-of-your-application'

number=$(ssh ${user}@${host} "ls ${location}/${apppatern}*war -1 \
        |sed 's,.*${apppatern},,;s,##,,;s,\.war,,'|sort -n|tail -n1")

if [ -z $number ]; then
    numpad=3
else
    numpad=${#number}
fi

number=$(expr $number + 1)
nextnumber=$(printf %0${numpad}d ${number})


echo ${nextnumber}

To make sure the application gets automatically undeployed when everybody's session is either timed out or otherwise invalidated include the

undeployOldVersions='true'

parameter in the "Host" element of your server.xml configuration file. Done.

So to bottom line this for you:

1. Use naming convention in the form of appname##version.war remembering that it is alphanumeric and _not_ numeric so padding is crucial
2. add the undeployOldVersions="true" parameter to Host in server.xml
3. Start rolling updates

Of course the entire process in real life is a lot more complex. It involves automated testing of the application before it gets released, automated copying of files to the server - stuff like that. But the essential piece is there and you get it absolutely for free. Please note that since it is the entire version of the application being updated it is OK to have your dependencies updated as well with such an update. This will just work.

Here's the link to the relevant configuration options in Tomcat:

http://tomcat.apache.org/tomcat-7.0-doc/config/context.html#Parallel_deployment

Happy CDing!