Tuesday, November 24, 2015

Building configurable applications

As it usually is the case recently I've been reviewing options to build web applications that can be easily transferred between environments. I found some very interesting examples on what can be done to achieve some quite interesting results. But let's start at the beginning..

The project has been released to the public as com.github.testdriven:cfgagent:1.0.0

The problem

The problem usually is that the application we develop tend to have too many configuration options. Database connection string, SMTP server settings, file locations (more than one), configuration of connection pool... just to name a few. As the number of options grows (the project I'm working on now has about 200 of those) passing them all in the command line using JAVA_OPTS is just not possible. And I don't mean like "not practical" but plain and simple not possible due to the limits of what a command-line can take. There has to be a better way to do it.

There actually is a mechanism that one can use with Tomcat called catalina.properties (and I'm sure one would find similar ones in other containers) but that is not going to fly if we want to select which set of options we want to use for this particular run or if the values should come from environmental variables (for example from Docker or Heroku).

JVM Agent to the rescue

I've been looking for a while for a nice, pluggable solution to this problem. My idea was roaming about something that could be fixed but parameterized at the same time, could be specified at execution time (just like JAVA_OPTS are) and in general to be just easy to use regardless of the execution environment.

Looking at it from different angles I turned my attention to what is executed before the main method. As it turns out there is such a thing and it is the concepts of Java Agents.

Without further due let's see the solution:

The code is pretty much self explanatory. First we load the properties from a file name given to the agent (default: system.properties), then we replace all the {placeholders} with environment variable values.

Example usage

At first we create a Docker (instantly cool) container with Oracle. Then we start Tomcat with system properties configured based on the system.properties.

Conclusion

This small utility can easily be used with any type of process where the _OPTS part is getting way too long to be maintained. I've tried it with JBoss, Tomcat, command-line and it worked great every single time :) It also has this nice property that you compose behaviors instead of hard coding them in every application and your products depend only on the concept of system properties (System.getProperty()) that is already present in Java

Happy coding!

It feels great to be blogging again :)

Monday, November 23, 2015

Running SQL queries - the groovy way

Recently I've been tasked with the creation of a utility that executes predefined queries against a database via a command-line interface. A maintenance-sort of utility. I thought this is a a perfect opportunity to freshen up my Groovy DSL skills and see what I can put together to make the code readable.

The idea

The generic idea was to create an object that would describe what the query is all about, with placeholders, then fill in those placeholders and finally execute the statement. It's a one-off operation so we won't be concerning ourselves with connection pooling and the like. Let's KISS.

The implementation

What I'm about to show you here is an over simplified version of a implementation that might be useful. Actually it is quite a bit of overhead for the simplicity Groovy already has in its toolset (the groovy.sql.Sql class) but it opens up a way of thinking of interoperability with the database.

So here we go. First we create a class to house the main property (the query to be executed) and allows for switching context to one where only execution-time properties play a role. In that switched context we execute the query and complete the execution.

Imagine now that instead of just using the Object[] array one would allow for a map to be passed on, like so:

It really doesn't get much more complex than this in regard to predefined queries :) If you'd extract the engine that fills in templates from placeholders and a map of keys (placeholders) and values you could apply the same to pretty much everything that takes a string and parameters and executes it, for example the execution of external commands :)

Happy coding

Sunday, November 22, 2015

Algorithms, data structures, accidental and essential complexity, inheritance, composition and functional programming

Today we're going to go through the tools a programmer has in order to see how and when we can use them to solve programming problems.

Let's start with some of the good old slogans that we have been fed for years:


Every loop can be replaced with functional programming
Every decision can be replaced with inheritance
Every inheritance can be replaced with composition

Does it mean that all our inheritance structures should be 2 level (Object -> MyClass) and that inheritance for itself is bad? Does it mean that if we use a switch/case or an if we're committing a crime against purity? Are those just relics of an era that's come an gone? What do you think?

I strongly believe if there is a tool (even like the goto instruction) and even if the fashion for it has been mentally superseded with a new construct (or even an older one!) then it is still valid to use it in certain contexts. For example goto in the context of text parsers is still very much valid and in use for years to come.

So how about loops and conditions? When is the right place to use them?

First things first - inheritance

Let's tackle the inheritance first because it is the simplest one. You inherited genes from your parents because you're a man. If you'd be a dog you'd not inherit a thing from your human owners as they are not your parents (although your mrs. loves you very much!). An employee is most probably a human being (although there are examples where that would not be true) and a car is a vehicle (when it works otherwise it's a problem).

To put it bluntly: whenever what you inherit is a thing you inherit from then you're doing the right thing. When you're inheriting because you'd like to have that same functionality but extended or twister then you're committing a felony and you should burn in hell.

Complexity

Let's take a look at the types of complexity we're working with on a daily basis. As we've been already told there is essential complexity (driven by the complexity of the domain we're working with) and accidental complexity (driven by the technical details of framework/language you're working in). A good example everyone can relate to is sorting so let's take a look at 2 quicksort implementations:

It is pretty straightforward, right? At first we're slicing the input in half, figuring out the elements that are lesser than the middle point and swapping them out with those that are to the right. Basically we're grouping elements smaller, equal and greater than the pivot and sorting those groups further until there is anything to sort. Efficiency aside the algorithm presented here is pretty clear. Can it get any clearer?

At first when I described the algorithm to you in the above paragraph I told you not how we're grouping things but that we do group them. I even gave you the recipe on assigning elements their destination group. This in turn describes the essential complexity of the algorithm. Can we remove all the accidental complexity? Is that even possible? Let's try it out with Groovy:

Try to read the above code and explain to yourself what it does and how it does it. Immediately you'll notice that it lacks something. I mean it must lack something, right? The Java implementation is 56 lines long and this is just 6 so there surely is a huge gap in those two implementations. But is there really a gap? First we define the exit criteria (list size less than 2), then we figure out the middle point, then we group the elements by their comparative ratio (if there are no elements then return an empty list), then we take all the smaller, sort them, add the equal to the pivot and then sort the bigger ones and add them to. I mean it takes more words to describe what is taking place than really reading the algorithm!

By the way... Take note of the if statement. It's there because the algorithm dictates it.

I think we should go over one more example, maybe less algorithmic but definitely very useful: running external applications from Java.

It is again, very straight forward: get the hold of the runtime, execute the command, read line-by-line the output from a buffered reader. Done. Can it be any simpler?

Wooow! Now that is really the salt itself: just execute the command I gave you and print out the text that came out of it. It's almost like writing a bash script but with the indescribable potential you get when using a full-fledged, mainstream JVM-based programming language.

Personal preference aside I think those two examples demonstrate pretty clearly that software can be created much faster and in a readable fashion. Obviously I'm far from suggesting you write your own quicksort implementation. Those types of lego pieces are already in place and you don't really need to worry about them. But you do need to worry about your algorithms.

Algorithms and data

Remember when you have been introduced to the SOLID principles? My God is that a bizarre harness for a developer when he hears about this the first time! Hands get soaking wet, head shakes in disbelief, and immediately questions like "why" and "how" start to raise but what it really is are means to implement extendable, easy to understand programs.

Single responsibility principle tells you that a piece of code shall only have one reason to be changed. That reason would be for example change in the algorithm but a you probably already know those change very infrequently.

Open/Closed principle tells you to write your software in such way that you can extend it without recompiling.

Liskov substitution principle tells you that no matter which implementation you're going to use the outcome of the algorithm shall stay the same and there should be no difference in the outcome when that algorithm ends. That don't mean it there can only be a single implementation because the sole purpose of doing a different implementation is to inflict change. But the fact is that if your algorithm is generic enough it will always work the same and the changing parts will be properly externalized. A fantastic example here is the template method pattern and composition over inheritance is JDBCTemplate in Spring JDBC. It allows you to tell "in place" what the behavior you'd like to inflict on the result of your query is and takes aside all that nagging boiler-plate away. No matter what you do the algorithm there will always do the same which is iterate over the ResultSet and call your method when appropriate.

Interface segregation principle tells you to only expose as much of functionality as the user of your code would. So for example the JDBCTemplate could theoretically give you the option to close the ResultSet but it doesn't do it because you'll never need it (and if you do you're doing something wrong)

Dependency injection principle modeled after one of the the Hollywood's most famous quotes, Don't call us - we'll call you, is also very well observed in the JDBCTemplate where the algorithm in that implementation takes complete control over the flow of your program and calls you back when your intervention is required.

In addition to this Niklaus Wirth came out with this book of his, Algorithms + Data Structures = Programs. So what's that all about? Has it not been superseded by SOLID principles?

The thruth is that the SOLID principles back up that book's messgage which is to take a point in dividing the essential and the accidental complexity. To make sure you don't include data in your algorithms. What do I mean by this?

Suppose you're writing a very simple program to tell the driver that the current environmental conditions are somehow dangerous. My car has that kind of feature and when it is too cold (below 4°C a subtle alarm will go off and the temperature display will blink a few times. What my car lacks however is that if it is too hot (and that'd be for me > 30°C) then I should be notified by this as well. Otherwise I could fall into the same danger as a boiled frog. I suspect that the program they have has some fixed boundaries, maybe something like this:

What is wrong with this type of programming is not that the temperature is not extracted to some constant or even taken from some function that would combine it with different sensor readings. The bad part of it is that if we would extend it we definitely need to change it and that is not kosher.

That's a whole different story! You can even read the settings from a database or mock the input in test! Cool, ain't it?

Conclusion

Every time you write a program think if what you are doing comes from the essential complexity of the domain you're working with or if it is just because the tools you have are not powerful enough to allow you to express it more elegantly. If you find yourself in the latter position extend the tools (use meta-programming, use reflections, use monkey-patching, use whatever you can) to make sure that if you read the algorithm a year from now it will still read nicely and clearly.

Happy coding!