Thursday, May 19, 2016

Baibulo is born!

Howdy, folks!

As promised (although it took forever and then some) my first Node module is released! Baibulo (version in Chewa) is a very simple versioned content server backed by Redis. Check it out - the README should be enough to get you started but if that is not good enough there is a working example that you can basically take and play with :)

I know there's probably a ton of things that should be done differently. Please share if you think it makes sense or not. Feedback is very much welcomed!

Happy coding!

Wednesday, May 11, 2016

The world around us is changing

A few years back when working on a redesign of a small internal application for market managers at Travelocity I made a decision to go with the new and shiny single page application approach. Back in 2010 it was quite unheard of and spawned questions like "is the browser going to be fast enough?" or "do we have good enough developers to do it?". To tell you the truth both questions should back then be answered negative but as it turns out the competition that I was going against was of such low quality that anything that actually works would have been better. To sum it up the app was created in Grails of which I have mostly used the GSP engine to create the index file and a few controllers to get the API in place. In all seriousness this was quite a simple app but the effect it made on users exceeded my wildest dreams.

Now forward a few years and not only everyone knows what a single page application is but the world gets divided into the client side and the server side. It seems that everybody that forgets this is going to be left behind soon-ish :)

One of the benefits of thinking about the system as 2 entities (the client app backed by an API) is that intuitively one will arrive at a place where those two get to be developed independently. This means 2 version control repositories, 2 separate teams, 2 different build systems and 2 different sets of skills required to do the job right.

As some of you already know I am working on an intelligent home system for myself. Recently I have decided that I'd like it to be a testing field for ideas that I have stumbled upon elsewhere that would allow me to exercise in a more realistic scenario than a hello-world-type app the things others came up with. That being said I did come across a presentation from 2014 on the railsconf about splitting the deployment of static assets (a.k.a the frontend application) and using Redis to store the content of an index file so that it can be, sort of, versioned. This is one of those ideas that are really worth exploring and since I already have made the decision to split up the backend and frontend into two separate repos it was the perfect scenario to play with.

Due to my current professional occupation I've gotten a lot more interested in JavaScript development and that in both the backend as well as frontend. And so I have decided to implement the first version of the versioned content server in Node using Redis as advertised by Luke Melia but with a slight twist. I'm not really interested in all the S3 portion of it since I will be doing a thing that is mostly served by a server I know about which means I can do all the file serving myself. That being said I have decided to just version everything and have Redis as a versioned file system for my static assets.

Since I am a fanatic enthusiast of the Sinatra framework there is no surprise that I have selected ExpressJS as my weapon of choice. Both Sinatra and Express have the notion of middleware that can sit between the bare metal and your app doing god knows what. I my case I wanted to completely take over a portion of server and just serve things from Redis instead of file system like the express.static middleware would. The schema for naming keys I came up with is quite simple:

prefix:/context/path:[version]
prefix:/context/path:content-type:[version]
prefix:/context/path:etag:[version]
and to store the current version that will be served when no version is specified I'd use
prefix:/context:[version]
One of the benefits of taking the "store it all in Redis" approach is that one can query Redis to give back a full list of available versions thus making a sort of a versions selector page an easy task. Specifying the version happens via a version=[specific-version] query parameter as it allows for easy creation of links to particular versions. It just so happens that all the assets being retrieved have the header Referer that contains that particular information from the original URL so other requests (including XMLHttpRequests) can take advantage of the specified version. And since this is mostly used in scenarios I have full control over (for testing, preview and the like) there is no problem with any proxies stripping that info out of the request. And since the version can be any string I am able to deploy a feature branch, a test branch, a new proper version - whatever I want and it just transparently works. The middleware is to be mounted per context, like /hello and will use that mount point as part of the Redis key to differentiate it between different frontend apps in case there will be more than one.

The one problem I needed to solve was how to upload stuff to Redis in an efficient manner. For now I have not solved the "efficient" part of it and I just store each file with some metadata in each version in Redis. This has the added benefit of me being able to completely remove one version from Redis whenever I want in an easy fashion.

The next challenge I am working on is to be able to serve a single version of frontend against 2 or more versions of the backend. This can be quite useful when updating the backend and validating it against the current or upcoming release of the frontend app.

I will soon publish that as part of the Aplaster project and maybe deploy the server and content uploader as a gem for everyone to take a chance to use it? Will see :)

Happy coding!

Monday, March 14, 2016

The anatomy of a unit

The article in a pill: Click here.

Back in the winter of 2005 I started working on a flight planning system for commercial airlines. There I was presented with a shit-load of legacy code written to perform functions and not to be understood. Suffices to say that it took 9 months to get new developers on-board - even brilliant ones like my good friend Adrian. Sure the domain is just hard. All the physics, advanced maths, algorithms, spherical calculations and God knows what else combined with 10+ years code converted originally from Fortran produced a mixture very hard to work with. Around that same time (late 2005/early 2006) I was introduced to the idea of unit testing through a library called DUnit.

A funny (judging from the perspective of 2016) thing happened in the flight planning product. I was introduced to this idea that would make the Pervasive-based database backend (that was actually just pure BTrieve with no SQL over it to sugar coat it) interchangeable with "real" SQL database like Oracle or Microsoft's SQL server. Looking at it from time's perspective switching from a no-sql database that performed really well for the sake of "we need to support X because our clients require that of us" sounds like pure nonsense but it was what it was. And the actual most disturbing thing about that project wasn't the what as you might have guessed but the how of it. Basically what happened was that BTrieve API calls have been literally translated to extensive SQL builder. It was a disaster for a number of reasons: first the idea that the SQL calls would perform anywhere near the speed native BTrieve was just wrong. Nowadays we know that for example the performance of fine-tuned Redis for fast data inserts from multiple sources is better compared to, let's say MySQL - it is a simple fact. But back in the days the desire to run the flight planner off of SQL Server was more important than speed. And even when the project ultimately failed with the Japan government not approving it due to performance reasons (surprise!) it wasn't the biggest sin of that project in my opinion.

Another sub-project of that solution happened in the mean time having to do with parsing weather messages coming off of a satellite dish. Nowadays that'd be a service working mainly on regular expressions (as it is the case with text parsing), having clear separation of pretty much everything and being unit tested to the death but back then it was a piece of work created by just one programmer, the lead programmer of that project, consisting of just one procedure in just one Delphi unit having cyclomatic code complexity at around 6000 (that's six thousand!). It proved to be a fantastic testing ground for my tool to calculate the metric using McCabe's simplified method and gave me a ton of fun to work with. There was just one problem with the entire thing: it just didn't work as it was supposed to.

What both of the pieces had in place was code that was hard to understand, hard to read and fucking hard to fix. What you don't know is that the first one actually had unit tests to it! The coverage wasn't great (around 60%) but its readability factor was no better than the 6000-high complex walpha unit for parsing weather data. Why was that the case? Why both solutions were so bad and how could they have been made better?

To answer that one needs to first understand what a unit in Pascal-like languages is. Let's take a look at the anatomy of it down below

unit MyUnit;

interface

uses
  Classes, SysUtils;

const
  SOME_CONSTANT = 123;

type
  TMyClass = class (TObject)
  private
  protected
  public
    constructor Create; override;
  published
  end;

function GetMyObject: TMyClass;

implementation

uses
  SomeOtherUnit;

{ TMyClass }

{ Private declarations }

{ Protected declarations }

{ Public declarations }

constructor TMyClass.Create;
begin
..
end;

{ Published declarations }

{ Global declarations }

var
  MyObject: TMyClass;

function GetMyObject: TMyClass;
begin
  Result = MyObject;
end;

initialization
  MyObject = TMyClass.Create;

finalization
  FreeAndNil(MyObject);

end.

Without going into too much details one can clearly see the separation of interface and implementation sections, the list of units the code depends on explicitly and implicitly but what is most important is that a unit in this form describes a piece that is, in point of fact, self sufficient. Let's take a look at the sins of the BTrieve API rewrite and the WAlpha madness.

The SQLisation of BTrieve API has been initiated by this guy Nathan. Nathan was an architect back at the company and was high on Java which was the next best thing after the invention of sliced bread back in the days. Nathan was also a very buzz-oriented person so naturally when TDD became a thing in the industry he quickly realized that all pre-TDD code was shit and that all his future inventions will finally be good. Nathan led the project with another college of mine who got strung out on TDD the same way. Nevermind the fact that the project was actually carried out in Delphi and not Java since DUnit was already around they decided to take it to the next level. And that they did. Each and every class was having an interface, each interface and class was put into separate file, each had a unit test for it - all according to the best practices. What it meant for a developer using their code was a screen-long list of uses statement, anemic tests and a system so complex they had no idea how it works. Debugging took weeks and even though the system had such a great code coverage (of which they have been so proud!) it failed when it came to real-world usage.

The WAlpha case is on the other end of the spectrum. It was carried over by an experienced programmer, Irene, who had been with the project for years. I think she might have had the longest participation in the project besides the original author. She was used to the codebase, never paid any attention to suggestions from younger team mates and what is even more frighting is that she was in a position of power having the axe in hand that could expel you from the project in a heartbeat. So as I said before she did the coding on WAlpha all by herself. She wasn't very big on the whole TDD buzz that was going around so she did what she did best - she tested all the code inside the Delphi integrated debugger (a phenomenal piece of software compared to anything else I knew back in 2006!). And when it finally worked she called it a day and collected the awards coming her way for the job, obviously, well done.

For a very short time I took part in the BTrieve API thingy but couldn't stand the stink of Java in Delphi. It was just too much. I said to myself that I can write something better over a weekend that will work faster and will have less code than what all those geniuses did. And I was right! A weekend and six-pack later I have had a fully working read-only solution to the problem with the write portion 80% done and not completed because the weekend run out. Leaning on the shoulders of ADO drivers for SQL Server and Oracle I was able to navigate the tables, search through them and do all that blazing fast. The original project still used the same drivers but was set dead on on the SQL aspect of it which turned out to be a disaster. Soon after I presented my solution to the team I have heard that it is very nice but (and here comes the best part!) we have invested so much time already that we won't back out now. Funny enough my little side project turned out to be a fully working solution that I was able to offer to other companies and their abrupt solution didn't make it to production at all.

Those are just 2 examples of projects that failed to stand the test of time. Both have been very much different in their design, concepts used to created them, developers and their prior experience. What they have in common is that in both cases it wasn't the right thing those developers focused their attention on than what was actually needed for them to succeed. That thing I am referring to is clarity. Back in 1994 I red an article about different developers on the demoscene (Amiga and C64 was my thing back then) and what they viewed as the most important thing when it came to software development. One of them stated that the code doesn't need to work and be bug-free right away but needs to be written so that it is easy to navigate and fix whereas the other stated that he doesn't care at all about those qualities because all that counts is that it looks cool when showed on a copy party on the big screen. In my opinion both of the guys were right in their own areas. When you write code once, make money on it and throw it away (not even pass on for further development - just throw away) concentrating on clarity, test coverage, readability and whatever comes to mind when we talk about the properly engineered code makes absolutely no sense. It is pure waste and everyone should understand it. On the other hand if the code will be maintained for months and years to come forgetting about readability and concentrating only on how big the coverage is and how fast the tests run will cause all kinds of curses from your fellow programmers.

There's one universal truth to software development that has not changed ever: code is much more often read than it is written. It's that simple. If you write code that is tested like Forth Nox but nobody can understand what the hell you meant everyone will be in trouble (most importantly you if you're still around!)

There's another truth that I think is the mother of all statements: In software development there is no substitute for thinking. No discipline is going to make you a professional programmer, no design pattern is going to allow you to create readable code even though we try to tell it to ourselves that design patterns are the vocabulary of modern software development. My friends you can use 10 design patterns and make everybody hate you with a passion at the same time when you don't pay attention to readability and clarity.

There's another piece that I found irritating around the unit testing paradigm - especially with the test-first approach. When I do coding I usually have no idea what will come out of what I am doing. I explore ideas, options, usually figure stuff as I go. I might give a library a go if I think that it might help me out or I might put together some code from stackoverflow.com to see if it actually performs the thing I want it to do. At the time of writing I have no idea if it will be production-quality-top-notch-super-duper of if I'm going to flush it down the drain in a few minutes/hours/days. And as such I try to follow my heart and I don't write tests (much less test-first). I do YAGNI because I think that what I created is shit and nobody will want to see it. Later on when it turns out to be valuable I tend to write system tests to make sure I lock the end result in place. I test the whole thing in as much isolation as possible - but not an inch more. I seldom write real unit tests as such (except when the architect of a solution is still strung out on code coverage then I do that for his pleasure). I think that testing code in isolation makes absolutely no sense whatsoever. Single methods are useless pieces of a whole system that if exercised in separation give one no clue if they work as part of the whole. In Delphi the concept of a unit allows a developer to put together an implementation of a fully functioning unit of work that can be nicely tested through the provided interface. No other language that I know of goes about this the way Delphi does. And the funny thing is that Pascal wasn't even created with that in mind! It was a remedy to switching between header and source in C and C++! But the definition of a unit is in my personal opinion the best there is in all the languages I worked with. Those units make sens to be tested.

Remember: think before you write, read it after you wrote it and in two weeks time. If it makes no sense what you wrote re-write it until it is readable. Refactor, extract, rename, test, unit-test, re-test - do whatever you need to make sure it's not going to be ordeal for whoever will work with that piece next. For all you know it might be you!

Saturday, March 12, 2016

Unit test coverage means nothing

If you're like me or any other person that got hooked up on TDD you might want to take a look at this presentation by the creator of Ruby on Rails on RailsCon 2014

RailsConf 2014 - Keynote: Writing Software by David Heinemeier Hansson

I think this is the missing piece of revelation that I have been looking for for years on end. It always felt like there's something wrong with the world of TDD but I just couldn't figure it out myself. I did have a project with 100% code coverage and tests running bloody fast that broke down in the first week of being online and I did write software working for 10+ years that had no tests whatsoever and still performs its duties Today without a hiccup! I also wrote a ton of software that I'm not proud of but some pieces from 17 years back when I read through them now look as though I had a genius by my side in terms of readability and clarity. This is a pure stunning experience.

Go write some code that you'll be proud reading 10+ years from now!

Windows and Git - config --global not working

When you use Git and you're behind a corporate firewall that blocks access to remote repositories via the git:// protocol you're being advised on many sites to exchange the git:// protocol to https://

git config --global url.https://.insteadOf git://

That's all nice and dandy until you're on Windows where this just simply doesn't work when used from npm. I only experienced that problem on Windows 10 but I'm pretty sure it's going to be equally fucked up on any other version. The reason for it is that npm uses some kind of different user to clone the repos. Let's face it: whatever the reason is Windows just suck big time anyways.

The only way I found that would make it work and allow to install packages using npm is to do the same configuration but system-wide like so

git config --system url.https://.insteadOf git://

I'm far, far away from saying that life's good again but that piece works now.

Saturday, February 27, 2016

Samba public share

Samba is overly complex. The number of configuration options makes it very configurable and therefore cool but some of those options are just completely crazy.

One such example is the creation of publicly available folder - something that I have no doubt is very popular when you create a NAS server at home and you just want to have one network share to exchange files between computers. Doing that using Microsoft Windows is quite simple: you just specify that everyone shall have read/write permissions and that is it. With Samba on Linux the case is not quite so easy. Here's an example configuration that achieves just that:

[public]
  path = /storage-location-of-public-drive
  guest ok = yes
  read only = no
  public = yes
  browseable = yes
  writeable = yes
  create mask = 0666
  force create mode = 0666
  security mask = 0666
  force security mode = 0666
  directory mask = 0777
  force directory mode = 0777
  directory security mask = 0777
  force directory security mode = 0777

I dare someone to logically explain why the hell one needs 4 entries to set the same thing (create, force create, security create and then finally force security create) and then defend that as a sane thing to do.

Anyways... Creating public Samba share demystified

Thursday, February 4, 2016

Top 10 Most Common Mistakes That Java Developers Make

I recently came across a very interesting article by a gentleman called Mikhail Selivanov describing a number of problems young developers struggle with.

Top 10 Most Common Mistakes That Java Developers Make

Even if you're an experienced developer you might find it interesting. Us pros we tend to forget what mistakes can be made. Going through them helps us understand our young colleagues better.

Have a nice day!