Wednesday, January 20, 2016

Web platform performance comparison

I have developed a deeper interest in nodejs recently especially in the area of server applications. Node's HTTP module makes it extremely easy to create small and very fast applications. As a polyglot software developer I have decided to test how node compares to other frameworks. For this I have chosen, rack, Sinatra, node/http and due to my current occupation, Java servlets with Tomcat as the servlet container.

I started the benchmark with Siege and ab but have soon realized that there is something wrong with the world. As it turns out the problem was performance on the side of the request initiator. I have since switched to wrk which does a much better job all round.

Ruby

I started the test with Rack on Thin. It is the framework for Ruby so the idea was that it will be a good reference point for others.

Running 10s test @ http://localhost:9292
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.38ms  619.35us  16.05ms   92.73%
    Req/Sec     1.42k   200.93     1.59k    95.56%
  25527 requests in 10.02s, 3.12MB read
  Socket errors: connect 10, read 0, write 0, timeout 0
Requests/sec:   2546.72
Transfer/sec:    318.34KB

I ended up running this test about 10 times with different threading models on Thin and couldn't get it to pass the 3k rps mark

Sinatra

This is the most elegant solution of all. Sinatra is just hands down the cleanest one ever.

Performance-wise one might clearly say that under thin it performs very much OK

Running 10s test @ http://localhost:4567/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.90ms    0.91ms  13.55ms   84.49%
    Req/Sec     1.75k   120.16     1.99k    68.00%
  34765 requests in 10.01s, 7.43MB read
Requests/sec:   3474.70
Transfer/sec:    760.09KB

I mean almost 3.5k requests per second, nice API to program against... What more can we expect?

Node + http

Having established the baseline now was the time to start poking around nodejs.

Well, that is also very succinct, even though the asynchronous API might feel a bit weird at the beginning. Performance wise it was exceptionally good!

Running 10s test @ http://localhost:8080
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   627.53us    1.19ms  37.32ms   99.26%
    Req/Sec     8.95k     1.06k   18.06k    97.01%
  179056 requests in 10.10s, 22.03MB read
Requests/sec:  17729.12
Transfer/sec:      2.18MB

17.7k requests per second! Compared to Ruby it is a 5.1 times better performance!

The platform itself is single threaded so in order to make use of all the CPU power one would simply spin off a few instances on different ports and put a load balancer in front of them. Luckily there is a node module called loadbalancer which makes the whole experience quite approachable for mere mortals:

Running 10s test @ http://localhost:8080/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   658.55us    0.94ms  17.34ms   95.67%
    Req/Sec     9.55k     1.40k   19.58k    83.08%
  191007 requests in 10.10s, 23.50MB read
Requests/sec:  18912.15
Transfer/sec:      2.33MB

The setup is way more complex, there are 2 types of applications and all but the gain isn't what I would expect.

So I thought since JavaScript is slow I decided to give haproxy a go. After all it is a specialized application for balancing high-load traffic. I would expect way better performance than the JavaScript-based simplistic load balancer. And so I downloaded the latest sources, built it, configured and ran the tests.

And here are the test results

Running 10s test @ http://localhost:8090/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.91ms    0.92ms  12.88ms   93.20%
    Req/Sec     6.53k   729.78    11.21k    80.60%
  130444 requests in 10.10s, 13.06MB read
Requests/sec:  12915.95
Transfer/sec:      1.29MB

What what what??? HAProxy is 30% slower than a proxy written in JavaScript? Can this be true? I'm sure the configuration I came up with can be tuned so I'm going to call this one a draw and move on.

Node has one more option to choose from - it's the cluster. The idea is quite simply to fork the current process, bind the socket to the parent's port and let the parent distribute the load over the spawned children. It's brilliant in that it doesn't add the overhead of making additional proxy request. So it should be really fast!

As you can see it is very simple and also very expressive. If you add to it that this is actually the only file in the entire solution it starts to take a really nice shape. My computer has 4 cores so I'll be spawning 4 processes and let them process the requests in a round-robin way. Now let's take a look at the results:

Running 10s test @ http://localhost:8080/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   592.01us    2.58ms  80.36ms   97.85%
    Req/Sec    15.96k     2.56k   19.74k    87.62%
  320693 requests in 10.10s, 38.54MB read
Requests/sec:  31751.61
Transfer/sec:      3.82MB

Wow!! 31.7k requests per second! Fricken amazing performance! JavaScript's V8 engine rocks! Let's leave it at that.

Java

Now to get a sense of where Node with those 31.7k rps places itself on the landscape I decided to test Java servlets. I didn't went with spark or anything else of that sort since I wanted to compare only the most prominent solutions (or Sinatra since you just can't ignore that extremely beautiful framework).

As you can see we're doing a Maven project with just one servlet and the web.xml. Let's see the performance on that baby:

Running 10s test @ http://localhost:8080/example/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.90ms    5.34ms  85.90ms   92.74%
    Req/Sec    21.62k    12.25k   43.35k    50.50%
  430345 requests in 10.00s, 47.69MB read
Requests/sec:  43028.84
Transfer/sec:      4.77MB

Hold your horses! Yes it is faster but one needs to remember that Java needs time to get to the top performance. So I ran the test once again

Running 10s test @ http://localhost:8080/example/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   804.67us    1.92ms  32.48ms   91.16%
    Req/Sec    31.80k     2.68k   37.12k    77.00%
  632656 requests in 10.00s, 70.10MB read
Requests/sec:  63258.42
Transfer/sec:      7.01MB

Now that is just unbelievable! 63.2k requests a second is twice the speed the fastest node solution was capable of yielding! Twice!

Post scriptum

In reality the performance of the platform doesn't really matter all that much. If you take into consideration the response times they all are below 3ms which in turn means that if you make one call to the database you already blew the performance as that is going to cost you way more than just a couple milliseconds. But it is really nice to know the characteristics and to know that performance-wise it really doesn't matter what you choose these days. The framework is going to perform on an acceptable level. Even Sinatra with the 3.5k is still fast enough to serve thousands of requests a minute which is more than enough for most corporate solutions out there.

For a much more complete comparison of many frameworks and platforms check out the techempowered site

Happy coding!

1 comment:

Matthias Hryniszak said...

@Andrzej: can you repost your comment, please?