review


27
Jul 11

What I saw at Devops Talks Back event in London

Just had a great evening at the Devops Talks Back event in London. My smartphone was dumb enough not to update its timezone information on arrival to UK and I was 2 hours early. Luckily the event took place at a mini Google like office from the Forward guys, besides coffee and drinks I was asked if I needed a laptop to kill time :)

I chatted with the early arrived organizer and speakers about Ruby, Java deployments and British comedy sketches (Big Train, Little Britain and Alan Partridge).
Continue reading →


3
Jun 11

MailChimp – No More Bananas for You

I have not ranted for some time but I just heard and saw so much crap about MailChimp that I just had to open up a draft here and let some steam out. MailChimp was my choice of newsletter software years ago and it worked fine if you leave out some quirks here and there. As the usage has grown it has been brought to my attention that this software does not scale. Throughout the conversations I’ve also discovered a way how to block any account on MailChimp. So lets start.
Continue reading →


4
Aug 08

Case study: Is PHP embarrasingly slower than Java?

IP2C is a small library that provides IP to country resolution. It uses the free ip-to-country database. IP2C takes the database CSV file that is about 4mb and converts it into a ~600kb binary format and provides PHP and Java frontend to query the database.

The library is great, easy to convert an ip to a country and when using the country flags from it’s side project you could spice up your statistics with the country information. This a lot faster than using reverse DNS lookup.

The problem. The PHP implementation is a lot slower. Embarrassingly slower. Without any caching the Java version is able to do ~6000 queries per second. The PHP counterpart can push through ~850 queries. The implementations are the same. The stats provided by the author of the library are 8000 vs 1200. So about the same as my measurements.

I like PHP, I don’t use it that much anymore but I still care when I see such embarrassing numbers. I took the implementation and started profiling it. Spent the night running different tests and trying to optimize.

General outline of the algorithm is as follows. We take the dotted string IP and convert it to an IPv4 Internet network address (e.g. 69.55.232.153 becomes 1161291929). The DB holds sorted ranges of these addresses. A binary search will happen on these addresses and we have a country for the ip. Take a look at the implementation.


Lets see where the vanilla version of IP2C spends its time at. The results are based on 1000 iterations with Xdebug enabled and visualized by KCacheGrind. It processed about 210 IP addresses during this time.

IO part is surprisingly low. The internal fseek, fread constitute to 2% of the execution time. On the other hand the user level fseek which is just a wrapper alone uses 5%. readShort and readInt take 20% of the execution time.

function readShort() {
	$a = unpack('n', fread($this->m_file, 2));
	return $a[1];}
 
function readInt() {
	$a =unpack('N', fread($this->m_file, 4));
	return $a[1];}
 
function seek($offset){
	fseek($this->m_file, $offset);}

Functions calls are expensive. Lets eliminate them. readInt, readShort, fseek are now inlined. Recursion changed to iteration (e.g. 14 000 less function calls). Able to process 400 queries per second compared to the previous 210.

We see that the latest profiling results have twice the number of freads and unpacks than fseeks. It seems that fseek is used to seek out the right position, read two numbers with unpacking them. The implementation confirms that. Luckily we could just read once (2 bytes more) and unpack once (2 unpackings with one invocation).

$a =unpack('N', fread($this->m_file, 4));
$np['ip'] = $a[1];
 
$a =unpack('n', fread($this->m_file, 2));
$np['key'] = $a[1];
 
// this can be changed to
$np =unpack('Nip/nkey', fread($this->m_file, 6));

How does this version stack up to the Java version? Lets disable profiling and run 100 000 iterations. Vanilla version processes ~850 IPs, when functions are inlined the number is around 1400. Java version can still do 6000.

Lets try caching. Peeking at the Java implementation shows that Java caching version (whopping 141 242 IPs per second – yup 141k) uses just a byte[] array and makes lookups from there instead of seeking and reading from file. Easy, lets do the same in PHP.

We read everything into a string and instead of fread with access the string elements with the offset. For fseek with just set the offset. We are using 600kb more memory but can increase the throughput to ~2800.

As it seems I’ve just wasted a night, I just should have checked the Computer Language Benchmarks. PHP in the sense of execution speed is uncomparable to Java.

The upside, we can still take the library, eliminate recursions, double unpacks and add caching. A small gain is still a gain.


1
Apr 08

COBOL blog platform

Several weeks ago, while working on JavaRebel AI Module, we accidentally gave it access to our web server. Before we found out it rewritten all of our blog platform in COBOL. We are not sure where did it learn to program that, but when we tried the new platform, it was excellent. Not only is it a SOA-based RIA, but it’s fully written using REST, JSON and CAPS.

In fact it is now our firm belief that with technologies like that COBOL will make a return and become the language of choice for web development. I mean who needs local variables, recursion, dynamic memory allocation, or structured programming constructs when we have a language that reads like plain English. All the real programmers know, how important is to have code that reads well, and our new blog platform provides twice the scalability of Java on half the hardware to boot.


22
Mar 08

Mozilla Prism gets an overhaul

Although two weeks late, I finally noticed that Mozilla Prism has been updated. Mozilla Prism is a “One Site Browser”, which is to say a browser started from your desktop tied to one particular web site. I have been using it since the first release, mainly to separate the Google Mail, Reader and Calendar windows from the rest of my browsing experience.

The new version is a significant reworking of Prism. First of all you no longer have to install a 6.6 Mb application in addition to Firefox. Now you can just download a 500 Kb Firefox extension, which will start Prism as a particular Firefox profile. And you can create the desktop shortcuts to your web site in one click using “Tools -> Convert Website to Application”.

Secondly Prism will finally pick up the favicons that the website is using and use it both as shortcut icon and (drumroll!) the application window icon! Before you had to download the icons manually and it still would use the Prism icon in the taskbar, which made it much harder to distinguish the windows. Having the GMail icon in the taskbar is just what I’ve been waiting for. Now, if only it would change on new mail…

However, no matter the changes, I’m still stuck with a major annoyance — no Firefox shortcuts work. And since half the sites on Internet do not optimize for 1680×1050, my first reaction in Firefox is often Ctrl+, to increase the font size. Well, hopefully they hit it in the next release, you hear that, Mozilla?

P.S. I’m making this post from my nifty dow.ngra.de admin browser app, now where do I find an icon for that?