August, 2008


29
Aug 08

Hug a developer today

Found from tOMPSON’s blog.


18
Aug 08

A Java geek’s (mis)adventures in China: South

This summer I took a three week backpacking trip to China with my wife, which was loads of fun and opened my eyes to a couple of things. This is an account of just some of our (mis)adventures.

Note that all the experiences and reports are purely my own and can be absurdly wrong in the grand scale of things. All the photos in the post were taken by my lovely wife (Ragne Kabanova) with more available at norvidia.com.

Guangzhou

We touched ground in Guangzhou (ex-Canton), the rumored food capital of China. The proverb goes that Chinese eat everything that runs that isn’t a car and everything that flies that isn’t a plane. My first impression was that they eat everything that swims:

Another impression that you get when you come to Guangzhou is “Wow, now I know what smog looks like!”. I never imagined that besides the health troubles smog would actually block out the sun! Guangzhou is permanently engulfed in a cloudy/foggy mantle and the locals don’t see sun more often than once-twice a month. I guess this is what London looked like at the turn of the previous century.

Guangzhou is also a commercial city, with endless streets selling everything you can imagine:

Some of those things are eatable as well, though you might wonder why would anyone bother:

Not entirely sure why snake penis wine is better than plain old snake wine as I didn’t get to try either of them there. However what I did get was a Chinese haircut :)

Another impression you get from China is how hard is it to communicate when you not only don’t share a language (almost noone spoke English), but also the alphabet. Although Chinese names and addresses can be rendered in English letters, most Chinese will not be able to read them. They are also likely to not (or mis-) understand your pronunciation of place names. Therefore the only way to navigate there is to have things written in Chinese (guides and maps are some help here) or call a Chinese friend and ask him to explain. It’s also common that you can call the hostel and they’ll explain the taxi driver where to go.

In fact, after returning to Estonia we met a Chinese guy, who said that the best way to pronounce the names is to scream as aggressively as you can. Thinking back this might actually have worked and perhaps the main reason for the problems with pronunciation was European politeness :)

Yangshuo

After two days in Guangzhou we moved on to Yangshuo, a place famous for its hill scenery:

Yangshuo is a nice place to chill, but hosts the most foreigners we saw in China. This meant that to eat genuine Chinese you had to walk off the main streets, where every other place offered pizza and pasta:

Of course if you walk far enough from the main streets, you might end up on the Chinese market and see just how fresh do they like their food:

One of the most interesting things we got to try in Yangshuo was the “Thousand Year Old Egg”. Basically a fresh egg is wrapped in herbs and buried in the ground for one to three months. When you take it out it’s rotten fermented and you can eat it uncooked. That is if you have the guts to do it:

Unlike what you’d think the communistic stuff was almost invisible and didn’t bother us in the slightest. If anything it was fun:

The end to the slow days of chilling came on the last day in Yangshuo, when we tried to order the plane tickets to get to Shanghai. On that very day the only website you could reasonably order plane tickets from (ctrip.com) went offline. We had to be in Shanghai the next day, so we tried everything we could. We ordered the tickets from another site, but in a kind of catch-22 it informed us on the order tracking page that we need to fax in a copy of the credit card and allow one to three days of processing (at 8 pm in the evening in pretty much the middle of nowhere).

With nothing much to lose we decided to get up at 5 am and take a taxi to the airport (a two hour ride) to try and get tickets in time for the 11 am flight. When we got there the airline booth was closed and the ticket office offered us only full-priced tickets, which was much more than we could reasonably afford. In the last attempt to solve things I called the airline hotline from my mobile and let them speak with the ticket clerk. Like magic the discounted tickets appeared, leaving us to wonder if we persisted through some kind of scam or just a weird type of bureaucracy.

Luckily we could ponder that while flying east, where Shanghai and Hangzhou were preparing a different kind of welcome for us (to be continued).


12
Aug 08

An agile way to track time

A genius way to track time — instead of making yourself to fill in tedious spreadsheets just build a LEGO tower! This is what agile is all about for me — replacing discipline with social contracts and games that we’re really good at and enjoy doing.


11
Aug 08

Is static typing and refactoring really connected?

One of the main problems brought out when comparing dynamic languages to static ones is lack of proper refactoring support. It is usually implied that dynamic languages are not conceptually refactorable, which speeds up code rotting.

Although there is plenty of evidence that dynamic languages do support refactoring, I’d like to concentrate on the other claim — that statically typed languages are refactorable. Challenging this claim may seem laughable, as there is no lack of refactoring tools for Java or C#. But let’s examine a more advanced language that is touted as Java successor — Scala.

Scala supports structural types, which allow treating classes as records of methods that can subtyped by the presence of appropriate methods. This example was given in the Scala 2.6 release notes:
[java]
class File(name: String) {
def getName(): String = name
def open() { /*..*/ }
def close() { println(“close file”) }
}

def test(f: { def getName(): String }) {
println(f.getName)
}

test(new File(“test.txt”))
test(new java.io.File(“test.txt”))
[/java]

In this code the type { def getName(): String } refers to any class with the method getName(): String in it. Now what happens if we try to rename the method in the structural type?

  1. We can rename all the instances of the getName() methods found in all classes anywhere.
  2. We can just rename the method in the structural type and update everything else manually

Both of these approaches are useless. The first one is basically a search and replace done on all code and may rename methods that we never intended to rename (e.g. getName() in the Person class). The second one doesn’t really do anything for us.

The truth of the matter is that structural types miss an inherent scope associated with nominative (i.e. usual) types. Since every method signature in a nominative type originates from a single type, it gives refactoring a natural scope of all the subtypes of the originating type. Without that scope many refactoring techniques are essentially useless.

What is worse is that the presence of structural types also breaks refactoring in usual classes. E.g. if we try renaming getName() in the File type, we are also presented with a decision whether or not we can rename the method in structural type. And if we do rename it, we will break the code that accesses java.io.File the same way. Therefore if we want to refactor working code to working code we can again only rename everything or nothing at all.

Luckily it looks like the main refactorings broken by the structural types is renaming the methods and changing their signature. Unluckily these are the most common refactorings and having a same named method in any of the structural types breaks refactoring also in the usual classes. At the moment this mainly affects Scala and some other functional languages, but if the structural types become more spread it may come to a language near you :)

Interestingly, Cedric Beust brought out that you can refactor structural types as opposed to the duck types. Since I obviously think differently it would be interesting to hear his (and your) comments on the matter. Perhaps I’m missing something obvious?


4
Aug 08

Case study: Is PHP embarrasingly slower than Java?

IP2C is a small library that provides IP to country resolution. It uses the free ip-to-country database. IP2C takes the database CSV file that is about 4mb and converts it into a ~600kb binary format and provides PHP and Java frontend to query the database.

The library is great, easy to convert an ip to a country and when using the country flags from it’s side project you could spice up your statistics with the country information. This a lot faster than using reverse DNS lookup.

The problem. The PHP implementation is a lot slower. Embarrassingly slower. Without any caching the Java version is able to do ~6000 queries per second. The PHP counterpart can push through ~850 queries. The implementations are the same. The stats provided by the author of the library are 8000 vs 1200. So about the same as my measurements.

I like PHP, I don’t use it that much anymore but I still care when I see such embarrassing numbers. I took the implementation and started profiling it. Spent the night running different tests and trying to optimize.

General outline of the algorithm is as follows. We take the dotted string IP and convert it to an IPv4 Internet network address (e.g. 69.55.232.153 becomes 1161291929). The DB holds sorted ranges of these addresses. A binary search will happen on these addresses and we have a country for the ip. Take a look at the implementation.


Lets see where the vanilla version of IP2C spends its time at. The results are based on 1000 iterations with Xdebug enabled and visualized by KCacheGrind. It processed about 210 IP addresses during this time.

IO part is surprisingly low. The internal fseek, fread constitute to 2% of the execution time. On the other hand the user level fseek which is just a wrapper alone uses 5%. readShort and readInt take 20% of the execution time.

function readShort() {
	$a = unpack('n', fread($this->m_file, 2));
	return $a[1];}
 
function readInt() {
	$a =unpack('N', fread($this->m_file, 4));
	return $a[1];}
 
function seek($offset){
	fseek($this->m_file, $offset);}

Functions calls are expensive. Lets eliminate them. readInt, readShort, fseek are now inlined. Recursion changed to iteration (e.g. 14 000 less function calls). Able to process 400 queries per second compared to the previous 210.

We see that the latest profiling results have twice the number of freads and unpacks than fseeks. It seems that fseek is used to seek out the right position, read two numbers with unpacking them. The implementation confirms that. Luckily we could just read once (2 bytes more) and unpack once (2 unpackings with one invocation).

$a =unpack('N', fread($this->m_file, 4));
$np['ip'] = $a[1];
 
$a =unpack('n', fread($this->m_file, 2));
$np['key'] = $a[1];
 
// this can be changed to
$np =unpack('Nip/nkey', fread($this->m_file, 6));

How does this version stack up to the Java version? Lets disable profiling and run 100 000 iterations. Vanilla version processes ~850 IPs, when functions are inlined the number is around 1400. Java version can still do 6000.

Lets try caching. Peeking at the Java implementation shows that Java caching version (whopping 141 242 IPs per second – yup 141k) uses just a byte[] array and makes lookups from there instead of seeking and reading from file. Easy, lets do the same in PHP.

We read everything into a string and instead of fread with access the string elements with the offset. For fseek with just set the offset. We are using 600kb more memory but can increase the throughput to ~2800.

As it seems I’ve just wasted a night, I just should have checked the Computer Language Benchmarks. PHP in the sense of execution speed is uncomparable to Java.

The upside, we can still take the library, eliminate recursions, double unpacks and add caching. A small gain is still a gain.