report


4
Aug 08

Case study: Is PHP embarrasingly slower than Java?

IP2C is a small library that provides IP to country resolution. It uses the free ip-to-country database. IP2C takes the database CSV file that is about 4mb and converts it into a ~600kb binary format and provides PHP and Java frontend to query the database.

The library is great, easy to convert an ip to a country and when using the country flags from it’s side project you could spice up your statistics with the country information. This a lot faster than using reverse DNS lookup.

The problem. The PHP implementation is a lot slower. Embarrassingly slower. Without any caching the Java version is able to do ~6000 queries per second. The PHP counterpart can push through ~850 queries. The implementations are the same. The stats provided by the author of the library are 8000 vs 1200. So about the same as my measurements.

I like PHP, I don’t use it that much anymore but I still care when I see such embarrassing numbers. I took the implementation and started profiling it. Spent the night running different tests and trying to optimize.

General outline of the algorithm is as follows. We take the dotted string IP and convert it to an IPv4 Internet network address (e.g. 69.55.232.153 becomes 1161291929). The DB holds sorted ranges of these addresses. A binary search will happen on these addresses and we have a country for the ip. Take a look at the implementation.


Lets see where the vanilla version of IP2C spends its time at. The results are based on 1000 iterations with Xdebug enabled and visualized by KCacheGrind. It processed about 210 IP addresses during this time.

IO part is surprisingly low. The internal fseek, fread constitute to 2% of the execution time. On the other hand the user level fseek which is just a wrapper alone uses 5%. readShort and readInt take 20% of the execution time.

function readShort() {
	$a = unpack('n', fread($this->m_file, 2));
	return $a[1];}
 
function readInt() {
	$a =unpack('N', fread($this->m_file, 4));
	return $a[1];}
 
function seek($offset){
	fseek($this->m_file, $offset);}

Functions calls are expensive. Lets eliminate them. readInt, readShort, fseek are now inlined. Recursion changed to iteration (e.g. 14 000 less function calls). Able to process 400 queries per second compared to the previous 210.

We see that the latest profiling results have twice the number of freads and unpacks than fseeks. It seems that fseek is used to seek out the right position, read two numbers with unpacking them. The implementation confirms that. Luckily we could just read once (2 bytes more) and unpack once (2 unpackings with one invocation).

$a =unpack('N', fread($this->m_file, 4));
$np['ip'] = $a[1];
 
$a =unpack('n', fread($this->m_file, 2));
$np['key'] = $a[1];
 
// this can be changed to
$np =unpack('Nip/nkey', fread($this->m_file, 6));

How does this version stack up to the Java version? Lets disable profiling and run 100 000 iterations. Vanilla version processes ~850 IPs, when functions are inlined the number is around 1400. Java version can still do 6000.

Lets try caching. Peeking at the Java implementation shows that Java caching version (whopping 141 242 IPs per second – yup 141k) uses just a byte[] array and makes lookups from there instead of seeking and reading from file. Easy, lets do the same in PHP.

We read everything into a string and instead of fread with access the string elements with the offset. For fseek with just set the offset. We are using 600kb more memory but can increase the throughput to ~2800.

As it seems I’ve just wasted a night, I just should have checked the Computer Language Benchmarks. PHP in the sense of execution speed is uncomparable to Java.

The upside, we can still take the library, eliminate recursions, double unpacks and add caching. A small gain is still a gain.


21
Jun 08

TSSJS Prague Afterthoughts

The event is over and I’m heading back home for another five days. This was my second time on TSSJS and it definitely felt different. Last time it was in Barcelona and we weren’t exhibiting (in fact ZeroTurnaround didn’t even exist yet). Also this year had a lot of changes for TheServerSide, with a bunch of new people on the team and a bunch of old people leaving.

The conference content this year was great. Unfortunately I didn’t get to sit in on much talks, but I heard other participants discuss them and I know that the speakers who were there always deliver. There was quite a range of topics and none of the sponsored bullshit (there were sponsored keynotes, but apparently they were decently technical instead of unrefined marketing).

From a speaker perspective it was quite nice as well. I got to hang out with really cool dudes and had loads of fun in the evenings. Also my talks were surprisingly well attended this year, so I couldn’t be happier! The only hiccup was the “Special Appreciation Dinner” to which only half the speakers were invited. That wasn’t very well communicated and the choice of invitees was also very weird. Kudos to Brian for handling the situation and inviting everyone in the end.

However from a vendor perspective I was rather disappointed. The vendor space was hard to access and very dark, and the breaks between sessions were extremely short. The dedicated “vendor networking” time was put in the early morning before sessions and noone bothered coming. Perhaps for vendors who only need to make a few sales it was better, but for us numbers count. Luckily one of the (well attended) talks was (among others) about JavaRebel so in the end it played out OK for us. But I do think that next year TSS guys should either improve the situation considerably or do the conference without a trade show.

Finally I want to thank Geert, who was kind enough to record both my talks on his spanking new HD camcorder. I’m not sure at the moment if and when I could publish them (one recording takes more than 3 Gb at the moment), but at the very least I can make a podcast from the Fireside Chat we did with Geert and Guilaume.


18
Jun 08

Live: Stephan Janssen TSSJS Keynote on RIA

What is RIA? “Old body with a new face” — just a new client slammed on top of the same old applications server. The obligatory Wikipedia quote is “too vague” :) The spectrum of RIA apparently includes “Internet-enabled client”, “Smart client” and “Web 2.0″.

Turns out the “Smart client” refers to the JavaFX ability to be dragged from the applet in the browser to become a desktop app. AFAIK this wasn’t even demoed for JavaFX specially, but for 1.6u10 applets, so this one misses the target for me. And as for the JavaFX “unified deployment model” we’ll see that when it actually gets there. Until then I’ll remain skeptical.

The keynote will present same case study for AIR, JavaFX, Flex and GWT. A Silverlight demos exists, but isn’t included in the talk.

Going through the data exchange formats: HTML, XML, JSON and binary. Caucho Hessian gets a mention as the binary protocol that works over HTTP. Also Adobe AMF. Apparently Flamingo exposes existing services via binary protocol in the Adobe space. Doesn’t binary have 4/3 overhead on top of HTTP? Is it really worth it?

Now going through communication strategies: RPC, WS-*, REST and JMS. In Servlets 2.0 you can use JAX-RS to expose your POJOs via REST (wow! doesn’t RoR do that since, like, forever?). Spring folks have their own solution, seems like they now ignore the Sun folks altogether. JMS isn’t really a comm protocol, so I don’t get how it fits here. Apparently just to mention that Adobe BlazeDS supports JMS. Everyone else would use Comet.

The case study is (of course!) Parlleys.com.

The first version was done in AJAX. Problems with that included:

  • Back-button support. People do get used to its quirks nowadays.
  • Cross platform/browser support. JavaScript on Internet Explorer sucks, especially on Macs.
  • Securing AJAX. Someone voted 100 stars on one of the talks by calling the server-side directly. This isn’t really a problem with AJAX, man :) That will come with any server-side application.

The next variant is still AJAX, but using GWT. Lots of sliding in and sliding out, prettier interface. The GWT experience:

  • It’s Java! Woohoo!
  • Back button works. Might have mentioned that you have to define the back actions pretty much manually. Also problems on Internet Explorer.
  • Works on different browsers out-of-the-box. As long as you don’t need something arcane :)
  • Sucks: GWT sites not indexed by Google. Supposedly fixed in GWT 1.5.

Flex is next. Demo looks identical to GWT. The big difference is that Flash can go full screen (erm, you can go full screen with a browser as well, what’s up with that?). As soon as the AIR application is launched the browser app will change to accommodate the extra desktop settings. You can download the talks for playing offline and so on. Pretty fancy.

  • Flex/AIR just works
  • Lot’s of different animations available
  • Lots of UI components
  • Bookmark and history support
  • Good Eclipse support
  • Bad: Hard to do unit tests
  • Bad: Different deployment strategies for desktop and web.
  • Bad: Not Google friendly. Hacks possible, but not too easy.
  • Bad: No socket listeners so couldn’t implement P2P.

Stephan obviously likes Adobe products a lot and it feels in the talk. E.g. one could likely develop a quite similar offline support for downloading and playing talks on top of GWT with Google Gears, but that’s not mentioned in the talk. However I’m yet to see the JavaFX demo.

The JavaFX demo looks slightly cooler. JavaFX will have native support for Flash video codec, good stuff. Strongly typed and Java-like. Swing, Java2D and Java3D APIs available. Animation and effect library on top of that. Would be much cooler if all this stuff wouldn’t be just one paper… No mentions of the offline functionality as well. Obviously possible, but would be interesting to know how transition from applet to desktop and from online to offline works. Don’t think that anyone outside Sun knows that yet :)

In the end a very cool demo of the actual talk publishing using Parleys. You can add both the video and slides and likely audio and arrange them on a usual timeline. Can’t do that with a web client :)


20
May 08

First Days at the Great Indian Developer Summit

This is a guest post by Rein Raudjärv, who at the moment is with Toomas Römer at the Great Indian Developer Summit. Edited by Jevgeni Kabanov for brevity and to fit with other blog content.

Our trip started after a big birthday party. I slept in (my phone battery ran out of power) and got no time for breakfast or anything. There were almost 25 extra kilos of luggage and with a re-check-in to a local Indian airline in Delhi we decided to leave our back wall and some T-shirts in the Tallinn office. We got to Finland quickly and easily.

The flight to Delhi took about 7 hours. The landing at about midnight was delayed because of the thunderstorm above the airport. The first impression of India was high temperature (above 30 degrees Celsius). There already were guys waiting at the airport waiting to carry your luggage to the nearest (his cousin’s) taxi. We had to use the taxi to get to the next airport. People were sleeping outside the building — we didn’t understand why they were not allowed to enter. Inside most of people including the personal was sleeping as well.

We got to the flight to Bangalore (officially named Bangaluru) in the second morning. The taxi trip to the hotel illustrates the Indian traffic in the daylight. The number of lanes is very dynamic. They have lot of scooters and 3-wheel drives. I also saw a cow sleeping on the sidewalk. Nobody was interrupting the holy animal. In general India is not so clean. At the same time people are quite friendly and happy to help us.

The first day at the conference started with several hours setting up the booth. We had only a table, a chair and a title “ZEROTUMAROUND” waiting for us and no WiFi in sight. We asked for extra tables and a plasma TV. Trying out different things we finally came up with the final booth layout. The title and WiFi got fixed during the day. Except for the slow network connection, hotness (felt like 30 degrees inside) and lacking of toilette paper everything went really well. During the day there were several folks enjoying our cartoon and giving some fliers in front of our booth.

Great Indian Developer Summit

Our initial booth

Our booth that is ready

Indian girls with fliers in front of us


8
May 08

Scala to get binary compatibility

I sat down with Martin Odersky yesterday and talked about the issues that bothered me in Scala. Turns out that the main reason for the lack of binary compatibility between releases are the concrete methods in traits. At the moment Scala compiler needs to generate stubs in every class mixing in those traits, which means that adding a fresh concrete method requires a full recompile against the trait.

However this can be solved by adding a runtime postprocessor that inserts the stub into class bytecode during classloading. Hopefully this functionality will be enabled in some next version of Scala and we can look forward to distributing the Scala libraries in JARs. I am glad to say that Martin is a great guy and is really committed to making Scala language stable so that the community can build on it without being afraid of breaking changes.

Martin also said that the latest version of Scala implements the stable naming of closure classes that David requested some time ago. This means that JavaRebel can now reload even more changes to Scala programs. Kudos to the Scala team.