Featured


4
Nov 08

Script kiddies have awesome tools

About 10 years ago a friend of mine showed me an exploit. It was written in C and it tried to spawn a shell at a remote host. It seemed pretty cool. I did not understand the code but the mere idea that almost anybody equipped with a script like that could deface a webpage seemed scary.

You did have to compile the c file, have the right devel packages installed and use the correct flags. And then you had to figure out how to use it. A 14yr old could do it.

Today I spent many hours grepping logs, checking the filesystem for new/changed files to figure out how an old WordPress instance was hacked and what had the hacker done there.

Going through the changed files I stumbled upon a php file which had some code prepended. The script had a very long line that started like this:
[PHP]
eval(gzinflate(base64_decode(‘FJ3HcqPsFkUf……..
[/PHP]

Ok, lets check what does the script do. Lets assign the long string to a variable and base64 decode it and inflate the compression.
[PHP]
$script = base64_decode($script);
$script = gzinflate($script);
echo $script;
[/PHP]

The output was not what I expected.
[PHP]
eval(gzinflate(base64_decode(‘FJ3HjqPcGkUf57……..
[/PHP]

The strings looked similar and I was already looking for an error in my code. Nope, code is correct. There is a slight change in the string. It seems it was compressed and encoded couple of times. Wow, it means I can have many evals inside evals. Fun!
[PHP]
do {
// extract the first 28 characters
// the eval(gzinflate(base64_decode part
$start = substr($string, 0, 28);
// remove the first 30 chars, the eval(gzinflate(base64_decode(‘ part
$string = substr($string, 30);
// remove the last )));
$string = substr($string, 0, strlen($string)-4);

$string = base64_decode($string);
$string = gzinflate($string);
echo “Iteration:”.$i++.”\n”;
// iterate as long as we get a eval(gzinflat start
} while ($start == “eval(gzinflate(base64_decode”);
[/PHP]

After 11 iterations I got the code. Kind of reminded me a challenge that was posted to a mailing list and the question was what was the output of the program. That time it was more difficult: base64 encoded perl, that outputted base64 encoded bytecode, that outputted Java source file with a byte array that was byte code for the class file of the solution.

Anyways the 11 iterations gave me this (shot is made from my home computer).
Wow!

Lets see the functionality that it has to offer:

  • Full blown file manager
  • Quick menu for
    • Finding all suid files
    • Finding all sgid files
    • Finding all htaccess files
    • Finding all writeable folders
  • Interface for the UNIX tool find
  • Input field for executing commands as webserver user
  • Tools for installing a backdoor
    • Perl/C flavoured programs that are downloaded from a Singapore server
    • Compiled/Interpreted – depending what is available
  • Processes viewer
  • FTP brute force cracker using users from /etc/passwd
  • System info (CPU, Memory, installed binaries, passwd file, configuration files)
  • SQL dump utility
  • Interface for executing PHP code
  • Self removal
  • Adding a password for the script
  • Fancy design!

I’m just amazed. This is way too eazy. So this is how it works:

  • Lets scan the internet for WordPress installation (automated)
  • Look for vulnerable versions (automated)
  • Exploit (in this case themes were filled with hidden links – semi automated)
  • PROFIT! (automated)

How to avoid being hacked:

  • Keep an eye on your WordPress installations
  • Subscribe to WordPress release emails/RSS and upgrade when needed
  • Monitor for changed files (for example fcheck)
  • Run Apache in chroot to minimize the available software for the Apache user
  • Any other ideas?

PS. The script is 2500 lines of code, supports Windows and Linux and looks great :)


3
Nov 08

When protected isn’t protected

For all those spec believers out there: what would you think if I told you that all of the JVM implementations consistently violate the spec for at least one particular instance?

Our story begins on a cold and lonely evening when I discover a bug in JavaRebel in conjunction with Xerces XMLEntityManager. The problem was with the following method:
[java]
public String[] getRecognizedProperties() {
return (String[])(RPROPERTIES.clone());
}
[/java]
However when I looked at the class bytecode I saw a different picture:
[java]
{
GETSTATIC XMLEntityManager.RPROPERTIES : String[]
INVOKEVIRTUAL Object.clone() : Object[]
CHECKCAST String[]
ARETURN
}
[/java]
For those of you who do not spend hours staring at the Java bytecode, it does the following:

  1. Retrieve the RPROPERTIES static field value (which is of type String[]) and put it on the top of the stack.
  2. Call Object.clone() using the top of the stack as target object (in our case the array). The target is popped off the stack and the result is placed on top of the stack instead of it.
  3. Cast the top of the stack to a String[]
  4. Return the top of the stack as method result.

Whooph. Now that is all fine except for one tiny thing. This is not legal. Object.clone() is a protected method and the JVM spec (see 2.7.4, INVOKEVIRTUAL) allows calls to protected method only if the following is true:

  1. The accessing class is a subclass of the target class or is the same class (check, XMLEntityManager is a subclass of Object, so’s everything else).
  2. The class of the target object (the actual runtime class of the object, not the target class we use in the signature) is either a subclass of the accessing class or is the same class.

The second check may sound crazy, but it restricts the calls to protected methods only to the classes inheriting from the target class, as you would expect. It can also only be enforced during runtime and is the cause of the IllegalAccessError you may see once in a while.

And obviously in our case String[] that is the actual type of the target object is in no way a subclass of the XMLEntityManager. This call should have cause an error in the JVM, but for some inexplicable reason it doesn’t. Further tests show that this is only limited (AFAIK) to the Object.clone() call, which is public as far as JVM cares.

A little googling turned up two JVM bugs that were discussing the same issue (bug 1, bug 2). Turns out the Sun javac compiler has been emitting wrong code at least until 1.4.2 and the JVMs have accommodated accordingly right until today.

Does this have any practical importance? Not for most of you, but if you handle Java bytecode in any way be prepared to handle this exception as well. To me it proves that there is more to Java than just the spec.


27
Oct 08

When System.currentTimeMillis() is too slow…

At the moment I am working on reducing the performance overhead of JavaRebel. Once I got rid of all the obvious bottlenecks and optimizations I fired up the profiler and started searching and destroying ad-hoc hotspots. After some time I got to the point, when the bottleneck was in System.currentTimeMillis().

This deserves some explanation. Since JavaRebel reloads code after the class changes, it needs to poll the actual class file for changes. The way we do it is quite lazy and the class itself causes the resource to be polled. However often there will be a lot of polling involved (more or less any method call on the class will cause a check), so we need to filter those calls. What we did is to check if some amount of time (e.g. 500 ms) has passed since the last check:

if (lastCheck + 500  > System.currentTimeMillis())
  return;
lastCheck = System.currentTimeMillis();
// Resource check code

After some optimization everything else got fast enough that this call started to weigh the performance down. System.currentTimeMillis() causes a system call and often a hardware poll, which is comparatively expensive. I needed some way to eliminate it.

After some thought I decided to cache the call. It may sound crazy to cache the current time, but I only needed the resolution of about half a second, so it was OK for me if the time was lagging behind about that much. What I did is put up a separate thread that would cache the current time and update it every half a sec.

public class TimeCacheThread extends Thread {
  public static volatile long currentTimeMillis = 
    System.currentTimeMillis();
 
  public TimeCacheThread() { 
    setDaemon(true);
  }
 
  static {
    new TimeCacheThread().start();
  }
 
  public void run() {    
    while (true) {
      currentTimeMillis = System.currentTimeMillis();
 
      try {
        Thread.sleep(500);
      } catch (InterruptedException e) {
        throw new RuntimeException(e);
      }
    }
  }
}

Then the check became:

if (lastCheck + 500  > TimeCacheThread.currentTimeMillis)
  return;
lastCheck = TimeCacheThread.currentTimeMillis;
// Resource check code

This check is amazingly fast, because instead of doing a system call we are just making a memory read.

[EDIT] The original code did not have the field declared volatile with the following explanation:

You may also notice the complete lack of any synchronization (or volatile keyword) on the currentTimeMillis static field. Since the type is primitive we can only deal with stale data here and I cared about performance more than I cared about missing a couple of checks.

There is a follow-up post (What do we really know about non-blocking concurrency in Java?) that explains some more of the reasoning behind initially omitting the volatile and then still putting it in. [/EDIT]

However once I got that far I decided I could do even better. The currentTimeMillis is of type long and we are still doing some arithmetic on it. In fact the only thing we want to know is if a quantum of time has passed since the last check. And we can do that by using a very simple heartbeat counter:

public class HeartBeatThread extends Thread {  
  public static volatile int counter = 0;  
 
  public HeartBeatThread() {
    setDaemon(true);
  }
 
  static {
    new HeartBeatThread().start();
  }
 
  public void run() {    
    while (true) {      
      try {
        Thread.sleep(500);
      } catch (InterruptedException e) {
        throw new RuntimeException(e);
      }
 
      counter++;
    }
  }
}

Now the checking code becomes just:

if (counter == HeartBeatThread.counter)
  return;
counter = HeartBeatThread.counter;

The counter field is now of int type and we only do an equals check on a DWORD, which should translate directly to 3 very fast hardware instructions (two memory reads and one conditional jump). Hopefully if you ever find yourself polling something a lot this post will come of help. As for me, it improved JavaRebel performance by 70% on some benchmarks.


4
Aug 08

Case study: Is PHP embarrasingly slower than Java?

IP2C is a small library that provides IP to country resolution. It uses the free ip-to-country database. IP2C takes the database CSV file that is about 4mb and converts it into a ~600kb binary format and provides PHP and Java frontend to query the database.

The library is great, easy to convert an ip to a country and when using the country flags from it’s side project you could spice up your statistics with the country information. This a lot faster than using reverse DNS lookup.

The problem. The PHP implementation is a lot slower. Embarrassingly slower. Without any caching the Java version is able to do ~6000 queries per second. The PHP counterpart can push through ~850 queries. The implementations are the same. The stats provided by the author of the library are 8000 vs 1200. So about the same as my measurements.

I like PHP, I don’t use it that much anymore but I still care when I see such embarrassing numbers. I took the implementation and started profiling it. Spent the night running different tests and trying to optimize.

General outline of the algorithm is as follows. We take the dotted string IP and convert it to an IPv4 Internet network address (e.g. 69.55.232.153 becomes 1161291929). The DB holds sorted ranges of these addresses. A binary search will happen on these addresses and we have a country for the ip. Take a look at the implementation.


Lets see where the vanilla version of IP2C spends its time at. The results are based on 1000 iterations with Xdebug enabled and visualized by KCacheGrind. It processed about 210 IP addresses during this time.

IO part is surprisingly low. The internal fseek, fread constitute to 2% of the execution time. On the other hand the user level fseek which is just a wrapper alone uses 5%. readShort and readInt take 20% of the execution time.

function readShort() {
	$a = unpack('n', fread($this->m_file, 2));
	return $a[1];}
 
function readInt() {
	$a =unpack('N', fread($this->m_file, 4));
	return $a[1];}
 
function seek($offset){
	fseek($this->m_file, $offset);}

Functions calls are expensive. Lets eliminate them. readInt, readShort, fseek are now inlined. Recursion changed to iteration (e.g. 14 000 less function calls). Able to process 400 queries per second compared to the previous 210.

We see that the latest profiling results have twice the number of freads and unpacks than fseeks. It seems that fseek is used to seek out the right position, read two numbers with unpacking them. The implementation confirms that. Luckily we could just read once (2 bytes more) and unpack once (2 unpackings with one invocation).

$a =unpack('N', fread($this->m_file, 4));
$np['ip'] = $a[1];
 
$a =unpack('n', fread($this->m_file, 2));
$np['key'] = $a[1];
 
// this can be changed to
$np =unpack('Nip/nkey', fread($this->m_file, 6));

How does this version stack up to the Java version? Lets disable profiling and run 100 000 iterations. Vanilla version processes ~850 IPs, when functions are inlined the number is around 1400. Java version can still do 6000.

Lets try caching. Peeking at the Java implementation shows that Java caching version (whopping 141 242 IPs per second – yup 141k) uses just a byte[] array and makes lookups from there instead of seeking and reading from file. Easy, lets do the same in PHP.

We read everything into a string and instead of fread with access the string elements with the offset. For fseek with just set the offset. We are using 600kb more memory but can increase the throughput to ~2800.

As it seems I’ve just wasted a night, I just should have checked the Computer Language Benchmarks. PHP in the sense of execution speed is uncomparable to Java.

The upside, we can still take the library, eliminate recursions, double unpacks and add caching. A small gain is still a gain.


24
Mar 08

Typesafe DSLs in Java: Part 1 — Typesafe Bytecode

Domain Specific Languages (DSLs) have been brought to Java under the name of Fluent Interface. However most of them utilize a lot of strings and untyped behavior to make the interface fluent enough. It turns out that using Java 5 and a bag of tricks we can have the compiler to check a lot more. In this post we'll check out how to write Java bytecode using ASM in a typesafe way.

Continue reading →