What do we really know about non-blocking concurrency in Java?
Written by Jevgeni Kabanov on October 28, 2008 – 3:22 pmIf my yesterday’s post taught me anything it’s this: not that much. This post is a result of a lengthy chat with Heinz Kabutz and Kirk Pepperdine (who really do know something about non-blocking concurrency) as well as comments on the post and the Reddit entry.
When I put together the yesterday’s post (When System.currentTimeMillis() is too slow…) I deliberately left the code unsynchronized in any way. Not because I didn’t know that volatile will fix everything, but because I wasn’t clear on how and why it will fix things and if anything was even broken. I got a lot of replies to the post and although everyone was sure that the answer is to use volatile (or atomics), not many could explain what will really happen if we don’t use them and what impact on performance do they have if any.
First thing most people don’t realize is that reading/writing is atomic on all primitive values, except long and double (this includes references no matter whether they are 32 or 64 bit). This is a requirement in the JVM spec and is independent of volatile flag. Volatile additionally imposes atomicity on long and double value reads/writes. See JVM spec section 17.7 for details.
If the reading/writing of the value is atomic (which it is on ints that I used in the end), then no thread can ever see corrupted data. The only thing we have to deal with is inconsistency. The reason why I even introduced this “error” to begin with is that consistency is not that important for the particular problem as long as it’s not permanently inconsistent. By CPU measurement half a second is a lot of time, so if I can assume that the the data is synced at that horizon, I can safely leave this code in.
The JVM memory model specification stipulates that the JVM/JIT is free to keep a thread-local copy of any values that are not declared as volatile and is only obliged to sync them when entering/exiting a monitor (a monitor is taken when entering a synchronized block and released in the end). This is needed above all to allow for the aggressive memory caching in multicore processors, which is more likely to be present on high-end server hardware (like Sun Sparc) than on desktop Intel/AMD chips.
Hardware keeps the memory values in the L2 cache, which is bound to be flushed relatively regularly, so I’m still not entirely convinced that this would be a real problem even on high-end servers. However Heinz Kabutz has demonstrated that in some cases JVM will cache the values even on a single core processor in the Java Specialists newsletter issue 159. Although the caching seems to be quite limited and we couldn’t reproduce it on the problem at hand it does demonstrate that permanent inconsistency is indeed a possibility one has to consider.
Interestingly enough if we only have to worry about hardware caching, we can use a different trick to ensure the consistency among threads. This was proposed by Kirk Pepperdine and neither he nor I are entirely sure if this will work on all JVM implementations/hardware platforms. However the same trick seems to be used in Cliff Clicks high-scale-lib, and he is one of the few people who really does understand the non-blocking concurrency. Before he confirms it though, I have to consider it just another crazy idea.
The trick is that if reading a volatile will flush the whole cache, who said we have to read the same volatile? We just declare a different variable volatile just for the purpose of flushing the cache and leave the original one still non-volatile:
-
public static int counter = 0;
-
public static volatile int cacheFlush = 0;
-
-
public HeartBeatThread() {
-
setDaemon(true);
-
}
-
-
static {
-
new HeartBeatThread().start();
-
}
-
-
public void run() {
-
while (true) {
-
try {
-
}
-
-
counter++;
-
cacheFlush++;
-
}
-
}
-
}
To be fair this is not the code that I’m going to put in JavaRebel. Now that I understand the issues behind not using a volatile well enough, I will just slap one on and be content. My tests have shown that making a field volatile does not impose any noticeable overhead at least on the microbenchmarks I was using. I can just hope that this post has perhaps helped you to understand some of the issues better as well.
P.S. To everyone who was suggesting to use AtomicLong/AtomicInteger — if you check the implementation of those classes the field they wrap is declared as volatile, so there is no possible benefit in using atomics instead of volatile values unless you need compareAndSwap() (considering that reads/writes to volatile long are anyway atomic).
Tags: concurrency, java, performance
Posted in creative | 9 Comments »
9 Comments to “What do we really know about non-blocking concurrency in Java?”
Leave a Comment
Additional comments powered by BackType
October 28th, 2008 at 9:16 pm
If you want another reference for that behavior, check out Bill Pugh’s JSR-133 FAQ:
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#volatile
“Under the new memory model, it is still true that volatile variables cannot be reordered with each other. The difference is that it is now no longer so easy to reorder normal field accesses around them. Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire. In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.”
The source code in there is basically the same that you have.
October 28th, 2008 at 9:23 pm
@Mathias
Wow, thanks! That is exactly what I was looking for!
October 28th, 2008 at 9:58 pm
You say that the Java memory model is mostly useful for caches, but it’s also important for concurrent garbage collection. I read this paper about the Sapphire GC a while ago, and the correctness of their implementation is based on synchronization points and the assumption that threads should not have any race conditions, meaning they can’t grab references to objects in other threads without entering a monitor.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.1701
October 28th, 2008 at 11:38 pm
@Reid
I don’t really see where do I imply that. Race conditions and synchronization just wasn’t relevant to my very specific problem.
October 28th, 2008 at 11:46 pm
@Reid: In the new memory model, a write to a volatile variable from one thread and a read from another thread serve as barrier in the order of operations. There is no race condition.
October 28th, 2008 at 11:53 pm
@Mathias
Sorry, that isn’t right. If you have several operations in two threads, they will still be competing unless there’s a lock.
October 29th, 2008 at 5:20 am
@Jevgeni:
What I was trying to say was that writes to non-volatile variables by thread 1 cannot be reordered past a write to a volatile variable by thread 1 and a read by a volatile variable by thread 2, thread 2 will see all the writes by thread 1.
If you have this structure, there is no race condition for the non-volatile data.
thread1:
1a) write to non-volatile variable A
1b) write to non-volatile variable B
1c) write to volatile variable C
thread2:
2a) read from volatile variable C
2b) read from non-volatile variable A
2c) read from non-volatile variable B
1a and 1b can be permuted, and 2b and 2c can be permuted, but 1a and 1b can’t be moved past 1c, and 2b and 2c can’t be moved ahead of 2a.
That means that you can use volatiles to implement locks:
“The third of these rules [Volatile rule. A write to a volatile field happens before every subsequent read of the same volatile.], the one governing volatile fields, is a stronger guarantee than that made by the original memory model. This is useful because now volatile variables can be used as ‘guard’ variables — you can now use a volatile field to indicate across threads that some set of actions has been performed, and be confident that those actions will be visible to all other threads.”
(Brian Goetz in “JSR 133 in Public Review”, http://today.java.net/lpt/a/84 )
I’m sorry I didn’t make this clear.
October 29th, 2008 at 12:03 pm
This is not guarranteed to work, since the reader threads do not access the cacheFlush volatile, written by the writer thread. If it works, it is merely an implementation artifact; there is no happens-before between the write and the read, unless there is some other piggybacking going on that makes it correct…
See chapter 16 of Java Concurrency in Practice.
(Why not just use an AtomicInteger#getAndIncrement()?)
November 13th, 2008 at 1:04 pm
@Pesco
Seriously? ++ is done by one thread *only*. concurrency is not important here. Generally only one thread writes.