Is Volatile Expensive

Is volatile expensive?

On Intel an un-contended volatile read is quite cheap. If we consider the following simple case:

public static long l;

public static void run() {        
    if (l == -1)
        System.exit(-1);

    if (l == -2)
        System.exit(-1);
}

Using Java 7's ability to print assembly code the run method looks something like:

# {method} 'run2' '()V' in 'Test2'
#           [sp+0x10]  (sp of caller)
0xb396ce80: mov    %eax,-0x3000(%esp)
0xb396ce87: push   %ebp
0xb396ce88: sub    $0x8,%esp          ;*synchronization entry
                                    ; - Test2::run2@-1 (line 33)
0xb396ce8e: mov    $0xffffffff,%ecx
0xb396ce93: mov    $0xffffffff,%ebx
0xb396ce98: mov    $0x6fa2b2f0,%esi   ;   {oop('Test2')}
0xb396ce9d: mov    0x150(%esi),%ebp
0xb396cea3: mov    0x154(%esi),%edi   ;*getstatic l
                                    ; - Test2::run@0 (line 33)
0xb396cea9: cmp    %ecx,%ebp
0xb396ceab: jne    0xb396ceaf
0xb396cead: cmp    %ebx,%edi
0xb396ceaf: je     0xb396cece         ;*getstatic l
                                    ; - Test2::run@14 (line 37)
0xb396ceb1: mov    $0xfffffffe,%ecx
0xb396ceb6: mov    $0xffffffff,%ebx
0xb396cebb: cmp    %ecx,%ebp
0xb396cebd: jne    0xb396cec1
0xb396cebf: cmp    %ebx,%edi
0xb396cec1: je     0xb396ceeb         ;*return
                                    ; - Test2::run@28 (line 40)
0xb396cec3: add    $0x8,%esp
0xb396cec6: pop    %ebp
0xb396cec7: test   %eax,0xb7732000    ;   {poll_return}
;... lines removed

If you look at the 2 references to getstatic, the first involves a load from memory, the second skips the load as the value is reused from the register(s) it is already loaded into (long is 64 bit and on my 32 bit laptop it uses 2 registers).

If we make the l variable volatile the resulting assembly is different.

# {method} 'run2' '()V' in 'Test2'
#           [sp+0x10]  (sp of caller)
0xb3ab9340: mov    %eax,-0x3000(%esp)
0xb3ab9347: push   %ebp
0xb3ab9348: sub    $0x8,%esp          ;*synchronization entry
                                    ; - Test2::run2@-1 (line 32)
0xb3ab934e: mov    $0xffffffff,%ecx
0xb3ab9353: mov    $0xffffffff,%ebx
0xb3ab9358: mov    $0x150,%ebp
0xb3ab935d: movsd  0x6fb7b2f0(%ebp),%xmm0  ;   {oop('Test2')}
0xb3ab9365: movd   %xmm0,%eax
0xb3ab9369: psrlq  $0x20,%xmm0
0xb3ab936e: movd   %xmm0,%edx         ;*getstatic l
                                    ; - Test2::run@0 (line 32)
0xb3ab9372: cmp    %ecx,%eax
0xb3ab9374: jne    0xb3ab9378
0xb3ab9376: cmp    %ebx,%edx
0xb3ab9378: je     0xb3ab93ac
0xb3ab937a: mov    $0xfffffffe,%ecx
0xb3ab937f: mov    $0xffffffff,%ebx
0xb3ab9384: movsd  0x6fb7b2f0(%ebp),%xmm0  ;   {oop('Test2')}
0xb3ab938c: movd   %xmm0,%ebp
0xb3ab9390: psrlq  $0x20,%xmm0
0xb3ab9395: movd   %xmm0,%edi         ;*getstatic l
                                    ; - Test2::run@14 (line 36)
0xb3ab9399: cmp    %ecx,%ebp
0xb3ab939b: jne    0xb3ab939f
0xb3ab939d: cmp    %ebx,%edi
0xb3ab939f: je     0xb3ab93ba         ;*return
;... lines removed

In this case both of the getstatic references to the variable l involves a load from memory, i.e. the value can not be kept in a register across multiple volatile reads. To ensure that there is an atomic read the value is read from main memory into an MMX register movsd 0x6fb7b2f0(%ebp),%xmm0 making the read operation a single instruction (from the previous example we saw that 64bit value would normally require two 32bit reads on a 32bit system).

So the overall cost of a volatile read will roughly equivalent of a memory load and can be as cheap as a L1 cache access. However if another core is writing to the volatile variable, the cache-line will be invalidated requiring a main memory or perhaps an L3 cache access. The actual cost will depend heavily on the CPU architecture. Even between Intel and AMD the cache coherency protocols are different.

Increased cost of a volatile write over a nonvolatile write

Here's why, according to the article you have indicated:

Volatile writes are considerably more expensive than nonvolatile writes because of the memory fencing required to guarantee visibility but still generally cheaper than lock acquisition.
[...] volatile reads are cheap -- nearly as cheap as nonvolatile reads

And that is, of course, true: memory fence operations are always bound to writing and reads execute the same way regardless of whether the underlying variable is volatile or not.

However, volatile in Java is about much more than just volatile vs. nonvolatile memory read. In fact, in its essence it has nothing to do with that distinction: the difference is in the concurrent semantics.

Consider this notorious example:

volatile boolean runningFlag = true;

void run() {
  while (runningFlag) { do work; }
}

If runningFlag wasn't volatile, the JIT compiler could essentially rewrite that code to

void run() {
   if (runningFlag) while (true) { do work; }
}

The ratio of overhead introduced by reading the runningFlag on each iteration against not reading it at all is, needless to say, enormous.

What is the volatile keyword useful for?

volatile has semantics for memory visibility. Basically, the value of a volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it. Without volatile, readers could see some non-updated value.

To answer your question: Yes, I use a volatile variable to control whether some code continues a loop. The loop tests the volatile value and continues if it is true. The condition can be set to false by calling a "stop" method. The loop sees false and terminates when it tests the value after the stop method completes execution.

The book "Java Concurrency in Practice," which I highly recommend, gives a good explanation of volatile. This book is written by the same person who wrote the IBM article that is referenced in the question (in fact, he cites his book at the bottom of that article). My use of volatile is what his article calls the "pattern 1 status flag."

If you want to learn more about how volatile works under the hood, read up on the Java memory model. If you want to go beyond that level, check out a good computer architecture book like Hennessy & Patterson and read about cache coherence and cache consistency.

Unnecessarily using volatile keyword -- is that dangerous?

It can have performance implications, but that's it.

There may be another danger: that you lull yourself into thinking that since everything is volatile, your code is thread-safe. That's not the case, and there's no substitute for actually understanding the threading implications of your code. If you do that, you won't need to mark things volatile "just in case."

But to your immediate question: no, you won't ever take functionally correct code and break it by making a variable violate.

How does volatile actually work?

From what I understand it always appears as if the cache has been flushed after write, and always appears as if reads are conducted straight from memory on read. The effect is that a Thread will always see the results of writes from another Thread and (according to the Java Memory Model) never a cached value. The actual implementation and CPU instructions will vary from one architecture to another however.

It doesn't guarantee correctness if you increment the variable in more than one thread, or check its value and take some action since obviously there is no actual synchronization. You can generally only guarantee correct execution if there is only just Thread writing to the variable and others are all reading.

Also note that a 64 bit NON-volatile variable can be read/written as two 32 bit variables, so the 32 bit variables are atomic on write but the 64 bit ones aren't. One half can be written before another - so the value read could be nether the old or the new value.

This is quite a helpful page from my bookmarks:

http://www.cs.umd.edu/~pugh/java/memoryModel/

Volatile for int?

Volatile is not used for atomic operations, but for visibility. If any changes has been made and it should be visible for other threads, volatile should be used. Here is a short description by java tutorial:

Atomic actions cannot be interleaved, so they can be used without fear of thread interference. However, this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible. Using volatile variables reduces the risk of memory consistency errors, because any write to a volatile variable establishes a happens-before relationship with subsequent reads of that same variable. This means that changes to a volatile variable are always visible to other threads. What's more, it also means that when a thread reads a volatile variable, it sees not just the latest change to the volatile, but also the side effects of the code that led up the change.

Why access volatile variable is about 100 slower than member?

Acess to volatile prevents some JIT optimisaton. This is especially important if you have a loop which doesn't really do anything as the JIT can optimise such loops away (unless you have a volatile field) If you run the loops "long" the descrepancy should increase more.

In more realistic test, you might expect volatile to take between 30% and 10x slower for cirtical code. In most real programs it makes very little difference because the CPU is smart enough to "realise" that only one core is using the volatile field and cache it rather than using main memory.

Is Volatile Expensive