Synchronizing on String Objects in Java

Synchronizing on String objects in Java

Without putting my brain fully into gear, from a quick scan of what you say it looks as though you need to intern() your Strings:

final String firstkey = "Data-" + email;
final String key = firstkey.intern();

Two Strings with the same value are otherwise not necessarily the same object.

Note that this may introduce a new point of contention, since deep in the VM, intern() may have to acquire a lock. I have no idea what modern VMs look like in this area, but one hopes they are fiendishly optimised.

I assume you know that StaticCache still needs to be thread-safe. But the contention there should be tiny compared with what you'd have if you were locking on the cache rather than just the key while calling getSomeDataForEmail.

Response to question update:

I think that's because a string literal always yields the same object. Dave Costa points out in a comment that it's even better than that: a literal always yields the canonical representation. So all String literals with the same value anywhere in the program would yield the same object.

Edit

Others have pointed out that synchronizing on intern strings is actually a really bad idea - partly because creating intern strings is permitted to cause them to exist in perpetuity, and partly because if more than one bit of code anywhere in your program synchronizes on intern strings, you have dependencies between those bits of code, and preventing deadlocks or other bugs may be impossible.

Strategies to avoid this by storing a lock object per key string are being developed in other answers as I type.

Here's an alternative - it still uses a singular lock, but we know we're going to need one of those for the cache anyway, and you were talking about 50 threads, not 5000, so that may not be fatal. I'm also assuming that the performance bottleneck here is slow blocking I/O in DoSlowThing() which will therefore hugely benefit from not being serialised. If that's not the bottleneck, then:

  • If the CPU is busy then this approach may not be sufficient and you need another approach.
  • If the CPU is not busy, and access to server is not a bottleneck, then this approach is overkill, and you might as well forget both this and per-key locking, put a big synchronized(StaticCache) around the whole operation, and do it the easy way.

Obviously this approach needs to be soak tested for scalability before use -- I guarantee nothing.

This code does NOT require that StaticCache is synchronized or otherwise thread-safe. That needs to be revisited if any other code (for example scheduled clean-up of old data) ever touches the cache.

IN_PROGRESS is a dummy value - not exactly clean, but the code's simple and it saves having two hashtables. It doesn't handle InterruptedException because I don't know what your app wants to do in that case. Also, if DoSlowThing() consistently fails for a given key this code as it stands is not exactly elegant, since every thread through will retry it. Since I don't know what the failure criteria are, and whether they are liable to be temporary or permanent, I don't handle this either, I just make sure threads don't block forever. In practice you may want to put a data value in the cache which indicates 'not available', perhaps with a reason, and a timeout for when to retry.

// do not attempt double-check locking here. I mean it.
synchronized(StaticObject) {
data = StaticCache.get(key);
while (data == IN_PROGRESS) {
// another thread is getting the data
StaticObject.wait();
data = StaticCache.get(key);
}
if (data == null) {
// we must get the data
StaticCache.put(key, IN_PROGRESS, TIME_MAX_VALUE);
}
}
if (data == null) {
// we must get the data
try {
data = server.DoSlowThing(key);
} finally {
synchronized(StaticObject) {
// WARNING: failure here is fatal, and must be allowed to terminate
// the app or else waiters will be left forever. Choose a suitable
// collection type in which replacing the value for a key is guaranteed.
StaticCache.put(key, data, CURRENT_TIME);
StaticObject.notifyAll();
}
}
}

Every time anything is added to the cache, all threads wake up and check the cache (no matter what key they're after), so it's possible to get better performance with less contentious algorithms. However, much of that work will take place during your copious idle CPU time blocking on I/O, so it may not be a problem.

This code could be commoned-up for use with multiple caches, if you define suitable abstractions for the cache and its associated lock, the data it returns, the IN_PROGRESS dummy, and the slow operation to perform. Rolling the whole thing into a method on the cache might not be a bad idea.

Synchronize On Same String Value

There is no guarantee that two strings with the same value would point to the same instance in Java, especially if they are created from user input.

However, you can easily force them into the string pool using the intern() method, which would guarantee it's the same instance being used:

public void createUserInDb(String userName){
String interned = userName.intern();
synchronized (interned) {
SQLHelper.insertUser(interned);
}
}

Creating an object from string and using as monitor in synchronized block

I know that I can't use string directly and ints don't have monitor so
I can't use hashcode directly.

Yes, nonetheless, you could (but you should not) take advantage of the method intern(). From source one can read:

The java string intern() method returns the interned string. It
returns the canonical representation of string.

It can be used to return string from memory, if it is created by new
keyword. It creates exact copy of heap string object in string
constant pool.

String annotationString = // ...;
synchronized (annotationString.intern())

Notwithstanding, the problem of locking on strings is that other objects could inadvertently be locking on those same strings for entirely different purposes. In other words, strings have a global semantic scope, that you cannot control, i.e., ensure that those same strings are not being used elsewhere as a locking mechanism. From this SO Thread (among other points) one can read:

Locking on strings is discouraged, the main reason is that (because of
string-interning) some other code could lock on the same string
instance without you knowing this. Creating a potential for deadlock
situations.

Now this is probably a far fetched scenario in most concrete
situations. It's more a general rule for libraries.

An alternative solution would be to have a Map of <String, Object> where the key is the string read from the annotation, and the value would be an object which threads can use to synchronize.

The Map would be shared among threads, therefore one would have to ensure mutual exclusion when accessing such a Map. Every time a thread reads a string from an annotation adds that string to the Map if not yet added, otherwise synchronizes on the value associated with that string key.

Java: synchronize on String object

String literals are immutable and shared via the VM's String pool. This means that every time you write, for example, "foo", a new String representing foo is not placed on the heap. As a result, this String pool is visible to all threads. Synchronizing on a String literal then exposes you to unstructured synchronization, which is the first stop on the deadlock hell train.

Efficient sharing of Strings is why you shouldn't use the String constructor with the signature String(String) unless you have a really, really good reason for doing so.

Also, there's no point synchronizing on a local variable, since there's no visibility of it outside of the method, let alone in other threads.

Finally, do you really need to synchronize? Since you're not using it effectively in the code above, even excluding the string issue, it's possible you don't have to use it.

Synchronize on value, not object

If the set of user ids is limited, you can synchronize on an interned version of the String.

Either use String.intern() (which has a few drawbacks) or something like Guava Interners if you need a bit more control over the interning.

Synchronizing a Block of Code - Exercise 13-2

StringBuffer is synchronized, but only on each individual method call. This means that when you are not using synchronization, the part that prints the value of the string buffer is not done within a synchronized block. One of the other threads can update the buffer during that time. When this happens, the print will show the new value. You can't really know which thread will update the buffer first.

Also, a thread may change the value between the call to sb.charAt() and the call to sb.setCharAt(). So the result of the increment itself may be unpredictable.

Note: synchronizing on this means each thread is synchronizing on a different lock, so it's like not synchronizing at all.

String literal synchronization

The point of synchronization is access to shared resources. Your thread is supposed to acquire the monitor on an object that other threads are trying to access. In your case, each thread acquires a different object's monitor, so none of them blocks.

If instead you passed the literal "One" to your constructors

first = new RunThread("One",orderedthread);
second = new RunThread("One",orderedthread);
third = new RunThread("One",orderedthread);

Then you would see execution in order. Each thread would have to finish its for-loop (inside the synchronized block) before another one can start. This is because each Thread is synchronized on the same String object.

Better yet, use one of them many java.util.concurrent classes that act as locks. For example, Lock.

Java: Synchronization based on object value

You need a collection of locks, which you can keep in a map and allocate as required. Here I assume that your id1 and id2 are Strings; adjust as appropriate.

 Map<String,Object> lockMap = new HashMap<>();
:

void someMethod(String id1, String id2) {
Object lock;
synchronized (lockMap) {
lock = lockMap.get(id1+id2);
if (lock == null) lockMap.put(id1+id2, (lock = new Object()));
}
synchronized (lock) {
:
}
}

You need a little bit of 'global' synchronization for the map operations, or you could use one of the concurrent implementations. I used the base HashMap for simplicity of implementation.

After you've selected a lock, sync on it.

Synchronizing on an object and changing the reference

From JLS 17.1:

The synchronized statement (§14.19) computes a reference to an object;
it then attempts to perform a lock action on that object's monitor and
does not proceed further until the lock action has successfully
completed. After the lock action has been performed, the body of the
synchronized statement is executed. If execution of the body is ever
completed, either normally or abruptly, an unlock action is
automatically performed on that same monitor.

Now the questions.

What happens to the lock on m?

Nothing. This is a little bit confusing. Actually, the thread is holding the lock on the object referenced by m at the time it was trying to acquire the lock. The assignment to m in the synchronized block does not automatically "switch" the lock that is being held by the executing thread.

Is it still safe to update the new object represented by m?

It's not safe. The write to m is not synchronized on the same lock.

Or is the lock essentially on the old object?

Yes



Related Topics



Leave a reply



Submit