Java multi threading - Write only if no other threads are reading - multithreading

I have a map which holds some data similar to an in memory cache.
Map<String,Object> map;
There are multiple theads which are reading the data from the map. Each thread may read the data more than one time.
public void processData(){
...
map.get("something");
...
map.get("someOther");
...
}
The map needs to be updated with latest values from database. I use a webservice for that.
public void refreshService(){
//this code should wait until no one is reading the data
map.clear();
map.put("latestFromDb",readData());
}
The requirement is that my web service should wait until all the reading threads are finished. So that the readers should get either the old data fully or the new data fully.
Edit: The reading threads should not wait for anything. The reading thread can wait for the web service(Since this is infrequent). but the reading threads should not wait for other reading threads.
Which java 8 lock/mechanism/design pattern I can use to implement this.

Since the reading operation queries the map multiple times, the only way to get a consistent result (without locking), is to never modify the map at all.
In other words, instead of calling clear(), instantiate a new map, populate it, and change the map reference to point to the new map, as an atomic update. Don’t forget that the read operation must not read map multiple times then, but copy the current reference into a local variable right at the beginning.
volatile Map<String, Object> map;
public void processData(){
Map<String, Object> map = this.map;
...
map.get("something");
...
map.get("someOther");
...
}
public void refreshService() {
Map<String, Object> map = new HashMap<>();
map.put("latestFromDb", readData());
...
this.map = Collections.unmodifiableMap(map);
}
Wrapping the map with unmodifiableMap is not strictly necessary, but will help spotting errors, as any modification beyond this point would lead to nondeterministic behavior, including the possibility to never spot the problem during testing. Therefore, replacing the nondeterministic behavior with deterministically throwing an exception on subsequent modification attempts, is preferable.

Related

HazelCast IMap why EntryProcessor required when updating map values since its a distributed map(single instance scenario)?

I am using Hazelcast for distributed cache (only one instance). In once scenario I am trying to update a value in map.
Based on https://stackoverflow.com/a/33351291/1212903 it seems I have to use EntryProcessor when doing update operations since its atomic.
Why I have to use EntryProcessor even if IMap is distributed?
In Entry processor code I don't technically understand the usage of BackupProcessor exactly from the documentation since its distributed.
Why the process method returns an Object and it has no effect (we have to set the updated value to Map.entry.setValue() to actually update).
public class AnalysisResponseProcessor implements EntryProcessor<String, AnalysisResponseMapper> {
#Override
public Object process(Map.Entry<String, AnalysisResponseMapper> entry) {
AnalysisResponseMapper analysisResponseMapper = entry.getValue();
analysisResponseMapper.increaseCount();
entry.setValue(analysisResponseMapper);
return analysisResponseMapper;
}
#Override
public EntryBackupProcessor<String, AnalysisResponseMapper> getBackupProcessor() {
return null;
}
}
How to deal with this scenario?
Answers to your questions:
whether the map is distributed or not, it can be accessed concurrently. If you do a series of get and put, someone else can modify the value in the meantime and you will overwrite the update. If you use EntryProcessor, you can read and update the value in one atomic operation. If only one client updates the map, you can use get and put. The entry processor also needs one network round-trip instead of two.
you can return null for backup processor if you have 0 backups. But if you ever decide to add a backup, then the backup will not be updated. It might be easier to extend AbstractEntryProcessor where you don't have to deal with the backup processor, it will execute the same logic on main and backup replicas. It's only worth to do the backup processor manually if the computation inside the process method is heavy.
the return value from the process() method isn't the updated entry value, but a value that will be returned from map.executeOnKey() method. If you don't need it, just return null.

Confused about safe publishing and visibility in Java, especially with Immutable Objects

When I read the Java Concurrency in Practice by Brian Goetz, I recall him saying "Immutable objects, on the other hand, can be safely accessed even when synchronization is not used to publish the object reference" in the chapter about visibility.
I thought that this implies that if you publish an immutable object, all fields(mutable final references included) are visible to other threads that might make use of them and at least up to date to when that object finished construction.
Now, I read in https://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html that
"Now, having said all of this, if, after a thread constructs an immutable object (that is, an object that only contains final fields), you want to ensure that it is seen correctly by all of the other thread, you still typically need to use synchronization. There is no other way to ensure, for example, that the reference to the immutable object will be seen by the second thread. The guarantees the program gets from final fields should be carefully tempered with a deep and careful understanding of how concurrency is managed in your code."
They seem to contradict each other and I am not sure which to believe.
I have also read that if all fields are final then we can ensure safe publication even if the object is not per say immutable.
For example, I always thought that this code in Brian Goetz's concurrency in practice was fine when publishing an object of this class due to this guarantee.
#ThreadSafe
public class MonitorVehicleTracker {
#GuardedBy("this")
private final Map<String, MutablePoint> locations;
public MonitorVehicleTracker(
Map<String, MutablePoint> locations) {
this.locations = deepCopy(locations);
}
public synchronized Map<String, MutablePoint> getLocations() {
return deepCopy(locations);
}
public synchronized MutablePoint getLocation(String id) {
MutablePoint loc = locations.get(id);
return loc == null ? null : new MutablePoint(loc);
}
public synchronized void setLocation(String id, int x, int y) {
MutablePoint loc = locations.get(id);
if (loc == null)
throw new IllegalArgumentException("No such ID: " + id);
loc.x = x;
loc.y = y;
}
private static Map<String, MutablePoint> deepCopy(
Map<String, MutablePoint> m) {
Map<String, MutablePoint> result =
new HashMap<String, MutablePoint>();
for (String id : m.keySet())
result.put(id, new MutablePoint(m.get(id)));
return Collections.unmodifiableMap(result);
}
}
public class MutablePoint { /* Listing 4.5 */ }
For example, in this code example, what if that final guarantee is false and a thread made an instance of this class and then the reference to that object is not null, but the field locations is null at the time another thread uses that class?
Once again, I don't know which is correct or if I happened to misinterpret both the article or Goetz
This question has been answered a few times before but I feel that many of those answers are inadequate. See:
https://stackoverflow.com/a/14617582
https://stackoverflow.com/a/35169705
https://stackoverflow.com/a/7887675
Effectively Immutable Object
etc...
In short, Goetz's statement in the linked JSR 133 FAQ page is more "correct", although not in the way that you are thinking.
When Goetz says that immutable objects are safe to use even when published without synchronization, he means to say that immutable objects that are visible to different threads are guaranteed to retain their original state/invariants, all else remaining the same. In other words, properly synchronized publication is not necessary to maintain state consistency.
In the JSR-133 FAQ, when he says that:
you want to ensure that it is seen correctly by all of the other thread (sic)
He is not referring to the state of the immutable object. He means that you must synchronize publication in order for another thread to see the reference to the immutable object. There's a subtle difference to what the two statements are talking about: while JCIP is referring to state consistency, the FAQ page is referring to access to a reference of an immutable object.
The code sample you provided has nothing, really, to do with anything that Goetz says here, but to answer your question, a correctly initializedfinal field will hold its expected value if the object is properly initialized (beware the difference between initialization and publication). The code sample also synchronizes access to the locations field so as to ensure updates to the final field are thread-safe.
In fact, to elaborate further, I suggest that you look at JCIP listing 3.13 (VolatileCachedFactorizer). Notice that even though OneValueCache is immutable, that it is stored in a volatile field. To illustrate the FAQ statement, VolatileCachedFactorizer will not work correctly without volatile. "Synchronization" is referring to using a volatile field in order to ensure that updates made to it are visible to other threads.
A good way to illustrate the first JCIP statement is to remove volatile. In this case, the CachedFactorizer won't work. Consider this: what if one thread set a new cache value, but another thread tried to read the value and the field was not volatile? The reader might not see the updated OneValueCache. BUT, recalling that Goetz refers to the state of the immutable object, IF the reader thread happened to see an up-to-date instance of OneValueCache stored at cache, then the state of that instance would be visible and correctly constructed.
So although it is possible to lose updates to cache, it is impossible to lose the state of the OneValueCache if it is read, because it is immutable. I suggest reading the accompanying text stating that "volatile reference used to ensure timely visibility."
As a final example, consider a singleton that uses FinalWrapper for thread safety. Note that FinalWrapper is effectively immutable (depending on whether the singleton is mutable), and that the helperWrapper field is in fact non-volatile. Recalling the second FAQ statement, that synchronization is required for access the reference, how can this "correct" implementation possibly be correct!?
In fact, it is possible to do this here because it is not necessary for threads to immediately see the up-to-date value for helperWrapper. If the value that is held by helperWrapper is non-null, then great! Our first JCIP statement guarantees that the state of FinalWrapper is consistent, and that we have a fully initialized Foo singleton that can be readily returned. If the value is actually null, there are 2 possibilities: firstly, it is possible that it is the first call and it has not been initialized; secondly, it could just be a stale value.
In the case that it is the first call, the field itself is checked again in a synchronized context, as suggested by the second FAQ statement. It will find that this value is still null, and will initialize a new FinalWrapper and publish with synchronization.
In the case that it is just a stale value, by entering the synchronized block, the thread can setup a happens-before order with a preceding write to the field. By definition, if a value is stale, then some writer has already written to the helperWrapper field, and that the current thread just has not seen it yet. By entering into the synchronized block, a happens-before relationship is established with that previous write, since according to our first scenario, a truly uninitialized helperWrapper will be initialized by the same lock. Therefore, it can recover by rereading once the method has entered a synchronized context and obtain the most up-to-date, non-null value.
I hope that my explanations and the accompanying examples that I have given will clear things up for you.

Concurrency in Message Driven Bean - Thread safe Java EE5 vs. EE6

I have a situation where I need a set of operations be enclosed into a single transaction and be thread safe from a MDB.
If thread A executes the instruction 1, do not want other threads can read, at least not the same, data that thread A is processing.
In the code below since IMAGE table contains duplicated data, coming from different sources, this will lead in a duplicated INFRANCTION. Situation that needs to be avoided.
The actual solution that I found is declaring a new transaction for each new message and synchronize the entire transaction.
Simplifying the code:
#Stateless
InfranctionBean{
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
checkInfranction(String plate){
1. imageBean.getImage(plate); // read from table IMAGE
2. infranctionBean.insertInfranction(String plate); // insert into table INFRANCTION
3. imageBean.deleteImage(String plate); //delete from table IMAGE
}
}
#MessageDriven
public class ImageReceiver {
private static Object lock = new Object();
public void onMessage(Message msg){
String plate = msg.plate;
synchronized (lock) {
infanctionBean.checkInfranction(plate);
}
}
}
I am aware that using synchronized blocks inside the EJB is not recommanded by EJB specification. This can lead even in problems if the applicaton server runs in two node cluster.
Seems like EE6 has introduced a solution for this scenario, which is the EJB Singleton.
In this case, my solution would be something like this:
#ConcurrencyManagement(ConcurrencyManagementType.CONTAINER)
#Singleton
InfranctionBean{
#Lock(LockType.WRITE)
checkInfranction(String plate){
1...
2...
3...
}
}
And from MDB would not be neccessary the usage of synchronized block since the container will handle the concurrency.
With #Lock(WRITE) the container guarantees the access of single thread inside checkInfranction().
My queston is: How can I handle this situation in EE5? There is a cleaner solution without using synchronized block?
Environment: Java5,jboss-4.2.3.GA,Oracle10.
ACTUAL SOLUTION
#Stateless
InfranctionBean{
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
checkInfranction(String plate){
1. imageBean.lockImageTable(); // lock table IMAGE in exclusive mode
2. imageBean.getImage(plate); // read from table IMAGE
3. infranctionBean.insertInfranction(String plate); // insert into table INFRANCTION
4. imageBean.deleteImage(String plate); //delete from table IMAGE
}
}
#MessageDriven
public class ImageReceiver {
public void onMessage(Message msg){
infanctionBean.checkInfranction(msg.plate);
}
}
On 20.000 incoming messages (half of them simultaneously) seems the application works ok.
#Lock(WRITE) is only a lock within a single application/JVM, so unless you can guarantee that only one application/JVM is accessing the data, you're not getting much protection anyway. If you're only looking for single application/JVM protection, the best solution in EE 5 would be a ReadWriteLock or perhaps a synchronized block. (The EJB specification has language to dissuade applications from doing this to avoid compromising the thread management of the server, so take care that you don't block indefinitely, that you don't ignore interrupts, etc.)
If you're looking for a more robust cross-application/JVM solution, I would use database locks or isolation levels rather than trying to rely on JVM synchronized primitives. That is probably the best solution regardless of the EJB version being used.

<Spring Batch> Why does making ItemReader thread-safe leads us to loosing restartability?

I have a multi-threaded batch job reading from a DB and I am concerned about different threads re-reading records as ItemReader is not thread safe in Spring batch. I went through SpringBatch FAQ section which states that
You can synchronize the read() method (e.g. by wrapping it in a delegator that does the synchronization). Remember that you will lose restartability, so best practice is to mark the step as not restartable and to be safe (and efficient) you can also set saveState=false on the reader.
I want to know why will I loose re-startability in this case? What has restartability got to do with synchronizing my read operations? It can always try again,right?
Also, will this piece of code be enough for synchronizing the reader?
public SynchronizedItemReader<T> implements ItemReader<T> {
private final ItemReader<T> delegate;
public SynchronizedItemReader(ItemReader<T> delegate) {
this.delegate = delegate;
}
public synchronized T read () {
return delegate.read();
}
}
When using an ItemReader with multithreads, the lack of restartability is not about the read itself. It's about saving the state of the reader which occurs in the update method. The issue is that there needs to be coordination between the calls to read() - the method providing the data and update() - the method persisting the state. When you use multiple threads, the internal state of the reader (and therefore the update() call) may or may not reflect the work that has been done. Take for example the FlatFileItemReader using a chunk size of 5 and running on multiple threads. You could have thread1 having read 5 items (time to update), yet thread 2 could have read an additional 3. This means that the call to update would save that 8 items have been read. If the chunk on thread 2 fails, the state would due incorrect and the restart would miss the three items that were already read.
This is not to say that it is impossible to write a thread safe ItemReader. However, as your example above illustrates, if delegate is a stateful ItemReader (implements ItemStream as well), the state will not be persisted correctly with calls to update (in fact, your example above doesn't even take the ItemStream aspect of stageful readers into account).
If you want make restartable your job, with parallel execution of items, you can save item, that reader read plus state of this item by yourself.

Different threads updating the same object in hibernate

In my application two thread try to update the same entity in a code as follows:
public static <T> T updateEntity(T entity, long id) {
long start = System.currentTimeMillis();
EntityManager em = null;
EntityTransaction tx = null;
try {
em = GenericPersistenceManager.emf.createEntityManager();
tx = em.getTransaction();
tx.begin();
entity = em.merge(entity);
tx.commit();
LoggerMultiplexer.logDBAccess(start, System.currentTimeMillis(),
String.format(OPERATION_UPDATE_ENTITY, entity.getClass().getName(), id));
return entity;
}
...
Sometimes, I get a duplicate key error in the commit line. I guess this occurs when the threads try to update the entity at the same time. Is it possible? I think so, because if I add a synchronized to the function above, I don't get the duplicate key exception. So, do I have to consider such kind of concurrency issues? If so, what would be the proper way, if I have multiple threads trying to update the same object.
In a single node application you could try to lock objects in Session (Pessimistic versioning) when retrieving them from DB.
More on locking. And a bit of advice on hibernate concurrency.
But maybe you should rethink you units of work. Adding locking or synchronized blocks will add a high contention on your application. It is best to bear in mind when you develop a bit of transaction basics. Shortening life span of objects or an Detached Object pattern. Using Optimistic versioning (by adding a version field for example) and then processing errors on concurrent modifications.

Resources