Manually synchronize hashmap without using concurrent and synchronize hashmap - multithreading

I have a hashmap. In which multiple threads will add entires to it. A new thread is created to see all current values in hashmap. If i try this , hashmap throws concurrent Modification exception. It can be overcomed by using concurrentHashMap and SynchronizedHashMap . but i need only my simple hashmap to be get synchronized .
Here is my code
when i am working with it. It throws me concurrent modification exception.

Related

How can I share the data without locking whole part of it?

Consider the following scenarios.
let mut map = HashMap::new();
map.insert(2,5);
thread::scope(|s|{
s.spawn(|_|{
map.insert(1,5);
});
s.spawn(|_|{
let d = map.get(&2).unwrap();
});
}).unwrap();
This code cannot be compiled because we borrow the variable map mutably in h1 and borrow again in h2. The classical solution is wrapping map by Arc<Mutex<...>>. But in the above code, we don't need to lock whole hashmap. Because, although two threads concurrently access to same hashmap, they access completely different region of it.
So I want to share map through thread without using lock, but how can I acquire it? I'm also open to use unsafe rust...
in the above code, we don't need to lock whole hashmap
Actually, we do.
Every insert into the HashMap may possibly trigger its reallocation, if the map is at that point on its capacity. Now, imagine the following sequence of events:
Second thread calls get and retrieves reference to the value (at runtime it'll be just an address).
First thread calls insert.
Map gets reallocated, the old chunk of memory is now invalid.
Second thread dereferences the previously-retrieved reference - boom, we get UB!
So, if you need to insert something in the map concurrently, you have to synchronize that somehow.
For the standard HashMap, the only way to do this is to lock the whole map, since the reallocation invalidates every element. If you used something like DashMap, which synchronizes access internally and therefore allows inserting through shared reference, this would require no locking from your side - but can be more cumbersome in other parts of API (e.g. you can't return a reference to the value inside the map - get method returns RAII wrapper, which is used for synchronization), and you can run into unexpected deadlocks.

Java multi threading - Write only if no other threads are reading

I have a map which holds some data similar to an in memory cache.
Map<String,Object> map;
There are multiple theads which are reading the data from the map. Each thread may read the data more than one time.
public void processData(){
...
map.get("something");
...
map.get("someOther");
...
}
The map needs to be updated with latest values from database. I use a webservice for that.
public void refreshService(){
//this code should wait until no one is reading the data
map.clear();
map.put("latestFromDb",readData());
}
The requirement is that my web service should wait until all the reading threads are finished. So that the readers should get either the old data fully or the new data fully.
Edit: The reading threads should not wait for anything. The reading thread can wait for the web service(Since this is infrequent). but the reading threads should not wait for other reading threads.
Which java 8 lock/mechanism/design pattern I can use to implement this.
Since the reading operation queries the map multiple times, the only way to get a consistent result (without locking), is to never modify the map at all.
In other words, instead of calling clear(), instantiate a new map, populate it, and change the map reference to point to the new map, as an atomic update. Don’t forget that the read operation must not read map multiple times then, but copy the current reference into a local variable right at the beginning.
volatile Map<String, Object> map;
public void processData(){
Map<String, Object> map = this.map;
...
map.get("something");
...
map.get("someOther");
...
}
public void refreshService() {
Map<String, Object> map = new HashMap<>();
map.put("latestFromDb", readData());
...
this.map = Collections.unmodifiableMap(map);
}
Wrapping the map with unmodifiableMap is not strictly necessary, but will help spotting errors, as any modification beyond this point would lead to nondeterministic behavior, including the possibility to never spot the problem during testing. Therefore, replacing the nondeterministic behavior with deterministically throwing an exception on subsequent modification attempts, is preferable.

HashMap vs ConcurrentHashMap for a threadsafe value type

HashMap vs ConcurrentHashMap, when the value is AtomicInteger or LongAdder, is there any harm in using HashMap in a multithreaded environment ?
Yes, there is.
An object being of type AtomicInteger or LongAdder just means that the object itself is safe in a concurrent modification operation (i.e. if two threads try to modify it, they will do so one after the other). However, if the map containing the objects itself is of type 'HashMap', then concurrent modification operations of the map are not safe. For instance, if you want to add a key-value pair only if the key doesn't already exist in the map, you cannot safely use the putIfAbset() operation anymore because it's not synchronized/thread-safe in HashMap. And if you do use it, then it is possible that two threads will execute call this method at the same time, both of them reaching the conclusion that the map doesn't have they key, and then both of them adding a key-value pair, resulting in one of them overwriting the other other's value.
You cannot use a HashMap in a multithreaded environment. The reason is as follows:
If multiple threads operate on a simple HashMap they can damage the internal structure of the HashMap which is an array of linked lists. The links can go missing or go in circles. The result will be that the HashMap will be totally unusable and corrupt. This is the reason you should always use a concurrentHashMap in a multithreaded environment regardless of what value you want to store in the map itself.
Now, in a concurrentHashMap of a type say map< String val, 'any number'> 'any number' could be a LongAdder or an AtomicLong etc. Remember that not all operations on a concurrentHashMap are threadsafe by default. Therefore, if you use say a LongAdder then you could write the following atomic operation without any need to synchronize:
map.putIfAbsent(a string key, new LongAdder());
map.get("abc").increment();

are class level property or variables thread safe

I always had this specific scenario worry me for eons. Let's say my class looks like this
public class Person {
public Address Address{get;set;}
public string someMethod()
{}
}
My question is, I was told by my fellow developers that the Address propery of type Address, is not thread safe.
From a web request perspective, every request is run on a separate thread and every time
the thread processes the following line in my business object or code behind, example
var p = new Person();
it creates a new instance of Person object on heap and so the instance is accessed by the requesting thread, unless and otherwise I spawn multiple threads in my application.
If I am wrong, please explain to me why I am wrong and why the public property (Address) is not thread safe?
Any help will be much appreciated.
Thanks.
If the reference to your Person instance is shared among multiple threads then multiple threads could potentially change Address causing a race condition. However unless you are holding that reference in a static field or in Session (some sort of globally accessible place) then you don't have anything to be worried about.
If you are creating references to objects in your code like you have show above (var p = new Person();) then you are perfectly thread safe as other threads will not be able to access the reference to these objects without resorting to nasty and malicious tricks.
Your property is not thread safe, because you have no locking to prevent multiple writes to the property stepping on each others toes.
However, in your scenario where you are not sharing an instance of your class between multiple threads, the property doesn't need to be thread safe.
Objects that are shared between multiple threads, where each thread can change the state of the object, then all state changes need to be protected so that only one thread at a time can modify the object.
You should be fine with this, however there are a few things I'd worry about...
If your Person object was to be modified or held some disposable resources, you could potentially find that one of the threads will be unable to read this variable. To prevent this, you will need to lock the object before read/writing it to ensure it won't be trampled on by other threads. The easiest way is by using the lock{} construct.

examples of garbage collection bottlenecks

I remembered someone telling me one good one. But i cannot remember it. I spent the last 20mins with google trying to learn more.
What are examples of bad/not great code that causes a performance hit due to garbage collection ?
from an old sun tech tip -- sometimes it helps to explicitly nullify references in order to make them eligible for garbage collection earlier:
public class Stack {
private static final int MAXLEN = 10;
private Object stk[] = new Object[MAXLEN];
private int stkp = -1;
public void push(Object p) {stk[++stkp] = p;}
public Object pop() {return stk[stkp--];}
}
rewriting the pop method in this way helps ensure that garbage collection gets done in a timely fashion:
public Object pop() {
Object p = stk[stkp];
stk[stkp--] = null;
return p;
}
What are examples of bad/not great code that causes a performance hit due to garbage collection ?
The following will be inefficient when using a generational garbage collector:
Mutating references in the heap because write barriers are significantly more expensive than pointer writes. Consider replacing heap allocation and references with an array of value types and an integer index into the array, respectively.
Creating long-lived temporaries. When they survive the nursery generation they must be marked, copied and all pointers to them updated. If it is possible to coalesce updates in order to reuse of an old version of a collection, do so.
Complicated heap topologies. Again, consider replacing many references with indices.
Deep thread stacks. Try to keep stacks shallow to make it easier for the GC to collate the global roots.
However, I would not call these "bad" because there is nothing objectively wrong with them. They are only inefficient when used with this kind of garbage collector. With manual memory management, none of the issues arise (although many are replaced with equivalent issues, e.g. performance of malloc vs pool allocators). With other kinds of GC some of these issues disappear, e.g. some GCs don't have a write barrier, mark-region GCs should handle long-lived temporaries better, not all VMs need thread stacks.
When you have some loop involving the creation of new object's instances: if the number of cycles is very high you procuce a lot of trash causing the Garbage Collector to run more frequently and so decreasing performance.
One example would be object references that are kept in member variables oder static variables. Here is an example:
class Something {
static HugeInstance instance = new HugeInstance();
}
The problem is the garbage collector has no way of knowing, when this instance is not needed anymore. So its usually better to keep things in local variables and have small functions.
String foo = new String("a" + "b" + "c");
I understand Java is better about this now, but in the early days that would involve the creation and destruction of 3 or 4 string objects.
I can give you an example that will work with the .Net CLR GC:
If you override a finalize method from a class and do not call the super class Finalize method such as
protected override void Finalize(){
Console.WriteLine("Im done");
//base.Finalize(); => you should call him!!!!!
}
When you resurrect an object by accident
protected override void Finalize(){
Application.ObjJolder = this;
}
class Application{
static public object ObjHolder;
}
When you use an object that uses Finalize it takes two GC collections to get rid of the data, and in any of the above codes you won't delete it.
frequent memory allocations
lack of memory reusing (when dealing with large memory chunks)
keeping objects longer than needed (keeping references on obsolete objects)
In most modern collectors, any use of finalization will slow the collector down. And not just for the objects that have finalizers.
Your custom service does not have a load limiter on it, so:
A lot requests come in for some reason at the same time (everyone logs on in the morning say)
The service takes longer to process each requests as it now has 100s of threads (1 per request)
Yet more part processed requests builds up due to the longer processing time.
Each part processed request has created lots of objects that live until the end of processing that request.
The garbage collector spends lots of time trying to free memory it, however it can’t due to the above.
Yet more part processed requests builds up due to the longer processing time…. (including time in GC)
I have encountered a nice example while doing some parallel cell based simulation in Python. Cells are initialized and sent to worker processes after pickling for running. If you have too many cells at any one time the master node runs out of ram. The trick is to make a limited number of cells pack them and send them off to cluster nodes before making some more, remember to set the objects already sent off to "None". This allows you to perform large simulations using the total RAM of the cluster in addition to the computing power.
The application here was cell based fire simulation, only the cells actively burning were kept as objects at any one time.

Resources