How to implement thread-safe map of maps in golang?

How to implement thread-safe map of maps in golang? - multithreading

I am working on a multi-threaded module and need to implement map of map in golang - map[outer]map[inner]*some_struct. The outer key(map[outer]) will be accessed by multiple threads(goroutines) to add key to inner map. I have a doubt if multiple threads can concurrently add keys to inner map, for a common outer key - map[outer]. Is it thread safe and is sync.Map a better option ?
Also outer key- map[outer] and total number of outer keys are known at runtime so can't define locks beforehand.
To better understand the problem statement, we can take example of add information about different cities. We can group cities by states. Each thread represents a city. To add info about a city, first thread needs to check outer key - state,(map[state]) and then each thread will simply add info to map[state][city] = &some_struct{x:y,y:z}.
I have read few articles and found out sync.Map is suitable for concurrent map operations and these operations are performed atomically. But in documentation one of the use-case mentioned was - when multiple goroutines read, write, and overwrite entries for disjoint sets of keys.
It will be helpful if someone can suggest thread-safe approach for this problem statement.

You must thing in OO terms
What do you want to represent as map of map?
Map state, city make some sense. However what kind of operations do you want to do?
Write and Read, concurrent? Why?
Do you want to iterate over all cities? Do you need to delete cities/states?
Imagine the following interface
type DB interface {
Exists(state, city string) bool
Get(state, city string) *some_struct
Set(state, city string, data *some_struct)
Delete(state, city string)
DeleteState(state string)
ForeachCitiesInState(state string, func(city string, data *some_struct) bool)
Foreach(func(state, city…))
}
With this interface we can consider:
use a struct with a Mutex and map of maps to control the access on each read/write/delete
same as 1 but with Read Write Mutex if you have more reads than writes
if you don’t need loop over cities on a particular state, perhaps
you can create a map[ composite key ] struct like state:city to
simplify.
If you will load it from another place with a constant time interval, perhaps you should use atomic.Value to store the big map. Update is just a substitution for a more recent map.
Perhaps you can combine several rw locks. For instance one for state and another for city. You can split like
type states struct {
sync.Mutex
map[ stateName ]state
}
type state struct {
sync.Mutex
map[ cityFirstLetter ]cities
}
type cities struct {
sync.Mutex
map[ cityName ] *some_struct
}
Ideas:
Define the interface
Define (or measure) the real scenario of usage
Write benchmarks
Be careful by return a pointer to data. You can change the internal state. Consider return a copy or an interface

Related

How to use Qt multithreading for parallel list processing?

I'm using qt to make software that analyses a large amount of data. The data consist of individual "Uber" orders with information such as order time, start location, and end location, and I need to be able to evaluate the data such as plotting the graph of demand over time.
To do this, I have to check every record of the data and sum it onto a new data table according to its timestamp, this takes a long time so my initial solution is to use QtConcurrent::filterReduced to get my sum.
However, the filter function cannot take extra arguments to filter the data based on the time interval I want.
My question is, is there another quick and easy solution for this kind of problem? Or do I need to use QThread's low-level API for this, if so, any examples/tutorials on how I can achieve that?

Instead of passing a function, you can pass a function object which holds the "parameter".
Something like this (T is your datatype here):
struct FilterWithTime
{
FilterWithTime(const QString &filterPredicate)
: m_filterPredicate(filterPredicate) { }
typedef bool result_type;
bool operator()(const T &value)
{
... test value against filterPredicate
}
QString m_filterPredicate;
};
QtConcurrent::filterReduced<ResultType>(your-list-of-T, FilterWithTime(QString("10-12"), YourTransformationObject()));
Note the explicit instantiation with ResultType!!

How can the memory address of a struct be the same inside a function and after the struct has been returned?

The phrase "when scope exits the values get automatically popped from stack" is repeated many times, but the example I provide here disproves the statement:
fn main() {
let foo = foobar();
println!("The address in main {:p}", &foo);
}
fn foobar() -> Employee {
let emp = Employee {
company: String::from("xyz"),
name: String::from("somename"),
age: 50,
};
println!("The address inside func {:p}", &emp);
emp
}
#[derive(Debug)]
struct Employee {
name: String,
company: String,
age: u32,
}
The output is:
The address inside func 0x7fffc34011e8
The address in main 0x7fffc34011e8
This makes sense. When I use Box to create the struct the address differs as I expected.
If the function returns ownership (move) of the return value to the caller, then after the function execution the memory corresponds to that function gets popped which is not safe, then how is the struct created inside the function accessible even after the function exits?
The same things happens when returning an array. Where are these elements stored in memory, whether in the stack or on the heap?
Will the compiler do escape analysis at compile time and move the values to the heap like Go does?
I'm sure that Employee doesn't implement the Copy trait.

In many languages, variables are just a convenient means for
humans to name some values.
Even if on a logical point of view we can assume that there is one
specific storage for each specific variable, and we can reason about
this in terms of copy, move... it does not imply that these copies
and moves physically happen (and notably because of the optimizer).
Moreover, when reading various documents about Rust, we often find
the term binding instead of variable; this reinforces the idea
that we just refer to a value that exists somewhere.
It is exactly the same as writing let a=something(); then let b=a;,
the again let c=b;... we simply change our mind about the name
but no data is actually moved.
When it comes to debugging, the generated code is generally
sub-optimal by giving each variable its own storage
in order to inspect these variables in memory.
This can be misleading about the true nature of the optimised code.
Back to your example, you detected that Rust decided to perform
a kind of return-value-optimization (common C++ term nowadays)
because it knows that a temporary value must appear in the calling
context to provide the result, and this result comes from a local
variable inside the function.
So, instead of creating two different storages and copying or moving from
one to another, it is better to use the same storage: the local
variable is stored outside the function (where the result is
expected).
On the logical point of view it does not change anything but it
is much more efficient.
And when code inlining comes into play, no one can predict where
our variables/values/bindings are actually stored.
Some comments below state that this return-value-optimisation
can be counted on since it takes place in the Rust ABI.
(I was not aware of that, still a beginner ;^)

HashMap vs ConcurrentHashMap for a threadsafe value type

HashMap vs ConcurrentHashMap, when the value is AtomicInteger or LongAdder, is there any harm in using HashMap in a multithreaded environment ?

Yes, there is.
An object being of type AtomicInteger or LongAdder just means that the object itself is safe in a concurrent modification operation (i.e. if two threads try to modify it, they will do so one after the other). However, if the map containing the objects itself is of type 'HashMap', then concurrent modification operations of the map are not safe. For instance, if you want to add a key-value pair only if the key doesn't already exist in the map, you cannot safely use the putIfAbset() operation anymore because it's not synchronized/thread-safe in HashMap. And if you do use it, then it is possible that two threads will execute call this method at the same time, both of them reaching the conclusion that the map doesn't have they key, and then both of them adding a key-value pair, resulting in one of them overwriting the other other's value.

You cannot use a HashMap in a multithreaded environment. The reason is as follows:
If multiple threads operate on a simple HashMap they can damage the internal structure of the HashMap which is an array of linked lists. The links can go missing or go in circles. The result will be that the HashMap will be totally unusable and corrupt. This is the reason you should always use a concurrentHashMap in a multithreaded environment regardless of what value you want to store in the map itself.
Now, in a concurrentHashMap of a type say map< String val, 'any number'> 'any number' could be a LongAdder or an AtomicLong etc. Remember that not all operations on a concurrentHashMap are threadsafe by default. Therefore, if you use say a LongAdder then you could write the following atomic operation without any need to synchronize:
map.putIfAbsent(a string key, new LongAdder());
map.get("abc").increment();

HashMap in OpenCL?

Is it possible to create a simple HashMap in OpenCL? E.g. one where all keys have type long and all values type int, and that never has to be modified (i.e. is passed read-only to the kernel).
Construction of the HashMap can take time (is it done once on the CPU and never has to be modified again), but read-access will be frequent, so get(long key, *hashmap H) should be cheap.
Are there any known implementations for this in OpenCL? I failed to find them. In case I'd have to write one from scratch, which HashMap implementation would be most suitable for this use?

I think that a simple hash table implementation using open addressing could fulfill your requirements here:
By its nature it is stored on a single buffer, and thus trivial to transfer to the kernels.
It's then easy to write the getter logic in the kernel, especially when you don't need any synchronization (read-only).
So, pass a buffer of long2 or a buffer of struct { long key; int val; }, when the first item is the key and the second the value, and also pass the buffer size; now write a regular open-address getter.

How to make atomic exchange -- Scala way?

Problem
I have such code
var ls = src.iter.toList
src.iter = ls.iterator
(this is part of copy constructor of my iterator-wrapper) which reads the source iterator, and in next line set it back. The problem is, those two lines have to be atomic (especially if you consider that I change the source of copy constructor -- I don't like it, but well...).
I've read about Actors but I don't see how they fit here -- they look more like a mechanism for asynchronous execution. I've read about Java solutions and using them in Scala, for example: http://naedyr.blogspot.com/2011/03/atomic-scala.html
My question is: what is the most Scala way to make some operations atomic? I don't want to use some heavy artillery for this, and also I would not like to use some external resources. In other words -- something that looks and feels "right".
I kind like the solution presented in the above link, because this is what I exactly do -- exchange references. And if I understand correctly, I would guard only those 2 lines, and other code does not have to be altered! But I will wait for definitive answer.
Background
Because every Nth question, instead of answer I read "but why do you use...", here:
How to copy iterator in Scala? :-)
I need to copy iterator (make a fork) and such solution is the most "right" I read about. The problem is, it destroys the original iterator.
Solutions
Locks
For example here:
http://www.ibm.com/developerworks/java/library/j-scala02049/index.html
The only problem I see here, that I have to put lock on those two lines, and every other usage on iter. It is minor thing now, but when I add some code, it is easy to forget to add additional lock.
I am not saying "no", but I have no experience, so I would like to get answer from someone who is familiar with Scala, to point a direction -- which solution is the best for such task, and in long-run.
Immutable iterator
While I appreciate the explanation by Paradigmatic, I don't see how such approach fits my problem. The thing is IteratorWrapper class has to wrap iterator -- i.e. raw iterator should be hidden within the class (usually it is done by making it private). Such methods as hasNext() and next() should be wrapped as well. Normally next() alters the state of the object (iterator) so in case of immutable IteratorWrapper it should return both new IteratorWrapper and status of next() (successful or not). Another solution would be returning NULL if raw next() fails, anyway, this makes using such IteratorWrapper not very handy.
Worse, there is still not easy way to copy such IteratorWrapper.
So either I miss something, or actually classic approach with making piece of code atomic is cleaner. Because all the burden is contained inside the class, and the user does not have to pay the price of they way IteratorWrapper handles the data (raw iterator in this case).

Scala approach is to favor immutability whenever it is possible (and it's very often possible). Then you do not need anymore copy constructors, locks, mutex, etc.
For example, you can convert the iterator to a List at object construction. Since lists are immutable, you can safely share them without having to lock:
class IteratorWrapper[A]( iter: Iterator[A] ) {
val list = iter.toList
def iteratorCopy = list.iterator
}
Here, the IteratorWrapper is also immutable. You can safely pass it around. But if you really need to change the wrapped iterator, you will need more demanding approaches. For instance you could:
Use locks
Transform the wrapper into an Actor
Use STM (akka or other implementations).
Clarifications: I lack information on your problem constraints. But here is how I understand it.
Several threads must traverse simultaneously an Iterator. A possible approach is to copy it before passing the reference to the threads. However, Scala practice aims at sharing immutable objects that do not need to be copied.
With the copy strategy, you would write something like:
//A single iterator producer
class Producer {
val iterator: Iterator[Foo] = produceIterator(...)
}
//Several consumers, living on different threads
class Consumer( p: Producer ) {
def consumeIterator = {
val iteratorCopy = copy( p.iterator ) //BROKEN !!!
while( iteratorCopy.hasNext ) {
doSomething( iteratorCopy.next )
}
}
}
However, it is difficult (or slow) to implement a copy method which is thread-safe. A possible solution using immutability will be:
class Producer {
val lst: List[Foo] = produceIterator(...).toList
def iteratorCopy = list.iterator
}
class Consumer( p: Producer ) {
def consumeIterator = {
val iteratorCopy = p.iteratorCopy
while( iteratorCopy.hasNext ) {
doSomething( iteratorCopy.next )
}
}
}
The producer will call produceIterator once at construction. It it immutable because its state is only a list which is also immutable. The iteratorCopy is also thread-safe, because the list is not modified when creating the copy (so several thread can traverse it simultaneously without having to lock).
Note that calling list.iterator does not traverse the list. So it will not decrease performances in any way (as opposed to really copying the iterator each time).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to implement thread-safe map of maps in golang? - multithreading

Related

How to use Qt multithreading for parallel list processing?

How can the memory address of a struct be the same inside a function and after the struct has been returned?

HashMap vs ConcurrentHashMap for a threadsafe value type

HashMap in OpenCL?

How to make atomic exchange -- Scala way?

Categories

Resources