How to safely add new key-value pair into go native map

How to safely add new key-value pair into go native map - multithreading

I want to add a new key-value pair into Golang map from concurrent threads. Problem is that if there is a key present in the map we don't create new pair. From multithreaded perspective how to check the current condition and if key isn't present insert the key - value.
Is there any way to organize code to add key safely when first encountered?
The main problem is safely initializing mutex

Is there any way to organize code to add key safely when first encountered?
No. You need proper synchronisation.

I would recommend the combination of sync.Map to store the key-values and sync.Once inside of the value to perform the one-time initialization.
Here is an example:
type Value struct {
init sync.Once
someValue string
}
func (v *Value) Init() {
v.init.Do(func() {
// This function will only be executed one time
v.someValue = "initialized"
})
}
func main() {
var m sync.Map
v1, _ := m.LoadOrStore("key", &Value{})
v1.(*Value).Init() // init function is called
v2, _ := m.LoadOrStore("key", &Value{})
v2.(*Value).Init() // init function is not called
}

Related

sync.Map or channels when using a goroutines

I'm writing program that parses a lot of files looking for "interesting" lines. Then it's checking if these lines were seen before. Each file is parsed using separate goroutine.
I'm wondering which approach is better:
Use sync.Map or something similar
Use channels and separate goroutine which should be responsible only for uniqueness check (probably using standard map). It would receive request and respond with something simple like "Not unique" or "Unique (and added)"
Is any of these solutions more popular or maybe both are wrong?

If you would like to have workers which can access a global map for unique checking, you can use a sync.RWMutex to be sure that the map is protected, like:
var (
mutex sync.RWMutex = sync.RWMutex{}
alreadySeen map[string]struct{} = make(map[string]struct{})
)
func Work() {
for {
Processing lines here...
//Checking
mutex.RLock() //Lock for reading only
if _, found := alreadySeen[line]; !found {
mutex.RUnLock()
mutex.Lock()
alreadySeen[line] = struct{}{}
mutex.UnLock()
} else {
mutex.RUnLock()
}
}
}
Another approach is to use a concurrent safe map to skip the whole mutexing, for example this package: https://github.com/cornelk/hashmap

How to allow only certain cyclic dependencies in ArchUnit?

In ArchUnit, I can check that packages .should().beFreeOfCycles(). How can I specify exceptions to this rule for certain cycles?
E.g., given these packages and their dependencies:
A <-> B <-> C
How can I allow A <-> B, but still forbid A and B being part of any other cycle, e.g. B <-> C?

Freezing Arch Rules is always an option to allow for certain violations, but catch others.
Would that be feasible in your case?

Based on Manfred's answer, here is a solution that seems to work fine (implemented in Kotlin):
fun test() {
SlicesRuleDefinition.slices().matching("(com.mypackage.*..)")
.should()
// Using an extension method which allows us to specify allowed
// cycles succinctly:
.beFreeOfCyclesExcept("com.mypackage.a" to "com.mypackage.b")
.check(someClasses)
}
fun SlicesShould.beFreeOfCyclesExcept(
vararg allowed: Pair<String, String>
): ArchRule =
FreezingArchRule
// In case you are not familiar with Kotlin:
// We are in an extension method. 'this' will be substituted with
// 'SlicesRuleDefinition.slices().matching("(com.mypackage.*..)")
// .should()';
.freeze(this.beFreeOfCycles())
.persistIn(
// Using a custom ViolationStore instead of the default
// TextFileBasedViolationStore so we can configure the
// allowed violations in code instead of a text file:
object : ViolationStore {
override fun initialize(properties: Properties) {}
override fun contains(rule: ArchRule): Boolean = true
override fun save(rule: ArchRule, violations: List<String>) {
// Doing nothing here because we do not want ArchUnit
// to add any additional allowed violations.
}
override fun getViolations(rule: ArchRule): List<String> =
allowed
// ArchUnit records cycles in the form
// A -> B -> A. I.e., A -> B -> A and
// B -> A -> B are different violations.
// We add the reverse cycle to make sure
// both directions are allowed:
.flatMap { pair ->
listOf(pair, Pair(pair.second, pair.first))
}
// .distinct() is not necessary, but using it is
// cleaner because by adding the reverse cycles
// we may possibly have added duplicates:
.distinct()
.map { (sliceA, sliceB) ->
// This is a prefix of the format that
// ArchUnit uses:
"Cycle detected: Slice $sliceA -> \n" +
" Slice $sliceB -> \n" +
" Slice $sliceA\n"
}
}
)
// The lines that ArchUnit uses are very specific, including
// info about which methods etc. create the cycle. That is
// exactly what is desirable when establishing a baseline for
// legacy code. But we want to permanently allow certain
// cycles, regardless of which current or future code creates
// the cycle. Thus, we only compare the prefixes of violation
// lines:
.associateViolationLinesVia {
lineFromFirstViolation,
lineFromSecondViolation ->
lineFromFirstViolation.startsWith(lineFromSecondViolation)
}
Note that I have tested this only on a small project.

iterating over dashmap stored in an arc

let dashMap:DashMap<String,String>:DashMap::new();
// Then I put it into app_data of Actix to use it as a global variable,
// which is an Arc.
// Then when I want to use it in iteration :
for item:RefMulti<String,String> in dashMap.into_iter() {
...
}
I need to iterate the dashMap.
I'm not sure what to do with item as it is RefMulti type. I was expecting (k,v)
How would I access the key from iteration ?
Is there a nice way to access the (key, value) pairs even from RefMulti ?
[UPDATED WITH MRE]
//cargo.toml
actix-web = { version = "4.0.0-beta.6", features=["rustls"] }
dashmap = "4.0.2"
[code] is a gist, couldn't post it here. was getting formatting errors

golang threading model comparison

I have a piece of data
type data struct {
// all good data here
...
}
This data is owned by a manager and used by other threads for reading only. The manager needs to periodically update the data. How do I design the threading model for this? I can think of two options:
1.
type manager struct {
// acquire read lock when other threads read the data.
// acquire write lock when manager wants to update.
lock sync.RWMutex
// a pointer holding a pointer to the data
p *data
}
2.
type manager struct {
// copy the pointer when other threads want to use the data.
// When manager updates, just change p to point to the new data.
p *data
}
Does the second approach work? It seems I don't need any lock. If other threads get a pointer pointing to the old data, it would be fine if manager updates the original pointer. As GoLang will do GC, after all other threads read the old data it will be auto released. Am I correct?

Your first option is fine and perhaps simplest to do. However, it could lead to poor performance with many readers as it could struggle to obtain a write lock.
As the comments on your question have stated, your second option (as-is) can cause a race condition and lead to unpredictable behaviour.
You could implement your second option by using atomic.Value. This would allow you to store the pointer to some data struct and atomically update this for the next readers to use. For example:
// Data shared with readers
type data struct {
// all the fields
}
// Manager
type manager struct {
v atomic.Value
}
// Method used by readers to obtain a fresh copy of data to
// work with, e.g. inside loop
func (m *manager) Data() *data {
return m.v.Load().(*data)
}
// Internal method called to set new data for readers
func (m *manager) update() {
d:=&data{
// ... set values here
}
m.v.Store(d)
}

ConcurrentModificationException with WeakHashMap

I have the code below but I'm getting ConcurrentModificationException, how should I avoid this issue? (I have to use WeakHashMap for some reason)
WeakHashMap<String, Object> data = new WeakHashMap<String, Object>();
// some initialization code for data
for (String key : data.keySet()) {
if (data.get(key) != null && data.get(key).equals(value)) {
//do something to modify the key
}
}

The Javadoc for WeakHashMap class explains why this would happen:
Map invariants do not hold for this class. Because the garbage
collector may discard keys at any time, a WeakHashMap may behave as
though an unknown thread is silently removing entries
Furthermore, the iterator generated under the hood by the enhanced for-loop you're using is of fail-fast type as per quoted explanation in that javadoc.
The iterators returned by the iterator method of the collections
returned by all of this class's "collection view methods" are
fail-fast: if the map is structurally modified at any time after the
iterator is created, in any way except through the iterator's own
remove method, the iterator will throw a
ConcurrentModificationException. Thus, in the face of concurrent
modification, the iterator fails quickly and cleanly, rather than
risking arbitrary, non-deterministic behavior at an undetermined time
in the future.
Therefore your loop can throw this exception for these reasons:
Garbage collector has removed an object in the keyset.
Something outside the code added an object to that map.
A modification occurred inside the loop.
As your intent appears to be processing the objects that are not GC'd yet, I would suggest using an iterator as follows:
Iterator<String> it = data.keySet().iterator();
int count = 0;
int maxTries = 3;
while(true) {
try {
while (it.hasNext()) {
String str = it.next();
// do something
}
break;
} catch (ConcurrentModificationException e) {
it = data.keySet().iterator(); // get a new iterator
if (++count == maxTries) throw e;
}
}

You can clone the key set first, but note that you hold the strong reference after that:
Set<KeyType> keys;
while(true) {
try {
keys = new HashSet<>(weakHashMap.keySet());
break;
} catch (ConcurrentModificationException ignore) {
}
}
for (KeyType key : keys) {
// ...
}

WeakHashMap's entries are automatically removed when no ordinary use of the key is realized anymore, this may happens in a different thread. While cloning the keySet() into a different Set a concurrent Thread may remove entries meanwhile, in this case a ConcurrentModificationException will 100% be thrown! You must synchronize the cloning.
Example:
Collections.synchronizedMap(data);
Please understand that
Collections.synchronizedSet(data.keySet());
Can not be used because data.keySet() rely on data's instance who is not synchronized here! More detail: synchronize(keySet) prevents the execution of methods on the keySet but keySet's remove-method is never called but WeakHashMap's remove-method is called so you have to synchronize over WeakHashMap!

Probably because your // do something in the iteration is actually modifying the underlying collection.
From ConcurrentModificationException:
For example, if a thread modifies a collection directly while it is iterating over the collection with a fail-fast iterator, the iterator will throw this exception.
And from (Weak)HashMap's keySet():
Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to safely add new key-value pair into go native map - multithreading

Is there any way to organize code to add key safely when first encountered? No. You need proper synchronisation.

Related

sync.Map or channels when using a goroutines

How to allow only certain cyclic dependencies in ArchUnit?

iterating over dashmap stored in an arc

golang threading model comparison

ConcurrentModificationException with WeakHashMap

Categories

Resources