golang threading model comparison - multithreading

I have a piece of data
type data struct {
// all good data here
...
}
This data is owned by a manager and used by other threads for reading only. The manager needs to periodically update the data. How do I design the threading model for this? I can think of two options:
1.
type manager struct {
// acquire read lock when other threads read the data.
// acquire write lock when manager wants to update.
lock sync.RWMutex
// a pointer holding a pointer to the data
p *data
}
2.
type manager struct {
// copy the pointer when other threads want to use the data.
// When manager updates, just change p to point to the new data.
p *data
}
Does the second approach work? It seems I don't need any lock. If other threads get a pointer pointing to the old data, it would be fine if manager updates the original pointer. As GoLang will do GC, after all other threads read the old data it will be auto released. Am I correct?

Your first option is fine and perhaps simplest to do. However, it could lead to poor performance with many readers as it could struggle to obtain a write lock.
As the comments on your question have stated, your second option (as-is) can cause a race condition and lead to unpredictable behaviour.
You could implement your second option by using atomic.Value. This would allow you to store the pointer to some data struct and atomically update this for the next readers to use. For example:
// Data shared with readers
type data struct {
// all the fields
}
// Manager
type manager struct {
v atomic.Value
}
// Method used by readers to obtain a fresh copy of data to
// work with, e.g. inside loop
func (m *manager) Data() *data {
return m.v.Load().(*data)
}
// Internal method called to set new data for readers
func (m *manager) update() {
d:=&data{
// ... set values here
}
m.v.Store(d)
}

Related

In Kotlin Native, how to keep an object around in a separate thread, and mutate its state from any other thead without using C pointers?

I'm exploring Kotlin Native and have a program with a bunch of Workers doing concurrent stuff
(running on Windows, but this is a general question).
Now, I wanted to add simple logging. A component that simply logs strings by appending them as new lines to a file that is kept open in 'append' mode.
(Ideally, I'd just have a "global" function...
fun log(text:String) {...} ]
...that I would be able to call from anywhere, including from "inside" other workers and that would just work. The implication here is that it's not trivial to do this because of Kotlin Native's rules regarding passing objects between threads (TLDR: you shouldn't pass mutable objects around. See: https://github.com/JetBrains/kotlin-native/blob/master/CONCURRENCY.md#object-transfer-and-freezing ).
Also, my log function would ideally accept any frozen object. )
What I've come up with are solutions using DetachedObjectGraph:
First, I create a detached logger object
val loggerGraph = DetachedObjectGraph { FileLogger("/foo/mylogfile.txt")}
and then use loggerGraph.asCPointer() ( asCPointer() ) to get a COpaquePointer to the detached graph:
val myPointer = loggerGraph.asCPointer()
Now I can pass this pointer into the workers ( via the producer lambda of the Worker's execute function ), and use it there. Or I can store the pointer in a #ThreadLocal global var.
For the code that writes to the file, whenever I want to log a line, I have to create a DetachedObjectGraph object from the pointer again,
and attach() it in order to get a reference to my fileLogger object:
val fileLogger = DetachedObjectGraph(myPointer).attach()
Now I can call a log function on the logger:
fileLogger.log("My log message")
This is what I've come up with looking at the APIs that are available (as of Kotlin 1.3.61) for concurrency in Kotlin Native,
but I'm left wondering what a better approach would be ( using Kotlin, not resorting to C ). Clearly it's bad to create a DetachedObjectGraph object for every line written.
One could pose this question in a more general way: How to keep a mutable resource open in a separate thread ( or worker ), and send messages to it.
Side comment: Having Coroutines that truly use threads would solve this problem, but the question is about how to solve this task with the APIs currently ( Kotlin 1.3.61 ) available.
You definitely shouldn't use DetachedObjectGraph in the way presented in the question. There's nothing to prevent you from trying to attach on multiple threads, or if you pass the same pointer, trying to attach to an invalid one after another thread as attached to it.
As Dominic mentioned, you can keep the DetachedObjectGraph in an AtomicReference. However, if you're going to keep DetachedObjectGraph in an AtomicReference, make sure the type is AtomicRef<DetachedObjectGraph?> and busy-loop while the DetachedObjectGraph is null. That will prevent the same DetachedObjectGraph from being used by multiple threads. Make sure to set it to null, and repopulate it, in an atomic way.
However, does FileLogger need to be mutable at all? If you're writing to a file, it doesn't seem so. Even if so, I'd isolate the mutable object to a separate worker and send log messages to it rather than doing a DetachedObjectGraph inside an AtomicRef.
In my experience, DetachedObjectGraph is super uncommon in production code. We don't use it anywhere at the moment.
To isolate mutable state to a Worker, something like this:
class MutableThing<T:Any>(private val worker:Worker = Worker.start(), producer:()->T){
private val arStable = AtomicReference<StableRef<T>?>(null)
init {
worker.execute(TransferMode.SAFE, {Pair(arStable, producer).freeze()}){
it.first.value = StableRef.create(it.second()).freeze()
}
}
fun <R> access(block:(T)->R):R{
return worker.execute(TransferMode.SAFE, {Pair(arStable, block).freeze()}){
it.second(it.first.value!!.get())
}.result
}
}
object Log{
private val fileLogger = MutableThing { FileLogger() }
fun log(s:String){
fileLogger.access { fl -> fl.log(s) }
}
}
class FileLogger{
fun log(s:String){}
}
The MutableThing uses StableRef internally. producer makes the mutable state you want to isolate. To log something, call Log.log, which will wind up calling the mutable FileLogger.
To see a basic example of MutableThing, run the following test:
#Test
fun goIso(){
val mt = MutableThing { mutableListOf("a", "b")}
val workers = Array(4){Worker.start()}
val futures = mutableListOf<Future<*>>()
repeat(1000) { rcount ->
val future = workers[rcount % workers.size].execute(
TransferMode.SAFE,
{ Pair(mt, rcount).freeze() }
) { pair ->
pair.first.access {
val element = "ttt ${pair.second}"
println(element)
it.add(element)
}
}
futures.add(future)
}
futures.forEach { it.result }
workers.forEach { it.requestTermination() }
mt.access {
println("size: ${it.size}")
}
}
The approach you've taken is pretty much correct and the way it's supposed to be done.
The thing I would add is, instead of passing around a pointer around. You should pass around a frozen FileLogger, which will internally hold a reference to a AtomicRef<DetachedObjectGraph>, the the attaching and detaching should be done internally. Especially since DetachedObjectGraphs are invalid once attached.

Hyperledger Fabric Go SDK: How to parse blocks

I'm using the Hyperledger Golang SDK for implementing a client to work with the ledger. My application relies on events being sent, however, I want to use BlockEvents so that I can be sure that the given data is written to the ledger already instead of chaincode events. Unfortunately, the documentation on these type of events is very limited. I registered for BlockEvents using func (c *Client) RegisterBlockEvent()... and get BlockEvent responses with a Block struct referenced in each of them. The block struct looks like this:
type Block struct {
Header *BlockHeader `protobuf:"bytes,1,opt,name=header,proto3" json:"header,omitempty"`
Data *BlockData `protobuf:"bytes,2,opt,name=data,proto3" json:"data,omitempty"`
Metadata *BlockMetadata `protobuf:"bytes,3,opt,name=metadata,proto3" json:"metadata,omitempty"`
XXX_NoUnkeyedLiteral struct{} `json:"-"`
XXX_unrecognized []byte `json:"-"`
XXX_sizecache int32 `json:"-"`
}
I can navigate to BlockData:
type BlockData struct {
Data [][]byte `protobuf:"bytes,1,rep,name=data,proto3" json:"data,omitempty"`
XXX_NoUnkeyedLiteral struct{} `json:"-"`
XXX_unrecognized []byte `json:"-"`
XXX_sizecache int32 `json:"-"`
}
However, at this point I am lost, having only a raw array of byte-arrays as data. I want to upon a specific asset creation event and need to parse the block data to search for the data. What struct or structure is used for this data? I assume every array entry represents a transaction, but without a struct to map onto it, parsing is extremely difficult.
write a function ParseBlock with protolator
// import "github.com/hyperledger/fabric-sdk-go/pkg/util/protolator"
func ParseBlock(block *common.Block) {
if err := protolator.DeepMarshalJSON(os.Stdout, block); err != nil {
log.Fatalln("DeepMarshalJSON err:", err)
}
}

Dynamically-Allocated Implementation-Class std::async-ing its Member

Consider an operation with a standard asynchronous interface:
std::future<void> op();
Internally, op needs to perform a (variable) number of asynchronous operations to complete; the number of these operations is finite but unbounded, and depends on the results of the previous asynchronous operations.
Here's a (bad) attempt:
/* An object of this class will store the shared execution state in the members;
* the asynchronous op is its member. */
class shared
{
private:
// shared state
private:
// Actually does some operation (asynchronously).
void do_op()
{
...
// Might need to launch more ops.
if(...)
launch_next_ops();
}
public:
// Launches next ops
void launch_next_ops()
{
...
std::async(&shared::do_op, this);
}
}
std::future<void> op()
{
shared s;
s.launch_next_ops();
// Return some future of s used for the entire operation.
...
// s destructed - delayed BOOM!
};
The problem, of course, is that s goes out of scope, so later methods will not work.
To amend this, here are the changes:
class shared : public std::enable_shared_from_this<shared>
{
private:
/* The member now takes a shared pointer to itself; hopefully
* this will keep it alive. */
void do_op(std::shared_ptr<shared> p); // [*]
void launch_next_ops()
{
...
std::async(&shared::do_op, this, shared_from_this());
}
}
std::future<void> op()
{
std::shared_ptr<shared> s{new shared{}};
s->launch_next_ops();
...
};
(Asides from the weirdness of an object calling its method with a shared pointer to itself, )the problem is with the line marked [*]. The compiler (correctly) warns that it's an unused variable.
Of course, it's possible to fool it somehow, but is this an indication of a fundamental problem? Is there any chance the compiler will optimize away the argument and leave the method with a dead object? Is there a better alternative to this entire scheme? I don't find the resulting code the most intuitive.
No, the compiler will not optimize away the argument. Indeed, that's irrelevant as the lifetime extension comes from shared_from_this() being bound by decay-copy ([thread.decaycopy]) into the result of the call to std::async ([futures.async]/3).
If you want to avoid the warning of an unused argument, just leave it unnamed; compilers that warn on unused arguments will not warn on unused unnamed arguments.
An alternative is to make do_op static, meaning that you have to use its shared_ptr argument; this also addresses the duplication between this and shared_from_this. Since this is fairly cumbersome, you might want to use a lambda to convert shared_from_this to a this pointer:
std::async([](std::shared_ptr<shared> const& self){ self->do_op(); }, shared_from_this());
If you can use C++14 init-captures this becomes even simpler:
std::async([self = shared_from_this()]{ self->do_op(); });

Implementing "move" thread semantics

I want to write a function to be called like this:
send("message","address");
Where some other thread that is doing
let k = recv("address");
println!("{}",k);
sees message.
In particular, the message may be large, and so I'd like "move" or "zero-copy" semantics for sending the message.
In C, the solution is something like:
Allocate messages on the heap
Have a global, threadsafe hashmap that maps "address" to some memory location
Write pointers into the memory location on send, and wake up the receiver using a semaphore
Read pointers out of the memory location on receive, and wait on a semaphore to process new messages
But according to another SO question, step #2 "sounds like a bad idea". So I'd like to see a more Rust-idiomatic way to approach this problem.
You get these sort of move semantics automatically, and get achieve light-weight moves by placing large values into a Box (i.e. allocate them on the heap). Using type ConcurrentHashMap<K, V> = Mutex<HashMap<K, V>>; as the threadsafe hashmap (there's various ways this could be improved), one might have:
use std::collections::{HashMap, RingBuf};
use std::sync::Mutex;
type ConcurrentHashMap<K, V> = Mutex<HashMap<K, V>>;
lazy_static! {
pub static ref MAP: ConcurrentHashMap<String, RingBuf<String>> = {
Mutex::new(HashMap::new())
}
}
fn send(message: String, address: String) {
MAP.lock()
// find the place this message goes
.entry(address)
.get()
// create a new RingBuf if this address was empty
.unwrap_or_else(|v| v.insert(RingBuf::new()))
// add the message on the back
.push_back(message)
}
fn recv(address: &str) -> Option<String> {
MAP.lock()
.get_mut(address)
// pull the message off the front
.and_then(|buf| buf.pop_front())
}
That code is using the lazy_static! macro to achieve a global hashmap (it may be better to use a local object that wraps an Arc<ConcurrentHashMap<...>, fwiw, since global state can make reasoning about program behaviour hard). It also uses RingBuf as a queue, so that messages bank up for a given address. If you only wish to support one message at a time, the type could be ConcurrentHashMap<String, String>, send could become MAP.lock().insert(address, message) and recv just MAP.lock().remove(address).
(NB. I haven't compiled this, so the types may not match up precisely.)

ConcurrentModificationException with WeakHashMap

I have the code below but I'm getting ConcurrentModificationException, how should I avoid this issue? (I have to use WeakHashMap for some reason)
WeakHashMap<String, Object> data = new WeakHashMap<String, Object>();
// some initialization code for data
for (String key : data.keySet()) {
if (data.get(key) != null && data.get(key).equals(value)) {
//do something to modify the key
}
}
The Javadoc for WeakHashMap class explains why this would happen:
Map invariants do not hold for this class. Because the garbage
collector may discard keys at any time, a WeakHashMap may behave as
though an unknown thread is silently removing entries
Furthermore, the iterator generated under the hood by the enhanced for-loop you're using is of fail-fast type as per quoted explanation in that javadoc.
The iterators returned by the iterator method of the collections
returned by all of this class's "collection view methods" are
fail-fast: if the map is structurally modified at any time after the
iterator is created, in any way except through the iterator's own
remove method, the iterator will throw a
ConcurrentModificationException. Thus, in the face of concurrent
modification, the iterator fails quickly and cleanly, rather than
risking arbitrary, non-deterministic behavior at an undetermined time
in the future.
Therefore your loop can throw this exception for these reasons:
Garbage collector has removed an object in the keyset.
Something outside the code added an object to that map.
A modification occurred inside the loop.
As your intent appears to be processing the objects that are not GC'd yet, I would suggest using an iterator as follows:
Iterator<String> it = data.keySet().iterator();
int count = 0;
int maxTries = 3;
while(true) {
try {
while (it.hasNext()) {
String str = it.next();
// do something
}
break;
} catch (ConcurrentModificationException e) {
it = data.keySet().iterator(); // get a new iterator
if (++count == maxTries) throw e;
}
}
You can clone the key set first, but note that you hold the strong reference after that:
Set<KeyType> keys;
while(true) {
try {
keys = new HashSet<>(weakHashMap.keySet());
break;
} catch (ConcurrentModificationException ignore) {
}
}
for (KeyType key : keys) {
// ...
}
WeakHashMap's entries are automatically removed when no ordinary use of the key is realized anymore, this may happens in a different thread. While cloning the keySet() into a different Set a concurrent Thread may remove entries meanwhile, in this case a ConcurrentModificationException will 100% be thrown! You must synchronize the cloning.
Example:
Collections.synchronizedMap(data);
Please understand that
Collections.synchronizedSet(data.keySet());
Can not be used because data.keySet() rely on data's instance who is not synchronized here! More detail: synchronize(keySet) prevents the execution of methods on the keySet but keySet's remove-method is never called but WeakHashMap's remove-method is called so you have to synchronize over WeakHashMap!
Probably because your // do something in the iteration is actually modifying the underlying collection.
From ConcurrentModificationException:
For example, if a thread modifies a collection directly while it is iterating over the collection with a fail-fast iterator, the iterator will throw this exception.
And from (Weak)HashMap's keySet():
Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined.

Resources