I'm successfully using a mpsc::channel() to send messages from a producer thread to a consumer.
The consumer is only ever interested in the latest message. (It uses the message from the previous check if there is no new message.)
In consequence, I'm running the consumer's try_recv() in a loop until it fails to get a new message, and then using the last received message, or the old one if no new messages were found.
Memory is being wasted storing old messages which the consumer will throw away.
How would I build a one-element variant of mpsc::channel()?
(I've considered using sync::Mutex<Option<MyMessage>> but it is critical that the consuming thread blocks for as little time as possible. Also, I want ownership to pass from the producer to the consumer.)
You can do it with an AtomicPtr, whose compare_exchange method should compile to a simple cmpxchg instruction, allowing you to store either std::ptr::null or an actual message.
There's quite a few possibilities, with various trade-offs.
I'd recommend the arc-swap crate (see below) for a safe and fast interface, and the DIY Double Buffering approach if performance is that critical.
std::mpsc
There's a second option for std::mpsc: the sync_channel function creates a bounded channel, where the sender blocks when the channel is full, until the receiver picks off a message.
I do not think that it is ideal for your usecase.
Tokio Watch channel
The Tokio ecosystem has the watch channel designed for the purpose of propagating configuration changes.
Unfortunately it is designed for multiple consumers, so the consumers borrow the messages: there is no transfer of ownership.
Arc Swap
I believe the arc-swap crate may be closer to what you need. As the name implies, it provides the moral equivalent of an Atomic<Arc<T>>.
You can use the ArcSwapOption<T> to have the equivalent of an Atomic<Option<Arc<T>>>, and the consumer can simply perform a let new = atomic.swap(None); then check if new is None (nothing new) or Some(Arc<T>) in which case it received an updated configuration.
Do be mindful of the cost of the dropping the previous Arc<T> when swapping a new one in: free is typically more expensive than malloc.
Back to std
You could use an AtomicPtr<T>. It'll require you to use unsafe, and would be a smidgen faster than ArcSwap by virtue of avoiding the reference counting.
It would suffer from the same drop issue, though.
DIY Double Buffering
You could also simply Do It Yourself. A simple double-buffering storage would work.
By storing a plain Option<T>, you avoid the additional extra allocation (and thus extra de-allocation), at the cost of making the check itself slower -- as you may now need to check both buffers. It may be possible to check a single buffer, not clear.
Related
I am creating a webserver using tokio. Whenever a client connection comes in, a green thread is created via tokio::spawn.
The main function of my web server is proxy. Target server information for proxy is stored as a global variable, and for proxy, all tasks must access the data. Since there are multiple target servers, they must be selected by round robin. So the global variable (struct) must have information of the recently selected server(by index).
Concurrency problems occur because shared information can be read/written by multiple tasks at the same time.
According to the docs, there seems to be a way to use Mutex and Arc or a way to use channel to solve this.
I'm curious which one you usually prefer, or if there is another way to solve the problem.
If it's shared data, you generally do want Arc, or you can leak a box to get a 'static reference (assuming that the data is going to exist until the program exits), or you can use a global variable (though global variables tends to impede testability and should generally be considered an anti-pattern).
As far as what goes in the Arc/Box/global, that depends on what your data's access pattern will be. If you will often read but rarely write, then Tokio's RwLock is probably what you want; if you're going to be updating the data every time you read it, then use Tokio's Mutex instead.
Channels make the most sense when you have separate parts of the program with separate responsibilities. It doesn't work as well to update multiple workers with the same changes to data, because then you get into message ordering problems that can result in each worker's state disagreeing about something. (You get many of the problems of a distributed system without any of the benefits.)
Channels can work if there is a single entity responsible for maintaining the data, but at that point there isn't much benefit over using some kind of mutual exclusion mechanism; it winds up being the same thing with extra steps.
Let's imagine I want to design a system similar to crowdfunding or to the auction. There is a fixed period of time for which such an event is running. Can I start a background thread that will periodically check whether the end time if the event has been reached and subsequently closes that event? I was looking into the futures crate (and some others) but is it usable within the Substrate? Is there any best practice on how to handle such scenarios?
I believe the answer to futures is no. Here's more explanation:
I think it is better to think about what programming primitives are available inside a Substrate runtime, instead of trying to use a concept from general purpose programming (future) and try and re-purpose it for the Substrate runtime (top-down vs. bottom-up viewpoint).
So, let's think about the lifecycle of a runtime and see what makes sense there:
Inside a runtime, you are kinda stuck in a box. A (wasm) runtime code is spawned and executed by the (always native) client whenever a new block is there to be imported (or authored, but let's assume just importing for now), and killed and set aside afterwards (at least from the perspective of the runtime -- the client has runtime caching). My point being, anything that you don't commit to state (i.e. write in storage) at the end of the execution of each block is lost. This includes all the local variables, stack, heap, and anything else. So even if you were to use a future to spawn a task, that doesn't really fit into the programming model of Substrate runtimes, because even if that future lived in the runtime, as soon as the block is done, the wasm instance is dead and so is the future.
That is all ignoring the fact that you can only use crates that support no_std in the runtime, so not every async library will be available anyhow.
The main solution, as I hinted, is probably something that uses state storage to record the starting point of the auction, so that x blocks later you can still know when you started it, and if some threshold is passed, then you can finish your auction. You could use either a timestamp or a number of blocks for your duration of auction. Something along the lines of:
trait Config: frame_system::Config {
// duration in time or block number
type AuctionDuration<T::BlockNumber>;
}
// inside your on_initialize
fn on_initialize(n: T::BlockNumber) {
if n % T::AuctionDuration::get() == 0 {
// ^^^^^ note: ensure this is non-zero, else panic in runtime might happen.
// time to close the auction.
}
}
I have an application with Many Producers and consumers.
From my understanding, RingBuffer creates objects at start of RingBuffer init and you then copy object when you publish in Ring and get them from it in EventHandler.
My application LogHandler buffers received events in a List to send it in Batch mode further once the list has reached a certain size. So EventHandler#onEvent puts the received object in the list , once it has reached the size , it sends it in RMI to a server and clears it.
My question, is do I need to clone the object before I put in list, as I understand, once consumed they can be reused ?
Do I need to synchronize access to the list in my EventHandler#onEvent ?
Yes - your understanding is correct. You copy your values in and out of the ringbuffer slots.
I would suggest that yes you clone the values as you extract it from the ring buffer and into your event handler list; otherwise the slot can be reused.
You should not need to synchronise access to the list as long as it is a private member variable of your Event Handler and you only have one event handler instance per thread. If you have multiple event handlers adding to the same (eg static) List instance then you would need synchronisation.
Clarification:
Be sure to read the background in OzgurH's comments below. If you stick to using the endOfBatch flag on disruptor and use that to decide the size of your batch, you do not have to copy objects out of the list. If you are using your own accumulation strategy (such as size - as per the question), then you should clone objects out as the slot could be reused before you have had the chance to send.
Also worth noting that if you are needing to synchronize on the list instance, then you have missed a big opportunity with disruptor and will destroy your performance anyway.
It is possible to use slots in the Disruptor's RingBuffer (including ones containing a List) without cloning/copying values. This may be a preferable solution for you depending on whether you are worried about garbage creation, and whether you actually need to be concerned about concurrent updates to the objects being placed in the RingBuffer. If all the objects being placed in the slot's list are immutable, or if they are only being updated/read by a single thread at a time (a precondition which the Disruptor is often used to enforce), there will be nothing gained from cloning them as they are already immune to data races.
On the subject of batching, note that the Disruptor framework itself provides a mechanism for taking items from the RingBuffer in batches in your EventHandler threads. This is approach is fully thread-safe and lock-free, and could yield better performance by making your memory access patterns more predictable to the CPU.
I am using shared variables on perl with use threads::shared.
That variables can we modified only from single thread, all other threads are only 'reading' that variables.
Is it required in the 'reading' threads to lock
{
lock $shared_var;
if ($shared_var > 0) .... ;
}
?
isn't it safe to simple verification without locking (in the 'reading' thread!), like
if ($shared_var > 0) ....
?
Locking is not required to maintain internal integrity when setting or fetching a scalar.
Whether it's needed or not in your particular case depends on the needs of the reader, the other readers and the writers. It rarely makes sense not to lock, but you haven't provided enough details for us to determine what your needs are.
For example, it might not be acceptable to use an old value after the writer has updated the shared variable. For starters, this can lead to a situation where one thread is still using the old value while the another thread is using the new value, a situation that can be undesirable if those two threads interact.
It depends on whether it's meaningful to test the condition just at some point in time or other. The problem however is that in a vast majority of cases, that Boolean test means other things, which might have already changed by the time you're done reading the condition that says it represents a previous state.
Think about it. If it's an insignificant test, then it means little--and you have to question why you are making it. If it's a significant test, then it is telltale of a coherent state that may or may not exist anymore--you won't know for sure, unless you lock it.
A lot of times, say in real-time reporting, you don't really care which snapshot the database hands you, you just want a relatively current one. But, as part of its transaction logic, it keeps a complete picture of how things are prior to a commit. I don't think you're likely to find this in code, where the current state is the current state--and even a state of being in a provisional state is a definite state.
I guess one of the times this can be different is a cyclical access of a queue. If one consumer doesn't get the head record this time around, then one of them will the next time around. You can probably save some processing time, asynchronously accessing the queue counter. But here's a case where it means little in context of just one iteration.
In the case above, you would just want to put some locked-level instructions afterward that expected that the queue might actually be empty even if your test suggested it had data. So, if it is just a preliminary test, you would have to have logic that treated the test as unreliable as it actually is.
I was trying to find some resources for best performance and scaling with message passing. I heard that message passing by value instead of reference can be better scalability as it works well with NUMA style setups and reduced contention for a given memory address.
I would assume value based message passing only works with "smaller" messages. What would "smaller" be defined as? At what point would references be better? Would one do stream processing this way?
I'm looking for some helpful tips or resources for these kinds of questions.
Thanks :-)
P.S. I work in C#, but I don't think that matters so much for these kind of design questions.
Some factors to add to the excellent advice of Jeremy:
1) Passing by value only works efficiently for small messages. If the data has a [cache-line-size] unused area at the start to avoid false sharing, you are already approaching the size where passing by reference is more efficient.
2) Wider queues mean more space taken up by the queues, impacting memory use.
3) Copying data into/outof wide queue structures takes time. Apart from the actual CPU use while moving data, the queue remains locked during the copying. This increases contention on the queue and leading to an overall performance hit that is queue width dependent. If there is any deadlock-potential in your code, keeping locks for extended periods will not help matters.
4) Passing by value tends to lead to code that is specific to the data size, ie. is fixed at compile-time. Apart from a nasty infestation of templates, this makes it very difficult to tune buffer-sizes etc. at run-time.
5) If the messages are passed by reference and malloced/freed/newed/disposed/GC'd, this can lead to excessive contention on the memory-manager and frequent, wasteful GC. I usually use fixed pools of messages, allocated at startup, specifically to avoid this.
6) Handling byte-streams can be awkward when passing by reference. If a byte-stream is characterized by frequent delivery of single bytes, pass-by-reference is only sensible if the bytes are chunked-up. This can lead to the need for timeouts to ensure that partially-filled messages are dispatched to the next thread in a timely manner. This introduces complication and latency.
7) Pass-by-reference designs are inherently more likely to leak. This can lead to extended test times and overdosing on valgrind - a particularly painful addiction, (another reason I use fixed-size message object pools).
8) Complex messages, eg. those that contain references to other objects, can cause horrendous problems with ownership and lifetime-management if passed by value. Example - a server socket object has a reference to a buffer-list object that contains an array of buffer-instances of varying size, (real example from IOCP server). Try passing that by value..
9) Many OS calls cannot handle anything but a pointer. You cannot PostMessage, (that's a Windows API, for all you happy-feet), even a 256-byte structure by value with one call, (you have just the 2 wParam,lParam integers). Calls that set up asychronous callbacks often allow 'context data' to be sent to the callback - almost always just one pointer. Any app that is going to use such OS functionality is almost forced to resort to pass by reference.
Jeremy Friesner's comment seems to be the best as this is a new area, although Martin James's points are also good. I know Microsoft is looking into message passing for their future kernels as we gain more cores.
There seems to be a framework that deals with message passing and it claims to have much better performance than current .Net producer/consumer generics. I'm not sure how it will compare to .Net's Dataflow in 4.5
https://github.com/odeheurles/Disruptor-net