Scala Iterator for multithreading

Scala Iterator for multithreading - multithreading

I am using scala Iterator for waiting loop in synchronized block:
anObject.synchronized {
if (Try(anObject.foo()).isFailure) {
Iterator.continually {
anObject.wait()
Try(anObject.foo())
}.dropWhile(_.isFailure).next()
}
anObject.notifyAll()
}
Is it acceptable to use Iterator with concurrency and multithreading? If not, why? And then what to use and how?
There are some details, if it matters. anObject is a mutable queue. And there are multiple producers and consumers to the queue. So the block above is a code of such producer or consumer. anObject.foo is a common simplified declaration of function that either enqueue (for producer) or dequeue (for consumer) data to/from the queue.

Iterator is mutable internally, so you have to take that into consideration if you use it in multi-threaded environment. If you guaranteed that you won't end up in situation when e.g.
2 threads check hasNext()
one of them calls next() - it happens to be the last element
the other calls next() - NPE
(or similar) then you should be ok. In your example Iterator doesn't even leave the scope, so the errors shouldn't come from Iterator.
However, in your code I see the issue with having aObject.wait() and aObject.notifyAll() next to each other - if you call .wait then you won't reach .notifyAll which would unblock it. You can check in REPL that this hangs:
# val anObject = new Object { def foo() = throw new Exception }
anObject: {def foo(): Nothing} = ammonite.$sess.cmd21$$anon$1#126ae0ca
# anObject.synchronized {
if (Try(anObject.foo()).isFailure) {
Iterator.continually {
anObject.wait()
Try(anObject.foo())
}.dropWhile(_.isFailure).next()
}
anObject.notifyAll()
}
// wait indefinitelly
I would suggest changing the design to NOT rely on wait and notifyAll. However, from your code it is hard to say what you want to achieve so I cannot tell if this is more like Promise-Future case, monix.Observable, monix.Task or something else.
If your use case is a queue, produces and consumers, then it sound like a use case for reactive streams - e.g. FS2 + Monix, but it could be FS2+IO or something from Akka Streams
val queue: Queue[Task, Item] // depending on use case queue might need to be bounded
// in one part of the application
queue.enqueu1(item) // Task[Unit]
// in other part of the application
queue
.dequeue
.evalMap { item =>
// ...
result: Task[Result]
}
.compile
.drain
This approach would require some change in thinking about designing an application, because you would no longer work on thread directly, but rather designed a flow data and declaring what is sequential and what can be done in parallel, where threads become just an implementation detail.

Related

How to leverage concurrency for processing large array of data in rust?

I am a javascript developer and I am building a desktop application using Tauri. I am used to single-threaded languages and trying to get comfortable with the concept of concurrency.
My problem can be summarized as follows. I receive a JSON array from the backend of length 500. I loop through this array and perform some asynchronous operations like making a network call. In the end, I aggregate, structure, and return the data. This entire process takes around 25-35 seconds on my machine.
I wanted to leverage the concept of concurrency to reduce the time required for this operation. One possible solution that I thought of, was to create n number of threads, let's say 8, and parallelly process the data.
async fun main(){
// creating variables which will hold final structured data
let app_data;
let app_to_bundle_map;
// create 8 threads
let thread_1 = thread::spawn(|| {})
let thread_2 = thread::spawn(|| {})
//.. so on
for i in 1..500{
let thread_assigned = i % 8
match thread_assigned {
1 => {
// process_data() on thread 1 and insert into app_data & app_to_bundle_map.
// But how do I assign process_data() to the thread's closure?
// How do I make sure thread 1 is available for use?
}
2 => {
// process_data() on thread 2 and insert into app_data & app_to_bundle_map
}
_ => {
// process_data() on thread _ and insert into app_data & app_to_bundle_map
}
}
}
}
fun process_data(item: String, data: &Data){
// perform some heavy operations like
make_network_call(item.url)
// perform more operations and modify the function argument
more_processing()
}
async fun make_a_network_call(url: String) -> String{
let client = reqwest::Client::builder().build().unwrap();
let mut _res: Result<Response, Error> = client.get(url).send().await;
match _res{
OK(_res) => {
// return response
}
Err(_res) => {
format!(r#"{{"error": "{placeholder}"}}"#, placeholder = _res.to_string())
}
}
}
Another option I thought of was dividing my data of size 500 into 8 parts and then processing them parallelly. Is this a better approach? Or are both the approaches wrong, if so, what do you suggest is the correct method to solve such problems in rust? Overall, my final goal is to reduce the time from 25-35 seconds to less than 10 seconds. Looking forward to everybody's insights. Thank you in advance.

Concurrency is hard [citation needed]. What you are trying to do here is manually handling it. That of course could work, but it will be a pain, especially if you are a beginner. Luckily there are fantastic libraries already out there that provide handling concurrency part for you. You should looking if they already provide what you need.From your description I am not quite sure if you are CPU or IO bound.
If you are CPU bound you should look at rayon crate. It allows to easily iterate in parallel over some iterator.
If you are IO bound you should look at async rust. There are many libraries, that allow to do many things, but I would recommend tokio to begin with. It is production-ready and has great emphasis put on networking. You would need however to learn a bit about async rust, as it requires different thinking model than normal synchronous code.
And regardless of which one you choose you should familiarize yourself with channels. They are a great and easy tool for passing data around, including from one thread to another.

Shared future setting different values with get

I wanted to know if I could do something like this with shared_futures.
Essentially I have two threads that receive a reference to a promise.
Incase any of the thread returns an output by setting a value in the promise I would like to process that output and return back to listening for another assignment to a promise from the remaining thread. Can I do something like this.
void tA(std::promise<string>& p )
{
....
std::string r = "Hello from thread A";
p.set_value(std::move(r));
}
void tB(std::promise<string>& p )
{
...
std::string r = "Hello from thread A";
p.set_value(std::move(r));
}
int main() {
std::promise<std::string> inputpromise;
std::shared_future<std::string> inputfuture(inputpromise.get_future());
//start the thread A
std::thread t(std::bind(&tA,std::ref(inputpromise));
//start the thread B
std::thread t(std::bind(&tA,std::ref(inputpromise));
std::future<std::string> f(p.get_future());
std::string response = f.get(); ------> Will this unblock when one thread sets a value to the promise and can i go back listening for more assignments on the promise ?
if(response=="b")
response = f.get(); -->listen for the assignment from the remaining thread
}

You cannot call promise::set_value (or any equivalent function like set_exception) more than once. Promises are not intended to be used in this way, shared across threads. You have one thread which owns the promise, and one or more locations that can tell if the promise has been satisfied, and if so retrieve the value.
A promise is not the right tool for doing what you want. A future/promise is really a special case of a more general tool: a concurrent queue. In a true concurrent queue, generating threads push values into the queue. Receiving threads can extract values from the queue. A future/promise is essentially a single-element queue.
You need a general concurrent queue, not a single-element queue. Unfortunately, the standard library doesn't have one.

How to freeze a thread and notify it from another?

I need to pause the current thread in Rust and notify it from another thread. In Java I would write:
synchronized(myThread) {
myThread.wait();
}
and from the second thread (to resume main thread):
synchronized(myThread){
myThread.notify();
}
Is is possible to do the same in Rust?

Using a channel that sends type () is probably easiest:
use std::sync::mpsc::channel;
use std::thread;
let (tx,rx) = channel();
// Spawn your worker thread, giving it `send` and whatever else it needs
thread::spawn(move|| {
// Do whatever
tx.send(()).expect("Could not send signal on channel.");
// Continue
});
// Do whatever
rx.recv().expect("Could not receive from channel.");
// Continue working
The () type is because it's effectively zero-information, which means it's pretty clear you're only using it as a signal. The fact that it's size zero means it's also potentially faster in some scenarios (but realistically probably not any faster than a normal machine word write).
If you just need to notify the program that a thread is done, you can grab its join guard and wait for it to join.
let guard = thread::spawn( ... ); // This will automatically join when finished computing
guard.join().expect("Could not join thread");

You can use std::thread::park() and std::thread::Thread::unpark() to achieve this.
In the thread you want to wait,
fn worker_thread() {
std::thread::park();
}
in the controlling thread, which has a thread handle already,
fn main_thread(worker_thread: std::thread::Thread) {
worker_thread.unpark();
}
Note that the parking thread can wake up spuriously, which means the thread can sometimes wake up without the any other threads calling unpark on it. You should prepare for this situation in your code, or use something like std::sync::mpsc::channel that is suggested in the accepted answer.

There are multiple ways to achieve this in Rust.
The underlying model in Java is that each object contains both a mutex and a condition variable, if I remember correctly. So using a mutex and condition variable would work...
... however, I would personally switch to using a channel instead:
the "waiting" thread has the receiving end of the channel, and waits for it
the "notifying" thread has the sending end of the channel, and sends a message
It is easier to manipulate than a condition variable, notably because there is no risk to accidentally use a different mutex when locking the variable.
The std::sync::mpsc has two channels (asynchronous and synchronous) depending on your needs. Here, the asynchronous one matches more closely: std::sync::mpsc::channel.

There is a monitor crate that provides this functionality by combining Mutex with Condvar in a convenience structure.
(Full disclosure: I am the author.)
Briefly, it can be used like this:
let mon = Arc::new(Monitor::new(false));
{
let mon = mon.clone();
let _ = thread::spawn(move || {
thread::sleep(Duration::from_millis(1000));
mon.with_lock(|mut done| { // done is a monitor::MonitorGuard<bool>
*done = true;
done.notify_one();
});
});
}
mon.with_lock(|mut done| {
while !*done {
done.wait();
}
println!("finished waiting");
});
Here, mon.with_lock(...) is semantically equivalent to Java's synchronized(mon) {...}.

thread synchronization: making sure function gets called in order

I'm writing a program in which I need to make sure a particular function is called is not being executed in more than one thread at a time.
Here I've written some simplified pseudocode that does exactly what is done in my real program.
mutex _enqueue_mutex;
mutex _action_mutex;
queue _queue;
bool _executing_queue;
// called in multiple threads, possibly simultaneously
do_action() {
_enqueue_mutex.lock()
object o;
_queue.enqueue(o);
_enqueue_mutex.unlock();
execute_queue();
}
execute_queue() {
if (!executing_queue) {
_executing_queue = true;
enqueue_mutex.lock();
bool is_empty = _queue.isEmpty();
_enqueue_mutex.lock();
while (!is_empty) {
_action_mutex.lock();
_enqueue_mutex.lock();
object o = _queue.dequeue();
is_empty = _queue.isEmpty();
_enqueue_mutex.unlock();
// callback is called when "o" is done being used by "do_stuff_to_object_with_callback" also, this function doesn't block, it is executed on its own thread (hence the need for the callback to know when it's done)
do_stuff_to_object_with_callback(o, &some_callback);
}
_executing_queue = false;
}
}
some_callback() {
_action_mutex.unlock();
}
Essentially, the idea is that _action_mutex is locked in the while loop (I should say that lock is assumed to be blocking until it can be locked again), and expected to be unlocked when the completion callback is called (some_callback in the above code).
This, does not seem to be working though. What happens is if the do_action is called more than once at the same time, the program locks up. I think it might be related to the while loop executing more than once simultaneously, but I just cant see how that could be the case. Is there something wrong with my approach? Is there a better approach?
Thanks

A queue that is not specifically designed to be multithreaded (multi-producer multi-consumer) will need to serialize both eneueue and dequeue operations using the same mutex.
(If your queue implementation has a different assumption, please state it in your question.)
The check for _queue.isEmpty() will also need to be protected, if the dequeue operation is prone to the Time of check to time of use problem.
That is, the line
object o = _queue.dequeue();
needs to be surrounded by _enqueue_mutex.lock(); and _enqueue_mutex.unlock(); as well.

You probably only need a single mutex for the queue. Also once you've dequeued the object, you can probably process it outside of the lock. This will prevent calls to do_action() from hanging too long.
mutex moo;
queue qoo;
bool keepRunning = true;
do_action():
{
moo.lock();
qoo.enqueue(something);
moo.unlock(); // really need try-finally to make sure,
// but don't know which language we are using
}
process_queue():
{
while(keepRunning)
{
moo.lock()
if(!qoo.isEmpty)
object o = qoo.dequeue();
moo.unlock(); // again, try finally needed
haveFunWith(o);
sleep(50);
}
}
Then Call process_queue() on it's own thread.

Question about thread synchronisation

i have a question about thread situation.
Suppose i have 3 threads :producer,helper and consumer.
the producer thread is in running state(and other two are in waiting state)and when its done it calls invoke,but the problem it has to invoke only helper thread not consumer,then how it can make sure that after it releases resources are to be fetched by helper thread only and then by consumer thread.
thanks in advance

Or have you considered, sometimes having separate threads is more of a problem than a solution?
If you really want the operations in one thread to be strictly serialized with the operations in another thread, perhaps the simpler solution is to discard the second thread and structure the code so the first thread does the operations in the order desired.
This may not always be possible, but it's something to bear in mind.

You could have, for instance, two mutexes (or whatever you are using): one for producer and helper, and other for producer and consumer
Producer:
//lock helper
while true
{
//lock consumer
//do stuff
//release and invoke helper
//wait for helper to release
//lock helper again
//unlock consumer
//wait consumer
}
The others just lock and unlock normally.
Another possible approach (maybe better) is using a mutex for producer / helper, and other helper / consumer; or maybe distribute this helper thread tasks between the other two threads. Could you give more details?

The helper thread is really just a consumer/producer thread itself. Write some code for the helper like you would for any other consumer to take the result of the producer. Once that's complete write some code for the helper like you would for any other producer and hook it up to your consumer thread.

You might be able to use queues to help you with this with locks around them.
Producer works on something, produces it, and puts it on the helper queue.
Helper takes it, does something with it, and then puts it on the consumer queue.
Consumer take its, consumes it, and goes on.
Something like this:
Queue<MyDataType> helperQ, consumerQ;
object hqLock = new object();
object cqLock = new object();
// producer thread
private void ProducerThreadFunc()
{
while(true)
{
MyDataType data = ProduceNewData();
lock(hqLock)
{
helperQ.Enqueue(data);
}
}
}
// helper thread
private void HelperThreadFunc()
{
while(true)
{
MyDataType data;
lock(hqLock)
{
data = helperQ.Dequeue();
}
data = HelpData(data);
lock(cqLock)
{
consumerQ.Enqueue(data);
}
}
}
// consumer thread
private void ConsumerThreadFunc()
{
while(true)
{
MyDataType data;
lock(cqLock)
{
data = consumerQ.Dequeue();
}
Consume(data);
}
}
NOTE: You will need to add more logic to this example to make sure usable. Don't expect it to work as-is. Mainly, use signals for one thread to let the other know that data is available in its queue (or as a worst case poll the size of the queue to make sure it is greater than 0 , if it is 0, then sleep -- but the signals are cleaner and more efficient).
This approach would let you process data at different rates (which can lead to memory issues).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Scala Iterator for multithreading - multithreading

Related

How to leverage concurrency for processing large array of data in rust?

Shared future setting different values with get

How to freeze a thread and notify it from another?

thread synchronization: making sure function gets called in order

Question about thread synchronisation

Categories

Resources