scala actors vs threads and blocking IO

scala actors vs threads and blocking IO - multithreading

As I understand it, actors are basically lightweight threads implemented on top of threads, running many actors on a small pool of shared threads.
Given that's the case, using blocking operations in an actor blocks the underlying thread. This is not a correctness problem because the actor library will spawn more threads as necessary (is that right?) but then you end up with lots and lots of threads, negating the benefit of using actors in the first place.
Given that, how do actors work when you need to do such IO operations? Are there operations which "actor-block", suspending the actor while letting the thread go on to other operations (much as blocking operations suspend the thread while letting the CPU go on to other operations), or is everything written in CPS, with chained actors? Or are actors simply not a good fit for this sort of long-running operation?
Background: I have experience writing multithreaded stuff the classic way, and understand prettywell how CPS/event loops work, but have absolutely no experience working with actors, and just want to understand, on a high level, how they fit in, before I dive into the code.

This is not a correctness problem because the actor library will spawn more threads as necessary (is that right?)
So far as I understand, that is not right. The actor is blocked, and sending another message to it causes that message to sit in the actors mailbox until that actor can receive it or react to the message.
In Programming in Scala (1), it explicitly states that actors should not block. If an actor needs to do something long running it should pass the work to a second actor, so that the main actor can free itself up and go read more messages from its mailbox. Once the worker has completed the work, it can signal that fact back to the main actor, which can finish doing whatever it has to do.
Since workers too have mailboxes, you will end up with several workers busily working their way through the work. If you don't have enough CPU to handle that, their queues will just get bigger and bigger. Eventually you can scale out by using remote actors. Akka might be more useful in such cases.
(1) Chapter 32.5 of Programming in Scala (Odersky, Second edition, 2010)
EDIT: I found this:
The scheduler method of the Actor trait can be overridden to return a ResizableThreadPoolScheduler, which resizes its thread pool to avoid starvation caused by actors that invoke arbitrary blocking methods.
Found it at: http://www.scala-lang.org/api/current/scala/actors/Actor.html
So, that means depending on the scheduler impl you set, perhaps the pool used to run the actors will be increased. I was wrong when I said you were wrong :-) The rest of the answer still holds true.

How thing work in erlang is that all blocking operation should be don by send message because when your actor is blocked waiting for a message it's yielding the thread to other actor.
So if you want to do some blocking operation like reading from a file you should do a FileReader actor that use Non-bloking api to read and write from file. And have your other actor use this actor (send and receive message to it) as an api to read and write to file.

Related

What really is asynchronous computing?

I've been reading (and working) quite a bit with massively multi-threaded applications, and with IO, and I've found that the term asynchronous has become some sort of catch-all for multiple vague ideas. I'm wondering if I understand it correctly. The way I see it is that there are two main branches of "asynchronicity".
Asynchronous I/O. Such as network read/write. What this really boils down to is efficient parallel processing between multiple CPUs, such as your main CPU and your NIC CPU. The idea is to have multiple processors running in parallel, exchanging data, without blocking waiting for the other to finish and return the results of it's job.
Minimizing context-switching penalties by minimizing use of threads. This seems to be what the .NET framework is focusing on with it's async/await features. Instead of spawning/closing/blocking threads, break parallel jobs into tasks, and use a software task scheduler to keep a pool of threads as busy as possible without resorting to spawning new threads.
These seem like two entirely separate concepts with no similarities that could tie them together, but are both referred to by the same "asynchronous computing" vocabulary.
Am I understanding all of this correctly?

Asynchronous basically means not blocking, i.e. not having to wait for an operation to complete.
Threads are just one way of accomplishing that. There are many ways of doing this, from hardware level, SO level, software level.
Someone with more experience than me can give examples of asyncronicity not related to threads.

What this really boils down to is efficient parallel processing between multiple CPUs, such as your main CPU and your NIC CPU. The idea is to have multiple processors running in parallel...
Asynchronous programming is not all about multi-core CPU's and parallelism: consider a single core CPU, with just one thread creating email messages and sends them. In a synchronous fashion, it would spend a few micro seconds to create the message, and a lot more time to send it through network, and only then create the next message. But in asynchronous program, the thread could create a new message while the previous one is being sent through the network. One implementation for that kind of program can be using .NET async/await feature, where you can have just one thread. But even a blocking IO program could be considered asynchronous: If the main thread creates the messages and queues them in a buffer, which another thread pulls them from and sends them in a blocking IO way. From the main thread's point of view - it's completely async.
.NET async/await just uses the OS api's which are already async - reading /writing a file, send /receive data through network, they are all async anyway - the OS doesn't block on them (the drivers themselves are async).

Asynchronous is a general term, which does not have widely accepted meaning. Different domains have different meanings to it.
For instance, async IO means that instead of blocking on IO call, something else happens. Something else can be really different things, but it usually involves some sort of notification of call completion. Details might differ. For instance, a notification might be built into the call itself - like in MS Completeion Ports (if memory serves). Or, it can be something verify do before you make a call so that the call can not block - this is what poll() and friends do.
Async might also well mean simply parallel execution. For instance, one might say that 'database is updated asynchronously' meaning that there is a dedicated thread which handles database connectivity, and that thread does not slow down the main processing thread.

multithreading and user interface

Ok, here we go.
recently I got fond of HCI topics on interface design.
I found out that there could be some way to implement a multithread interface in case for reducing the delay of system respons.
Morover. this may also be possible to say that designing a user interface has tight relationship with STD.
therefore, I wonder if there is any method or techniques to find independant part of ,say,a given STD of a UI that can be seen as threads?

A multi threaded interface in most cases is not fundamentally different to it's single threaded counterpart. There is still a single thread listening on interface events and it will still run handlers as events happen. However the difference comes down to what is contained in these handlers. A simple single threaded event loop would look as below:
A multi-threaded UI is a little different but the principal is the same:
Effectively long processes which are initiated in worker threads which can then report back to them main UI thread so it can report completion.
In relation to a State Transition Diagram, multi-threading complicates things somewhat however there a number of ways to still accomplish this. The first is to simply map each (potential) thread's path separately, this requires decisions for if any threads are finished at the points the main thread checks. It is also possible to use a thread state transition diagram which can demonstrate many threads in a single diagram but is sometimes harder to parse.
Now regarding using a state transition diagram to help implement threading in a user interface program you simply have to locate tasks between the event handler and returning to listening which are time consuming and likely to block. You then need to dispatch these tasks as a thread, optionally adding a completion callback in the main thread.
If I have missed anything please comment below, otherwise I hope this is helpful.

How does process blocking apply to a multi-threaded process?

I've learned that a process has running, ready, blocked, and suspended states. Threads also have these states except for suspended because it lives in the process's address space.
A process blocks most of the time when it is doing a blocking i/o or waiting for an event.
I can easily picture out a process getting blocked if its single-threaded or if it follows a one-to-many model, but how does it work if the process is multi-threaded?
For example:
I have a process with two threads in a system that follows a one-to-one model. One handles the gui and the other handles the blocking i/o. I know the process remains responsive because the other thread handles the i/o.
So is there by any chance the process gets blocked or should I just rule it out in this case?
I'm just getting into these stuff so forgive me If I haven't understand some of the important details yet.

Let's say you have a work queue where the UI thread schedules work to be done and the I\O thread looks there for work to do. The work queue itself is data that is read and modified from both threads, therefor you must synchronize access somehow or race conditions result.
The naive approach is to synchronize access to the queue using a lock (aka critical section). If the I\O thread acquires the lock and then blocks, the UI thread will only remain responsive until it decides it needs to schedule work and tries to acquire the lock. A better approach is to use a lock-free queue about which much has been written and you can easily search for more info.
But to answer your question, yes, it is still much easier than you might think to cause UI to stutter / hang even when using multiple threads. There are various libraries that make it easier or harder to solve this problem, so depending on your OS and language of choice, there may be something better than just OS primitives. Win32 (from what I remember) doesn't it make it very easy at all despite having all sorts of synchronization primitives. Pthreads and Boost never seemed very straightforward to me either. Apple's GCD makes it semantically much easier to express what you want (in my opinion), though there are still pitfalls one must be aware of (such as scheduling too many blocking operations on a single work queue to be done in parallel and causing the processor to thrash when they all wake up at the same time).
My advice is to just dive in and write lots of multithreaded code. It can be tough to debug but you will learn a lot and eventually it becomes second nature.

How can akka actor interact between threads

I've read akka documentation and can't produce clean understanding of thread interaction while using akka. Docs may omit this thing as obvious but it is not so obvious for me.
All akka actors seemed to be run in same thread they are called. I see actors as co-procedures that just had own stack reset each time receive called.
You may perform a huge chain of actors switching in straight line. Each receive perform small non-blocking operation and force another receive to work further. There is no event loop, that can handle messages outside of the actor system.
I'd like to catch a request from other thread, perform control operations, and wait for another message.
There are some use cases that outline my needs.
There is thread that constantly polling data from some sources. Once data matches pattern it invokes event-driven handler based on actors. Logical controller makes a decision and passes it workers. There should be two persistent thread. One threads works constantly on polling and another works asynchronously to control it work. You should not let akka actors to first thread since they broke polling periods and first thread should not block actors so they need another thread.
There is some kind of two-side board game. One side has a controller thread that schedules calculation time works interacts with board server and etcetera. Other thread is a heavy calculating thread that loops over different variants and could not be written in akka since it has blocking nature
I aware of existing akka futures, but they represent a working task that run once fired and shutting down after performing their goal. The futures are well combined with akka actors, but can not express looped working threads.
Akka actor system incorporates different kinds of network event loops. You may use its built-in remote actor system or well known 0mq protocol. But using network for thread interactions seems like overdoing for me.
What is the supposed way to glue non-akka thread with akka one? Should I wrote a couple of special procedures to perform message passing in thread-safe way?

If you need polling, then the polling thread should just turn whatever is polled into a message and fire it off to an actor.
I find it more useful to use an Actor with a receiveTimeout to do non-blocking polling at an interval, and when there's something that gets polled, it will publish it to some other actor, or perhaps even its ActorSystems' EventStream, for true pub-sub action.

How does Asynchronous programming work in a single threaded programming model?

I was going through the details of node.jsand came to know that, It supports asynchronous programming though essentially it provides a single threaded model.
How is asynchronous programming handled in such cases? Is it like runtime itself creates and manages threads, but the programmer cannot create threads explicitly? It would be great if someone could point me to some resources to learn about this.

Say it with me now: async programming does not necessarily mean multi-threaded.
Javascript is a single-threaded runtime - you simply aren't able to create new threads in JS because the language/runtime doesn't support it.
Frank says it correctly (although obtusely) In English: there's a main event loop that handles when things come into your app. So, "handle this HTTP request" will get added to the event queue, then handled by the event loop when appropriate.
When you call an async operation (a mysql db query, for example), node.js sends "hey, execute this query" to mysql. Since this query will take some time (milliseconds), node.js performs the query using the MySQL async library - getting back to the event loop and doing something else there while waiting for mysql to get back to us. Like handling that HTTP request.
Edit: By contrast, node.js could simply wait around (doing nothing) for mysql to get back to it. This is called a synchronous call. Imagine a restaurant, where your waiter submits your order to the cook, then sits down and twiddles his/her thumbs while the chef cooks. In a restaurant, like in a node.js program, such behavior is foolish - you have other customers who are hungry and need to be served. Thus you want to be as asynchronous as possible to make sure one waiter (or node.js process) is serving as many people as they can.
Edit done
Node.js communicates with mysql using C libraries, so technically those C libraries could spawn off threads, but inside Javascript you can't do anything with threads.

Ryan said it best: sync/async is orthogonal to single/multi-threaded. For single and multi-threaded cases there is a main event loop that calls registered callbacks using the Reactor Pattern. For the single-threaded case the callbacks are invoked sequentially on main thread. For the multi-threaded case they are invoked on separate threads (typically using a thread pool). It is really a question of how much contention there will be: if all requests require synchronized access to a single data structure (say a list of subscribers) then the benefits of having multiple threaded may be diminished. It's problem dependent.
As far as implementation, if a framework is single threaded then it is likely using poll/select system call i.e. the OS is triggering the asynchronous event.

To restate the waiter/chef analogy:
Your program is a waiter ("you") and the JavaScript runtime is a kitchen full of chefs doing the things you ask.
The interface between the waiter and the kitchen is mediated by queues so requests are not lost in instances of overcapacity.
So your program is assigned one thread of execution. You can only wait one table at a time. Each time you want to offload some work (like making the food/making a network request), you run to the kitchen and pin the order to a board (queue) for the chefs (runtime) to pick-up when they have spare capacity. The chefs will let you know when the order is ready (they will call you back). In the meantime, you go wait another table (you are not blocked by the kitchen).
So the accepted answer is misleading. The JavaScript runtime is definitionally multithreaded because I/O does not block your JavaScript program. As a waiter you can continue serving customers, while the kitchen cooks. That involves at least two threads of execution. The reality is that the runtime will maintain several threads of execution behind the scenes, in order to efficiently serve the single thread directly corresponding to your script.
By design, only one thread of execution is assigned to the synchronous running of your JavaScript program. This is a good thing because it makes your program easier to reason about than having to handle multiple threads of execution yourself. Don't worry: your JavaScript program can still get plenty complicated though!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string