Single thread and asynchronous confusion - multithreading

I've read various articles and similar questions, and I know that the two concepts are different, but I don't seem to know the correct answer.
I understood that thread is in terms of number of workers, and sync/async is in terms of task order.
I would like to ask if my understanding is correct, with the following example.
Have to make a sandwich.
Bake bread.
Fry the eggs.
Combine.
A thread == a frying pan.
Single-thread & sync:
Put the bread on the frying pan.
just watch until the bread is all baked.
When the bread is baked, remove the bread from the pan and put the eggs on that pan.
Multi-thread & async:
Multiple pans.
Put the bread and eggs on different pans, respectively.
No matter what you put first, take out the completed one.
Single-thread & async:
One pan.
Put the bread on the pan.
The bread was not all baked, but put it aside for a while and put the eggs on that pan.
The eggs aren't all fried, but I'll just put them away and put the bread on the pan.
Repeat...
Multi-thread & sync:
There are several pans, but we will bake the bread on pan1 first.
When the bread on pan1 is finished, fry the eggs on pan2.
Is my understanding correct?
+) If so, case single-thread / async like javascript, the task in the event loop is just waiting in the queue and not progressing, right?

The example is great & funny. Do not forget to take later the other one for "Multi-thread & async" otherwise it will be burnt ;) . Otherwise, It seems correct overall to me.
The example is not very good for the "Single-thread & async" case though and it might be the source of confusion. In practice, the pre-emption (switch between the eggs and the bread) is not guaranteed in async. It is generally done cooperatively. The problem is that in you example the bread cannot be cooked while the eggs are since there is only 1 pan. In practice, a task like I/Os operations can progress without using any core (ie. pan).
Async is mainly useful to avoid waiting a task completion while something else can be done. It is not really useful for computing tasks. For example, if you cook 2 sandwiches for 2 friends and you do not know if they like eggs or beacon, you need to ask them. This task can be async: you can ask the first, then the second, then cook the bread using 1 or 2 pan, then check the answers before cooking the eggs/beacon. Without async, you have to wait for the answer (possibly in a thread) before cooking the bread (which is not efficient).
Async operation can be split in multiple parts:
starting the request (eg. send a message to a friend)
checking the state or triggering events (eg. checking your friend messages or react to them when received)
completing the request (eg. start cocking what your friends want) which can include starting new requests (done in 1.)
The part 2 is dependent of the language/framework. Moreover, regarding the language/framework, there is sometime a part consisting in waiting for the task to be completed (blocking operation). It can be done by looping on the part 2 until the state is completed, but sometime it can be done a bit more efficiently.

Related

Handling large amounts of arbitrarily scheduled tasks in node

Premise: I have a calendar-like system that allows the creation/deletion of 'events' at a scheduled time in the future. The end goal is to perform an action (send message/reminder) prior to & at the start of the event. I've done a bit of searching & have narrowed down to what seems to be my two most viable choices
Unix Cron Jobs
Bree
I'm not quite sure which will best suit my end goal though, and additionally, it feels like there must be some additional established ways to do things like this that I just don't have proper knowledge of, or that I'm entirely skipping over.
My questions:
If, theoretically, the system were to be handling an arbitrarily large amount of 'events', all for arbitrary times in the future, which of these options is more practical system-resource-wise? Is my concern in this regard even valid?
Is there any foreseeable problem with filling up a crontab with a large volume of jobs - or, in bree's case, scheduling a large amount of jobs?
Is there a better idea I've just completely missed so far?
This mainly stems from bree's use of node 'worker threads'. I'm very unfamiliar with this concept
and concerned that since a 'worker thread' is spawned per every job, I could very quickly tie up all of my available threads and grind... something, to a halt. This, however, sounds somewhat silly & possibly wrong(possibly indicative of my complete lack of knowledge here), & thus, my question.
Thanks, Stark.
For a calendar-like system, it seems you could query your database to find all events occuring in the next hour, then create a setTimeout() for each one of those. Then, an hour later, do the same thing again. Then, upon any server restart, do the same thing again. You don't really need to worry about events that aren't imminent. They can just sit in the database until shortly before their time. You will just need an efficient way to query the database to find events that are imminent and user a timer for them.
WorkerThreads are fairly heavy weight items in nodejs as they create a whole separate heap and a whole new instance of a V8 interpreter. You would definitely not want a separate WorkerThread for each event.
I should add that timers in nodejs are very lightweight items and it is not problem to have lots of them. They are just stored in a sorted linked list and only the insertion of a new timer takes a little bit more time (to do an insertion sort as it is added to the list) as the list gets longer. There is no continuous run-time overhead because there are lots of timers. The event loop, then just checks the first item in the linked list to see if it's time yet for the next timer to fire. If so, it removes it from the head of the list and calls its callback. If not, it goes about the rest of the event loop work items and will check the first item in the list again the next through the event loop.

processing two of many concurrent results already, optimization - architecture? algorithm?

I am downloading data from two times a dozen sources, via threading.Thread(). The special property of my case is that data processing can already start when certain two such sources are ready (Not any two, but predefined two, pairwise).
First approach is to download all sources, and
for all t in threads: t.join()
to wait until all downloads have finished, and then start the data processing, because then I can be sure to have everything I need, for the data processing.
It will already be fast, but then the data processing only starts after all sources have completed downloading. How to get the last bit of optimization now?
I am wondering if perhaps there is a canonic CS way to solve this question of starting tasks when the input collection is only partly done.
Thanks!
In general CS terms what you are looking for is language features that enable composition of tasks.
For example, in C# if you use Task instead of Thread you can use Task.WhenAll() to wait for each pair of subtasks to complete (as a Task itself) and then ContinueWith whatever you need to do with that pair of results.
See https://msdn.microsoft.com/en-us/library/hh194874(v=vs.110).aspx
Task a = ...
Task b = ...
Task ab = Task.WhenAll(a,b).ContinueWith(...);
...
var result = Task.WhenAll(ab, ab, bc, ...).Result;
You could also look into Reactive Extensions that are becoming available in many programming languages. These enable composition of observable sequences of results in a push model. For example zip in RxJS would allow you to trigger a next step whenever both of its input sequences have produced a result.

What are the benefits of coroutines?

I've been learning some lua for game development. I heard about coroutines in other languages but really came up on them in lua. I just don't really understand how useful they are, I heard a lot of talk how it can be a way to do multi-threaded things but aren't they run in order? So what benefit would there be from normal functions that also run in order? I'm just not getting how different they are from functions except that they can pause and let another run for a second. Seems like the use case scenarios wouldn't be that huge to me.
Anyone care to shed some light as to why someone would benefit from them?
Especially insight from a game programming perspective would be nice^^
OK, think in terms of game development.
Let's say you're doing a cutscene or perhaps a tutorial. Either way, what you have are an ordered sequence of commands sent to some number of entities. An entity moves to a location, talks to a guy, then walks elsewhere. And so forth. Some commands cannot start until others have finished.
Now look back at how your game works. Every frame, it must process AI, collision tests, animation, rendering, and sound, among possibly other things. You can only think every frame. So how do you put this kind of code in, where you have to wait for some action to complete before doing the next one?
If you built a system in C++, what you would have is something that ran before the AI. It would have a sequence of commands to process. Some of those commands would be instantaneous, like "tell entity X to go here" or "spawn entity Y here." Others would have to wait, such as "tell entity Z to go here and don't process anymore commands until it has gone here." The command processor would have to be called every frame, and it would have to understand complex conditions like "entity is at location" and so forth.
In Lua, it would look like this:
local entityX = game:GetEntity("entityX");
entityX:GoToLocation(locX);
local entityY = game:SpawnEntity("entityY", locY);
local entityZ = game:GetEntity("entityZ");
entityZ:GoToLocation(locZ);
do
coroutine.yield();
until (entityZ:isAtLocation(locZ));
return;
On the C++ size, you would resume this script once per frame until it is done. Once it returns, you know that the cutscene is over, so you can return control to the user.
Look at how simple that Lua logic is. It does exactly what it says it does. It's clear, obvious, and therefore very difficult to get wrong.
The power of coroutines is in being able to partially accomplish some task, wait for a condition to become true, then move on to the next task.
Coroutines in a game:
Easy to use, Easy to screw up when used in many places.
Just be careful and not use it in many places.
Don't make your Entire AI code dependent on Coroutines.
Coroutines are good for making a quick fix when a state is introduced which did not exist before.
This is exactly what java does. Sleep() and Wait()
Both functions are the best ways to make it impossible to debug your game.
If I were you I would completely avoid any code which has to use a Wait() function like a Coroutine does.
OpenGL API is something you should take note of. It never uses a wait() function but instead uses a clean state machine which knows exactly what state what object is at.
If you use coroutines you end with up so many stateless pieces of code that it most surely will be overwhelming to debug.
Coroutines are good when you are making an application like Text Editor ..bank application .. server ..database etc (not a game).
Bad when you are making a game where anything can happen at any point of time, you need to have states.
So, in my view coroutines are a bad way of programming and a excuse to write small stateless code.
But that's just me.
It's more like a religion. Some people believe in coroutines, some don't. The usecase, the implementation and the environment all together will result into a benefit or not.
Don't trust benchmarks which try to proof that coroutines on a multicore cpu are faster than a loop in a single thread: it would be a shame if it were slower!
If this runs later on some hardware where all cores are always under load, it will turn out to be slower - ups...
So there is no benefit per se.
Sometimes it's convenient to use. But if you end up with tons of coroutines yielding and states that went out of scope you'll curse coroutines. But at least it isn't the coroutines framework, it's still you.
We use them on a project I am working on. The main benefit for us is that sometimes with asynchronous code, there are points where it is important that certain parts are run in order because of some dependencies. If you use coroutines, you can force one process to wait for another process to complete. They aren't the only way to do this, but they can be a lot simpler than some other methods.
I'm just not getting how different they are from functions except that
they can pause and let another run for a second.
That's a pretty important property. I worked on a game engine which used them for timing. For example, we had an engine that ran at 10 ticks a second, and you could WaitTicks(x) to wait x number of ticks, and in the user layer, you could run WaitFrames(x) to wait x frames.
Even professional native concurrency libraries use the same kind of yielding behaviour.
Lots of good examples for game developers. I'll give another in the application extension space. Consider the scenario where the application has an engine that can run a users routines in Lua while doing the core functionality in C. If the user needs to wait for the engine to get to a specific state (e.g. waiting for data to be received), you either have to:
multi-thread the C program to run Lua in a separate thread and add in locking and synchronization methods,
abend the Lua routine and retry from the beginning with a state passed to the function to skip anything, least you rerun some code that should only be run once, or
yield the Lua routine and resume it once the state has been reached in C
The third option is the easiest for me to implement, avoiding the need to handle multi-threading on multiple platforms. It also allows the user's code to run unmodified, appearing as if the function they called took a long time.

Progress bar and multiple threads, decoupling GUI and logic - which design pattern would be the best?

I'm looking for a design pattern that would fit my application design.
My application processes large amounts of data and produces some graphs.
Data processing (fetching from files, CPU intensive calculations) and graph operations (drawing, updating) are done in seperate threads.
Graph can be scrolled - in this case new data portions need to be processed.
Because there can be several series on a graph, multiple threads can be spawned (two threads per serie, one for dataset update and one for graph update).
I don't want to create multiple progress bars. Instead, I'd like to have single progress bar that inform about global progress. At the moment I can think of MVC and Observer/Observable, but it's a little bit blurry :) Maybe somebody could point me in a right direction, thanks.
I once spent the best part of a week trying to make a smooth, non-hiccupy progress bar over a very complex algorithm.
The algorithm had 6 different steps. Each step had timing characteristics that were seriously dependent on A) the underlying data being processed, not just the "amount" of data but also the "type" of data and B) 2 of the steps scaled extremely well with increasing number of cpus, 2 steps ran in 2 threads and 2 steps were effectively single-threaded.
The mix of data effectively had a much larger impact on execution time of each step than number of cores.
The solution that finally cracked it was really quite simple. I made 6 functions that analyzed the data set and tried to predict the actual run-time of each analysis step. The heuristic in each function analyzed both the data sets under analysis and the number of cpus. Based on run-time data from my own 4 core machine, each function basically returned the number of milliseconds it was expected to take, on my machine.
f1(..) + f2(..) + f3(..) + f4(..) + f5(..) + f6(..) = total runtime in milliseconds
Now given this information, you can effectively know what percentage of the total execution time each step is supposed to take. Now if you say step1 is supposed to take 40% of the execution time, you basically need to find out how to emit 40 1% events from that algorithm. Say the for-loop is processing 100,000 items, you could probably do:
for (int i = 0; i < numItems; i++){
if (i % (numItems / percentageOfTotalForThisStep) == 0) emitProgressEvent();
.. do the actual processing ..
}
This algorithm gave us a silky smooth progress bar that performed flawlessly. Your implementation technology can have different forms of scaling and features available in the progress bar, but the basic way of thinking about the problem is the same.
And yes, it did not really matter that the heuristic reference numbers were worked out on my machine - the only real problem is if you want to change the numbers when running on a different machine. But you still know the ratio (which is the only really important thing here), so you can see how your local hardware runs differently from the one I had.
Now the average SO reader may wonder why on earth someone would spend a week making a smooth progress bar. The feature was requested by the head salesman, and I believe he used it in sales meetings to get contracts. Money talks ;)
In situations with threads or asynchronous processes/tasks like this, I find it helpful to have an abstract type or object in the main thread that represents (and ideally encapsulates) each process. So, for each worker thread, there will presumably be an object (let's call it Operation) in the main thread to manage that worker, and obviously there will be some kind of list-like data structure to hold these Operations.
Where applicable, each Operation provides the start/stop methods for its worker, and in some cases - such as yours - numeric properties representing the progress and expected total time or work of that particular Operation's task. The units don't necessarily need to be time-based, if you know you'll be performing 6,230 calculations, you can just think of these properties as calculation counts. Furthermore, each task will need to have some way of updating its owning Operation of its current progress in whatever mechanism is appropriate (callbacks, closures, event dispatching, or whatever mechanism your programming language/threading framework provides).
So while your actual work is being performed off in separate threads, a corresponding Operation object in the "main" thread is continually being updated/notified of its worker's progress. The progress bar can update itself accordingly, mapping the total of the Operations' "expected" times to its total, and the total of the Operations' "progress" times to its current progress, in whatever way makes sense for your progress bar framework.
Obviously there's a ton of other considerations/work that needs be done in actually implementing this, but I hope this gives you the gist of it.
Multiple progress bars aren't such a bad idea, mind you. Or maybe a complex progress bar that shows several threads running (like download manager programs sometimes have). As long as the UI is intuitive, your users will appreciate the extra data.
When I try to answer such design questions I first try to look at similar or analogous problems in other application, and how they're solved. So I would suggest you do some research by considering other applications that display complex progress (like the download manager example) and try to adapt an existing solution to your application.
Sorry I can't offer more specific design, this is just general advice. :)
Stick with Observer/Observable for this kind of thing. Some object observes the various series processing threads and reports status by updating the summary bar.

Multithreading or green threading in actionscript?

I was wondering if there are any code or class libraries out there on how to implement multithreading or "green threading" in ActionScript.
As you've might seen, Scott Peterson is developing some kind of toolset, but I haven't found any more info on this other than his performance on the Adobe MAX Chicago event.
Regards Niclas
Here's a Green Threading lib from Drew Cummins:
http://blog.generalrelativity.org/?p=29
There's no built-in way to do green threading in ActionScript. You have to write code to handle it.
Make a function that performs one iteration of whatever operation you want to do. It should return true or false depending on if its job is done or not. Now, you have to compute the time interval left to the next screen update on the ENTER_FRAME event. This can be done using flash.utils.getTimer.
start = getTimer();
//thread is a ui component added to system manager that is redrawn each frame
var fr:Number = Math.floor(1000 / thread.systemManager.stage.frameRate);
due = start + fr;
Keep on executing your function while checking the function's return value each time and checking if due time has been crossed by comparing getTimer() with due.
This has been implemented into a usable class by Alex Harui in the blog entry - Threads in ActionScript
It's an old article, but quasimondo's method of launching multiple swfs and then sharing the data over a LocalConnection may also be of interest. They were saying that the back and forth of using the LocalConnection may eat up a few cycles, but if the iterations being processed are complex enough it shouldn't be too much of a problem.
I'm a graphics guy, not a programmer, so I'm not sure this will help you. BUT!
I make all my GUIs multi-frame "movies" and write each gui thread on a different frame. Make sure that you only have 1-3 threads, and set your FPS to 30 or 60.
This is useful for little projects because its bug-resistant and implementation is done for you.

Resources