I’m trying to create an Observable from an iterator that is not thread safe. I’m creating the iterator using Observable.resource, and then creating an Observable around this iterator using Observable.fromIterator. I then use .executeWithModel(ExecutionModel.SynchronousExecution) which I thought would force the resulting observable to run on the same thread even with a Scheduler that manages multiple threads.
It turns out that this is not the case, as my observable fails sometimes if it is zipped with another observable for example, and I suspect that I may get other issues like this if I use a global, multi-threaded scheduler.
One thing that seems to work is to run this observable on a single-threaded scheduler, but this make me either share one for every execution of this observable or create a new one every time. Is there any better way to make sure that my observable always run on the thread it was started on?
Related
In a Rust application that is:
Synchronous in the sense of not using "async"
multi-threaded using std::thread
threads are communicating via channels
the "anyhow" crate is being used to annotate and propagate Results
I am propagating all errors up to the main thread, but I only see the Error that is hit by the main thread. This usually happens before I join the child threads, so I don't see the actual root cause.
What minimum-boilerplate modification can I make to see the Errors from multiple threads?
(I'll put some ideas I have in answers, but I'm hoping there is something better.)
I could use my main thread only for supervising child threads, aggregate all of their Results in some kind of Vec as I join them, filter it, and write custom code to print it.
This still feels like more work than should be necessary; I'm not the first person to write a threaded rust application that handles errors.
I could let threads panic and "propagate the panics":
https://doc.rust-lang.org/beta/std/thread/type.Result.html
But this is ugly:
It introduces a difference between how errors are handled in child threads vs. the main thread.
At every fallible call site in my original code I have to add .unwrap().
If I unwrap() every error in the child threads, I might as well not even be using Result error handling, because everything will be returning Ok() or panicking. If I make that transformation, then I change the signature of all my existing functions, which is also gross.
One option would be to upgrade the entire application to use a logging framework, and then scrape through the logs.
This will require modifications everywhere have added an anyhow .context(), ensure!, or bail! annotation, to conditionally also throw a logger error.
Eventually I will want both logging/telemetry and clean teardown from propagating Results, but even then I will not want to have several lines of boilerplate for every single fallible function call.
I need to design a thread pool system, in Python in this case, but I'm more interested in the general methodology.
It has to be something along the lines of https://www.metachris.com/2016/04/python-threadpool/, where threads wait idling until some jobs are pushed into the pool. How that works, using condition variables, is clear to me.
I have one additional requirement though: the jobs I'm pushing into the pool cannot run all in parallel. Each of them has a class (i don't mean the object class here, just a simple integer that somehow classifies the job) and only one job per class can be running at the same time. If a job is pushed having the same class of a job that is currently running, it has to wait in the queue until the latter is done.
I have already modified the mentioned class to do this, but what I achieved is pretty messy and I'm not sure it's reliable, so I would ask what modifications would be suggested or whether I should use a totally different approach. Again: I don't need the code, but rather a description.
Thanks.
Let's have a worker thread which is accessed from a wide variety of objects. This worker object has some public slots, so anyone who connects its signals to the worker's slots can use emit to trigger the worker thread's useful tasks.
This worker thread needs to be almost global, in the sense that several different classes use it, some of them are deep in the hierarchy (child of a child of a child of the main application).
I guess there are two major ways of doing this:
All the methods of the child classes pass their messages upwards the hierarchy via their return values, and let the main (e.g. the GUI) object handle all the emitting.
All those classes which require the services of the worker thread have a pointer to the Worker object (which is a member of the main class), and they all connect() to it in their constructors. Every such class then does the emitting by itself. Basically, dependency injection.
Option 2. seems much more clean and flexible to me, I'm only worried that it will create a huge number of connections. For example, if I have an array of an object which needs the thread, I will have a separate connection for each element of the array.
Is there an "official" way of doing this, as the creators of Qt intended it?
There is no magic silver bullet for this. You'll need to consider many factors, such as:
Why do those objects emit the data in the first place? Is it because they need to do something, that is, emission is a “command”? Then maybe they could call some sort of service to do the job without even worrying about whether it's going to happen in another thread or not. Or is it because they inform about an event? In such case they probably should just emit signals but not connect them. Its up to the using code to decide what to do with events.
How many objects are we talking about? Some performance tests are needed. Maybe it's not even an issue.
If there is an array of objects, what purpose does it serve? Perhaps instead of using a plain array some sort of “container” class is needed? Then the container could handle the emission and connection and objects could just do something like container()->handle(data). Then you'd only have one connection per container.
I'm building a library that others--i.e. those uninterested in internals--could use to pull data from our DBs. In the internals, I want a couple I/O calls to be performed in parallel for performance purposes. The trade-off here is that the client (who, again, might not care much about this whole threading thing) would need to provide an appropriate execution context. Therefore, I provide a suggestion to use a helpful execution context in a helper object:
object ThreadPoolHelper {
val cachedThreadPoolContext: ExecutionContext =
ExecutionContext.fromExecutor(Executors.newCachedThreadPool())
}
The question is (assuming that someday I also provide other options, like, say, a fixed thread pool for the clients to optionally use) am I fine just leaving this (these) as a val? Or am I better off making it lazy? Or a def?
One way or another, lazy is the way to go.
Making them lazy vals would be a good all-purpose choice, as each could be initialized as needed (as they are accessed). Then, you would never initialize more thread pools than are needed. scala.concurrent.ExecutionContext.Implicits.global is an implicit lazy val.
Technically, singleton objects (like ThreadPoolHelper) are lazy by default, so they will not be initialized until they are first accessed. A val would be fine if you only had one ExecutionContext in an object. However, multiple ExecutionContexts as vals in the same object wouldn't make as much sense, because accessing one would initialize them all--which would use more resources than needed.
A def would not make sense, because then you would be creating a new ExecutionContext on each call, and throwing it away when done. That could cause a lot of unwanted overhead, and default the purpose of having a thread pool in the first place.
Some ExecutionContexts that are little more custom than yours are singleton objects that extend ExecutionContext and implement their own custom behavior. These would also be lazy.
I've been reading a lot about the Event Loop, and I understand the abstraction provided whereby I can make an I/O request (let's use fs.readFile(foo.txt)) and just pass in a callback that will be executed once a particular event indicates completion of the file reading is fired. However, what I do not understand is where the function that is doing the work of actually reading the file is being executed. Javascript is single-threaded, but there are two things happening at once: the execution of my node.js file and of some program/function actually reading data from the hard drive. Where does this second function take place in relation to node?
The Node event loop is truly single threaded. When we start up a program with Node, a single instance of the event loop is created and placed into one thread.
However for some standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely. So they will not block the main loop or event loop. Instead they make use of something called a thread pool that thread pool is a series of (by default) four threads that can be used for running computationally intensive tasks. There are ONLY FOUR things that use this thread pool - DNS lookup, fs, crypto and zlib. Everything else execute in the main thread.
"Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the Apache model, there are a lot less threads and thread overhead, since threads aren’t needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js." via http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
Its like using, setTimeout(function(){/*file reading code here*/},1000);. JavaScript can run multiple things side by side like, having three setInterval(function(){/*code to execute*/},1000);. So in a way, JavaScript is multi-threading. And for actually reading from/or writing to the hard drive, in NodeJS, if you use:
var child=require("child_process");
function put_text(file,text){
child.exec("echo "+text+">"+file);
}
function get_text(file){
//JQuery code for getting file contents here (i think)
return JQueryResults;
}
These can also be used for reading and writing to/from the hard drive using NodeJS.