Is it safe to instantiate a TThread-descendent from inside a thread? - multithreading

I have a task where I have to do some heavy CPU/RAM stuff to do. With the outcome of that, I have to do some database requests. All of this I have to do for some thousand times, so I'm doing it inside a background thread. Now I'm considering dividing each task up into the two parts and split them between 2 separate threads, so that the first thread don't have to wait for the database requests to finish. It then could do the CPU/RAM stuff for the second round while the second thread is waiting for the database requests of the first round and everything would speed up.
Now, is it safe to instantiate the second TThread descendent from within the first one? Or do I need to instantiate TThread descendents from within MainThread? I could do both, but instantiating the second from within the first would be easier in my case, and it would be following the oop paradigm as the second thread would be transparent to the rest of the program.

I did it lots of times in production code and it never was a real issue. I can say it seems to be perfectly safe.

Related

Simplest way to switch context back and forth (dispatching)

I have two classes that perform independent computation. For the sake of simplicity here, I will represent them with functionscalc1 and calc2. At some random places in each function, I need to sleep for a while. Rather than doing calc1 and then calc2, I'd like to switch back and forth between them.
I could use threads, but it seems to me at first sight overly engineered to do so. In addition, these 2 functions need to be on the main thread, because they deal with user interface. I do not need different threads. I just need to switch from one context to another, and back where we were.
In python, there is the concept of greenlet (gevent) which allow simple context switching, without being real thread. That would be perfect for my need. Is there such a mechanism in swift ?
func calc1() {
...
sleep(300) // go to calc2
...
}
func calc2() {
...
sleep(200) // resume calc1
...
}
This notion of alternatively switching between two computationally expensive calculations on the main thread is not going to work.
First, we never do anything computationally expensive on the main thread. We should never block the main thread for any reason. It results in a horrible UX (your app may appear to frozen) and you risk having the app killed by the OS watchdog process (which is looking for apps that appear to be frozen and are blocking the main thread).
Second, if calculating two truly independent calculations, we wouldn't add the overhead and complexity of trying to switch between them. We would just use GCD to dispatch them independently to background queue(s).
So, the solution for this sort of process would be to dispatch these two tasks to run concurrently on background queue(s), either to one concurrent queue or two dedicated queues. But the key is to perform the complicated calculations off of the main thread, in parallel, and then dispatch the periodic UI updates back to the main thread. But always keep the main thread free to respond to user input, system events, etc.
FWIW, this process of dispatching these two independent tasks separately is simpler and a far more efficient use of the device resources. Just synchronize updates to model objects. And dispatch groups are a great way to keep track of when two independent concurrent tasks finish.

Difference between thread-isolated and semaphore-isolated calls

I was going through the Netflix opensource feature Hystrix...
I saw a statement
"Today tens of billions of thread-isolated, and hundreds of billions of semaphore-isolated calls are executed via Hystrix every day at Netflix"
Would like to know the difference between these different type of calls..
First we need to see the different between thread and semaphore. In general, calling thread is more expensive than semaphore because of the overhead. So for a large number of requests/second, then semaphore will be something you can considere.
Secondly with semaphore, the command will be executed within the thread of the caller. It means that the concurrent calls are not fully isolated from other (not like when you use thread).
Lastly, with semaphore, when there is a timeout, it can't be terminated (unless you specifically set it up). If you don't know what will be the client's behaviour ,then this would be not a nice thing to have.

boost::thread and starting or not starting large numbers of threads at the same time

In researching my question on this forum, I have found my same basic question about boost::thread asked a number of times before, but none of the answers seem to answer my specific needs. The thrust of the question from the way I'm thinking about it is how to instantiate a number of threads without actually launching any of them until you are ready. The answer always seems to be in the vein of why not just wait until you are ready to launch each thread before instantiating it. Indeed, it appears to me that's the way boost::thread is designed to work. That is, it appears that instantiating a boost::thread object seems to want also to launch the thread, and it's not clear to me if these to functions can really be separated.
I'll try to explain a little more about my application than some of the other questioners have explained. I need to launch at least dozens if not hundreds or even thousands of threads. Each thread will run the exact same function, and the threads will be differentiated from each other by a single and unique argument that is passed to the function as a part of each thread. Each thread will use large amounts of shared memory that will be read only and that does not need to be synchronized. Each thread also will use a substantial amount of local memory that will be allocated dynamically and which will be released prior to the completion of the thread. The local memory also does not need to be synchronized. Indeed, I'm thinking I may be able to get by without any synchronization between the threads at all except joins to await the completion of each thread.
The program will run for dozens or even hundreds or possibly thousands of hours, so I want to use all the processor cores on my machine, but I don't want to initiate more threads at the same time than the number of processor cores that I have available. The problem is that if I were to initiate too many threads at the same time, they would quickly exhaust my machine's memory because of the local memory used by each thread. But if I can manage to initiate only as many threads as I have processor cores, then the memory situation will be fine.
So if I have n cores, I want to launch n threads and wait until one of them is done. When one of then is done, I want to launch one more, etc. until all the work is done. I have looked at thread_group and join_all. On the thread_group itself, I can't figure out how to wait until one of the first n threads is completed before instantiating the (n+1)st one, etc. And join_all is not what I really need because I just want to wait until less than n threads are still running, and it shouldn't matter which one finishes first. Whenever the number of active threads becomes less than n, I want to launch a new one. I'm quite sure that add_thread() is going to be involved in this somehow or other. I'm also quite sure that the boost::thread size() function is going to be involved somehow or other. But I can't really see how to put all the pieces together. And the problem always seems to come down to not quite being able to understand how to instantiate a thread only when I'm ready. And for example, all the boost::thread examples I have been able to find seem to have "statically named" thread objects, viz., "boost::thread MyWorkerFunction" etc. that get instantiated as they are encountered in the code. Well, I could have thousands of threads for the same MyWorkerFunction and I only want to dispatch n of the threads at the same time, and I don't understand how to do that.
Thanks,
Jerry

threading synchronization at high speed

I have a threading question and what I'd qualify as a modest threading background.
Suppose I have the following (oversimplified) design and behavior:
Object ObjectA - has a reference to object ObjectB and a method MethodA().
Object ObjectB - has a reference to ObjectA, an array of elements ArrayB and a method MethodB().
ObjectA is responsible for instantiating ObjectB. ObjectB.ObjectA will point to ObjectB's instantiator.
Now, whenever some conditions are met, a new element is added in ObjectB.ArrayB and a new thread is started for this element, say ThreadB_x, where x goes from 1 to ObjectB.ArrayB.Length. Each such thread calls ObjectB.MethodB() to pass some data in, which in turn calls ObjectB.ObjectA.MethodA() for data processing.
So multiple threads call the same method ObjectB.MethodB(), and it's very likely that they do so at the very same time. There's a lot of code in MethodB that creates and initializes new objects, so I don't think there are problems there. But then this method calls ObjectB.ObjectA.MethodA(), and I don't have the slightest idea of what's going on in there. Based on the results I get, nothing wrong, apparently, but I'd like to be sure of that.
For now, I enclosed the call to ObjectB.ObjectA.MethodA() in a lock statement inside ObjectB.MethodB(), so I'm thinking this will ensure there are no clashes to the call of MethodA(), though I'm not 100% sure of that. But what happens if each ThreadB_x calls ObjectB.MethodB() a lot of times and very very fast? Will I have a queue of calls waiting for ObjectB.ObjectA.MethodA() to finish?
Thanks.
Your question is very difficult to answer because of the lack of information. It depends on the average time spent in methodA, how many times this method is called per thread, how many cores are allocated to the process, the OS scheduling policy, to name a few parameters.
All things being equals, when the number of threads grows toward infinity, you can easily imagine that the probability for two threads requesting access to a shared resource simultaneously will tend to one. This probability will grow faster in proportion to the amount of time spent on the shared resource. That intuition is probably the reason of your question.
The main idea of multithreading is to parallelize code which can be effectively computed concurrently, and avoid contention as much as possible. In your setup, if methodA is not pure, ie. if it may change the state of the process - or in C++ parlance, if it cannot be made const, then it is a source of contention (recall that a function can only be pure if it uses pure functions or constants in its body).
One way of dealing with a shared resource is to protect it with a mutex, as you've done in your code. Another way is to try to turn its use into an async service, with one thread handling it, and others requesting that thread for computation. In effect, you will end up with an explicit queue of requests, but threads doing these requests will be free to work on something else in the mean time. The goal is always to maximize computation time, as opposed to thread management time, which happens each time a thread gets rescheduled.
Of course, it is not always possible to do so, eg. when the result of methodA belongs to a strongly ordered chain of computation.

DB-connection in separate thread - what's the best way?

I am creating an app that accesses a database. On every database access, the app waits for the job to be finished.
To keep the UI responsive, I want to put all the database stuff in a separate thread.
Here is my idea:
The db-thread creates all database components it needs when it is created
Now the thread just sits there and waits for a command
If it receives a command, it performs the action and goes back to idle. During that time the main thread waits.
the db-thread lives as long as the app is running
Does this sound ok?
What's the best way to get the database results from the db-thread into the main thread?
I haven't done much with threads so far, therefore I'm wondering if the db-thread can create a query component out of which the main thread reads the results. Main thread and db thread will never access the query at the same time. Will this still cause problems?
What you are looking for is the standard data access technique, called asynchronous query execution. Some data access components implement this feature in an easy-to-use manner. At least dbGo (ADO) and AnyDAC implement that. Lets consider the dbGo.
The idea is simple - you call the convenient dataset methods, like a Open. The method launches required task in a background thread and immediately returns. When the task is completed, an appropriate event will be fired, notifying the application, that the task is finished.
The standard approach with the DB GUI applications and the Open method is the following (draft):
include eoAsyncExecute, eoAsyncFetch, eoAsyncFetchNonBlock into dataset ExecuteOptions;
disconnect TDataSource.DataSet from dataset;
set dataset OnFetchComplete to a proc P;
show "Hello ! We do the hard work to process your requests. Please wait ..." dialog;
call the dataset Open method;
when the query execution will be finished, the OnFetchComplete will be called, so the P. And the P hides the "Wait" dialog and connects TDataSource.DataSet back to the dataset.
Also your "Wait" dialog may have a Cancel button, which an user may use to cancel a too long running query.
First of all - if you haven't much experience with multi-threading, don't start with the VCL classes. Use the OmniThreadLibrary, for (among others) those reasons:
Your level of abstraction is the task, not the thread, a much better way of dealing with concurrency.
You can easily switch between executing tasks in their own thread and scheduling them with a thread pool.
All the low-level details like thread shutdown, bidirectional communication and much more are taken care of for you. You can concentrate on the database stuff.
The db-thread creates all database components it needs when it is created
This may not be the best way. I have generally created components only when needed, but not destroyed immediately. You should definitely keep the connection open in a thread pool thread, and close it only once the thread has been inactive for some time and the pool disposes of it. But it is also often a good idea to keep a cache of transaction and statement objects.
If it receives a command, it performs the action and goes back to idle. During that time the main thread waits.
The first part is being handled fine when OTL is used. However - don't have the main thread wait, this will bring little advantage over performing the database access directly in the VCL thread in the first place. You need an asynchronous design to make best use of multiple threads. Consider a standard database browser form that has controls for filtering records. I handle this by (re-)starting a timer every time one of the controls changes. Once the user finishes editing the timer event fires (say after 500 ms), and a task is started that executes the statement that fetches data according to the filter criteria. The grid contents are cleared, and it is repopulated only when the task has finished. This may take some time though, so the VCL thread doesn't wait for the task to complete. Instead the user could even change the filter criteria again, in which case the current task is cancelled and a new one started. OTL gives you an event for task completion, so the asynchronous design is easy to achieve.
What's the best way to get the database results from the db-thread into the main thread?
I generally don't use data aware components for multi-threaded db apps, but use standard controls that are views for business objects. In the database tasks I create these objects, put them in lists, and the task completion event transfers the list to the VCL thread.
Main thread and db thread will never access the query at the same time.
With all components that load data on-demand you can't be sure of that. Often only the first records are fetched from the db, and fetching continues after they have been consumed. Such components obviously must not be shared by threads.
I have implemented both strategies: Thread pool and adhoc thread creation.
I suggest to begin with the adhoc thread creation, it is simpler to implement and simpler to scale.
Only move to a thread pool if (with careful evaluation) (1) there is a lot of resources (and time) invested in the creation of the thread and (2) you have a lot of creation requests.
In both cases you must deal with passing parameters and collect results. I suggest to extend the thread class with properties that allow this data passing.
Refer to the documentation of the classes, components and functions that the thread use to make sure they are thread safe, that is, they can be use simultaneously from different threads. If not, you will need to synchronize the access. In some cases you may find slight differences regarding thread safety. As an example, see DateTimeToStr.
If you create your thread at start and reuse it later whenever you need it, you have to make sure that you disconnect the db components (grid..) from the underlying datasource (disableControls) each time you're "processing" data.
For the sake of simplicity, I would inherit TThread and implement all the business logic in my own class. The result dataset would be a member of this class and I would connect it the db aware compos in with synchronize.
Anyway, it is also very important to delegate as much work as possible to the db server and keep the UI as lightweight as possible. Firebird is my favourite db server: triggers, for select, custom UDF dlls developed in Delphi, many thread safe db components with lots of examples and good support (forum) : jvUIB...
Good Luck

Resources