Delphi threads and variables - multithreading

So this is my question , threads are so confusing for me , let's say I have 5 threads , and 50 or 100 or more sites. So as far as I've learned about threads , I can make constructor create (link:string) and start new threads with different links , but than I wold need to make as much threads as the number of links I need to parse.So how can I make variable link shared between threads , so when thread one downloads link listbox1.items[0] it tells others that number 0 is downloaded and next thread should ask what link should I download and get answer listbox1.items[1] and so on until they download all links when they should terminate.
Can anyone provide me with simple example of how can this be done. Threads are killing me :(

You could have a thread-safe list of URLs to process, and a static-sized pool of worker threads each taking an unprocessed URL from the list at a time, processing it (downloading and parsing) and adding any found new URLs to the list, in a loop, as long as there are any unprocessed items in the list. Keep finished URLs in the list, only mark them as done, to avoid recursion.

Sounds like you simply need to set up a critical section.
This needs to be set up around the code segment which reads the next URL. To do this you would typically place a semaphore at the start of the code so that only one thread can enter it at any time. The semaphore is reset at the end of the code. As each new thread sees the URL list has expired, then it terminates.
Typically semaphores are boolean, but they can be integers for example if you want to allow a specific number of threads to enter the region at any time.
In your case you can simply set up a global boolean variable (visible to all threads), say "fSemaphore".
At the start of the region, the thread checks the flag. If it is false it sets it to true and enters the region (to get the next URL).
If it is true, then it loops - e.g. repeat sleep(0) until (not fSemaphore).
When it exits the region it set fSemaphore := False;
Obviously you need to make sure you guard against a possible infinite loop scenario...

Define a 'TURI' class for the request URI, result, error message and anything else needed for the web query except for the component to be used for the URI access. Descending from TObject shoud be fine. Create, initialize 100 of these and push them on a producer-consumer queue, (TObjectQueue, TCriticalSection and a semaphore should do fine). Hang a few TThreads off the queue that loop around and process the TURI instances until the queue is empty, whereupon they block.
You do not say what action you need taking with the processed TURI's - they will need freeing somewhere. If you wish to notify the main thread, PostMessage the completed URI's and free them in the message-handler.
Terminate the threads? Sure, if you really have to, then queue up some object that signals them to commit suicide, (a NIL maybe - the threads can check for 'assigned' just after popping the queue). When doing something like this, I oftem just leave the threads lying around even if I don't need to process any more URI during the app run - it's not worth the typing of terminating them.
Sadly, the Delphi examples and, I'm afraid, many textbooks, dont' get much further than suspend/resume control of threads, (don't do it), and 'TThread.Synchronize', 'TThread.WaitFor' and 'TThread.OnTerminate'. If you get a textbook like that, take it outside and burn it - you will learn next-to-nothing good.

Related

Why pass parameters through thread function?

When I create a new thread in a program... in it's thread handle function, why do I pass variables that I want that thread to use through the thread function prototype as parameters (as a void pointer)? Since threads share the same memory segments (except for stack) as the main program, shouldn't I be able to just use the variables directly instead of passing parameters from main program to new thread?
Well, yes, you could use the variables directly. Maybe. Assuming that they aren't changed by some other thread before your thread starts running.
Also, a big part of passing parameters to functions (including thread functions) is to limit the amount of information the called function has to know about the outside world. If you pass the thread function everything it needs in order to do its work, then you can change the rest of the program with relative impunity and the thread will still continue to work. If, however, you force the thread to know that there is a global list of strings called MyStringList, then you can't change that global list without also affecting the thread.
Information hiding. Encapsulation. Separation of concerns. Etc.
You cannot pass parameters to a thread function in any kind of normal register/stack manner because thread functions are not called by the creating thread - they are given execution directly by the underlying OS and the API's that do this copy a fixed number of parameters, (usually only one void pointer), to the new and different stack of the new thread.
As Jim says, failure to understand this mechanism often results in disaster. There are numnerous questions on SO where the vars that devs. hope would be used by a new thread are RAII'd away before the new thread even starts.

The difference between wait_queue_head and wait_queue in linux kernel

I can find many examples regarding wait_queue_head.
It works as a signal, create a wait_queue_head, someone
can sleep using it until someother kicks it up.
But I can not find a good example of using wait_queue itself, supposedly very related to it.
Could someone gives example, or under the hood of them?
From Linux Device Drivers:
The wait_queue_head_t type is a fairly simple structure, defined in
<linux/wait.h>. It contains only a lock variable and a linked list
of sleeping processes. The individual data items in the list are of
type wait_queue_t, and the list is the generic list defined in
<linux/list.h>.
Normally the wait_queue_t structures are allocated on the stack by
functions like interruptible_sleep_on; the structures end up in the
stack because they are simply declared as automatic variables in the
relevant functions. In general, the programmer need not deal with
them.
Take a look at A Deeper Look at Wait Queues part.
Some advanced applications, however, can require dealing with
wait_queue_t variables directly. For these, it's worth a quick look at
what actually goes on inside a function like interruptible_sleep_on.
The following is a simplified version of the implementation of
interruptible_sleep_on to put a process to sleep:
void simplified_sleep_on(wait_queue_head_t *queue)
{
wait_queue_t wait;
init_waitqueue_entry(&wait, current);
current->state = TASK_INTERRUPTIBLE;
add_wait_queue(queue, &wait);
schedule();
remove_wait_queue (queue, &wait);
}
The code here creates a new wait_queue_t variable (wait, which gets
allocated on the stack) and initializes it. The state of the task is
set to TASK_INTERRUPTIBLE, meaning that it is in an interruptible
sleep. The wait queue entry is then added to the queue (the
wait_queue_head_t * argument). Then schedule is called, which
relinquishes the processor to somebody else. schedule returns only
when somebody else has woken up the process and set its state to
TASK_RUNNING. At that point, the wait queue entry is removed from the
queue, and the sleep is done
The internals of the data structures involved in wait queues:
Update:
for the users who think the image is my own - here is one more time the link to the Linux Device Drivers where the image is taken from
Wait queue is simply a list of processes and a lock.
wait_queue_head_t represents the queue as a whole. It is the head of the waiting queue.
wait_queue_t represents the item of the list - a single process waiting in the queue.

What multithreading package for Lua "just works" as shipped?

Coding in Lua, I have a triply nested loop that goes through 6000 iterations. All 6000 iterations are independent and can easily be parallelized. What threads package for Lua compiles out of the box and gets decent parallel speedups on four or more cores?
Here's what I know so far:
luaproc comes from the core Lua team, but the software bundle on luaforge is old, and the mailing list has reports of it segfaulting. Also, it's not obvious to me how to use the scalar message-passing model to get results ultimately into a parent thread.
Lua Lanes makes interesting claims but seems to be a heavyweight, complex solution. Many messages on the mailing list report trouble getting Lua Lanes to build or work for them. I myself have had trouble getting the underlying "Lua rocks" distribution mechanism to work for me.
LuaThread requires explicit locking and requires that communication between threads be mediated by global variables that are protected by locks. I could imagine worse, but I'd be happier with a higher level of abstraction.
Concurrent Lua provides an attractive message-passing model similar to Erlang, but it says that processes do not share memory. It is not clear whether spawn actually works with any Lua function or whether there are restrictions.
Russ Cox proposed an occasional threading model that works only for C threads. Not useful for me.
I will upvote all answers that report on actual experience with these or any other multithreading package, or any answer that provides new information.
For reference, here is the loop I would like to parallelize:
for tid, tests in pairs(tests) do
local results = { }
matrix[tid] = results
for i, test in pairs(tests) do
if test.valid then
results[i] = { }
local results = results[i]
for sid, bin in pairs(binaries) do
local outcome, witness = run_test(test, bin)
results[sid] = { outcome = outcome, witness = witness }
end
end
end
end
The run_test function is passed in as an argument, so a package can be useful to me only if it can run arbitrary functions in parallel. My goal is enough parallelism to get 100% CPU utilization on 6 to 8 cores.
Norman wrote concerning luaproc:
"it's not obvious to me how to use the scalar message-passing model to get results ultimately into a parent thread"
I had the same problem with a use case I was dealing with. I liked lua proc due to its simple and light implementation, but my use case had C code that was calling lua, which was triggering a co-routine that needed to send/receive messages to interact with other luaproc threads.
To achieve my desired functionality I had to add features to luaproc to allow sending and receiving messages from the parent thread or any other thread not running from the luaproc scheduler. Additionally, my changes allow using luaproc send/receive from coroutines created from luaproc.newproc() created lua states.
I added an additional luaproc.addproc() function to the api which is to be called from any lua state running from a context not controlled by the luaproc scheduler in order to set itself up with luaproc for sending/receiving messages.
I am considering posting the source as a new github project or contacting the developers and seeing if they would like to pull my additions. Suggestions as to how I should make it available to others are welcome.
Check the threads library in torch family. It implements a thread pool model: a few true threads (pthread in linux and windows thread in win32) are created first. Each thread has a lua_State object and a blocking job queue that admits jobs added from the main thread.
Lua objects are copied over from main thread to the job thread. However C objects such as Torch tensors or tds data structures can be passed to job threads via pointers -- this is how limited shared memory is achieved.
This is a perfect example of MapReduce
You can use LuaRings to accomplish your parallelization needs.
Concurrent Lua might seem like the way to go, but as I note in my updates below, it doesn't run things in parallel. The approach I tried was to spawn several processes that execute pickled closures received through the message queue.
Update
Concurrent Lua seems to handle first-class functions and closures without a hitch. See the following example program.
require 'concurrent'
local NUM_WORKERS = 4 -- number of worker threads to use
local NUM_WORKITEMS = 100 -- number of work items for processing
-- calls the received function in the local thread context
function worker(pid)
while true do
-- request new work
concurrent.send(pid, { pid = concurrent.self() })
local msg = concurrent.receive()
-- exit when instructed
if msg.exit then return end
-- otherwise, run the provided function
msg.work()
end
end
-- creates workers, produces all the work and performs shutdown
function tasker()
local pid = concurrent.self()
-- create the worker threads
for i = 1, NUM_WORKERS do concurrent.spawn(worker, pid) end
-- provide work to threads as requests are received
for i = 1, NUM_WORKITEMS do
local msg = concurrent.receive()
-- send the work as a closure
concurrent.send(msg.pid, { work = function() print(i) end, pid = pid })
end
-- shutdown the threads as they complete
for i = 1, NUM_WORKERS do
local msg = concurrent.receive()
concurrent.send(msg.pid, { exit = true })
end
end
-- create the task process
local pid = concurrent.spawn(tasker)
-- run the event loop until all threads terminate
concurrent.loop()
Update 2
Scratch all of that stuff above. Something didn't look right when I was testing this. It turns out that Concurrent Lua isn't concurrent at all. The "processes" are implemented with coroutines and all run cooperatively in the same thread context. That's what we get for not reading carefully!
So, at least I eliminated one of the options I guess. :(
I realize that this is not a works-out-of-the-box solution, but, maybe go old-school and play with forks? (Assuming you're on a POSIX system.)
What I would have done:
Right before your loop, put all tests in a queue, accessible between processes. (A file, a Redis LIST or anything else you like most.)
Also before the loop, spawn several forks with lua-posix (same as the number of cores or even more depending on the nature of tests). In parent fork wait until all children will quit.
In each fork in a loop, get a test from the queue, execute it, put results somewhere. (To a file, to a Redis LIST, anywhere else you like.) If there are no more tests in queue, quit.
In the parent fetch and process all test results as you do now.
This assumes that test parameters and results are serializable. But even if they are not, I think that it should be rather easy to cheat around that.
I've now built a parallel application using luaproc. Here are some misconceptions that kept me from adopting it sooner, and how to work around them.
Once the parallel threads are launched, as far as I can tell there is no way for them to communicate back to the parent. This property was the big block for me. Eventually I realized the way forward: when it's done forking threads, the parent stops and waits. The job that would have been done by the parent should instead be done by a child thread, which should be dedicated to that job. Not a great model, but it works.
Communication between parent and children is very limited. The parent can communicate only scalar values: strings, Booleans, and numbers. If the parent wants to communicate more complex values, like tables and functions, it must code them as strings. Such coding can take place inline in the program, or (especially) functions can be parked into the filesystem and loaded into the child using require.
The children inherit nothing of the parent's environment. In particular, they don't inherit package.path or package.cpath. I had to work around this by the way I wrote the code for the children.
The most convenient way to communicate from parent to child is to define the child as a function, and to have the child capture parental information in its free variables, known in Lua parlances as "upvalues." These free variables may not be global variables, and they must be scalars. Still, it's a decent model. Here's an example:
local function spawner(N, workers)
return function()
local luaproc = require 'luaproc'
for i = 1, N do
luaproc.send('source', i)
end
for i = 1, workers do
luaproc.send('source', nil)
end
end
end
This code is used as, e.g.,
assert(luaproc.newproc(spawner(randoms, workers)))
This call is how values randoms and workers are communicated from parent to child.
The assertion is essential here, as if you forget the rules and accidentally capture a table or a local function, luaproc.newproc will fail.
Once I understood these properties, luaproc did indeed work "out of the box", when downloaded from askyrme on github.
ETA: There is an annoying limitation: in some circumstances, calling fread() in one thread can prevent other threads from being scheduled. In particular, if I run the sequence
local file = io.popen(command, 'r')
local result = file:read '*a'
file:close()
return result
the read operation blocks all other threads. I don't know why this is---I assume it is some nonsense going on within glibc. The workaround I used was to call directly to read(2), which required a little glue code, but this works properly with io.popen and file:close().
There's one other limitation worth noting:
Unlike Tony Hoare's original conception of communicating sequential processing, and unlike most mature, serious implementations of synchronous message passing, luaproc does not allow a receiver to block on multiple channels simultaneously. This limitation is serious, and it rules out many of the design patterns that synchronous message-passing is good at, but it's still find for many simple models of parallelism, especially the "parbegin" sort that I needed to solve for my original problem.

multithreading: how to process data in a vector, while the vector is being populated?

I have a single-threaded linux app which I would like to make parallel. It reads a data file, creates objects, and places them in a vector. Then it calls a compute-intensive method (.5 second+) on each object. I want to call the method in parallel with object creation. While I've looked at qt and tbb, I am open to other options.
I planned to start the thread(s) while the vector was empty. Each one would call makeSolids (below), which has a while loop that would run until interpDone==true and all objects in the vector have been processed. However, I'm a n00b when it comes to threading, and I've been looking for a ready-made solution.
QtConcurrent::map(Iter begin,Iter end,function()) looks very easy, but I can't use it on a vector that's changing in size, can I? And how would I tell it to wait for more data?
I also looked at intel's tbb, but it looked like my main thread would halt if I used parallel_for or parallel_while. That stinks, since their memory manager was recommended (open cascade's mmgt has poor performance when multithreaded).
/**intended to be called by a thread
\param start the first item to get from the vector
\param skip how many to skip over (4 for 4 threads)
*/
void g2m::makeSolids(uint start, uint incr) {
uint curr = start;
while ((!interpDone) || (lineVector.size() > curr)) {
if (lineVector.size() > curr) {
if (lineVector[curr]->isMotion()) {
((canonMotion*)lineVector[curr])->setSolidMode(SWEPT);
((canonMotion*)lineVector[curr])->computeSolid();
}
lineVector[curr]->setDispMode(BEST);
lineVector[curr]->display();
curr += incr;
} else {
uio::sleep(); //wait a little bit for interp
}
}
}
EDIT: To summarize, what's the simplest way to process a vector at the same time that the main thread is populating the vector?
Firstly, to benefit from threading you need to find similarly slow tasks for each thread to do. You said your per-object processing takes .5s+, how long does your file reading / object creation take? It could easily be a tenth or a thousandth of that time, in which case your multithreading approach is going to produce neglegible benefit. If that's the case, (yes, I'll answer your original question soon incase it's not) then think about simultaneously processing multiple objects. Given your processing takes quite a while, the thread creation overhead isn't terribly significant, so you could simply have your main file reading/object creation thread spawn a new thread and direct it at the newly created object. The main thread then continues reading/creating subsequent objects. Once all objects are read/created, and all the processing threads launched, the main thread "joins" (waits for) the worker threads. If this will create too many threads (thousands), then put a limit on how far ahead the main thread is allowed to get: it might read/create 10 objects then join 5, then read/create 10, join 10, read/create 10, join 10 etc. until finished.
Now, if you really want the read/create to be in parallel with the processing, but the processing to be serialised, then you can still use the above approach but join after each object. That's kind of weird if you're designing this with only this approach in mind, but good because you can easily experiment with the object processing parallelism above as well.
Alternatively, you can use a more complex approach that just involves the main thread (that the OS creates when your program starts), and a single worker thread that the main thread must start. They should be coordinated using a mutex (a variable ensuring mutually-exclusive, which means not-concurrent, access to data), and a condition variable which allows the worker thread to efficiently block until the main thread has provided more work. The terms - mutex and condition variable - are the standard terms in the POSIX threading that Linux uses, so should be used in the explanation of the particular libraries you're interested in. Summarily, the worker thread waits until the main read/create thread broadcasts it a wake-up signal indicating another object is ready for processing. You may want to have a counter with index of the last fully created, ready-for-processing object, so the worker thread can maintain it's count of processed objects and move along the ready ones before once again checking the condition variable.
It's hard to tell if you have been thinking about this problem deeply and there is more than you are letting on, or if you are just over thinking it, or if you are just wary of threading.
Reading the file and creating the objects is fast; the one method is slow. The dependency is each consecutive ctor depends on the outcome of the previous ctor - a little odd - but otherwise there are no data integrity issues so there doesn't seem to be anything that needs to be protected by mutexes and such.
Why is this more complicated than something like this (in crude pseudo-code):
while (! eof)
{
readfile;
object O(data);
push_back(O);
pthread_create(...., O, makeSolid);
}
while(x < vector.size())
{
pthread_join();
x++;
}
If you don't want to loop on the joins in your main then spawn off a thread to wait on them by passing a vector of TIDs.
If the number of created objects/threads is insane, use a thread pool. Or put a counter is the creation loop to limit the number of threads that can be created before running ones are joined.
#Caleb: quite -- perhaps I should have emphasized active threads. The GUI thread should always be considered one.

Simple POSIX threads question

I have this POSIX thread:
void subthread(void)
{
while(!quit_thread) {
// do something
...
// don't waste cpu cycles
if(!quit_thread) usleep(500);
}
// free resources
...
// tell main thread we're done
quit_thread = FALSE;
}
Now I want to terminate subthread() from my main thread. I've tried the following:
quit_thread = TRUE;
// wait until subthread() has cleaned its resources
while(quit_thread);
But it does not work! The while() clause does never exit although my subthread clearly sets quit_thread to FALSE after having freed its resources!
If I modify my shutdown code like this:
quit_thread = TRUE;
// wait until subthread() has cleaned its resources
while(quit_thread) usleep(10);
Then everything is working fine! Could someone explain to me why the first solution does not work and why the version with usleep(10) suddenly works? I know that this is not a pretty solution. I could use semaphores/signals for this but I'd like to learn something about multithreading, so I'd like to know why my first solution doesn't work.
Thanks!
Without a memory fence, there is no guarantee that values written in one thread will appear in another. Most of the pthread primitives introduce a barrier, as do several system calls such as usleep. Using a mutex around both the read and write introduces a barrier, and more generally prevents multi-byte values being visible in partially written state.
You also need to separate the idea of asking a thread to stop executing, and reporting that it has stopped, and appear to be using the same variable for both.
What's most likely to be happening is that your compiler is not aware that quit_thread can be changed by another thread (because C doesn't know about threads, at least at the time this question was asked). Because of that, it's optimising the while loop to an infinite loop.
In other words, it looks at this code:
quit_thread = TRUE;
while(quit_thread);
and thinks to itself, "Hah, nothing in that loop can ever change quit_thread to FALSE, so the coder obviously just meant to write while (TRUE);".
When you add the call to usleep, the compiler has another think about it and assumes that the function call may change the global, so it plays it safe and doesn't optimise it.
Normally you would mark the variable as volatile to stop the compiler from optimising it but, in this case, you should use the facilities provided by pthreads and join to the thread after setting the flag to true (and don't have the sub-thread reset it, do that in the main thread after the join if it's necessary). The reason for that is that a join is likely to be more efficient than a continuous loop waiting for a variable change since the thread doing the join will most likely not be executed until the join needs to be done.
In your spinning solution, the joining thread will most likely continue to run and suck up CPU grunt.
In other words, do something like:
Main thread Child thread
------------------- -------------------
fStop = false
start Child Initialise
Do some other stuff while not fStop:
fStop = true Do what you have to do
Finish up and exit
join to Child
Do yet more stuff
And, as an aside, you should technically protect shared variables with mutexes but this is one of the few cases where it's okay, one-way communication where half-changed values of a variable don't matter (false/not-false).
The reason you normally mutex-protect a variable is to stop one thread seeing it in a half-changed state. Let's say you have a two-byte integer for a count of some objects, and it's set to 0x00ff (255).
Let's further say that thread A tries to increment that count but it's not an atomic operation. It changes the top byte to 0x01 but, before it gets a chance to change the bottom byte to 0x00, thread B swoops in and reads it as 0x01ff.
Now that's not going to be very good if thread B want to do something with the last element counted by that value. It should be looking at 0x0100 but will instead try to look at 0x01ff, the effect of which will be wrong, if not catastrophic.
If the count variable were protected by a mutex, thread B wouldn't be looking at it until thread A had finished updating it, hence no problem would occur.
The reason that doesn't matter with one-way booleans is because any half state will also be considered as true or false so, if thread A was halfway between turning 0x0000 into 0x0001 (just the top byte), thread B would still see that as 0x0000 (false) and keep going (until thread A finishes its update next time around).
And if thread A was turning the boolean into 0xffff, the half state of 0xff00 would still be considered true by thread B so it would do its thing before thread A had finished updating the boolean.
Neither of those two possibilities is bad simply because, in both, thread A is in the process of changing the boolean and it will finish eventually. Whether thread B detects it a tiny bit earlier or a tiny bit later doesn't really matter.
The while(quite_thread); is using the value quit_thread was set to on the line before it. Calling a function (usleep) induces the compiler to reload the value on each test.
In any case, this is the wrong way to wait for a thread to complete. Use pthread_join instead.
You're "learning" multhithreading the wrong way. The right way is to learn to use mutexes and condition variables; any other solution will fail under some circumstances.

Resources