Perl ithreads: Do some math instead of sleeping - multithreading

I am using perl ithreads and things work fine, unless I decide to have threads sleep.
Lets say my routine thread_job is passed as an entry for several threads to start running concurrently.
thread_job()
{
...
sleep 2;
#do other stuff here
}
If I dont have a sleep I have no issues with the threads running and they do their tasks fine. If I add a sleep, my script hangs. I am running this off a windows command prompt, if that helps.
Since I do need to sleep and Im guessing there's an issue with using this sleep on my current setup, I intend to have the thread do something, for a while, instead of sleeping. Is there any such mathematical operation which I could perform?

Try using Win32::Sleep instead. (Note that it takes milliseconds as an argument, not seconds.)

Calling sleep() blocks the entire process (that is all the threads).
You can instead block a single thread by calling select(). Do something like this:
thread_job() {
...
$delay = 2;
select(undef, undef, undef, $delay);
...
}

Related

Threads in Perl: order of detach()

I am trying to get into Perl's use of threads. Reading the documentation I came across the following code:
use threads;
my $thr = threads->create(\&sub1); # Spawn the thread
$thr->detach(); # Now we officially don't care any more
sleep(15); # Let thread run for awhile
sub sub1 {
my $count = 0;
while (1) {
$count++;
print("\$count is $count\n");
sleep(1);
}
}
The goal, it seems, would be to create one thread running sub1 for 15 seconds, and in the mean time print some strings. However, I don't think I understand what's going on at the end of the programme.
First of all, detach() is defined as follows:
Once a thread is detached, it'll run until it's finished; then Perl
will clean up after it automatically.
However, when does the subroutine finish? while(1) never finishes. Nor do I find any information in sleep() that it'd cause to break a loop. On top of that, from the point we detach we are 'waiting for the script to finish and then clean it up' for 15 seconds, so if we are waiting for the subroutine to finish, why do we need sleep() in the main script? The position is awkward to me; it suggests that the main programme sleeps for 15 seconds. But what is the point of that? The main programme (thread?) sleeps while the sub-thread keeps running, but how is the subroutine then terminated?
I guess the idea is that after sleep-ing is done, the subroutine ends, after which we can detach/clean up. But how is this syntactically clear? Where in the definition of sleep is it said that sleep terminates a subroutine (and why), and how does it know which one to terminate in case there are more than one threads?
All threads end when the program ends. The program ends when the main thread ends. The sleep in the main thread is merely keeping the program running a short time, after which the main thread (therefore the program, therefore all created threads) also end.
So what's up with detach? It just says "I'm never going to bother joining to this thread, and I don't care what it returns". If you don't either detach a thread or join to it, you'd get a warning when the program ends.
detach a thread means "I don't care any more", and that does actually mean when your process exits, the thread will error and terminate.
Practically speaking - I don't think you ever want to detach a thread in perl - just add a join at the end of your code, so it can exit cleanly, and signal it via a semaphore or Thread::Queue in order to terminate.
$_ -> join for threads -> list;
Will do the trick.
That code example - in my opinion - is a bad example. It's just plain messy to sleep so a detached thread has a chance to complete, when you could just join and know that it's finished. This is especially true of perl threads, which it's deceptive to assume they're lightweight, and so can be trivially started (and detached). If you're ever spawning enough that the overhead of joining them is too high, then you're using perl threads wrong, and probably should fork instead.
You're quite right - the thread will never terminate, and so you code will always have a 'dirty' exit.
So instead I'd rewrite:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
my $run : shared;
$run = 1;
sub sub1 {
my $count = 0;
while ($run) {
$count++;
print("\$count is $count\n");
sleep(1);
}
print "Terminating\n";
}
my $thr = threads->create( \&sub1 ); # Spawn the thread
sleep(15); # Let thread run for awhile
$run = 0;
$thr->join;
That way your main signals the thread to say "I'm done" and waits for it to finish it's current loop.

Do we need a sleep() while running a forever process in Linux?

I have read that a forever process like daemon should run with a sleep() in their while(1) or for(;;) loop. They say, it is required because otherwise this process will always be in a run queue and the kernel will always run it. This will block the other process. I don't agree that it will block the other process completely. If there is a time slicing, then it will execute other process. But, certainly it will steal a time from others. Making a delay for other process since this process is always in the run state. By default, the Linux runs as a round-robin. The first task is swapd, then other tasks . This is a circular link list with first task as swapd(process-id is 0) and then other tasks. I believe this is still based as time sliced. A particular time for each process. These tasks are nothing but the process-descriptor. I believe this link list is maintained by the init process. Please do correct me here If I am wrong. Other question is if we need to give a sleep() then what should be its value? How can we determine the sleep value to get the best results?
If your program has useful things to do, don't throttle it. A program can move out of the run queue by doing blocking stuff like IO and waiting.
If you are writing a polling loop that can spin an arbitrary number of times you probably want to throttle it a bit with sleep because spinning too often has little value.
That said, polling loops are a means of last resort. Normally, programs perform useful work with every instruction, so they don't sleep at all.
Sleep is almost certainly the wrong solution.
Usually what you do it call a blocking function which wakes you up when there's something for you to do.
For example, if you're a network service you'd want to remain inactive until a request arrives.
In other words, the core of your daemon should not look like this:
while(1)
{
if (checkIfSomethingToDo())
doSomething();
else
sleep(1);
}
but rather a little like this:
while(1)
{
int ret = poll(fds, nfds, -1);
if (ret > 0)
doSomething();
}
Have the kernel put you to sleep until there's actual work to do. It's not hard to implement, you'd be a lot more efficient (not stealing CPU time from others, only to waste it doing no actual work) and your response latency will go down too.
A sleep forces the os to pass execution to another thread and therefore is helpfull, or at least fair. Start with sleep one. Should be ok.

perl : Threads do not die when main process dies

I have a perl program, that spawns several threads. Each thread processees some task (by firing off other system commands etc) and then when its all done, Waits.
Once all threads are done, they fire a signal to Parent process. The parent then loads up new jobs, and signals the threads to go work on these new tasks.
So ideally, this program, would run forever.
Now, if I kill it in command line with kill -9 MainProgram.pl, its not killed! I see the output of the jobs the threads are running, and then I also see that after they are done, they getnew jobs and just go on and on...
I am absolutely confounded. If I do a kill -9 MainProgram.pl, it is supposed to kill all threads it owns, right?
Regardless of what the threads are out doing....
And even if the threads are doing I/O and so they wait for the IO to get done...I would expect the thread to die after its current task is done..but clearly, Main is reloading jobs too, as threads just keep continuing...
Is this kind of behaviour seen in perl ?
EDIT: Some of the code in mainProgram.pl
use threads;
use threads::shared;
for (my $count = 0; $count <= $threadNum-1; $count++) {
$t = threads->new(\&handleEvent, $count) ;
push(#threads, $t);
}
#Parent thread:
while(1) {
lock($parentSignal);
cond_wait($parentSignal);
getEvents();
while(#eventCount== 0){
sleep($parent_sleep_time);
getEvents(); #Try to get events again until you get some new stuff to process
}
cond_broadcast($threadsDone); # threadsgo work on this
}
Thanks
From what I understand, you're supposed to either join() or detach() on all threads prior to exiting.
From the POD:
If the program exits without all threads having either been joined or
detached, then a warning will be issued.
Source: http://metacpan.org/pod/threads

Perl Threads - Capture Exit

I have code that spawns two threads. The first is a system command which launches an application. The second monitors the program. I'm new to perl threads so I have a few questions...
my $thr1 = threads->new(system($cmd));
sleep(FIVEMINUTES);
my $thr2 = threads->new(\&check);
my $rth1 = $thr1->join();
my $rth2 = $thr2->join();
1) Do I need a second thread to monitor the program? You can think of my sub routine call to &check as a infinite while loop which checks a text file for stuff the application produces. Could I just do this:
my $thr1 = threads->new(system($cmd));
sleep(FIVEMINUTES);
&check;
2) I'm trying to figure out what my parent is doing after I run this code. So after I launch line 1 it will spawn that new thread, sleep, then spawn that second thread and then sit at that first join and wait. It will not execute the rest of my code until it joins at that first join. Is this correct or am I wrong? If I am wrong, then how does it work?
3) My first thread the one that launches the application can be killed unexpectedly. when this happens, I have nothing to catch that and kill the threads. It just says:
"Thread 1 terminated abnormally: Undefined subroutine &main::65280 called at myScript.pl line 109." and then hangs there.
What could I do to get it to end the other threads? I need it to send an email before the program ends as well which I can do by just calling &email (another subroutine I made).
Thanks
First of all,
my $thr1 = threads->new(system($cmd));
should be
my $thr1 = threads->new(sub { system($cmd) });
or simply
my $thr1 = async { system($cmd) };
You don't need to start a third thread. As you suspected, the main thread and the one executing system are sufficient.
What if the command finishes executing in less than five minutes? The following replaces sleep with a signal.
use threads;
use threads::shared;
my $done :shared = 0;
my $thr1 = async {
system($cmd);
lock($done);
$done = 1;
cond_signal($done);
};
{ # Wait up to $timeout for the thread to end.
lock($done);
my $timeout = time() + 5*60;
1 while !$done && cond_timedwait($done, $timeout);
if (!$done) {
... there was a timeout ...
}
}
$thr1->join();
In 2004-2006 I had the same challenges for 24/7 running perl app on Winblows... The only approach working was to use xml state files on disk to communicate the status of each component of the system... and make sure if threads are used every stat file handling occurred within a closure code block {} (big gotcha) The app ran at least 3 years on 100 machines 24/7 without errors ...
If you are on a Unix-like OS I would suggest to use forks and interprocess communication.
Use cpan modules, do not reinvent the wheel..
Multithreading in Perl is a little hard to deal with, I would suggest using the fork() commands instead. I will attempt to answer your questions to the best of my ability.
1) It seems to me like two threads/processes are the way to go here, as you need to check asynchronously check your data.
2) Your parent works exactly as you describe.
3) The reason for your thread hanging could be that you never terminate your second thread. You said it was an infinite loop, is there any exit condition?

Perl parallel HTTP requests - out of memory

First of all, I'm new to Perl.
I want to make multiple (e.g. 160) HTTP GET requests on a REST API in Perl. Executing them one after another takes much time, so I was thinking of running the requests in parallel. Therefore I used threads to execute more requests at the same time and limited the number of parallel requests to 10.
This worked just fine for the first time I ran the program, the second time I ran 'out of memory' after the 40th request.
Here's the code: (#urls contains the 160 URLs for the requests)
while(#urls) {
my #threads;
for (my $j = 0; $j < 10 and #urls; $j++) {
my $url = shift(#urls);
push #threads, async { $ua->get($url) };
}
for my $thread (#threads) {
my $response = $thread->join;
print "$response\n";
}
}
So my question is, why am I NOT running out of memory the first time but the second time (am I missing something crucial in my code)? And what can I do to prevent it?
Or is there a better way of executing parallel GET requests?
I'm not sure why you would get a OOM error on a second run when you don't get one on the first run; when you run a Perl script and the perl binary exits, it'll release all of it's memory back to the OS. Nothing is kept between executions. Is the exact same data being returned by the REST service each time? Maybe there's more data the second time you run and it's pushing you over the edge.
One problem I notice is that you're launching 10 threads and running them to completion, then spawning 10 more threads. A better solution may be a worker-thread model. Spawn 10 threads (or however many you want) at the start of the program, put the URLs into a queue, and allow the threads to process the queue themselves. Here's a quick example that may help:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new();
my #thr = map {
threads->create(sub {
my #responses = ();
while (defined (my $url = $q->dequeue())) {
push #responses, $ua->get($url);
}
return #responses;
});
} 1..10;
$q->enqueue($_) for #urls;
$q->enqueue(undef) for 1..10;
foreach (#thr) {
my #responses_of_this_thread = $_->join();
print for #responses_of_this_thread;
}
Note, I haven't tested this to make sure it works. In this example, you create a new thread queue and spawn up 10 worker threads. Each thread will block on the dequeue method until there is something to be read. Next, you queue up all the URLs that you have, and an undef for each thread. The undef will allow the threads to exit when there is no more work to perform. At this point, the threads will go through and process the work, and you will gather the responses via the join at the end.
Whenever I need an asynchronous solution Perl, I first look at the POE framework. In this particular case I used POE HTTP Request module that will allow us to send multiple requests simultaneously and provide a callback mechanism where you can process your http responses.
Perl threads are scary and can crash your application, especially when you join or detach them. If responses do not take a long time to process, a single threaded POE solution would work beautifully.
Sometimes though, we have to a rely on threading because application gets blocked due to long running tasks. In those cases, I create a certain number of threads BEFORE initiating anything in the application. Then with Thread::Queue I pass the data from the main thread to these workers AND never join/detach them; always keep them around for stability purposes.
(Not an ideal solution for every case.)
POE supports threads now and each thread can run a POE::Kernel. The kernels can communicate with each other through TCP sockets (which POE provides nice unblocking interfaces).

Resources