The memory consumption of the following code increases in the course of its execution.
What could be going wrong ? Is there something else I need to do to exit cleanly from the thread ?
#!/usr/bin/perl -w
use strict;
my ($i,$URL);
my #Thread;
my $NUM_THREADS=4;
my #response:shared =();
while(1)
{
for($i=0;$i<$NUM_THREADS;$i++)
{
if( $response[$i] is processed)
{
$URL=FindNextURL();
$Thread[$i]=new threads \&Get,$i,$URL;
$Thread[$i]->detach();
}
}
# wait for atleast one $response[$i]
# if ready process it
}
sub Get
{
my $i=$_[0];
my $URL=$_[1];
$response[$i]=FetchURL($URL);
return;
}
from http://perldoc.perl.org/threads.html
"On most systems, frequent and continual creation and destruction of threads can lead to ever-increasing growth in the memory footprint of the Perl interpreter. While it is simple to just launch threads and then ->join() or ->detach() them, for long-lived applications, it is better to maintain a pool of threads, and to reuse them for the work needed, using queues to notify threads of pending work. The CPAN distribution of this module contains a simple example (examples/pool_reuse.pl) illustrating the creation, use and monitoring of a pool of reusable threads."
Please try to have a poll of threads.
Related
I got array with [a-z,A-Z] ASCII numbers like so: my #alphabet = (65..90,97..122);
So main thread functionality is checking each character from alphabet and return string if condition is true.
Simple example :
my #output = ();
for my $ascii(#alphabet){
thread->new(\sub{ return chr($ascii); });
}
I want to run thread on every ASCII number, then put letter from thread function into array in the correct order.
So in out case array #output should be dynamic and contain [a..z,A-Z] after all threads finish their job.
How to check, is all threads is done and keep the order?
You're looking for $thread->join, which waits for a thread to finish. It's documented here, and this SO question may also help.
Since in your case it looks like the work being done in the threads is roughly equal in cost (no thread is going to take a long time more than any other), you can just join each thread in order, like so, to wait for them all to finish:
# Store all the threads for each letter in an array.
my #threads = map { thread->new(\sub{ return chr($_); }) } #alphabet;
my #results = map { $_->join } #threads;
Since, when the first thread returns from join, the others are likely already done and just waiting for "join" to grab their return code, or about to be done, this gets you pretty close to "as fast as possible" parallelism-wise, and, since the threads were created in order, #results is ordered already for free.
Now, if your threads can take variable amounts of time to finish, or if you need to do some time-consuming processing in the "main"/spawning thread before plugging child threads' results into the output data structure, joining them in order might not be so good. In that case, you'll need to somehow either: a) detect thread "exit" events as they happen, or b) poll to see which threads have exited.
You can detect thread "exit" events using signals/notifications sent from the child threads to the main/spawning thread. The easiest/most common way to do that is to use the cond_wait and cond_signal functions from threads::shared. Your main thread would wait for signals from child threads, process their output, and store it into the result array. If you take this approach, you should preallocate your result array to the right size, and provide the output index to your threads (e.g. use a C-style for loop when you create your threads and have them return ($result, $index_to_store) or similar) so you can store results in the right place even if they are out of order.
You can poll which threads are done using the is_joinable thread instance method, or using the threads->list(threads::joinable) and threads->list(threads::running) methods in a loop (hopefully not a busy-waiting one; adding a sleep call--even a subsecond one from Time::HiRes--will save a lot of performance/battery in this case) to detect when things are done and grab their results.
Important Caveat: spawning a huge number of threads to perform a lot of work in parallel, especially if that work is small/quick to complete, can cause performance problems, and it might be better to use a smaller number of threads that each do more than one "piece" of work (e.g. spawn a small number of threads, and each thread uses the threads::shared functions to lock and pop the first item off of a shared array of "work to do" and do it rather than map work to threads as 1:1). There are two main performance problems that arise from a 1:1 mapping:
the overhead (in memory and time) of spawning and joining each thread is much higher than you'd think (benchmark it on threads that don't do anything, just return, to see). If the work you need to do is fast, the overhead of thread management for tons of threads can make it much slower than just managing a few re-usable threads.
If you end up with a lot more threads than there are logical CPU cores and each thread is doing CPU-intensive work, or if each thread is accessing the same resource (e.g. reading from the same disks or the same rows in a database), you hit a performance cliff pretty quickly. Tuning the number of threads to the "resources" underneath (whether those are CPUs or hard drives or whatnot) tends to yield much better throughput than trusting the thread scheduler to switch between many more threads than there are available resources to run them on. The reasons this is slow are, very broadly:
Because the thread scheduler (part of the OS, not the language) can't know enough about what each thread is trying to do, so preemptive scheduling cannot optimize for performance past a certain point, given that limited knowledge.
The OS usually tries to give most threads a reasonably fair shot, so it can't reliably say "let one run to completion and then run the next one" unless you explicitly bake that into the code (since the alternative would be unpredictably starving certain threads for opportunities to run). Basically, switching between "run a slice of thread 1 on resource X" and "run a slice of thread 2 on resource X" doesn't get you anything once you have more threads than resources, and adds some overhead as well.
TL;DR threads don't give you performance increases past a certain point, and after that point they can make performance worse. When you can, reuse a number of threads corresponding to available resources; don't create/destroy individual threads corresponding to tasks that need to be done.
Building on Zac B's answer, you can use the following if you want to reuse threads:
use strict;
use warnings;
use Thread::Pool::Simple qw( );
$| = 1;
my $pool = Thread::Pool::Simple->new(
do => [ sub {
select(undef, undef, undef, (200+int(rand(8))*100)/1000);
return chr($_[0]);
} ],
);
my #alphabet = ( 65..90, 97..122 );
print $pool->remove($_) for map { $pool->add($_) } #alphabet;
print "\n";
The results are returned in order, as soon as they become available.
I'm the author of Parallel::WorkUnit so I'm partial to it. And I thought adding ordered responses was actually a great idea. It does it with forks, not threads, because forks are more widely supported and they often perform better in Perl.
my $wu = Parallel::WorkUnit->new();
for my $ascii(#alphabet){
$wu->async(sub{ return chr($ascii); });
}
#output = $wu->waitall();
If you want to limit the number of simultaneous processes:
my $wu = Parallel::WorkUnit->new(max_children => 5);
for my $ascii(#alphabet){
$wu->queue(sub{ return chr($ascii); });
}
#output = $wu->waitall();
I've got a perl program that trying to do conversion of a bunch of files from one format to another (via a command-line tool). It works fine, but too slow as it's converting the files one and a time.
I researched and utilize the fork() mechanism trying to spawn off all the conversion as child-forks hoping to utilize the cpu/cores.
Coding is done and tested, it does improve performance, but not to the way I expected.
When looking at /proc/cpuinfo, I have this:
> egrep -e "core id" -e ^physical /proc/cpuinfo|xargs -l2 echo|sort -u
physical id : 0 core id : 0
physical id : 0 core id : 1
physical id : 0 core id : 2
physical id : 0 core id : 3
physical id : 1 core id : 0
physical id : 1 core id : 1
physical id : 1 core id : 2
physical id : 1 core id : 3
That means I have 2 CPU and quad-core each? If so, I should able to fork out 8 forks and supposingly I should able to make a 8-min job (1min per file, 8 files) to finish in 1-min (8 forks, 1 file per fork).
However, when I test run this, it still take 4-min to finish. It appears like it only utilized 2 CPUs, but not the cores?
Hence, my question is:
Is it true that perl's fork() only parallel it based on CPUs, but not cores? Or maybe I didn't do it right? I'm simply using fork() and wait(). Nothing special.
I'd assume perl's fork() should be using cores, is there a simple bash/perl that I can write to prove my OS (i.e. RedHat 4) nor Perl is the culprit for such symptom?
To Add:
I even tried running the following command multiple times to simulate multiple processing and monitor htop.
while true; do echo abc >>devnull; done &
Somehow htop is telling me I've got 16 cores? and then when I spawn 4 of the above while-loop, I see 4 of them utilizing ~100% cpu each. When I spawn more, all of them start reducing the cpu utilization percentage evenly. (e.g. 8 processing, see 8 bash in htop, but using ~50% each) Does this mean something?
Thanks ahead. I tried google around but not able to find an obvious answer.
Edit: 2016-11-09
Here is the extract of perl code. I'm interested to see what I did wrong here.
my $maxForks = 50;
my $forks = 0;
while(<CIFLIST>) {
extractPDFByCIF($cifNumFromIndex, $acctTypeFromIndex, $startDate, $endDate);
}
for (1 .. $forks) {
my $pid = wait();
print "Child fork exited. PID=$pid\n";
}
sub extractPDFByCIF {
# doing SQL constructing to for the $stmt to do a DB query
$stmt->execute();
while ($stmt->fetch()) {
# fork the copy/afp2web process into child process
if ($forks >= $maxForks) {
my $pid = wait();
print "PARENTFORK: Child fork exited. PID=$pid\n";
$forks--;
}
my $pid = fork;
if (not defined $pid) {
warn "PARENTFORK: Could not fork. Do it sequentially with parent thread\n";
}
if ($pid) {
$forks++;
print "PARENTFORK: Spawned child fork number $forks. PID=$pid\n";
}else {
print "CHILDFORK: Processing child fork. PID=$$\n";
# prevent child fork to destroy dbh from parent thread
$dbh->{InactiveDestroy} = 1;
undef $dbh;
# perform the conversion as usual
if($fileName =~ m/.afp/){
system("file-conversion -parameter-list");
} elsif($fileName =~ m/.pdf/) {
system("cp $from-file $to-file");
} else {
print ERRORLOG "Problem happened here\r\n";
}
exit;
}
# end forking
$stmt->finish();
close(INDEX);
}
fork() spawns a new process - identical to, and with the same state as the existing one. No more, no less. The kernel schedules it and runs it wherever.
If you do not get the results you're expecting, I would suggest that a far more likely limiting factor is that you are reading files from your disk subsystem - disks are slow, and contending for IO isn't actually making them any faster - if anything the opposite, because it forces additional drive seeks and less easy caching.
So specifically:
1/ No, fork() does nothing more than clone your process.
2/ Largely meaningless unless you want to rewrite most of your algorithm as a shell script. There's no real reason to think that it'll be any different though.
To follow on from your edit:
system('file-conversion') looks an awful lot like an IO based process, which will be limited by your disk IO. As will your cp.
Have you considered Parallel::ForkManager which greatly simplifies the forking bit?
As a lesser style point, you should probably use 3 arg 'open'.
#!/usr/bin/env perl
use strict;
use warnings;
use Parallel::ForkManager;
my $maxForks = 50;
my $manager = Parallel::ForkManager->new($maxForks);
while ($ciflist) {
## do something with $_ to parse.
##instead of: extractPDFByCIF($cifNumFromIndex, $acctTypeFromIndex, $startDate, $endDate);
# doing SQL constructing to for the $stmt to do a DB query
$stmt->execute();
while ( $stmt->fetch() ) {
# fork the copy/afp2web process into child process
$manager->start and next;
print "CHILDFORK: Processing child fork. PID=$$\n";
# prevent child fork to destroy dbh from parent thread
$dbh->{InactiveDestroy} = 1;
undef $dbh;
# perform the conversion as usual
if ( $fileName =~ m/.afp/ ) {
system("file-conversion -parameter-list");
} elsif ( $fileName =~ m/.pdf/ ) {
system("cp $from-file $to-file");
} else {
print ERRORLOG "Problem happened here\r\n";
}
# end forking
$manager->finish;
}
$stmt->finish();
}
$manager->wait_all_children;
Your goal is to parallelize your application in a way that is using multiple cores as independent resources. What you want to achieve is multi-threading, in particular Perl's ithreads that are using calls to the fork() function of the underlying system (and are heavy-weight for that reason). You can teach the Perl way of multi-threading to yourself from perlthrtut. Quote from perlthrtut:
When a new Perl thread is created, all the data associated with the current thread is copied to the new thread, and is subsequently private to that new thread! This is similar in feel to what happens when a Unix process forks, except that in this case, the data is just copied to a different part of memory within the same process rather than a real fork taking place.
That being said, regarding your questions:
You're not doing it right (sorry). [see my comment...] With multi-threading you don't need to call fork() by yourself for that, but Perl will do it for you.
You can check whether your Perl interpreter has been built with thread support e.g. by perl -V (note the capital V) and looking at the messages. If there is nothing to see about threads then your Perl interpreter is not capable of Perl-multithreading.
The reason that your application is already faster even with only one CPU core by using fork() is likely that while one process has to wait for slow resources such as the file system another process can use the same core as a computation resource in the meantime.
I have 100+ tasks to do, I can do it in a loop, but that will be slow
I want to do these jobs by threading, let's say, 10 threads
There is no dependency between the jobs, each can run independently, and stop if failed
I want these threads to pick up my jobs and do it, there should be no more than 10 threads in total, otherwise it may harm the server
These threads keep doing the jobs until all finished
Stop the job in the thread when timeout
I was searching information about this on the Internet, Threads::Pool, Threads::Queue...
But I can't be sure on which one is better for my case. Could anyone give me some advise?
You could use Thread::Queue and threads.
The IPC (communication between threads) is much easier tan between processes.
To fork or not to fork?
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
my #thrs = threads->create(sub {
while (my $item = $q->dequeue()) {
# Do work on $item
}
})->detach() for 1..10;#for 10 threads
my $dbh = ...
while (1){
#get items from db
my #items = get_items_from_db($dbh);
# Send work to the thread
$q->enqueue(#items);
print "Pending items: "$q->pending()."\n";
sleep 15;#check DB in every 15 secs
}
I'd never use perl threads. The reason is that they aren't conceptually speaking threads: you have to specify what data is to be shared between the threads. Each thread runs a perl interpreter. That's why they are called interpreterthreads or ithreads. Needless to say, this consumes alot of memory all for running things in parallel. fork() shares al the memory up until the fork point. So if they are independent tasks, always use fork. It's also the most Unix way of doing things.
Using module_init I have created and woken up a kthread. In order to keep it alive and also do my function task, I used the following approach. That was the only approach I could make it running since I am changing the flag in an interrupt. Now I am facing an unbelievably drop in the performance of the code. I narrowed down a problem to the following piece of code:
while(1){
//Do my tasks here after changing flag
while(get_flag() ){ //Waiting for a flag, to basically do my Func in the previous line.
schedule();
}
}//to keep a kthread alive after initial create.
Details about dropping the performance: without using the second while(1) which includes schedule, the rate of data transmission in my code is 35MB/s but with this little line, it drops to 5MB/s.
Is there any other way that I can make a kthread sleep and wait for a flag change?
Ideally, This is not the way you should do this in Kernel. But if you have to do it this way.
See if you are doing a blocking check for the flag? If that is the case, change it to non-blocking wait, just check for the flag and schedule that should be enough in most of the cases. The scheduling algorithm will make sure to get the fair share of CPU for all the processes. Also, if you are doing a blocking check for flag you are unnecessarily wasting CPU cycles since you are doing the processing only on the next scheduler slice. with the same logic, if you want to get better performance, you should wake up your waiting process from your producer thread with wakeup_task()
-or-
if you just want to achieve the functionality, I feel the right way to do it is the following method. using a wait queue, wait_even_interruptible() and wake_up_interruptible()
From your above said kernel thread you just need to call the wait_event_interruptible
see the pseudo code below
while (1){
wait_event_interruptible(wq, your_flag)
{
<do your task>
}
}
and from the place you are setting the flag
{
<some event>
<set flag>
wake_up_interruptible (wq)
}
You don't have to call the schedule explicitly.
First of all, I'm new to Perl.
I want to make multiple (e.g. 160) HTTP GET requests on a REST API in Perl. Executing them one after another takes much time, so I was thinking of running the requests in parallel. Therefore I used threads to execute more requests at the same time and limited the number of parallel requests to 10.
This worked just fine for the first time I ran the program, the second time I ran 'out of memory' after the 40th request.
Here's the code: (#urls contains the 160 URLs for the requests)
while(#urls) {
my #threads;
for (my $j = 0; $j < 10 and #urls; $j++) {
my $url = shift(#urls);
push #threads, async { $ua->get($url) };
}
for my $thread (#threads) {
my $response = $thread->join;
print "$response\n";
}
}
So my question is, why am I NOT running out of memory the first time but the second time (am I missing something crucial in my code)? And what can I do to prevent it?
Or is there a better way of executing parallel GET requests?
I'm not sure why you would get a OOM error on a second run when you don't get one on the first run; when you run a Perl script and the perl binary exits, it'll release all of it's memory back to the OS. Nothing is kept between executions. Is the exact same data being returned by the REST service each time? Maybe there's more data the second time you run and it's pushing you over the edge.
One problem I notice is that you're launching 10 threads and running them to completion, then spawning 10 more threads. A better solution may be a worker-thread model. Spawn 10 threads (or however many you want) at the start of the program, put the URLs into a queue, and allow the threads to process the queue themselves. Here's a quick example that may help:
use strict;
use warnings;
use threads;
use Thread::Queue;
my $q = Thread::Queue->new();
my #thr = map {
threads->create(sub {
my #responses = ();
while (defined (my $url = $q->dequeue())) {
push #responses, $ua->get($url);
}
return #responses;
});
} 1..10;
$q->enqueue($_) for #urls;
$q->enqueue(undef) for 1..10;
foreach (#thr) {
my #responses_of_this_thread = $_->join();
print for #responses_of_this_thread;
}
Note, I haven't tested this to make sure it works. In this example, you create a new thread queue and spawn up 10 worker threads. Each thread will block on the dequeue method until there is something to be read. Next, you queue up all the URLs that you have, and an undef for each thread. The undef will allow the threads to exit when there is no more work to perform. At this point, the threads will go through and process the work, and you will gather the responses via the join at the end.
Whenever I need an asynchronous solution Perl, I first look at the POE framework. In this particular case I used POE HTTP Request module that will allow us to send multiple requests simultaneously and provide a callback mechanism where you can process your http responses.
Perl threads are scary and can crash your application, especially when you join or detach them. If responses do not take a long time to process, a single threaded POE solution would work beautifully.
Sometimes though, we have to a rely on threading because application gets blocked due to long running tasks. In those cases, I create a certain number of threads BEFORE initiating anything in the application. Then with Thread::Queue I pass the data from the main thread to these workers AND never join/detach them; always keep them around for stability purposes.
(Not an ideal solution for every case.)
POE supports threads now and each thread can run a POE::Kernel. The kernels can communicate with each other through TCP sockets (which POE provides nice unblocking interfaces).