It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
I'm very curious in what multithreading is. I have heard the name battered around here and there in answers I have received on StackOverflow but I have no idea what it is, so my main two questions being what is it and how can I benefit from it?
EDIT:
Ok, since the first question didn't really get the response I was looking for I'll go with this..
I have never heard of 'threading' even in other languages. This is an example I have found on the internet:
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
print "Starting main program\n";
my #threads;
for ( my $count = 1; $count <= 10; $count++) {
my $t = threads->new(\&sub1, $count);
push(#threads,$t);
}
foreach (#threads) {
my $num = $_->join;
print "done with $num\n";
}
print "End of main program\n";
sub sub1 {
my $num = shift;
print "started thread $num\n";
sleep $num;
print "done with thread $num\n";
return $num;
}
I cant seem to understand what it is that it is doing. Could anyone shine any light?
Regards,
Phil
Threading is a way of having more than one thing happen at the same time, at least conceptually speaking (on a single-core single-CPU computer, perhaps with an ARM or Atom, there's only one thread of execution at a time).
The Perl example launches ten different sections of code simultaneously. Each chunk of code only has to take care of what it is doing, and doesn't have to worry about anything else. This means you can get the output of the Perl program with one simple subroutine called ten times in a reasonably simple fashion.
One use is in programs that interact with the user. It's typically necessary to have a responsive interface along with things happening behind the scenes. It's hard to do this in one lump of code, so often the interface will be in one thread and the behind-the-scenes stuff in other threads, so that the interface can be snappy and the background tasks running.
If you're familiar with running multiple processes, threads are very similar, although they're more closely connected. This means it's easier to set up and communicate between threads, but it's also easy to mess up by not allowing for all possible orders of execution.
It is like forking, but lighter weight and it is easier to share data between threads than between processes. The downside is that, because of the sharing of data, it is easier to write buggy code (thread a modifies something thread b thought should stay the same, thread a gets a lock on resource c and tries to get a lock on resource d, but thread b has a lock on resource d and is trying to get a lock on resource c, etc.).
Related
I am using EC2 amazon server to perform data processing of 63 files,
the server i am using has 16 core but using perl Parallel::ForkManager with number of thread = number of core then it seems like half the core are sleeping and the working core are not at 100% and fluctuate around 25%~50%
I also checked IO and it is mostly iddling.
use Sys::Info;
use Sys::Info::Constants qw( :device_cpu );
my $info = Sys::Info->new;
my $cpu = $info->device( CPU => %options );
use Parallel::ForkManager;
my $manager=new Parallel::ForkManager($cpu->count);
for($i=0;$i<=$#files_l;$i++)
{
$manager->start and next;
do_stuff($files_l[$i]);
$manager->finish;
}
$manager->wait_all_children;
The short answer is - we can't tell you, because it depends entirely on what 'do_stuff' is doing.
The major reasons why parallel code doesn't create linear speed increases are:
Process creation overhead - some 'work' is done to spawn a process, so if the children are trivially small, that 'wastes' effort.
Contented resources - the most common is disk IO, but things like file locks, database handles, sockets, or interprocess communication can also play a part.
something else causing a 'back off' that stalls a process.
And without knowing what 'do_stuff' does, we can't second guess what it might be.
However I'll suggest a couple of steps:
Double the number of processes to twice CPU count. That's often a 'sweet spot' because it means that any non-CPU delay in a process just means one of the others get to go full speed.
Try strace -fTt <yourprogram> (if you're on linux, the commands are slightly different on other Unix variants). Then do it again with strace -fTtc because the c will summarise syscall run times. Look at which ones take the most 'time'.
Profile your code to see where the hot spots are. Devel::NYTProf is one library you can use for this.
And on a couple of minor points:
my $manager=new Parallel::ForkManager($cpu->count);
Would be better off written:
my $manager=Parallel::ForkManager -> new ( $cpu->count);
Rather than using indirect object notation.
If you are just iterating #files then it might be better to not use a loop count variable and instead:
foreach my $file ( #files ) {
$manager -> start and next;
do_stuff($file);
$manager -> finish;
}
I have a small sample program that hangs on perl 5.16.3. I am attempting to use an alarm to trigger if two threads don't finish working in time in a much more complicated program, but this boils down the gist of it. I know there's plenty of other ways to do this, but for the sake of argument, let's say I'm stuck with the code the way it is. I'm not sure if this is a bug in perl, or something that legitimately shouldn't work.
I have researched this on the Internet, and it seems like mixing alarms and threads is generally discouraged, but I've seen plenty of examples where people claim that this is a perfectly reasonable thing to do, such as this other SO question, Perl threads with alarm. The code provided in the accepted answer on that question also hangs on my system, which is why I'm wondering if maybe this is something that's now broke, at least as of 5.16.3.
It appears that in the code below, if I call join before the alarm goes off, the alarm never triggers. If I replace the join with while(1){} and go into a busy-wait loop, then the alarm goes off just fine, so it appears that join is blocking the SIGALRM for some reason.
My expectation is that the join happens, and then a few seconds later I see "Alarm!" printed on the screen, but this never happens, so long as that join gets called before the alarm goes off.
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
sub worker {
print "Worker thread started.\n";
while(1){}
}
my $thread = threads->create(\&worker);
print "Setting alarm.\n";
$SIG{ALRM} = sub { print "Alarm!\n" };
alarm 2;
print "Joining.\n";
$thread->join();
The problem has nothing to do with threads. Signals are only processed between Perl ops, and join is written in C, so the signal will only be handled when join returns. The following demonstrates this:
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
sub worker {
print "Worker thread started.\n";
for (1..5) {
sleep(1);
print(".\n");
}
}
my $thread = threads->create(\&worker);
print "Setting alarm.\n";
$SIG{ALRM} = sub { print "Alarm!\n" };
alarm 2;
print "Joining.\n";
$thread->join();
Output:
Setting alarm.
Joining.
Worker thread started.
.
.
.
.
.
Alarm!
join is essentially a call to pthread_join. Unlike other blocking system calls, pthread_join does not get interrupted by signals.
By the way, I renamed $tid to $thread since threads->create returns a thread object, not a thread id.
I'm going to post an answer to my own question to add some detail to ikegami's response above, and summarize our conversation, which should save future visitors from having to read through the huge comment trail it collected.
After discussing things with ikegami, I went and did some more reading on perl signals, consulted some other perl experts, and discovered the exact reason why join isn't being "interrupted" by the interpreter. As ikegami said, signals only get delivered in between perl operations. In perl, this is called Deferred Signals, or Safe Signals.
Deferred Signals were released in 5.8.0, back in 2002, which could be one of the reasons I was seeing older posts on the Net which don't appear to work. They probably worked with "unsafe signals", which act more like signal delivery that we're used to in C. In fact, as of 5.8.1, you can turn off deferred signal delivery by setting the environment variable PERL_SIGNALS=unsafe before executing your script. When I do this, the threads::join call is indeed interrupted as I was expecting, just as pthread_join is interrupted in C in this same scenario.
Unlike other I/O operations, like read, which returns EINTR when a signal interrupts it, threads::join doesn't do this. Under the hood it's a call to the C library call pthread_join, which the man page confirms does not return EINTR. Under deferred signals, when the interpreter gets the SIGALRM, it schedules delivery of the signal, deferring it, until the threads::join->pthread_join library call returns. Since pthread_join doesn't "interrupt" and return EINTR, my SIGALRM is effectively being swallowed by the threads::join. With other I/O operations, they would "interrupt" and return EINTR, giving the perl interpreter a chance to deliver the signal and then restart the system call via SA_RESTART.
Obviously, running in unsafe signals mode is probably a Bad Thing, so as an alternative, according to perlipc, you can use the POSIX module to install a signal handler directly via sigaction. This then makes the one particular signal "unsafe".
in my perl script I'm collecting a large data and later I need it to post to server, up to this I'm good but my criteria is that post to server takes subsequently large time so I need to a threading / forking concept so that one will post and parallely I can dig my second data set at same time while posting to server is taking place.
code snippet
if(system("curl -sS $post_url --data-binary \#$filename -H 'Content-type:text/xml;charset=utf-8' 1>/dev/null") != 0)
{
exit_script(" xml: Error ","Unable to update $filename xml on $post_url");
}
can any one tell me is this achievable with threading or forking.
It's difficult to give an answer to your question, because it depends.
Yes, Perl supports both forking and threading.
In general, I would suggest looking at threading for data-oriented tasks, and forking for almost anything else.
And so what you want to so is eminently achievable.
First you need to:
Encapsulate your tasks into subroutines. Get that working first. (This is very important - parallel stuff causes worlds of pain and is difficult to troubleshoot if you're not careful - get it working single threaded first).
Run your subroutines as threads, and capture their results.
Something like this:
use threads;
sub curl_update {
my $result = system ( "you_curl_command" );
return $result;
}
#start the async curl
my $thr = threads -> create ( \&curl_update );
#do your other stuff....
sleep ( 60 );
my $result = $thr -> join();
if ( $result ) {
#do whatever you would if the curl update failed
}
In this, the join is a blocking call - your main code will stop and wait for your thread to complete. If you want to do something more complicated, you can use is_running or is_joinable which are non blocking.
I'd suggest neither.
You're just talking lots of HTTP. You can talk concurrent HTTP a lot nicer, because it's just network IO, by using any of the asynchronous IO systems. Perl has many of them.
Principally I'd suggest IO::Async, but then I wrote it. You can use Net::Async::HTTP to make an HTTP hit. This will fully support doing many of them at once - many hundreds or thousands if need be.
Otherwise, you can also try either POE or AnyEvent, which will both support the same thing in their own way.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have several very large tables in mysql (Millions of rows) that I need to load into my perl script.
We then do some custom processing of the data, and aggregate it into a hash. Unfortunately, that custom processing can't be implemented in MySQL.
Heres a quick pseudocode.
my #data;
for my $table_num(#table_numbers){
my $sth = $dbh->prepare(...);
$sth->execute();
$sth->bind_columns(\my($a,$b,$c,...));
while(($sth->fetch()){
$data[$table_num]{black_box($a)}{secret_func($b)}+=$c;
}
}
my $x = $#data + 1;
for my $num (#table_numbers){
for my $a (keys %{$data[$num]}){
for my $b (keys %{$data[$num]{$a}){
$data[$x]{$a}{$b} += $data[$num]{$a}{$b};
}
}
}
Now, the first loop can take several minutes per iteration to run, so I am thinking of ways I can run them in parallel. I have looked at using Perl Threads before, but they seem to just run several perl interpreters at once, and my script is already using a lot of memory, and merging the data would seem t be problematic. Also, at this stage, the script is not using a lot of CPU.
I have been looking at possibly using Coro threads, but it seem like there would be a learning curve, plus a fairly complex integration of my current code. What I would like to know if I am likely to see any gains by going this route. Are there better ways of multithreading code like this. I can not afford to use any more memory then my code already uses. Is there something else I can do here?
Unfortunately doing the aggregation in MySQL is not an option, and rewriting the code in a different language would be too time consuming. I am aware that using arrays instead of hashes is likely to make my code faster/use less memory, but again that would require a major rewrite of a large script.
Edit: The above is pseudo code, the actual logic is a lot more complex. The bucketing is based on several db tables, and many more inputs then just $a and $b. Precomputing them is not practical, as the there are Trillions+ possible combinations. The main goal is how do I make the perl script run faster, not how to fix the SQL Part of things. That requires changes to how the data is stored and indexed in the actual server. Which would affect a lot of other code. There are other people working on doing those optimizations. My current goal is to attempt to make the code faster without changing any sql.
You could do it in mysql simply by making black_box and secret_func tables (temporary tables, if necessary) prepopulated with the results for every existing value of the relevant columns.
Short of that, measure how much time is spent in the calls to black_box and secret_func vs. execute/fetch. If a lot is in the former, you could memoize the results:
my %black_box;
my %secret_func;
for my $table_num...
...
$data[$table_num]{ $black_box{$a} //= black_box($a) }{ $secret_func{$b} //= secret_func($b) } += $c;
If you have memory concerns, using forks instead of threads may help. They use much less memory than the standard perl threads. There is going to be somewhat of a memory penalty for multi-threading, and YMMV as far as performance goes, but you might want to try something like:
use forks;
use Thread::Queue;
my $inQueue = Thread::Queue->new;
my $outQueue = Thread::Queue->new;
$inQueue->enqueue(#table_numbers);
# create the worker threads
my $numThreads = 4;
for(1 .. $numThreads) {
threads->create(\&doMagic);
}
# wait for the threads to finish
$_->join for threads->list;
# collect the data
my #data;
while(my $result = $outQueue->dequeue_nb) {
# merge $result into #data
}
sub doMagic {
while(my $table_num = $inQueue->dequeue_nb) {
my #data;
# your first loop goes here
$outQueue->enqueue(\#data);
}
return;
}
am using Perl on a linux box and my memory usage is going up and up, I believe because of previously run threads that have not been killed/joined.
I think I need to somehow signal the thread(s) that have done/run to terminate, and then detach
it/them so that it/they will get cleaned up automatically giving me back memory...
I have tried return(); with $thr_List->join(); & $thr_List->detach(); but my gui doesn't show for ages with join, and the mem problem seems
to be still there with the detach... Any help...
$mw->repeat(100, sub { # shared var handler/pivoting-point !?
while (defined(my $command = $q->dequeue_nb())) { #???
# to update a statusbar's text
$text->delete('0.0', "end");
$text->insert('0.0', $command);
$indicatorbar->value($val); # to update a ProgressBar's value to $val
$mw->update();
for ( #threadids ) { # #threadids is a shared array containing thread ids
# that is to say I have got the info I wanted from a thread and pushed its id into the above #threadids array.
print "I want to now kill or join the thread with id: $_\n";
#$thrWithId->detach();
#$thrWithId->join();
# then delete that id from the array
# delete $threadids[elWithThatIdInIt];
# as this seems to be in a repeat(100, sub... too, there are problems??!
# locks maybe?!?
# for ( #threadids ) if its not empty?!?
}
} # end of while
}); # end of sub
# Some worker... that works with the above handler/piviot me thinks#???
async {
for (;;) {
sleep(0.1);
$q->enqueue($StatusLabel);
}
}->detach();
I have uploaded my full code here (http://cid-99cdb89630050fff.office.live.com/browse.aspx/.Public) if needed, its in the Boxy.zip...
First sorry for replying here but I've lost my cookie that allows editing etc...
Thanks very much gangabass, that looks like great info, I will have to spend some time on it though, but at least it looks like others are asking the same questions... I was worried I was making a total mess of things.
Thanks guys...
So it sounds like you got the join working but it was very slow?
Threading in perl is not lightweight. Creating and joining threads takes significant memory and time.
If your task allows, it is much better to keep threads running and give them additional work rather than ending them and starting new threads later. Thread::Queue can help with this. That said, unless you are on Windows, there is not a lot of point to doing that instead of forking and using Parallel::ForkManager.
You need to use only one Worker thread and update GUI from it. Other threads just process data from the queue and there is no need to terminate them.
See this example for more info