I'm starting to learn parallel programming, and I want to compare a single-threaded program with a multi-threaded one.
I need to make a very simple algorithm that calculates the largest number of possible prime numbers within one minute and shows me the last calculated prime number and its position in the prime numbers.
For example, prime number 23, would appear as the number 23 and its position 9, because it is the 9th prime number.
Without using threads, the number of primes found was 233,596 and the last prime 3,246,107. But with threads, 229,972 primes were found and the last prime was 3,192,463.
I think that is wrong, because multi-threading was supposed to obtain a superior result to the single thread. I believe it is a pretty basic error, but I cannot solve it as I still do not understand much of Perl's parallelism.
This is the code. It calculates prime numbers for one minute single-threaded, and then the same calculation with four threads using shared variables.
use threads;
use threads::shared;
my $seconds = 60;
# WITHOUT THREAD #
print "\n\n Calc without Threads:\n";
my $endTime = time() + $seconds;
calcWithoutThread();
print "\n\n ----------------===========================---------------- \n";
# WITH THREAD #
print "\n\n Calc with Threads:\n";
my $prime :shared = 5; # Starts from the 5th prime
my $totalPrime :shared = 2; # Starts with prime 2 and prime 3
my $lastPrime :shared = 0;
my $endTime1 = time() + $seconds;
my $thread1 = threads->create(\&calcWithThread);
my $thread2 = threads->create(\&calcWithThread);
my $thread3 = threads->create(\&calcWithThread);
my $thread4 = threads->create(\&calcWithThread);
$thread1->join();
$thread2->join();
$thread3->join();
$thread4->join();
print " Was found $totalPrime prime numbers. Last prime: $lastPrime.";
# SUB's #
sub calcWithoutThread{
$prime = 5; # Starts from the 5th prime
$totalPrime = 2; # Starts with prime 2 and prime 3
$lastPrime = 0;
while (time() < $endTime){
if(calcPrime($prime)){
$totalPrime ++;
$lastPrime = $prime;
}
$prime ++;
}
print " Was found $totalPrime prime numbers. Last prime: $lastPrime.";
}
sub calcWithThread{
while (time() < $endTime1) {
lock($prime);
if(calcPrime($prime)){
$totalPrime ++;
$lastPrime = $prime;
}
$prime ++;
}
}
sub calcPrime{
for($i=2 ; $i< sqrt ($prime) ; $i++){
if( $prime % $i == 0){
return 0;
}
}
return 1;
}
The logic is that the threads do this calculation synchronously, if it is prime number or not, and also do not overlap values while calculating.
The problem is that your threads are synchronised with one another by the locked variable $prime. That means they don't have a chance to run simultaneously, and will also suffer the overhead of switching threads and synchronising access to the variable
Primes aren't a great test for parallel processing, as the calculation of each prime depends on the previous result. However, you could do it by keeping a separate $prime variable, but starting at 5, 6, 7, and 8 for the four threads and adding 4 between tests. That way they don't duplicate each other's work but together cover every integer
There is immediately a problem with this in that none of the even numbers will ever be prime, so two of the four threads will never produce a result. It is still a valid test of parallelism, but clearly very inefficient. You could fix that by starting the threads with 5, 7, 9, and 11, and incrementing by 8 before each test. Then every thread will be profitable
Don't forget that you will have to code the same algorithm for the single-threaded code, otherwise the parallel section gets an unfair advantage
Update
Perhaps a better way would be to lock $prime only to fetch the next number to be tested and increment it. That way all the threads can do their calculations in parallel and only queue up to get another job
You would then have to lock $total_prime to prevent two threads from incrementing it simultaneously, and also update $last_prime while that lock is in effect. Because parallel threads may well generate prime numbers out of sequence you will also have to check whether the prime just found is greater than the latest one discovered by any thread
Your subroutine would look like this. I apologise for changing the identifiers, but Perl traditionally uses snake_case and I find camelCase unpleasant to read. Snake case is also much easier for many people whose first language isn't English, and who cannot pick out the capital letters so easily
sub calc_with_thread {
while ( time() < $end_time_1 ) {
my $test = do {
lock $prime;
$prime++;
};
if ( calc_prime($test) ) {
lock $total_prime;
++$total_prime;
$last_prime = $test if $test > $last_prime;
}
}
}
Multi-threading is not an automatic speed boost.
Shared variables are tied, and tied variables are slow.
Here's a simpler, improved version that goes up to a max number rather than by time. It avoids locking. Threads are still significantly slower, because each of them have to do a get and set on that sluggish shared variable for every number they check.
#!/usr/bin/env perl
use strict;
use warnings;
use Time::HiRes qw(time);
use v5.10;
use threads;
use threads::shared;
my $Max_Number = 50_000;
{
print "\n\nCalc without Threads:\n";
my $start_time = time;
calcWithoutThread();
my $end_time = time;
say "$Max_Number took #{[ $end_time - $start_time ]} seconds.";
}
my $Shared_Prime :shared = 5;
{
print "\n\nCalc with Threads:\n";
my $start_time = time;
my $thread1 = threads->create(\&calcWithThread);
my $thread2 = threads->create(\&calcWithThread);
my $thread3 = threads->create(\&calcWithThread);
my $thread4 = threads->create(\&calcWithThread);
$thread1->join();
$thread2->join();
$thread3->join();
$thread4->join();
my $end_time = time;
say "$Max_Number took #{[ $end_time - $start_time ]} seconds.";
}
sub calcWithoutThread {
my $prime = 5; # Starts from the 5th prime
while( $prime <= $Max_Number ) {
calcPrime($prime++);
}
}
sub calcWithThread {
while( $Shared_Prime <= $Max_Number ) {
calcPrime($Shared_Prime++);
}
}
sub calcPrime {
my $prime = shift;
for( my $i=2 ; $i < sqrt($prime) ; $i++){
if( $prime % $i == 0){
return 0;
}
}
return 1;
}
Instead, avoid shared state or coordinating workers as much as possible. For example, you can slice up the work into pieces. If you have 4 workers, give them each a starting number and they each increment by 4. Then they can cover the whole space without shared state.
Worker 1: 5, 9, 13, ...
Worker 2: 6, 10, 14, ...
Worker 3: 7, 11, 15, ...
Worker 4: 8, 12, 16, ...
#!/usr/bin/env perl
use strict;
use warnings;
use Time::HiRes qw(time);
use v5.10;
use threads;
my $Max_Number = 7_000;
{
print "\n\nCalc without Threads:\n";
my $start_time = time;
calcWithoutThread(5, 1);
my $end_time = time;
say "$Max_Number took #{[ $end_time - $start_time ]} seconds.";
}
{
print "\n\nCalc with Threads:\n";
my $start_time = time;
my $thread1 = threads->create(\&calcWithThread, 5, 4 );
my $thread2 = threads->create(\&calcWithThread, 6, 4 );
my $thread3 = threads->create(\&calcWithThread, 7, 4 );
my $thread4 = threads->create(\&calcWithThread, 8, 4 );
$thread1->join();
$thread2->join();
$thread3->join();
$thread4->join();
my $end_time = time;
say "$Max_Number took #{[ $end_time - $start_time ]} seconds.";
}
sub calcWithoutThread {
my($start, $inc) = #_;
my #primes;
for( my $prime = $start; $prime <= $Max_Number; $prime += $inc ) {
push #primes, $prime if calcPrime($prime);
}
return \#primes;
}
sub calcWithThread {
my($start, $inc) = #_;
my #primes;
for( my $prime = $start; $prime <= $Max_Number; $prime += $inc ) {
push #primes, $prime if calcPrime($prime);
}
return \#primes;
}
sub calcPrime {
my $prime = shift;
for( my $i=2 ; $i < sqrt($prime) ; $i++){
if( $prime % $i == 0){
return 0;
}
}
return 1;
}
For me on my i7 MacBook (2 real cores, 4 virtual), threads break even at about 6000. To get the results, $thread->join will return the list of primes from calcWithThread.
Related
It hasn't been long since I started studying algorithm coding tests, and I found it difficult to find regularity in Memoization.
Here are two problems.
Min Cost Climbing Stairs
You are given an integer array cost where cost[i] is the cost of ith step on a staircase. Once you pay the cost, you can either climb one or two steps.
You can either start from the step with index 0, or the step with index 1.
Return the minimum cost to reach the top of the floor.
Min Cost Climbing Stairs
Recurrence Relation Formula:
minimumCost(i) = min(cost[i - 1] + minimumCost(i - 1), cost[i - 2] + minimumCost(i - 2))
House Robber
You are a professional robber planning to rob houses along a street. Each house has a certain amount of money stashed, the only constraint stopping you from robbing each of them is that adjacent houses have security systems connected and it will automatically contact the police if two adjacent houses were broken into on the same night.
Given an integer array nums representing the amount of money of each house, return the maximum amount of money you can rob tonight without alerting the police.
House Robber
Recurrence Relation Formula:
robFrom(i) = max(robFrom(i + 1), robFrom(i + 2) + nums(i))
So as you can see, first problem consist of the previous, and second problem consist of the next.
Because of this, when I try to make recursion function, start numbers are different.
Start from n
int rec(int n, vector<int>& cost)
{
if(memo[n] == -1)
{
if(n <= 1)
{
memo[n] = 0;
} else
{
memo[n] = min(rec(n-1, cost) + cost[n-1], rec(n-2, cost) + cost[n-2]);
}
}
return memo[n];
}
int minCostClimbingStairs(vector<int>& cost) {
const int n = cost.size();
memo.assign(n+1,-1);
return rec(n, cost); // Start from n
}
Start from 0
int getrob(int n, vector<int>& nums)
{
if(how_much[n] == -1)
{
if(n >= nums.size())
{
return 0;
} else {
how_much[n] = max(getrob(n + 1, nums), getrob(n + 2, nums) + nums[n]);
}
}
return how_much[n];
}
int rob(vector<int>& nums) {
how_much.assign(nums.size() + 2, -1);
return getrob(0, nums); // Start from 0
}
How can I easily know which one need to be started from 0 or n? Is there some regularity?
Or should I just solve a lot of problems and increase my sense?
Your question is right, but somehow examples are not correct. Both the problems you shared can be done in both ways : 1. starting from top & 2. starting from bottom.
For example: Min Cost Climbing Stairs : solution that starts from 0.
int[] dp;
public int minCostClimbingStairs(int[] cost) {
int n = cost.length;
dp = new int[n];
for(int i=0; i<n; i++) {
dp[i] = -1;
}
rec(0, cost);
return Math.min(dp[0], dp[1]);
}
int rec(int in, int[] cost) {
if(in >= cost.length) {
return 0;
} else {
if(dp[in] == -1) {
dp[in] = cost[in] + Math.min(rec(in+1, cost), rec(in+2, cost));
}
return dp[in];
}
}
However, there are certain set of problems where this is not easy. Their structure is such that if you start in reverse, the computation could get complicated or mess up the future results:
Example: Reaching a target sum from numbers in an array using an index at max only 1 time. Reaching 10 in {3, 4, 6, 5, 2} : {4,6} is one answer but not {6, 2, 2} as you are using index (4) 2 times.
This can be done easily in top down way:
int m[M+10];
for(i=0; i<M+10; i++) m[i]=0;
m[0]=1;
for(i=0; i<n; i++)
for(j=M; j>=a[i]; j--)
m[j] |= m[j-a[i]];
If you try to implement in bottom up way, you will end up using a[i] multiple times. You can definitely do it bottom up way if you figure a out a way to tackle this messing up of states. Like using a queue to only store reached state in previous iterations and not use numbers reached in current iterations. Or even check if you keep a count in m[j] instead of just 1 and only use numbers where count is less than that of current iteration count. I think same thing should be valid for all DP.
I am translating a fasta nucleotide file into protein sequences by this code
use Bio::SeqIO;
use Getopt::Long;
my ($format,$outfile) = 'fasta';
GetOptions(
'f|format:s' => \$format,
'o|out|outfile:s' => \$outfile,
);
my $oformat = 'fasta';
$file=$ARGV[0];
chomp $file;
# this implicity uses the <> file stream
my $seqin = Bio::SeqIO->new( -format => $format, -fh => \*ARGV);
my $seqout;
if( $outfile ) {
$seqout = Bio::SeqIO->new( -format => $oformat, -file => ">$outfile" );
} else {
# defaults to writing to STDOUT
$seqout = Bio::SeqIO->new( -format => $oformat );
}
while( (my $seq = $seqin->next_seq()) ) {
my $pseq = $seq->translate();
$seqout->write_seq($pseq);
}
I implement
threads and threads::shared perl modules to achieve in other cases but I want to apply following code into previous task
use threads;
use threads::shared;
use List::Util qw( sum );
use YAML;
use constant NUM_THREADS =>100;
my #output :shared;
my $chunk_size = #data / NUM_THREADS;
my #threads;
for my $chunk ( 1 .. NUM_THREADS ) {
my $start = ($chunk - 1) * $chunk_size;
push #threads, threads->create(
\&doOperation,
\#data,
$start,
($start + $chunk_size - 1),
\#output,
);
}
$_->join for #threads;
sub doOperation{
my ($data, $start, $end, $output) = #_;
my $id = threads->tid;
print "$id ";
for my $i ($start .. $end) {
print "Thread [$id] processing row $i\n";
#THIS WHILE SHOULD BE MULTITHREADED
while( (my $seq = $seqin->next_seq()) ) {
my $pseq = $seq->translate();
$seqout->write_seq($pseq);
}
#THIS WHILE SHOULD BE MULTITHREADED
sleep 1 if 0.2 > rand;
}
print "Thread done.\n";
return;
}
print "\n$time\n";
my $time = localtime;
print "$time\n";
Threads are being created but somehow it can not process the fasta file.
The fisrt code works fine without multi threading.
I'm afraid I'm not going to rewrite your code for you, but I can give you some pointers on how to accomplish threading.
The thing you need to understand about perl threading is it's not a lightweight thread. You should spawn a number of threads equal to the parallelism, run them off a Thread::Queue and go from there.
You also need to avoid any non-thread-safe modules - you can use them if you're careful but that usually means instantiating them within the thread with require and import instead of use at the start of the program.
I would also suggest avoiding trying to do your output IO in parallel - return the thread results and coalesce them (sorting if necessary) in the 'main' thread (or spin off a single writer).
So I'd go with something like;
#!/usr/bin/env perl
use strict;
use warnings;
use threads;
use Thread::Queue;
use Storable qw ( freeze thaw );
my $NUM_THREADS = 16; #approx number of cores.
my $translate_q = Thread::Queue->new;
my $translate_results_q = Thread::Queue->new;
sub translate_thread {
while ( my $item = translate_q->dequeue ) {
my $seq = thaw $item;
my $pseq = $seq->translate();
$translate_results_q->enqueue( freeze $pseq );
}
}
threads->create( \&translate_thread ) for 1 .. $NUM_THREADS;
while ( my $seq => $seqin->next_seq ) {
$translate_q->enqueue( freeze($seq) );
}
$translate_q->end;
$_->join for threads->list;
$translate_results_q->end;
while ( my $result = $translate_results_q->dequeue ) {
my $pseg = thaw($result);
}
Note - this won't work as is, because it's missing merging with the rest of your ocde. But hopefully it illustrates how the queue and threading can work to get parallelism?
You pass around your objects using freeze and thaw from Storable, and use the parallelism to unpack them.
Don't go too mad on the number of threads - for primarily compute workloads (e.g. no IO) then a number of threads equal to the number of cores is about right. If they'll be blocking on IO, you can increase this number, but going past about double isn't going to do very much.
You can't really parallelise disk IO efficiently - it just doesn't work like that. So do that in the 'main' thread.
SO I have 2 bubble sorts: 1 from lecture slides, another I wrote on my own:
def lecture_bubble(L):
while True:
swapped = False
for i in range(len(L) -1):
if L[i] > L[i+1]:
L[i+1] ,L[i] = L[i], L[i+1]
swapped = True
if not swapped:
# No swaps this pass ; therefore sorted
return L
def bubble_sort(array):
for i in range(len(array)-1):
swapped = False
for j in range(len(array)-1,i,-1):
if array[j] < array[j-1]:
array[j], array[j-1] = array[j-1], array[j]
swapped = True
if not swapped:
return array
return array
Comparing both of them:
Time taken for lecture_bubble is 4.485383749008179
Time taken for bubble_sort is 0.00061798095703125
[Finished in 4.6s]
Can someone explain why my bubble_sort is taking significantly lesser time to sort an array?
Also can my bubble sort be further improved?
Professors code executes till "if not swapped" is true. Your's will execute till either "the end of the for loop" or "if not swapped". Your code may not work for some cases.
Professor's algorithm stops sorting once it iterates through all elements without making any swap — which means the array is sorted. Have written the same algorithm in Javascript below
Comparing each with the neighbor and swapping if first is greater than the next
function bubbleSort(arr){
console.log("Input Array");
console.log(arr);
let i = 0
let temp;
let notSorted;
do {
notSorted = false;
for (let j = 0; j < arr.length-i; j++) {
if (arr[j] > arr[j+1]) {
notSorted = true;
temp = arr[j];
arr[j] = arr[j+1];
arr[j+1] = temp;
console.log(arr[j],"swapped with",arr[j+1])
console.log(arr);
} else {
console.log("SKIP");
}
console.log(j, arr.length-i);
}
i++;
} while (notSorted)
console.log("Sorted using Bubble Sort");
return arr;
}
// console.log(bubbleSort([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])); // uncomment to run and see how efficient this algorithm is when array is sorted
console.log(bubbleSort([5,8,18,4,19,13,1,3,2,20,17,15,16,9,10,11,14,12,6,7]));
Task
You are given an amount of pages in a book. The book printer is lazy and will only print the side number on every other page. The first page number to be printed is 1. The task is to calculate the amount of times a specific number is used on the printed book pages. The goal is to print out all numbers used in bookpages being; 10^9 < pages < 10^12 in less than 5 seconds.
For example
Amount of book pages is 20. The book pages to be printed are then, 1, 3, 5, 7, 9, 11, 13, 15, 17 and 19. The 1 contains only the number 1 and should therefor only increment the savings on 1 by one. However, the number 13 contains 1 and 3, therefor the number 1 and 3 will in the savings will be incremented and so forth.
Question
How do I make the program execute faster at larger numbers? I've been thinking about using threads but I'm unsure if it's beneficial or not.
#include <iostream>
#include <string>
int main(int argc, char *argv[]) {
long long sideNumber;
long long numbers[10];
if(argv[1]) {
sideNumber = std::stoll(argv[1]);
} else {
std::printf("Please enter amount of pages.\n");
return -1;
}
for(int i = 0; i < 10; i++) numbers[i] = 0;
long long index = 1;
while(index < sideNumber) {
long long current = index;
while(current > 0) {
numbers[current%10]++;
current /= 10;
}
index += 2;
}
for(int i = 0; i < 10; i++) {
std::printf("%i : %i\n", i, numbers[i]);
}
return 0;
}
This is trivially a maths problem, not a computing problem.
However, if this has really been set as a computing problem, then the answer is probably recursive.
Consider the page numbers 1-9. How many do they tally for digits 0-9?
Now consider the pages 11-19. Can you re-use the tally from the previous task to make a new tally? and again for 2x,3x,4x, etc.
Then, can you reuse the tally from 1-99 for 101-199?, etcetera.
Note that you need to think some more about how you deal with middle zeros.
Alternatively, you can use a pencil to get the same result in half the time it takes you to write the program.
I modified the code to work with two files. to_search.txt has string to be searched. big_file.fastq has lines where to be searched and if string found (2 mismatch allowed with exact length which range from 8-10, no addition and deletion), place in respective name. So each string is searched in all lines (2nd line) in big_file.fastq.
# to_search.txt: (length can be from 8-20 characters)
1 TCCCTTGT
2 ACGAGACT
3 GCTGTACG
4 ATCACCAG
5 TGGTCAAC
6 ATCGCACA
7 GTCGTGTA
8 AGCGGAGG
9 ATCCTTTG
10 TACAGCGC
#2000 search needed
# big_file.fastq: 2 billions lines (each 4 lines are associated: string search is in second line of each 4 lines).
# Second line can have 100-200 characters
#M04398:19:000000000-APDK3:1:1101:21860:1000 1:N:0:1
TCttTTGTGATCGATCGATCGATCGATCGGTCGTGTAGCCTCCAACCAAGCACCCCATCTGTTCCAAATCTTCTCCCACTGCTACTTGAAGACGCTGAAGTTGAAGGGCCACCTTCATCATTCTGG
+
#8ACCDGGGGGGGGGGGGGEGGGGGGGGDFFEGGG#FFGGGGGGGGGGGGGGGGGCF#<FFGGGGGFE9FGGGFEACE8EFFFGGGGGF9F?CECEFEG,CFGF,CF#CCC?BFFG<,9<9AEFG,,
#M04398:19:000000000-APDK3:1:1101:13382:1000 1:N:0:1
NTCGATCGATCGATCGATCGATCGATCGTTCTGAGAGGTACCAACCAAGCACACCACGGGCGACACAGACAGCTCCGTGTTGAACGGGTTGTTCTTCTTCTTGCCTTCATCATCCCCATCCTCAGTGGACGCAGCTTGCTCATCCTTCCTC
+
#8BCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
#M04398:19:000000000-APDK3:1:1101:18888:1000 1:N:0:1
NCAGAATGAGGAAGGATGAGCCCCGTCGTGTCGAAGCTATTGACACAGCGCTATTCCGTCTTTATGTTCACTTTAAGCGGTACAAGGAGCTGCTTGTTCTGATTCAGGAACCGAACCCTGGTGGTGTGCTTGGTTGGCAAGTTTACGGCTC
+
#8BCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGE
Here is the code for two mismatches. I tried with exact match, speed is not bad: takes around a day. I have used Time::Progress module. When I use 2 mismatch: shows 115 days to finish. How the speed can be improved here?
#!/usr/bin/perl
use strict;
use warnings;
$| = 1;
open( IN_P1, "big_file.fastq" ) or die "File not found";
my ( #sample_file_names, #barcode1 );
open( BC_FILE, "to_search.txt" ) or die "No barcode file";
my #barcode_file_content = <BC_FILE>;
foreach (#barcode_file_content) {
chomp $_;
$_ =~ s/\r//;
$_ =~ s/\n//;
#print $_;
my #elements = split( "(\t|,| )", $_ );
push #sample_file_names, $elements[0];
push #barcode1, $elements[2];
}
# open FH
my #fh_array_R1;
foreach (#sample_file_names) {
chomp $_;
local *OUT_R1;
open( OUT_R1, ">", "$_\.fq" ) or die "cannot write file";
push #fh_array_R1, *OUT_R1;
}
# unknown barcode file
open( UNKNOWN_R1, ">unknown-barcode_SE.fq" ) or die "cannot create unknown-r1 file";
while ( defined( my $firstp1 = <IN_P1> ) ) {
my $p1_first_line = $firstp1;
my $p1_second_line = <IN_P1>;
my $p1_third_line = <IN_P1>;
my $p1_fourth_line = <IN_P1>;
chomp( $p1_first_line, $p1_second_line, $p1_third_line, $p1_fourth_line, );
my $matched_R1 = "$p1_first_line\n$p1_second_line\n$p1_third_line\n$p1_fourth_line\n";
for ( my $j = 0 ; $j < scalar #barcode1 ; $j++ ) {
chomp $barcode1[$j];
my $barcode1_regex = make_barcode_fragments( $barcode1[$j] );
if ( $p1_second_line =~ /$barcode1_regex/i ) {
# keep if matched
print { $fh_array_R1[$j] } $matched_R1;
last;
}
else {
#print to unknown;
print UNKNOWN_R1 $matched_R1;
}
}
}
# make two mismatch patterm of barcode
sub make_barcode_fragments {
my ($in1) = #_;
my #subpats;
for my $i ( 0 .. length($in1) - 1 ) {
for my $j ( $i + 1 .. length($in1) - 1 ) {
my $subpat = join( '',
substr( $in1, 0, $i ),
'\\w', substr( $in1, $i + 1, $j - $i - 1 ),
'\\w', substr( $in1, $j + 1 ),
);
push #subpats, $subpat;
}
}
my $pat = join( '|', #subpats );
#print $pat;
return "$pat";
}
exit;
If your algorithm cannot be changed/improved in Perl itself, you can still get speedup by writing the time consuming parts in C. Here is an example using inline C:
use strict;
use warnings;
use Benchmark qw(timethese);
use Inline C => './check_line_c.c';
my $find = "MATCH1";
my $search = "saasdadadadadasd";
my %sub_info = (
c => sub { check_line_c( $find, $search ) },
perl => sub { check_line_perl( $find, $search ) },
);
timethese( 4_000_000, \%sub_info );
sub check_line_perl {
my ($find, $search ) = #_;
my $max_distance = 2;
for my $offset ( 0 .. length($search) - length($find) ) {
my $substr = substr( $search, $offset, length($find) );
my $hd = hd( $find, $substr );
if ( $hd <= $max_distance ) {
return ( $hd, $substr );
}
}
return ( undef, undef );
}
sub hd {
return ( $_[0] ^ $_[1] ) =~ tr/\001-\377//;
}
where check_line_c.c is:
void check_line_c( char* find, char * search ) {
int max_distance = 2;
int flen = strlen(find);
int last_ind = strlen(search) - flen;
SV *dis = &PL_sv_undef;
SV *match = &PL_sv_undef;
for ( int ind = 0; ind <= last_ind; ind++ )
{
int count = 0;
for ( int j = 0; j < flen; j++ )
{
if ( find[j] ^ search[ind+j] ) count++;
}
if ( count < max_distance )
{
match = newSV(flen);
sv_catpvn(match, search+ind, flen );
dis = newSViv(count);
break;
}
}
Inline_Stack_Vars;
Inline_Stack_Reset;
Inline_Stack_Push(sv_2mortal(dis));
Inline_Stack_Push(sv_2mortal(match));
Inline_Stack_Done;
}
The output is (Ubuntu Laptop using Intel Core i7-4702MQ CPU #2.20GHz):
Benchmark: timing 4000000 iterations of c, perl...
c: 2 wallclock secs ( 0.76 usr + 0.00 sys = 0.76 CPU) # 5263157.89/s (n=4000000)
perl: 19 wallclock secs (18.30 usr + 0.00 sys = 18.30 CPU) # 218579.23/s (n=4000000)
So this gives a 24-fold speedup for this case.
I'd suggest creating a really bad hashing algorithm. Something nice and reversible and inefficient, like the sum of the characters. Or maybe the sum of unique values (1-4) represented by the characters.
Compute the target sums, and also compute the maximum allowed variance. That is, if the objective is a match with two substitutions, then what is the maximum possible difference? (4-1 + 4-1 = 6).
Then, for each "window" of text of the appropriate length in the target data file, compute a running score. (Add a character to the end, drop a character from the start, update the hash score.) If the score for a window is within the allowable range, you can do further investigation.
You might want to implement this as different passes. Possibly even as different stages in a shell pipeline or script. The idea being that you might be able to parallelize parts of the search. (For instance, all the match strings with the same length could be searched by one process, since the hash windows are the same.)
Also, of course, it is beneficial that you can keep your early work if your program crashes in the later stages. And you can even have the early parts of the process running while you are still developing the end stages.