substring comparison in perl - string

Im working on comparing 2 substrings sub1 and sub2 from 2 initial strings seq1 and seq2 allowing only one mismatch $k is the length of subsequence
foreach (my $i = 0; $i < length($seq1) - $k; $i += 1) {
my $sub1 = substr($seq1, $i, $k);
foreach (my $j = 0; $j < length($seq2) - $k; $i++) {
my $sub2 = substr($seq2, $j, $k);
my $diff = $sub1 ^ $sub2;
my $num_mismatch = $diff =~ tr/\0//c;
if ($num_mismatch == 1) {
$d{$sub1}++;
}
}
}
foreach (keys %d) {
print "$_\n";
}
*When I run the code It gets stuck until i kill the process and it doesnt give any result. Any Help about this ? *

foreach (my $j=0;$j<length($seq2)-$k;$i++)
should be
foreach (my $j=0;$j<length($seq2)-$k;$j++)
# ^^

Related

Split string to fixed length chunks and write in separate line in Raku

I have a file test.txt:
Stringsplittingskills
I want to read this file and write to another file out.txt with three characters in each line like
Str
ing
spl
itt
ing
ski
lls
What I did
my $string = "test.txt".IO.slurp;
my $start = 0;
my $elements = $string.chars;
# open file in writing mode
my $file_handle = "out.txt".IO.open: :w;
while $start < $elements {
my $line = $string.substr($start,3);
if $line.chars == 3 {
$file_handle.print("$line\n")
} elsif $line.chars < 3 {
$file_handle.print("$line")
}
$start = $start + 3;
}
# close file handle
$file_handle.close
This runs fine when the length of string is not multiple of 3. When the string length is multiple of 3, it inserts extra newline at the end of output file. How can I avoid inserting new line at the end when the string length is multiple of 3?
I tried another shorter approach,
my $string = "test.txt".IO.slurp;
my $file_handle = "out.txt".IO.open: :w;
for $string.comb(3) -> $line {
$file_handle.print("$line\n")
}
Still it suffers from same issue.
I looked for here, here but still unable to solve it.
spurt "out.txt", "test.txt".IO.comb(3).join("\n")
Another approach using substr-rw.
subset PositiveInt of Int where * > 0;
sub break( Str $str is copy, PositiveInt $length )
{
my $i = $length;
while $i < $str.chars
{
$str.substr-rw( $i, 0 ) = "\n";
$i += $length + 1;
}
$str;
}
say break("12345678", 3);
Output
123
456
78
The correct answer is of course to use .comb and .join.
That said, this is how you might fix your code.
You could change the if line to check if it is at the end, and use else.
if $start+3 < $elements {
$file_handle.print("$line\n")
} else {
$file_handle.print($line)
}
Personally I would change it so that only the addition of \n is conditional.
while $start < $elements {
my $line = $string.substr($start,3);
$file_handle.print( $line ~ ( "\n" x ($start+3 < $elements) ));
$start += 3;
}
This works because < returns either True or False.
Since True == 1 and False == 0, the x operator repeats the \n at most once.
'abc' x 1; # 'abc'
'abc' x True; # 'abc'
'abc' x 0; # ''
'abc' x False; # ''
If you were very cautious you could use x+?.
(Which is actually 3 separate operators.)
'abc' x 3; # 'abcabcabc'
'abc' x+? 3; # 'abc'
infix:« x »( 'abc', prefix:« + »( prefix:« ? »( 3 ) ) );
I would probably use loop if I were going to structure it like this.
loop ( my $start = 0; $start < $elements ; $start += 3 ) {
my $line = $string.substr($start,3);
$file_handle.print( $line ~ ( "\n" x ($start+3 < $elements) ));
}
Or instead of adding a newline to the end of each line, you could add it to the beginning of every line except the first.
while $start < $elements {
my $line = $string.substr($start,3);
my $nl = "\n";
# clear $nl the first time through
once $nl = "";
$file_handle.print($nl ~ $line);
$start = $start + 3;
}
At the command line prompt, three one-liner solutions below.
Using comb and batch (retains incomplete set of 3 letters at end):
~$ echo 'StringsplittingskillsX' | perl6 -ne '.join.put for .comb.batch(3);'
Str
ing
spl
itt
ing
ski
lls
X
Simplifying (no batch, only comb):
~$ echo 'StringsplittingskillsX' | perl6 -ne '.put for .comb(3);'
Str
ing
spl
itt
ing
ski
lls
X
Alternatively, using comb and rotor (discards incomplete set of 3 letters at end):
~$ echo 'StringsplittingskillsX' | perl6 -ne '.join.put for .comb.rotor(3);'
Str
ing
spl
itt
ing
ski
lls

What is the fastest way to increment a string in perl?

I would like to append a string in perl within a loop in a fast way, without having to copy the string for each iteration. I'm looking for something like StringBuilder from Java or C#.
I currently know the following alternatives in mind, in order to do 'a += b'.
a .= b # concat
a = join('', a, b); # join
push #a, b # array push
I am not interested in copying all string to the other. I need to copy one character per time, or append small strings foreach iteration. I am trying to solve the following problem: compress the input string 'aaabbccc' to '3a2b3c'. So the idea is to iterate over the input string, check how many repeated characters we have, and then append to the output in the compressed way. What is the most efficient to perform this in perl ?
Here is a link to the problem I was trying to solve. I's slightly different though.
For comparsion, I tried to test different versions for solving your actual problem of compressing the string. Here is my test script test.pl:
use strict;
use warnings;
use Benchmark qw(cmpthese);
use Inline C => './compress_c.c';
my $str_len = 10000;
my #chars = qw(a b c d);
my $str;
$str .= [#chars]->[rand 4] for 1 .. $str_len;
cmpthese(
-1,
{
compress_array => sub { compress_array( $str ) },
compress_regex => sub { compress_regex( $str ) },
compress_str => sub { compress_str( $str ) },
compress_c => sub { compress_c( $str ) },
}
);
# Suggested by #melpomene in the comments
sub compress_regex {
return $_[0] =~ s/([a-z])\1+/($+[0] - $-[0]) . $1/egr;
}
sub compress_array {
my $result = '';
my #chrs = split //, $_[0];
my $prev = $chrs[0];
my $count = 1;
my #result;
for my $i ( 1..$#chrs ) {
my $char = $chrs[$i];
if ( $prev eq $char ) {
$count++;
next if $i < $#chrs;
}
if ( $count > 1) {
push #result, $count, $prev;
}
else {
push #result, $prev;
}
if ( ( $i == $#chrs ) and ( $prev ne $char ) ) {
push #result, $char;
last;
}
$count = 1;
$prev = $char;
}
return join '', #result;
}
sub compress_str {
my $result = '';
my $prev = substr $_[0], 0, 1;
my $count = 1;
my $lastind = (length $_[0]) - 1;
for my $i (1 .. $lastind) {
my $char = substr $_[0], $i, 1;
if ( $prev eq $char ) {
$count++;
next if $i < $lastind;
}
if ( $count > 1) {
$result .= $count;
}
$result .= $prev;
if ( ( $i == $lastind ) and ( $prev ne $char ) ) {
$result .= $char;
last;
}
$count = 1;
$prev = $char;
}
return $result;
}
where compress_c.c is:
SV *compress_c(SV* str_sv) {
STRLEN len;
char* str = SvPVbyte(str_sv, len);
SV* result = newSV(len);
char *buf = SvPVX(result);
char prev = str[0];
int count = 1;
int j = 0;
int i;
for (i = 1; i < len; i++ )
{
char cur = str[i];
if ( prev == cur ) {
count++;
if ( i < (len - 1) )
continue;
}
if ( count > 1) {
buf[j++] = count + '0'; // assume count is less than 10
}
buf[j++] = prev;
if ( (i == (len - 1)) && (prev != cur) ) buf[j++] = cur;
count = 1;
prev = cur;
}
buf[j] = '\0';
SvPOK_on(result);
SvCUR_set(result, j);
return result;
}
The result of running perl test.pl:
Rate compress_array compress_str compress_regex compress_c
compress_array 311/s -- -42% -45% -99%
compress_str 533/s 71% -- -6% -98%
compress_regex 570/s 83% 7% -- -98%
compress_c 30632/s 9746% 5644% 5273% --
Which shows that regex version is slightly faster than the string version. However, the C version is the fastest, and it is about 50 times as fast as the regex version.
Note: I tested this on my Ubuntu 16.10 laptop (Intel Core i7-7500U CPU # 2.70GHz)
I've performed the following benchmark in several ways to perform that:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(cmpthese);
my $dna;
$dna .= [qw(G A T C)]->[rand 4] for 1 .. 10000;
sub frequency_concat {
my $result = '';
for my $idx (0 .. length($dna) - 1) {
$result .= substr($dna, $idx, 1);
}
return $result;
}
sub frequency_join {
my $result = '';
for my $idx (0 .. length($dna) - 1) {
$result = join '', $result, substr($dna,$idx,1);
}
return $result;
}
sub frequency_list_push {
my #result = ();
for my $idx (0 .. length($dna) - 1) {
push #result, substr($dna,$idx,1);
}
return join '', #result;
}
sub frequency_list_prealloc {
my #result = (' ' x length($dna));
for my $idx (0 .. length($dna) - 1) {
$result[$idx] = substr($dna,$idx,1);
}
return join '', #result;
}
cmpthese(-1, # Run each for at least 1 second(s) {
concat => \&frequency_concat,
join => \&frequency_join,
list_push => \&frequency_list_push,
list_list_prealloc => \&frequency_list_prealloc
}
);
The results below have shown that the concat (a . b) is the fastest operation. I don't understand why, since this will need to make several copies of the string.
Rate join list_push list_list_prealloc concat
join 213/s -- -38% -41% -74%
list_push 342/s 60% -- -5% -58%
list_list_prealloc 359/s 68% 5% -- -56%
concat 822/s 285% 140% 129% --

Keeping count with threads in perl

Im trying to count whenever a thread is done in perl, and print the count. but this is not working. i keep getting either "0" or "1", im trying to add to the count then print the count right after the get request is made.
use strict;
use threads;
use LWP::UserAgent;
our $MAX //= $ARGV[1];
my $list = $ARGV[0];
open my $handle, '<', $list;
chomp(my #array = <$handle>);
close $handle;
my $lines = `cat $list | wc -l`;
my $count = 0;
my #threads;
foreach $_ (#array) {
push #threads, async{
my #chars = ("a".."z");
my $random = join '', map { #chars[rand #chars] } 1 .. 6;
my $ua = LWP::UserAgent->new;
my $url = $_ . '?session=' . $random;
my $response = $ua->get($url);
count++;
print $count;
};
sleep 1 while threads->list( threads::running ) > $MAX;
}
$_->join for #threads;
Just to summarise points in comments by #choroba and myself, and not leave the question without an answer.
You would need to include:
use threads::shared;
in your code, along with all the other use elements.
And to indicate that variable $count is shared:
my $count :shared = 0;
EDIT As per Ikegami's comment, you would have to lock the variable if you want to modify it, to avoid problems of concurrency.
{
lock($count);
$count++;
print $count;
}
And that should be enough for the variable $count to be shared.

Splitting a numerical string in Perl

I have a numerical string:
"13245988"
I want to split before and after consecutive numbers.
Expected output is:
1
32
45
988
Here is what I've tried:
#!/usr/bin/perl
use strict;
use warnings;
my $a="132459";
my #b=split("",$a);
my $k=0;
my #c=();
for(my $i=0; $i<=#b; $i++) {
my $j=$b[$i]+1;
if($b[$i] == $j) {
$c[$k].=$b[$i];
} else {
$k++;
$c[$k]=$b[$i];
$k++;
}
}
foreach my $z (#c) {
print "$z\n";
}
Editing based on clarified question. Something like this should work:
#!/usr/bin/perl
use strict;
use warnings;
my $a = "13245988";
my #b = split("",$a);
my #c = ();
push #c, shift #b; # Put first number into result.
for my $num (#b) { # Loop through remaining numbers.
my $last = $c[$#c] % 10; # Get the last digit of the last entry.
if(( $num <= $last+1) && ($num >= $last-1)) {
# This number is within 1 of the last one
$c[$#c] .= $num; # Append this one to it
} else {
push #c, $num; # Non-consecutive, add a new entry;
}
}
foreach my $z (#c) {
print "$z\n";
}
Output:
1
32
45
988

Calculating the Mean from aPerl Script

I m still in here. ;)
I've got this code from a very expert guy, and I'm shy to ask him this basic questions...anyway this is my question now; this Perl Script prints the median of a column of numbers delimited space, and, I added some stuff to get the size of it, now I'm trying to get the sum of the same column. I did and got not results, did I not take the right column? ./stats.pl 1 columns.txt
#!/usr/bin/perl
use strict;
use warnings;
my $index = shift;
my $filename = shift;
my $columns = [];
open (my $fh, "<", $filename) or die "Unable to open $filename for reading\n";
for my $row (<$fh>) {
my #vals = split/\s+/, $row;
push #{$columns->[$_]}, $vals[$_] for 0 .. $#vals;
}
close $fh;
my #column = sort {$a <=> $b} #{$columns->[$index]};
my $offset = int($#column / 2);
my $length = 2 - #column % 2;
my #medians = splice(#column, $offset, $length);
my $median;
$median += $_ for #medians;
$median /= #medians;
print "MEDIAN = $median\n";
################################################
my #elements = #{$columns->[$index]};
my $size = #elements;
print "SIZE = $size\n";
exit 0;
#################################################
my $sum = #{$columns->[$index]};
for (my $size=0; $size < $sum; $size++) {
my $mean = $sum/$size;
};
print "$mean\n";
thanks in advance.
OK some pointers to get you going :
You can put all the numbers into an array :
my #result = split(m/\d+/, $line);
#average
use List::Util qw(sum);
my $sum = sum(#result);
You can then access individual columns with $result[$index] where index is the number of column you want to access.
Also note that :
$total = $line + $total;
$count = $count + 1;
Can be rewritten as :
$total += $line;
$count += 1;
Finally make sure that you are reading the file :
put a "debugging" print into the while loop :
print $line, "\n";
This should get you going :)

Resources