Splitting a numerical string in Perl - string

I have a numerical string:
"13245988"
I want to split before and after consecutive numbers.
Expected output is:
1
32
45
988
Here is what I've tried:
#!/usr/bin/perl
use strict;
use warnings;
my $a="132459";
my #b=split("",$a);
my $k=0;
my #c=();
for(my $i=0; $i<=#b; $i++) {
my $j=$b[$i]+1;
if($b[$i] == $j) {
$c[$k].=$b[$i];
} else {
$k++;
$c[$k]=$b[$i];
$k++;
}
}
foreach my $z (#c) {
print "$z\n";
}

Editing based on clarified question. Something like this should work:
#!/usr/bin/perl
use strict;
use warnings;
my $a = "13245988";
my #b = split("",$a);
my #c = ();
push #c, shift #b; # Put first number into result.
for my $num (#b) { # Loop through remaining numbers.
my $last = $c[$#c] % 10; # Get the last digit of the last entry.
if(( $num <= $last+1) && ($num >= $last-1)) {
# This number is within 1 of the last one
$c[$#c] .= $num; # Append this one to it
} else {
push #c, $num; # Non-consecutive, add a new entry;
}
}
foreach my $z (#c) {
print "$z\n";
}
Output:
1
32
45
988

Related

Split string to fixed length chunks and write in separate line in Raku

I have a file test.txt:
Stringsplittingskills
I want to read this file and write to another file out.txt with three characters in each line like
Str
ing
spl
itt
ing
ski
lls
What I did
my $string = "test.txt".IO.slurp;
my $start = 0;
my $elements = $string.chars;
# open file in writing mode
my $file_handle = "out.txt".IO.open: :w;
while $start < $elements {
my $line = $string.substr($start,3);
if $line.chars == 3 {
$file_handle.print("$line\n")
} elsif $line.chars < 3 {
$file_handle.print("$line")
}
$start = $start + 3;
}
# close file handle
$file_handle.close
This runs fine when the length of string is not multiple of 3. When the string length is multiple of 3, it inserts extra newline at the end of output file. How can I avoid inserting new line at the end when the string length is multiple of 3?
I tried another shorter approach,
my $string = "test.txt".IO.slurp;
my $file_handle = "out.txt".IO.open: :w;
for $string.comb(3) -> $line {
$file_handle.print("$line\n")
}
Still it suffers from same issue.
I looked for here, here but still unable to solve it.
spurt "out.txt", "test.txt".IO.comb(3).join("\n")
Another approach using substr-rw.
subset PositiveInt of Int where * > 0;
sub break( Str $str is copy, PositiveInt $length )
{
my $i = $length;
while $i < $str.chars
{
$str.substr-rw( $i, 0 ) = "\n";
$i += $length + 1;
}
$str;
}
say break("12345678", 3);
Output
123
456
78
The correct answer is of course to use .comb and .join.
That said, this is how you might fix your code.
You could change the if line to check if it is at the end, and use else.
if $start+3 < $elements {
$file_handle.print("$line\n")
} else {
$file_handle.print($line)
}
Personally I would change it so that only the addition of \n is conditional.
while $start < $elements {
my $line = $string.substr($start,3);
$file_handle.print( $line ~ ( "\n" x ($start+3 < $elements) ));
$start += 3;
}
This works because < returns either True or False.
Since True == 1 and False == 0, the x operator repeats the \n at most once.
'abc' x 1; # 'abc'
'abc' x True; # 'abc'
'abc' x 0; # ''
'abc' x False; # ''
If you were very cautious you could use x+?.
(Which is actually 3 separate operators.)
'abc' x 3; # 'abcabcabc'
'abc' x+? 3; # 'abc'
infix:« x »( 'abc', prefix:« + »( prefix:« ? »( 3 ) ) );
I would probably use loop if I were going to structure it like this.
loop ( my $start = 0; $start < $elements ; $start += 3 ) {
my $line = $string.substr($start,3);
$file_handle.print( $line ~ ( "\n" x ($start+3 < $elements) ));
}
Or instead of adding a newline to the end of each line, you could add it to the beginning of every line except the first.
while $start < $elements {
my $line = $string.substr($start,3);
my $nl = "\n";
# clear $nl the first time through
once $nl = "";
$file_handle.print($nl ~ $line);
$start = $start + 3;
}
At the command line prompt, three one-liner solutions below.
Using comb and batch (retains incomplete set of 3 letters at end):
~$ echo 'StringsplittingskillsX' | perl6 -ne '.join.put for .comb.batch(3);'
Str
ing
spl
itt
ing
ski
lls
X
Simplifying (no batch, only comb):
~$ echo 'StringsplittingskillsX' | perl6 -ne '.put for .comb(3);'
Str
ing
spl
itt
ing
ski
lls
X
Alternatively, using comb and rotor (discards incomplete set of 3 letters at end):
~$ echo 'StringsplittingskillsX' | perl6 -ne '.join.put for .comb.rotor(3);'
Str
ing
spl
itt
ing
ski
lls

Find the index of the Nth occurrence of a character in a string

I've found index and rindex for finding the first or last occurrence of a character (or substring) in a string. I'm also aware that they have an offset that can be used to start at a certain index.
What I want to know is if there is a simple way to find the index of the Nth occurrence of a character or substring in a string. I'd prefer not to have to do it with a regex and would rather not to have to write a loop that just repeatedly calls index with an offset.
EDIT: I didn't state the restriction well. The reason I said "no loop" is I am looking for a built-in way to do it, which exists in many languages.
One possible implementation:
use strict;
use warnings;
use feature qw(say);
my $string = 'the quick brown fox jumped over the lazy dog';
my $substring = 'o';
my $n = 4;
sub nth_index {
my ($string, $substring, $n) = #_;
my ($times, $index) = (0, 0);
while ( $times < $n && $index != -1 ) {
$index = index(
$string,
$substring,
$times == 0
? 0
: $index + length($substring),
);
$times++;
}
return $index;
}
say nth_index($string, $substring, $n); # 42
Here are two examples of how I would solve the problem
Subroutine nth_index1 uses index, while nth_index2 uses a regex. Both involve loops, as any solution must
I believe the regex solution is much more readable, while the index solution is probably a little faster. But they are both so quick that they are highly unlikely to cause a bottleneck, and readability is always paramount
use strict;
use warnings 'all';
my $s = 'the quick brown fox jumps over the lazy dog';
my $ss = 'o';
for my $n ( 1 .. 4 ) {
printf "%d %d\n",
nth_index1($s, $ss, $n),
nth_index2($s, $ss, $n);
}
sub nth_index1 {
my ($s, $ss, $n) = #_;
my $i;
my $len = length $ss;
while ( $n-- ) {
$i = index($s, $ss, $i ? $i + $len : 0 );
return if $i < 0;
}
$i;
}
sub nth_index2 {
my ($s, $ss, $n) = #_;
while ( $s =~ /$ss/g ) {
return $-[0] unless --$n;
}
return;
}
output
12 12
17 17
26 26
41 41
As stated, there is no built-in for this. Here are a few ways, using split, index, and regex.
use warnings;
use strict;
use feature qw(say);
my $str = "Xab_ab_ab_ab_"; # 'Xab_ab'; # test failed (3) matches
my $N = 3;
foreach my $patt qw(a ab c) {
say "Find index of occurrence $N of |$patt| in: |$str|";
say "index: ", ( ind_Nth_match_1($str, $patt, $N) // "no $N matches" ); #/
say "split: ", ( ind_Nth_match_2($str, $patt, $N) // "no $N matches" ); #/
say "regex: ", ( ind_Nth_match_3($str, $patt, $N) // "no $N matches" ); #/
}
sub ind_Nth_match_1 {
my ($str, $patt, $N) = #_;
my ($pos, $cnt) = (0, 0);
while ($pos = index($str, $patt, $pos) + 1) { # != 0
return $pos-1 if ++$cnt == $N;
}
return;
}
sub ind_Nth_match_2 {
my ($str, $patt, $N) = #_;
my #toks = split /($patt)/, $str;
return if #toks < 2*$N;
return length( join '', #toks[0..2*$N-1] ) - length($patt);
}
sub ind_Nth_match_3 {
my ($str, $patt, $N) = #_;
my $cnt = 0;
while ($str =~ m/$patt/g) {
return $-[0] if ++$cnt == $N;
}
}
This prints
Find index of occurrence 3 of |a| in: |Xab_ab_ab_ab_|
index: 7
split: 7
regex: 7
Find index of occurrence 3 of |ab| in: |Xab_ab_ab_ab_|
index: 7
split: 7
regex: 7
Find index of occurrence 3 of |c| in: |Xab_ab_ab_ab_|
index: no 3 matches
split: no 3 matches
regex: no 3 matches
Notes
In split each delimiter is also returned in the output list, with capturing /($patt)/, for simpler length estimation. Thus we count 2*$N (and then take -1).
In regex the #- array is used, #LAST_MATCH_START, for the position of the last successful match. Here the /g in scalar context in while makes it jump from a match to the next in repeated executions, and $-[0] gives the starting position of the last (previous) such match.
The subs return undef if there are not required $N matches, including no matches at all.
Thanks to Borodin for comments on return from subs and on using #- instead of #+.
(This answer does not answer your question, but is here to help you down the road towards accepting a regex solution.)
The way you're asking for a numeric index of the position, it sounds like you're thinking of extracting data from the string once you have that number, in a way that C programmers might do it.
For example, say you had the string
my $str = "My daddy left home when I was three and he didn't leave much for ma and me";
and you wanted to extract all the data up to the first instance of the word "and". Here's the way you could do it, which is sort of a C way to do it using Perl.
my $pos = find_index_of_first_occurrence( $str, 'and' );
# Note that find_index_of_first_occurrence() is a hypothetical function.
print substr( $str, 0, $pos );
# Prints "My daddy left home when I was three "
The way you'd do that in Perl with regexes is much simpler.
$str =~ /^(.*?)and/;
print $1;
With regexes, you're combining the searching for the string and the extraction of the data in one operation. (Note that both code snippets ignore the case of not finding "and" at all, for the sake of simplicity)
I understand that you don't know regexes very well yet, and that regexes can be daunting at first, but you'll need to understand them as part of learning Perl if you're going to succeed with the language.

How to move the decimal point N places to the left efficiently?

I have a bunch of decimal numbers (as strings) which I receive from an API. I need to 'unscale' them, i.e. divide them by some power of 10. This seems a simple task for integers, but I have decimals with no guaranteed range. So, basically I need a function that works like this:
move_point "12.34" 1; # "1.234"
move_point "12.34" 5; # "0.0001234"
I'd rather not use floats to avoid any rounding errors.
This is a bit verbose, but should do the trick:
sub move_point {
my ($n, $places) = #_;
die 'negative number of places' if $places < 0;
return $n if $places == 0;
my ($i, $f) = split /\./, $n; # split to integer/fractional parts
$places += length($f);
$n = sprintf "%0*s", $places+1, $i.$f; # left pad with enough zeroes
substr($n, -$places, 0, '.'); # insert the decimal point
return $n;
}
Demo:
my $n = "12.34";
for my $p (0..5) {
printf "%d %s\n", $p, move_point($n, $p);
}
0 12.34
1 1.234
2 0.1234
3 0.01234
4 0.001234
5 0.0001234
Unless your data has contains values with significantly more digits than you have shown then a floating-point value has more than enough accuracy for your purpose. Perl can reliably reproduce up to 16-digit values
use strict;
use warnings 'all';
use feature 'say';
say move_point("12.34", 1); # "1.234"
say move_point("12.34", 5); # "0.0001234"
say move_point("1234", 12);
say move_point("123400", -9);
sub move_point {
my ($v, $n) = #_;
my $dp = $v =~ /\.([^.]*)\z/ ? length $1 : 0;
$dp += $n;
$v /= 10**$n;
sprintf '%.*f', $dp < 0 ? 0 : $dp, $v;
}
output
1.234
0.0001234
0.000000001234
123400000000000
Update
If the limits of standard floating-point numbers are actually insuffcient for you then the core Math::BigFloat will do what you need
This program shows a number with sixteen digits of accuracy, multiplied by everything from 10E-20 to 10E20
use strict;
use warnings 'all';
use feature 'say';
use Math::BigFloat;
for ( -20 .. 20 ) {
say move_point('1234567890.1234567890', $_);
}
sub move_point {
my ($v, $n) = #_;
$v = Math::BigFloat->new($v);
# Build 10**$n
my $mul = Math::BigFloat->new(10)->bpow($n);
# Count new decimal places
my $dp = $v =~ /\.([^.]*)\z/ ? length $1 : 0;
$dp += $n;
$v->bdiv($mul);
$v->bfround(-$dp) if $dp >= 0;
$v->bstr;
}
output
123456789012345678900000000000
12345678901234567890000000000
1234567890123456789000000000
123456789012345678900000000
12345678901234567890000000
1234567890123456789000000
123456789012345678900000
12345678901234567890000
1234567890123456789000
123456789012345678900
12345678901234567890
1234567890123456789
123456789012345678.9
12345678901234567.89
1234567890123456.789
123456789012345.6789
12345678901234.56789
1234567890123.456789
123456789012.3456789
12345678901.23456789
1234567890.123456789
123456789.0123456789
12345678.90123456789
1234567.890123456789
123456.7890123456789
12345.67890123456789
1234.567890123456789
123.4567890123456789
12.34567890123456789
1.234567890123456789
0.1234567890123456789
0.01234567890123456789
0.001234567890123456789
0.0001234567890123456789
0.00001234567890123456789
0.000001234567890123456789
0.0000001234567890123456789
0.00000001234567890123456789
0.000000001234567890123456789
0.0000000001234567890123456789
0.00000000001234567890123456789

What's the mistake in my recursive subroutine?

This subroutine generates string combinations of the letters using the letters from A to the Mth letter of the Alphabet with length N.
sub genString
{
my($m,$n,$str,$letter,$temp,$i) = #_;
if($n == 0){
$letter = chr(ord("A")+($i+=1));
if($temp == 1){ print "$str\n"; }
else{
for($j = 0 ; $j < temp-1 ; $j++){
if(ord(substr($str,$j,1)) < ord(substr($str,$j+1,1))){$do_print = 1;}
else{
$do_print = 0;
break;
}
}
if($do_print == 1){ print "$str\n"; }
}
}
else{
for($j = ord($letter) ; $j < ord($letter)+$m ; $j++){
genString($m,$n-1,$str.chr($j),$letter,$temp,$i);
}
}
}
&genString($m,$n,$str,"A",$n,0);
Example:
Input: M=4; N=3;
Output: ABC ABD ACD BCD
I tried similar to this in Ruby and it works, but in Perl, it's an infinite loop, and I don't know why. I'm new here in Perl. What should I do? (Sorry if my code is kinda lengthy)
Please always use use strict; and use warnings; in your code, especially when posting code and asking for help. Also always declare local variables with my.
In this case even without having tried it I'm pretty sure something like $j referring to a global variable is causing you a lot of headache -- something use strict would have caught.
By default, variables are globals in perl (though undeclared and unqualified use of them will be prevented by use strict). For your recursion to work, you'll need to make some of them lexical, for instance, changing:
for($j = 0 ; $j < temp-1 ; $j++){
to
for (my $j = 0; $j < $temp-1; $j++) {
or better yet, just
for my $j (0..$temp-2) {
Your code is very hard to read. I can't understand the algorithm, and I don't see the purpose of so many parameters to the subroutine, especially $temp which doesn't appear to change, and you don't say what its initial value is set to in the outermost call.
This code appears to do what you want, with a similar algorithm
use strict;
use warnings;
genString(4, 3);
sub genString {
my ($m, $n, $str, $i) = #_;
if ($n == 0) {
print $str, "\n";
}
else {
for my $off ($i // 0 .. $m - $n) {
$str //= '';
genString($m, $n-1, $str.chr(ord('A') + $off), $off+1);
}
}
}
output
ABC
ABD
ACD
BCD

Why is my word frequency counter example written in Perl failing to produce useful output?

I am very new to Perl, and I am trying to write a word frequency counter as a learning exercise.
However, I am not able to figure out the error in my code below, after working on it. This is my code:
$wa = "A word frequency counter.";
#wordArray = split("",$wa);
$num = length($wa);
$word = "";
$flag = 1; # 0 if previous character was an alphabet and 1 if it was a blank.
%wordCount = ("null" => 0);
if ($num == -1) {
print "There are no words.\n";
} else {
print "$length";
for $i (0 .. $num) {
if(($wordArray[$i]!=' ') && ($flag==1)) { # start of a new word.
print "here";
$word = $wordArray[$i];
$flag = 0;
} elsif ($wordArray[$i]!=' ' && $flag==0) { # continuation of a word.
$word = $word . $wordArray[$i];
} elsif ($wordArray[$i]==' '&& $flag==0) { # end of a word.
$word = $word . $wordArray[$i];
$flag = 1;
$wordCount{$word}++;
print "\nword: $word";
} elsif ($wordArray[$i]==" " && $flag==1) { # series of blanks.
# do nothing.
}
}
for $i (keys %wordCount) {
print " \nword: $i - count: $wordCount{$i} ";
}
}
It's neither printing "here", nor the words. I am not worried about optimization at this point, though any input in that direction would also be much appreciated.
This is a good example of a problem where Perl will help you work out what's wrong if you just ask it for help. Get used to always adding the lines:
use strict;
use warnings;
to the top of your Perl programs.
Fist off,
$wordArray[$i]!=' '
should be
$wordArray[$i] ne ' '
according to the Perl documentation for comparing strings and characters. Basically use numeric operators (==, >=, …) for numbers, and string operators for text (eq, ne, lt, …).
Also, you could do
#wordArray = split(" ",$wa);
instead of
#wordArray = split("",$wa);
and then #wordArray wouldn't need to do the wonky character checking and you never would have had the problem. #wordArray will be split into the words already and you'll just have to count the occurrences.
You seem to be writing C in Perl. The difference is not just one of style. By exploding a string into a an array of individual characters, you cause the memory footprint of your script to explode as well.
Also, you need to think about what constitutes a word. Below, I am not suggesting that any \w+ is a word, rather pointing out the difference between \S+ and \w+.
#!/usr/bin/env perl
use strict; use warnings;
use YAML;
my $src = '$wa = "A word frequency counter.";';
print Dump count_words(\$src, 'w');
print Dump count_words(\$src, 'S');
sub count_words {
my $src = shift;
my $class = sprintf '\%s+', shift;
my %counts;
while ($$src =~ /(?<sequence> $class)/gx) {
$counts{ $+{sequence} } += 1;
}
return \%counts;
}
Output:
---
A: 1
counter: 1
frequency: 1
wa: 1
word: 1
---
'"A': 1
$wa: 1
=: 1
counter.";: 1
frequency: 1
word: 1

Resources