Substring with offset going reverse - string

I looked at documentation of substr in Perl and see the 'offset' which is the position to start and 'length' is how long. If 'length' is negative, that's how many characters to leave off the right end of the string. How do I get a substring before the 'offset'? Example
my $string = "HelloWorld";
my $sub = someFunction(string=$string, offset=5, lengthBefore=2); # I know this is not Perl syntax for sub-routine but wanted to show the inputs I need
print "$sub\n";
I want to get
oW
Is there a function to do this in Perl? Any help would be greatly appreciated! Thank you.

sub someFunction {
my ($s, $i, $j) = #_;
my $start = $i-$j+1;
$start = 0 if $start < 0;
my $length = $i-$start+1;
return substr($s, $start, $length);
}
or
sub someFunction {
my ($s, $i, $j) = #_;
return substr(substr($s, 0, $i+1), -$j);
}
That assumes that someFunction("012345", 2, 4) should return 012.

Related

Find the index of the Nth occurrence of a character in a string

I've found index and rindex for finding the first or last occurrence of a character (or substring) in a string. I'm also aware that they have an offset that can be used to start at a certain index.
What I want to know is if there is a simple way to find the index of the Nth occurrence of a character or substring in a string. I'd prefer not to have to do it with a regex and would rather not to have to write a loop that just repeatedly calls index with an offset.
EDIT: I didn't state the restriction well. The reason I said "no loop" is I am looking for a built-in way to do it, which exists in many languages.
One possible implementation:
use strict;
use warnings;
use feature qw(say);
my $string = 'the quick brown fox jumped over the lazy dog';
my $substring = 'o';
my $n = 4;
sub nth_index {
my ($string, $substring, $n) = #_;
my ($times, $index) = (0, 0);
while ( $times < $n && $index != -1 ) {
$index = index(
$string,
$substring,
$times == 0
? 0
: $index + length($substring),
);
$times++;
}
return $index;
}
say nth_index($string, $substring, $n); # 42
Here are two examples of how I would solve the problem
Subroutine nth_index1 uses index, while nth_index2 uses a regex. Both involve loops, as any solution must
I believe the regex solution is much more readable, while the index solution is probably a little faster. But they are both so quick that they are highly unlikely to cause a bottleneck, and readability is always paramount
use strict;
use warnings 'all';
my $s = 'the quick brown fox jumps over the lazy dog';
my $ss = 'o';
for my $n ( 1 .. 4 ) {
printf "%d %d\n",
nth_index1($s, $ss, $n),
nth_index2($s, $ss, $n);
}
sub nth_index1 {
my ($s, $ss, $n) = #_;
my $i;
my $len = length $ss;
while ( $n-- ) {
$i = index($s, $ss, $i ? $i + $len : 0 );
return if $i < 0;
}
$i;
}
sub nth_index2 {
my ($s, $ss, $n) = #_;
while ( $s =~ /$ss/g ) {
return $-[0] unless --$n;
}
return;
}
output
12 12
17 17
26 26
41 41
As stated, there is no built-in for this. Here are a few ways, using split, index, and regex.
use warnings;
use strict;
use feature qw(say);
my $str = "Xab_ab_ab_ab_"; # 'Xab_ab'; # test failed (3) matches
my $N = 3;
foreach my $patt qw(a ab c) {
say "Find index of occurrence $N of |$patt| in: |$str|";
say "index: ", ( ind_Nth_match_1($str, $patt, $N) // "no $N matches" ); #/
say "split: ", ( ind_Nth_match_2($str, $patt, $N) // "no $N matches" ); #/
say "regex: ", ( ind_Nth_match_3($str, $patt, $N) // "no $N matches" ); #/
}
sub ind_Nth_match_1 {
my ($str, $patt, $N) = #_;
my ($pos, $cnt) = (0, 0);
while ($pos = index($str, $patt, $pos) + 1) { # != 0
return $pos-1 if ++$cnt == $N;
}
return;
}
sub ind_Nth_match_2 {
my ($str, $patt, $N) = #_;
my #toks = split /($patt)/, $str;
return if #toks < 2*$N;
return length( join '', #toks[0..2*$N-1] ) - length($patt);
}
sub ind_Nth_match_3 {
my ($str, $patt, $N) = #_;
my $cnt = 0;
while ($str =~ m/$patt/g) {
return $-[0] if ++$cnt == $N;
}
}
This prints
Find index of occurrence 3 of |a| in: |Xab_ab_ab_ab_|
index: 7
split: 7
regex: 7
Find index of occurrence 3 of |ab| in: |Xab_ab_ab_ab_|
index: 7
split: 7
regex: 7
Find index of occurrence 3 of |c| in: |Xab_ab_ab_ab_|
index: no 3 matches
split: no 3 matches
regex: no 3 matches
Notes
In split each delimiter is also returned in the output list, with capturing /($patt)/, for simpler length estimation. Thus we count 2*$N (and then take -1).
In regex the #- array is used, #LAST_MATCH_START, for the position of the last successful match. Here the /g in scalar context in while makes it jump from a match to the next in repeated executions, and $-[0] gives the starting position of the last (previous) such match.
The subs return undef if there are not required $N matches, including no matches at all.
Thanks to Borodin for comments on return from subs and on using #- instead of #+.
(This answer does not answer your question, but is here to help you down the road towards accepting a regex solution.)
The way you're asking for a numeric index of the position, it sounds like you're thinking of extracting data from the string once you have that number, in a way that C programmers might do it.
For example, say you had the string
my $str = "My daddy left home when I was three and he didn't leave much for ma and me";
and you wanted to extract all the data up to the first instance of the word "and". Here's the way you could do it, which is sort of a C way to do it using Perl.
my $pos = find_index_of_first_occurrence( $str, 'and' );
# Note that find_index_of_first_occurrence() is a hypothetical function.
print substr( $str, 0, $pos );
# Prints "My daddy left home when I was three "
The way you'd do that in Perl with regexes is much simpler.
$str =~ /^(.*?)and/;
print $1;
With regexes, you're combining the searching for the string and the extraction of the data in one operation. (Note that both code snippets ignore the case of not finding "and" at all, for the sake of simplicity)
I understand that you don't know regexes very well yet, and that regexes can be daunting at first, but you'll need to understand them as part of learning Perl if you're going to succeed with the language.

Can a var concatenated to a string be manipulated by reference

I need to initialize a string with a fixed text concatenated to a variable like this:
my $id = 0;
my $text ="This is an example, id: ".$id."\n";
Now, in a imaginay loop for 0->9, I want to modify only the $id value without changing the fixed text.
I guessed that using references should work like this way
for($i = 0; $i < 9; $i++) {
my $rid = \$id;
${$rid}++;
print $text;
}
Wanted output is
This is an example, id: 0
This is an example, id: 1
This is an example, id: 2
and so on...but it's not working.
Am I misunderstanding referencing system?
You are missunderstanding the reference system.
with
my $id = 0;
my $text ="This is an example, id: ".$id."\n";
The text is concatinated with the value of id at that point, in this case 0. This text loses all connection with the varable $id. Then in the loop
for($i = 0; $i < 9; $i++) {
my $rid = \$id;
${$rid}++;
print $text;
}
You are incrementing the $id variable using $rid( which in becomes another name for $id at my $rid = \$id; but this will have no affect on the text as it has no reference to the variable $id.
The cleanest way of doing what your trying to do is to use a closure
my $id = 0;
my $textfunc = sub { return "This is an example, id: ".$id."\n" };
then in your loop do
for($i = 0; $i < 9; $i++) {
$id++;
print $textfunc->();
}
As Sinan pointed out there is an easier way to do this. If you want to keep the $text string separate for maintainability and/or reuse, you may also consider using sprintf, e.g.:
my $id = 0;
my $max_id = 9;
my $text = "This is an example, id: %d\n";
for (my $i = $id; $i < $max_id; $i++) {
print sprintf($text, $i+1);
}
You seem to be confused about references. Maybe you are thinking thinking of the following C pointer scenario:
char text[] = "This is a test xx\n";
char *cursor = text + 15;
*cursor = ' 1';
I don't know what thought process can bring about the impression that once you interpolate the contents of $id into my $x = "Test string $id", you can change the value of the interpolated string by changing the value of $id.
As I said, you really are confused.
Now, if you want a subroutine someplace to be able to format some output without embedding in the subroutine the output format, you can pass as one of the arguments to the subroutine a message formatter as in:
my $formatter = sub { sprintf 'The error code is %d', $_[0] };
forbnicate([qw(this that and the other)], $formatter);
sub frobnicate {
my $args = shift;
my $formatter = shift;
# ...
for my $i (0 .. 9) {
print $formatter->($i), "\n";
}
return;
}
This is bound to get tedious, so you can basically have a package of formatters, and let subs use whatever formatters they need:
package My::Formatters;
sub error_code {
my $class = shift;
return sprintf 'The error code is %d', $_[0];
}
In the main script:
use My::Formatters;
for my $i (0 .. 9) {
My::Formatters->error_code($i);
}

Why is my word frequency counter example written in Perl failing to produce useful output?

I am very new to Perl, and I am trying to write a word frequency counter as a learning exercise.
However, I am not able to figure out the error in my code below, after working on it. This is my code:
$wa = "A word frequency counter.";
#wordArray = split("",$wa);
$num = length($wa);
$word = "";
$flag = 1; # 0 if previous character was an alphabet and 1 if it was a blank.
%wordCount = ("null" => 0);
if ($num == -1) {
print "There are no words.\n";
} else {
print "$length";
for $i (0 .. $num) {
if(($wordArray[$i]!=' ') && ($flag==1)) { # start of a new word.
print "here";
$word = $wordArray[$i];
$flag = 0;
} elsif ($wordArray[$i]!=' ' && $flag==0) { # continuation of a word.
$word = $word . $wordArray[$i];
} elsif ($wordArray[$i]==' '&& $flag==0) { # end of a word.
$word = $word . $wordArray[$i];
$flag = 1;
$wordCount{$word}++;
print "\nword: $word";
} elsif ($wordArray[$i]==" " && $flag==1) { # series of blanks.
# do nothing.
}
}
for $i (keys %wordCount) {
print " \nword: $i - count: $wordCount{$i} ";
}
}
It's neither printing "here", nor the words. I am not worried about optimization at this point, though any input in that direction would also be much appreciated.
This is a good example of a problem where Perl will help you work out what's wrong if you just ask it for help. Get used to always adding the lines:
use strict;
use warnings;
to the top of your Perl programs.
Fist off,
$wordArray[$i]!=' '
should be
$wordArray[$i] ne ' '
according to the Perl documentation for comparing strings and characters. Basically use numeric operators (==, >=, …) for numbers, and string operators for text (eq, ne, lt, …).
Also, you could do
#wordArray = split(" ",$wa);
instead of
#wordArray = split("",$wa);
and then #wordArray wouldn't need to do the wonky character checking and you never would have had the problem. #wordArray will be split into the words already and you'll just have to count the occurrences.
You seem to be writing C in Perl. The difference is not just one of style. By exploding a string into a an array of individual characters, you cause the memory footprint of your script to explode as well.
Also, you need to think about what constitutes a word. Below, I am not suggesting that any \w+ is a word, rather pointing out the difference between \S+ and \w+.
#!/usr/bin/env perl
use strict; use warnings;
use YAML;
my $src = '$wa = "A word frequency counter.";';
print Dump count_words(\$src, 'w');
print Dump count_words(\$src, 'S');
sub count_words {
my $src = shift;
my $class = sprintf '\%s+', shift;
my %counts;
while ($$src =~ /(?<sequence> $class)/gx) {
$counts{ $+{sequence} } += 1;
}
return \%counts;
}
Output:
---
A: 1
counter: 1
frequency: 1
wa: 1
word: 1
---
'"A': 1
$wa: 1
=: 1
counter.";: 1
frequency: 1
word: 1

Calculating the Mean from aPerl Script

I m still in here. ;)
I've got this code from a very expert guy, and I'm shy to ask him this basic questions...anyway this is my question now; this Perl Script prints the median of a column of numbers delimited space, and, I added some stuff to get the size of it, now I'm trying to get the sum of the same column. I did and got not results, did I not take the right column? ./stats.pl 1 columns.txt
#!/usr/bin/perl
use strict;
use warnings;
my $index = shift;
my $filename = shift;
my $columns = [];
open (my $fh, "<", $filename) or die "Unable to open $filename for reading\n";
for my $row (<$fh>) {
my #vals = split/\s+/, $row;
push #{$columns->[$_]}, $vals[$_] for 0 .. $#vals;
}
close $fh;
my #column = sort {$a <=> $b} #{$columns->[$index]};
my $offset = int($#column / 2);
my $length = 2 - #column % 2;
my #medians = splice(#column, $offset, $length);
my $median;
$median += $_ for #medians;
$median /= #medians;
print "MEDIAN = $median\n";
################################################
my #elements = #{$columns->[$index]};
my $size = #elements;
print "SIZE = $size\n";
exit 0;
#################################################
my $sum = #{$columns->[$index]};
for (my $size=0; $size < $sum; $size++) {
my $mean = $sum/$size;
};
print "$mean\n";
thanks in advance.
OK some pointers to get you going :
You can put all the numbers into an array :
my #result = split(m/\d+/, $line);
#average
use List::Util qw(sum);
my $sum = sum(#result);
You can then access individual columns with $result[$index] where index is the number of column you want to access.
Also note that :
$total = $line + $total;
$count = $count + 1;
Can be rewritten as :
$total += $line;
$count += 1;
Finally make sure that you are reading the file :
put a "debugging" print into the while loop :
print $line, "\n";
This should get you going :)

How to turn an array of words into an array containing the characters of the words in order?

I have an array of unknown number of words, with an unknown max length.
I need to convert it to another array, basically turning it into a column word array.
so with this original array:
#original_array = ("first", "base", "Thelongest12", "justAWORD4");
The resluting array would be:
#result_array = ("fbTj","iahu","rses","selt","t oA"," nW"," gO"," eR"," sD"," t4"," 1 "," 2 ");
Actually I will have this:
fbTj
iahu
rses
selt
t oA
nW
gO
eR
sD
t4
1
2
I need to do it in order to make a table, and these words are the table's headers.
I hope I have made myself clear, and appreciate the help you are willing to give.
I tried it with the split function, but I keep messing it up...
EDIT:
Hi all, thanks for all the tips and suggestions! I learned quite much from all of you hence the upvote. However I found tchrist's answer more convenient, maybe because I come from a c,c# background... :)
use strict;
use warnings;
use 5.010;
use Algorithm::Loops 'MapCarU';
my #original_array = ("first", "base", "Thelongest12", "justAWORD4");
my #result_array = MapCarU { join '', map $_//' ', #_ } map [split //], #original_array;
I have an old program that does this. Maybe you can adapt it:
$ cat /tmp/a
first
base
Thelongest12
justAWORD4
$ rot90 /tmp/a
fbTj
iahu
rses
selt
t oA
nW
gO
eR
sD
t4
1
2
Here’s the source:
$ cat ~/scripts/rot90
#!/usr/bin/perl
# rot90 - tchrist#perl.com
$/ = '';
# uncomment for easier to read, but not reversible:
### #ARGV = map { "fmt -20 $_ |" } #ARGV;
while ( <> ) {
chomp;
#lines = split /\n/;
$MAXCOLS = -1;
for (#lines) { $MAXCOLS = length if $MAXCOLS < length; }
#vlines = ( " " x #lines ) x $MAXCOLS;
for ( $row = 0; $row < #lines; $row++ ) {
for ( $col = 0; $col < $MAXCOLS; $col++ ) {
$char = ( length($lines[$row]) > $col )
? substr($lines[$row], $col, 1)
: ' ';
substr($vlines[$col], $row, 1) = $char;
}
}
for (#vlines) {
# uncomment for easier to read, but again not reversible
### s/(.)/$1 /g;
print $_, "\n";
}
print "\n";
}
use strict;
use warnings;
use List::Util qw(max);
my #original_array = ("first", "base", "Thelongest12", "justAWORD4");
my #new_array = transpose(#original_array);
sub transpose {
my #a = map { [ split // ] } #_;
my $max = max(map $#$_, #a);
my #out;
for my $n (0 .. $max) {
$out[$n] .= defined $a[$_][$n] ? $a[$_][$n] : ' ' for 0 .. $#a;
}
return #out;
}
It can be done easily by this simple one-liner:
perl -le'#l=map{chomp;$m=length if$m<length;$_}<>;for$i(0..$m-1){print map substr($_,$i,1)||" ",#l}'

Resources