how to compare 2 strings by each characters in perl - string

basically I want to compare
$a = "ABCDE";
$b = "--(-)-";
and get output CE.
i.e where ever parentheses occur the characters of $a should be taken.

One of the rare uses of the bitwise or-operator.
# magic happens here ↓
perl -E'say (("ABCDE" | "--(-)-" =~ tr/-()/\377\000/r) =~ tr/\377//dr)'
prints CE.
Use this for golfing purposes only, AHA’s solution is much more maintainable.

Simple regex and pos solution:
my $str = "ABCDE";
my $pat = "--(-)-";
my #list;
while ($pat =~ /(?=[()])/g) {
last if pos($pat) > length($str); # Required to prevent matching outside $x
my $char = substr($str, pos($y), 1);
push #list, $char;
}
print #list;
Note the use of lookahead to get the position before the matching character.
Combined with Axeman's use of the #- variable we can get an alternative loop:
while ($pat =~ /[()]/g) {
last if $-[0] > length($str);
my $char = substr($str, $-[0], 1);
push #list, $char;
}
This is pretty much mentioned in the documentation for #-:
After a match against some variable $var :
....
$& is the same as substr($var, $-[0], $+[0] - $-[0])
In other words, the matched string $& equals that substring expression. If you replace $var with another string, you would get the characters matching the same positions.
In my example, the expression $+[0] - $-[0] (offset of end of match minus offset of start of match) would be 1, since that is the max length of the matching regex.
QED.

This uses the idea that you can scan one string for positions and just take the values of the other strings. #s is a reusable product.
use strict;
use warnings;
sub chars {
my $source = shift;
return unless #_;
my #chars = map { substr( $source, $_, 1 ) } #_;
return wantarray ? #chars, join( '', #chars );
}
my $a = "ABCDE";
my $b = "--(-)-";
my #s;
push #s, #- while $b =~ m/[()]/g;
my $res = chars( $a, #s );

Way faster than all the solutions except daxim's, and almost as fast as daxim's without preventing the use of characters 255 and above:
my $pat = $b =~ s/[^()]/.?/gr =~ s/[()]/(.?)/gr
my $c = join '', $a =~ /^$pat/s;
It changes
---(-)-
to
.?.?.?(.?).?(.?).?
Then uses the result as regex pattern to extract the desired characters.

This is easy to accomplish using each_array, each_arrayref or pairwise from List::MoreUtils:
#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw( min );
use List::MoreUtils qw( each_array );
my $string = 'ABCDE';
my $pattern = '--(-)-';
my #string_chars = split //, $string;
my #pattern_chars = split //, $pattern;
# Equalise length
my $min_length = min $#string_chars, $#pattern_chars;
$#string_chars = $#pattern_chars = $min_length;
my $ea = each_array #string_chars, #pattern_chars;
while ( my ( $string_char, $pattern_char ) = $ea->() ) {
print $string_char if $pattern_char =~ /[()]/;
}
Using pairwise:
{
no warnings qw( once );
print pairwise {
$a if $b =~ /[()]/;
} #string_chars, #pattern_chars;
}
Without using List::MoreUtils:
for ( 0 .. $#string_chars ) {
print $string_chars[$_] if $pattern_chars[$_] =~ /[()]/;
}
Thanks to TLP for discovering the set $# technique without which this solution will have been longer and complicated. :-)

#!/usr/bin/perl
use strict;
use warnings;
my $a = "ABCDE";
my $b = "--(-)-";
my ($i, $c, $x, $y) = 0;
$c .= $y =~ /\(|\)/ ? $x : "" while ($x = substr $a, $i, 1) && ($y = substr $b, $i++, 1);
print "$c\n";

Related

Find the index of the Nth occurrence of a character in a string

I've found index and rindex for finding the first or last occurrence of a character (or substring) in a string. I'm also aware that they have an offset that can be used to start at a certain index.
What I want to know is if there is a simple way to find the index of the Nth occurrence of a character or substring in a string. I'd prefer not to have to do it with a regex and would rather not to have to write a loop that just repeatedly calls index with an offset.
EDIT: I didn't state the restriction well. The reason I said "no loop" is I am looking for a built-in way to do it, which exists in many languages.
One possible implementation:
use strict;
use warnings;
use feature qw(say);
my $string = 'the quick brown fox jumped over the lazy dog';
my $substring = 'o';
my $n = 4;
sub nth_index {
my ($string, $substring, $n) = #_;
my ($times, $index) = (0, 0);
while ( $times < $n && $index != -1 ) {
$index = index(
$string,
$substring,
$times == 0
? 0
: $index + length($substring),
);
$times++;
}
return $index;
}
say nth_index($string, $substring, $n); # 42
Here are two examples of how I would solve the problem
Subroutine nth_index1 uses index, while nth_index2 uses a regex. Both involve loops, as any solution must
I believe the regex solution is much more readable, while the index solution is probably a little faster. But they are both so quick that they are highly unlikely to cause a bottleneck, and readability is always paramount
use strict;
use warnings 'all';
my $s = 'the quick brown fox jumps over the lazy dog';
my $ss = 'o';
for my $n ( 1 .. 4 ) {
printf "%d %d\n",
nth_index1($s, $ss, $n),
nth_index2($s, $ss, $n);
}
sub nth_index1 {
my ($s, $ss, $n) = #_;
my $i;
my $len = length $ss;
while ( $n-- ) {
$i = index($s, $ss, $i ? $i + $len : 0 );
return if $i < 0;
}
$i;
}
sub nth_index2 {
my ($s, $ss, $n) = #_;
while ( $s =~ /$ss/g ) {
return $-[0] unless --$n;
}
return;
}
output
12 12
17 17
26 26
41 41
As stated, there is no built-in for this. Here are a few ways, using split, index, and regex.
use warnings;
use strict;
use feature qw(say);
my $str = "Xab_ab_ab_ab_"; # 'Xab_ab'; # test failed (3) matches
my $N = 3;
foreach my $patt qw(a ab c) {
say "Find index of occurrence $N of |$patt| in: |$str|";
say "index: ", ( ind_Nth_match_1($str, $patt, $N) // "no $N matches" ); #/
say "split: ", ( ind_Nth_match_2($str, $patt, $N) // "no $N matches" ); #/
say "regex: ", ( ind_Nth_match_3($str, $patt, $N) // "no $N matches" ); #/
}
sub ind_Nth_match_1 {
my ($str, $patt, $N) = #_;
my ($pos, $cnt) = (0, 0);
while ($pos = index($str, $patt, $pos) + 1) { # != 0
return $pos-1 if ++$cnt == $N;
}
return;
}
sub ind_Nth_match_2 {
my ($str, $patt, $N) = #_;
my #toks = split /($patt)/, $str;
return if #toks < 2*$N;
return length( join '', #toks[0..2*$N-1] ) - length($patt);
}
sub ind_Nth_match_3 {
my ($str, $patt, $N) = #_;
my $cnt = 0;
while ($str =~ m/$patt/g) {
return $-[0] if ++$cnt == $N;
}
}
This prints
Find index of occurrence 3 of |a| in: |Xab_ab_ab_ab_|
index: 7
split: 7
regex: 7
Find index of occurrence 3 of |ab| in: |Xab_ab_ab_ab_|
index: 7
split: 7
regex: 7
Find index of occurrence 3 of |c| in: |Xab_ab_ab_ab_|
index: no 3 matches
split: no 3 matches
regex: no 3 matches
Notes
In split each delimiter is also returned in the output list, with capturing /($patt)/, for simpler length estimation. Thus we count 2*$N (and then take -1).
In regex the #- array is used, #LAST_MATCH_START, for the position of the last successful match. Here the /g in scalar context in while makes it jump from a match to the next in repeated executions, and $-[0] gives the starting position of the last (previous) such match.
The subs return undef if there are not required $N matches, including no matches at all.
Thanks to Borodin for comments on return from subs and on using #- instead of #+.
(This answer does not answer your question, but is here to help you down the road towards accepting a regex solution.)
The way you're asking for a numeric index of the position, it sounds like you're thinking of extracting data from the string once you have that number, in a way that C programmers might do it.
For example, say you had the string
my $str = "My daddy left home when I was three and he didn't leave much for ma and me";
and you wanted to extract all the data up to the first instance of the word "and". Here's the way you could do it, which is sort of a C way to do it using Perl.
my $pos = find_index_of_first_occurrence( $str, 'and' );
# Note that find_index_of_first_occurrence() is a hypothetical function.
print substr( $str, 0, $pos );
# Prints "My daddy left home when I was three "
The way you'd do that in Perl with regexes is much simpler.
$str =~ /^(.*?)and/;
print $1;
With regexes, you're combining the searching for the string and the extraction of the data in one operation. (Note that both code snippets ignore the case of not finding "and" at all, for the sake of simplicity)
I understand that you don't know regexes very well yet, and that regexes can be daunting at first, but you'll need to understand them as part of learning Perl if you're going to succeed with the language.

What is the best way to convert a string to a string with separators in perl

How do I convert $var = "000000000" to $var = "0_0000_0000" in Perl ?
If the string is always 9 characters long, you can just use substr:
my $var = '000000000';
substr($var, 5, 0) = '_';
substr($var, 1, 0) = '_';
For formatting strings of arbitrary length you could use a function like this:
sub format_str {
my $str = reverse $_[0];
$str =~ s/(.{4})(?=.)/$1_/g;
return scalar reverse $str;
}
my $var = "000000000";
print format_str $var; # "0_0000_0000"
$var = "000000000";
$var2 = substr($var,0,1)."_".substr($var,1,4)."_".substr($var,5);
print $var2;
Assuming you're asking how to insert a _ after the first and fifth characters of a string, the following are a variety of straightforward solutions:
my $in = '000000000';
my $out = substr($var,0,1) . '_' . substr($var,1,4) . '_' . substr($var,5);
my $in = '000000000';
my $out = join('_', substr($var,0,1), substr($var,1,4), substr($var,5));
my $in = '000000000';
my $out = join('_', unpack('a1 a4 a4*', $in));
my $in = '000000000';
my $out = $in =~ s/^(.)(.{4})/${1}_${2}_/sr; # 5.14+
my $in = '000000000';
( my $out = $in ) =~ s/^(.)(.{4})/${1}_${2}_/s;
In-place:
my $var = '000000000';
$var =~ s/^(.)(.{4})/${1}_${2}_/s;
my $var = '000000000';
substr($var, 5, 0) = '_';
substr($var, 1, 0) = '_';
For a solution for any-length string and considering efficiency issues that arise for very-long strings, please see my previous question&answer: How to chunk text "from the back" in perl.
Per suggestion in comment, here is code using the idea in the linked question/answer which answers the OP question:
use integer;
my $la = length($var);
my $r = $la % 4;
my $q = $la / 4;
my $tr = $r ? "a$r" : "";
$var = join "_", unpack "$tr(a4)$q", $var;
Note: change all three 4s for a different grouping size.
If this is a commify problem that is solved in "How can I output my numbers with commas added?", available as perldoc -q 'commas added', then a similar solution will suffice, with extra parameters to define the separator and the size of the interval
You will want to read the perlfaq entry for other alternatives
use strict;
use warnings 'all';
print group_characters(1234567), "\n";
print group_characters('000000000', '_', 4), "\n";
print group_characters('0123456789ABCDEF', ' ', 4), "\n";
sub group_characters {
my ($s, $sep, $n) = #_;
$sep //= ',';
$n //= 3;
1 while $s =~ s/[^$sep]+\K(?=[^$sep]{$n})/$sep/;
$s;
}
output
1,234,567
0_0000_0000
0123 4567 89AB CDEF

How to divide string in perl for 2 parts

so I need to divide my string in perl for 2 parts. For example I have:
$string = "../dira/dirb/*.txt"
And I want to divide it on:
$stringA = "../dira/dirb"
$stringB = "*.txt"
But if I have:
$string = "dira/dirb/dirc/.../dirn/test.pl";
I want to divie it on:
$stringA = "dira/dirb/dirc/.../dirn"
$stringB = "test.pl"
Somebody have idea how can I do it? I tried to do something like:
$howmany++ while $string =~ m/\//g;
So I know how many slashes I have. But I have no idea what I can do more with this :/
Use Path::Tiny:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use Path::Tiny;
for my $path (qw( ../dira/dirb/*.txt
dira/dirb/dirc/.../dirn/test.pl
)) {
my $path_o = 'Path::Tiny'->new($path);
my $basename = $path_o->basename;
my $dirname = $path_o->dirname;
$dirname =~ s=/$==; # Remove the trailing slash.
say $basename, ' ', $dirname;
}
you can try something like this:
$string =~ m|^(.*)/(.*)$|;
($stringA,$stringB) = ($1,$2);
print "stringA = $stringA\n";
print "stringB = $stringB\n";
= edit: =
restrict to certain values of stringB:
if($string =~ m|^(.*)/(.*\.pl)$|) {
($stringA,$stringB) = ($1,$2);
print "stringA = $stringA\n";
print "stringB = $stringB\n";
}
You can use File::Basename functions to parse file paths:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Basename;
my $string = "../dira/dirb/*.txt";
my $stringA = dirname($string);
my $stringB = basename($string);
printf "String A: %-25sString B: %s\n", $stringA, $stringB;
$string = "dira/dirb/dirc/.../dirn/test.pl";
$stringA = dirname($string);
$stringB = basename($string);
printf "String A: %-25sString B: %s\n", $stringA, $stringB;

compare string variables in perl

I have an if clause in perl, where as condition I need to compare two variables if they match as strings. But my code doesnt work and the strings never match:
if(trim($file) eq trim($fields[0])) {
print "OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO";
}
For the definition of trim I have used:
sub trim($)
{
my $string = shift;
$string =~ s/^\s*(.*?)\s*$/$1/;
return $string;
}
Moreover I have used this before for the variables to compare.
my #fields= split(/\;/,$_);
Any help? Thanks!
Your code is correct, so your strings are different.
To find the differences, I recommend the following code since it will reveals differences that might not be noticeable by just printing the strings:
use Data::Dumper;
{
local $Data::Dumper::Useqq=1;
print Dumper($file, $fields[0]);
}
By the way, the following is more elegant and possibly faster:
sub trim {
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+\z//;
return $string;
}
And IIRC, the following is even faster (for a drop in readability):
sub trim {
my $string = shift;
$string =~ s/^\s+|\s++\z//g;
return $string;
}

What's a good character to separate strings with leading white-spaces?

I'm using the null character (\0) as a separator to keep the strings leading white-spaces after the sprintf. But the strings with the null character don't work (in this case) with the Curses addstr function.
Is there some suitable character to replace the \0 for this purpose?
#!/usr/bin/env perl
use warnings;
use 5.12.0;
sub routine {
my #list = #_;
#list = map{ "\0".$_."\0"; } #list;
# ...
# ...
#list = map{ sprintf "%35.35s", $_ } #list;
# ...
# ...
my $result = $list[5];
$result =~ s/\A\s+\0//;
$result =~ s/\0\s+\z//;
return $result;
}
What about using some pretty print module from CPAN?
http://metacpan.org/pod/Data::Format::Pretty::Console
http://metacpan.org/pod/Text::Tabulate

Resources