Perl string matching

Perl string matching - string

I am facing problems with Perl string matching/searching using both index as well as the =~ operator. I need to search for the string "RT #zaynmalik: Big cover for #cosmopolitanuk ! Boys looking slick http://example.com/FcWA80HI" in a text file.
if($splitlines[1] =~ /RT #zaynmalik: Big cover for #cosmopolitanuk ! Boys looking slick http://example.com/FcWA80HI/){
## Do something ##
}
However, because '#' is a special character in Perl, I am getting compile errors. Could you suggest me a method to do this? I tried saving the string to a variable like $str, but it did not work (which is understandable).
So, this is what I am doing now,
$max_freq_tweet = 'RT #zaynmalik: Big cover for #cosmopolitanuk ! Boys looking slick http://example.com/FcWA80HI';
if($splitlines[1] =~ /\Q$max_freq_tweet\E/){
print FILE5 "$splitlines2[1] \n";
}
But it still doesn't seem to be working.

Either escape the # via a backslash, or use single quotes.
my $search_string = 'RT #zaynmalik: Big cover for #cosmopolitanuk ! Boys looking slick http://example.com/FcWA80HI';
# or: "RT \#zaynmalik: Big cover for \#cosmopolitanuk ! Boys looking slick http://example.com/FcWA80HI"
if (-1 != index $str, $search_string) { do something }
If you have a string and want to use it in a regex, you should make sure to protect the meaning via \Q...\E:
if ($str =~ /\Q$search_string\E/) { do something }
This \QUOT\E doesn't prevent array interpolation, but no character in that string will be considered special; without it the . in the string would match any character!

You need to escape the # in your regexp. As in $str =~ /RT \#.*:/.
Edit: you also escape slashes (/) with a backslash (\). $str =~/RT \#.*: .* http:\/\/.*/.

You need to escape special characters with a preceding \ (backslash).
This is relevant not only for #, but for other characters too.
To be on the safe side, you can escape any non-letter character.

Related

How can I remove all vowels in a sentence except for the vowel at the first letter?

What is wrong specifically with this code? How can I correct it?
$x = "without any vowels after the first letter\n";
foreach $i (#x[1..]) {
if ($i =~ /[AEIOUaeiou]/) {
$x =~ tr/A E I O U a e i o u//d;
}
}
print "$x\n";
I tried [1..] to exclude the first letter. If it does not work, how else can I remove the first letter?
EDIT I edited code to make it syntactically (mostly) correct to convey their obvious original idea, except for the attempt to index into a string which isn't correct in Perl. (Clarifying that is a part of what I consider useful in this question.)

First, most of that is not Perl, or any programming language for that matter. I'd suggest to work through a Perl tutorial of your choice first, before trying to get solutions for specific problems. However, here's an answer since the problem itself is of enough interest in general.
Next, in Perl you can't directly index into a string, so you can't skip the first character(s) like that.
But you can separate that first character in the string and process the rest (removing vowels), of course. One way with regex†
use warnings;
use strict;
use feature 'say';
my $str = shift // 'out with ALL vowels after first';
$str =~ s/.\K(.*)/ $1 =~ tr{[aeiouAEIOU]}{}dr /e;
say $str; #--> ot wth LL vwls ftr frst
This relies on the /e modifier, which makes it so that the replacement side is evaluated as code, and so it runs an independent transliteration (tr) there, processing the captured substring.
Then we need the /r modifier in that embedded tr/regex, to return the new string instead of changing the old one in place -- what wouldn't be possible anyway as one can't change $1.
One can also use a regex insteda of tr, less efficient but with its many conveniences
$str =~ s/.\K(.*)/ $1 =~ s{[aeiou]}{}igr /e;
Now we can use far more sophisticated tools in that regex than in tr; in this case it's only the i flag, for case-insensitive.
If it were more than the one first character to keep change . to .{N}.
† Regex is not compulsory, of course. A more elementary take: split the string into its first character and the rest, then use tr on the rest
use warnings;
use strict;
use feature 'say';
my $str = shift // q(out with ALL vowels after first);
my ($result, $rest) = split //, $str, 2; # first char, rest of string
$result .= $rest =~ tr/aeiouAEIOU//dr; # prune $rest of vowels, append to $result
say $result;
Then put this in a little mini subroutine. To change the original string in place, instead of getting a new ($result) string, use it ($str) everywhere instead of $result.
I am not sure about how it compares efficiency wise but it may well fare well.
For the curiosity's sake, here it is in a single statement
$str = join '', map { length > 1 ? tr/aeiouAEIOU//dr : $_ } split //, $str, 2;
This specifically uses the fact that only the first (one) character need be skipped; that is easily made dynamical, as long as the criterion does involve the length of substrings.
More importantly, this assumes that the rest of the string is longer than 1 character. To drop that assumption change the criterion
use feature 'state';
$str = join '', map {
state $chr_cnt = 0;
++$chr_cnt > 1 ? tr/aeiouAEIOU//dr : $_
}
split //, $str, 2;
This also relies on leaving aside just one character. It uses a feature to keep a lexical value across executions, state.
A more generic solution, which uses the property of substr to be possible to write to
substr($str, 1) =~ tr/aeiouAEIOU//d;
Here it's much cleaner and simpler to relax the limitation to the first character: just change that 1 in order to skip more characters. The tricky -- unexpected -- part here may be that normally builtins can't be written to like that, they aren't lvalue subroutines

The algorithm for solution of the problem is in your question
add letter to a string if it isn't vowel
add letter to the string if it is first vowel in the input string
use strict;
use warnings;
my $x = "without any vowels after the first letter\n";
my($o,$count) = ('',0);
print 'IN: ' . $x;
for ( split('',$x) ) {
$o .= $_ unless $count != 0 and /[aeiou]/i;
$count++ if /[aeiou]/i;
}
print 'OUT: ' . $o;
Output
IN: without any vowels after the first letter
OUT: witht ny vwls ftr th frst lttr
Addendum: OP's clarification of the problem
look at each word in the sentence
if a word starts from vowel then delete all vowels but first one
if a word starts from none vowel then delete all vowels
use strict;
use warnings;
use feature 'say';
my $x = 'I like apples more than oranges';
my #o;
say 'IN: ' . $x;
for ( split(' ', $x) ) {
if ( /^[aeiou]/i ) {
s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e;
} else {
tr|aeiouAEIOU||d;
}
#o = (#o,$_);
}
say 'OUT: ' . join(' ', #o);
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs
Or in perlish style
use strict;
use warnings;
use feature 'say';
my $x = "I like apples more than oranges";
say 'IN: ' . $x;
say 'OUT: ' . join(' ', map { s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e && $_ } split('[ ]+', $x));
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs

get perl to process backslash escapes in string

Consider this code:
my $str = '"line 1\n\t line 2"'; # from some JSON, or something
say $str; # print literal backslashes, not what I want
say eval $str; # processes backslashes, but overkill
Is there a reasonably easy way to get the effect of the last line, but without using full-blown eval? Even leaving aside the security implications (I mostly trust this string), this interpolates variables and stuff which I don't want. This can be worked around by an extra preprocessing step where I manually escape dollar signs and such, but this still feels a bit too hacky, even for my tastes.

#mob has the right recommendation. For the general problem:
#!/usr/bin/env perl
use strict;
use warnings;
my %unescape = map +($_ => eval "qq{\\$_}"), qw(f n r t); # etc
my $special = join '|', keys %unescape;
my $str = '"line 1\n\t line 2"';
$str =~ s{ \\ ($special) }{$unescape{$1}}xg;
print "'$str\n'";

If it's JSON, then decode it with JSON.
use JSON;
my $str = '"line 1\n\t line 2"'; # from some JSON, or something
my $decoded = JSON::decode_json("[$str]");
say $decoded->[0];

search a specific sub string pattern in a string using perl

I'm a newbie to perl, I went through this Check whether a string contains a substring to how to check a substring is present in a string, Now my scenario is little different
I have a string like
/home/me/Desktop/MyWork/systemfile/directory/systemfile64.elf ,
In the end this might be systemfile32.elf or systemfile16.elf,so In my perl script I need to check whether this string contains a a substring in the format systemfile*.elf.
How can I achieve this in perl ?
I'm planing to do like this
if(index($mainstring, _serach_for_pattern_systemfile*.elf_ ) ~= -1) {
say" Found the string";
}

You can use the pattermatching
if ($string =~ /systemfile\d\d\.elf$/){
# DoSomething
}
\d stands for a digit (0-9)
$ stands for end of string

Well
if( $mainstring =~ m'/systemfile(16|32)\.elf$' ) {
say" Found the string";
}
does the job.
For your informations :
$string =~ m' ... '
is the same than
$string =~ / ... /
which checks the string against the given regular expression. This is one of the most useful features of the Perl language.
More info at http://perldoc.perl.org/perlre.html
(I did use the m'' syntax to improve readability, because of the presence of another '/' character in the regexp. I could also write /\/systemfile\d+\.elf$/

if ($string =~ /systemfile.*\.elf/) {
# Do something with the string.
}
That should match only the strings you seek (given that every time, a given string is stored in $string). Inside the curly brackets you should write your logic.
The . stands for "any character" and the * stands for "as many times you see the last character". So, .* means "any character as many times you see it". If you know that the string will end in this pattern, then it will be safer to add $ at the end of the pattern to mark that the string should end with this:
$string =~ /systemfile.*\.elf$/
Just don't forget to chomp $string to avoid any line-breaks that might mess with your desired output.

use strict;
use warnings;
my $string = 'systemfile16.elf';
if ($string =~ /^systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
will match systemfile'anythinghere'.elf if you have a set directory.
if you want to search entire string, including directory then:
my $string = 'c:\\windows\\system\\systemfile16.elf';
if ($string =~ /systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
if you only want to match 2 systemfile then 2 numeric characters .elf then use the other methods mentioned above by other answers. but if you want systemanything.elf then use one of these.

How do I include new lines in a string in Perl?

I have a string that looks like this
Acanthocolla_cruciata,#8B5F65Acanthocyrta_haeckeli,#8B5F65Acanthometra_fusca,#8B5F65Acanthopeltis_japonica,#FFB5C5
I am trying to added in new lines so get in list format. Like this
Acanthocolla_cruciata,#8B5F65
Acanthocyrta_haeckeli,#8B5F65
Acanthometra_fusca,#8B5F65
Acanthopeltis_japonica,#FFB5C5
I have a perl script
use strict;
use warnings;
open my $new_tree_fh, '>', 'test_match.txt'
or die qq{Failed to open "update_color.txt" for output: $!\n};
open my $file, '<', $ARGV[0]
or die qq{Failed to open "$ARGV[0]" for input: $!\n};
while ( my $string = <$file> ) {
my $splitmessage = join ("\n", ($string =~ m/(.+)+\,+\#+\w{6}/gs));
print $new_tree_fh $splitmessage, "\n";
}
close $file;
close $new_tree_fh;
The pattern match works but it wont print the new line as I want to make the list. Can anyone please suggest anything.

I'd do:
my $str = 'Acanthocolla_cruciata,#8B5F65Acanthocyrta_haeckeli,#8B5F65Acanthometra_fusca,#8B5F65Acanthopeltis_japonica,#FFB5C5';
$str =~ s/(?<=,#\w{6})/\n/g;
say $str;
Output:
Acanthocolla_cruciata,#8B5F65
Acanthocyrta_haeckeli,#8B5F65
Acanthometra_fusca,#8B5F65
Acanthopeltis_japonica,#FFB5C5

OK, I think your problem here is that your regular expression doesn't match properly.
(.+)+
for example - probably doesn't do what you think it does. It's a greedy capture of 1 or more of "anything" which will grab your whole string.
Check it out on regex101.
Try:
#!/usr/bin/perl
use strict;
use warnings;
while ( my $string = <DATA> ) {
my $splitmessage = join( "\n", ( $string =~ m/(\w+,\#+\w{6})/g ) );
print $splitmessage, "\n";
}
__DATA__
Acanthocolla_cruciata,#8B5F65Acanthocyrta_haeckeli,#8B5F65Acanthometra_fusca,#8B5F65Acanthopeltis_japonica,#FFB5C5
Which will print:
Acanthocolla_cruciata,#8B5F65
Acanthocyrta_haeckeli,#8B5F65
Acanthometra_fusca,#8B5F65
Acanthopeltis_japonica,#FFB5C5

Rather than a quickfix solution, let's find the problem in your existing code and hence learn from it. Your problem is in the regular expression, so we'll dissect and fix it.
($string =~ m/(.+)+\,+\#+\w{6}/gs)
First, the two significant mistakes that lead to the bug:
At the beginning, you're doing a .+, followed by matching with , and # and so on. The problem is, .+ is greedy, which means it'll match upto the last , in the input, and not the first one. So when you run this, almost the entire line (except for the last plant's color) gets matched up by this single .+.
There are a few different ways you can fix this, but the easiest is to restrict what you're matching. Instead of saying .+ "match anything", make it [\w\s]+ at the beginning - which means match either "word characters" (which includes alphabets and digits) or space characters (since there is a space in the middle of the plant name).
($string =~ m/([\w\s]+)+\,+\#+\w{6}/gs)
That changes the output, but still not to the fully correct version because:
m/some regex/g returns a list of its matches as a list here, and what we want is for it to return the whole match including both plant name and color. But, when there are paranthesis inside the match anywhere, m/ returns only the part matched by the paranthesis (which is the plant name here), not the whole match. So, remove the paranthesis, and it becomes:
($string =~ m/[\w\s]++\,+\#+\w{6}/gs)
This works, but is quite clumsy and bug-prone, so here's some improvement suggestions:
Since your input has no newline characters, the /s at the end is unnecessary.
($string =~ m/[\w\s]++\,+\#+\w{6}/g)
, and # are not a special character in perl regular expressions, so they don't need a \ before them.
($string =~ m/[\w\s]++,+#+\w{6}/g)
+ is for when you know only that the character will be present, but don't know how many times it'll be there. Here, since we're only trying to match one , and one # characters, the + after them is unnecessary.
($string =~ m/[\w\s]++,#\w{6}/g)
The ++ after [\w\s] means something quite different from + (basically an even greedier match than usual), so let's make it a single +
($string =~ m/[\w\s]+,#\w{6}/g)
Optionally, you can change the last \w to match only the hexadecimal characters which will appear in the colour code:
($string =~ m/[\w\s]+,#[0-9A-F]{6}/g)
That's a pretty solid, working regular expression that does what you want.

How to extract upper case words within a long string using perl

I am trying to find a way to extract only upper case words (at least three consecutive upper characters, plus numbers) from quite a long string using perl.
Example:
"Hello world, thank GOD it's Friday, I can watch EPISODE4"
Output:
"GOD EPISODE4"
For some reason I cannot come up with a sensible way to do this, any ideas? Thanks!

Use character classes:
my #matches = ( $string =~ /\b[[:upper:]|[:digit:]]{3,}+\b/g );
say join " - ", #matches;
(You stated uppercase characters and numbers. You didn't specify where the number would be. You also didn't say whether or not I need to do something with the number.
Edit your question to include other requirements).

This will get you any upper case words that are over 3 characters and which may or may not have numbers at the end:
my $str = "Hello world, thank GOD its Friday, I can watch EPISODE4";
my #matches = ($str =~ /\b([A-Z]{3,}+[0-9]*)\b/g);
You can modify it to look for upper case characters after the numbers:
my #matches = ($str =~ /\b([A-Z]{3,}+[0-9]*[A-Z]*)\b/g);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Perl string matching - string

You need to escape the # in your regexp. As in $str =~ /RT \#.:/. Edit: you also escape slashes (/) with a backslash (\). $str =~/RT \#.: .* http:\/\/.*/.

You need to escape special characters with a preceding \ (backslash). This is relevant not only for #, but for other characters too. To be on the safe side, you can escape any non-letter character.

Related

How can I remove all vowels in a sentence except for the vowel at the first letter?

get perl to process backslash escapes in string

search a specific sub string pattern in a string using perl

How do I include new lines in a string in Perl?

How to extract upper case words within a long string using perl

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Perl string matching - string

You need to escape the # in your regexp. As in $str =~ /RT \#.*:/. Edit: you also escape slashes (/) with a backslash (\). $str =~/RT \#.*: .* http:\/\/.*/.

You need to escape special characters with a preceding \ (backslash). This is relevant not only for #, but for other characters too. To be on the safe side, you can escape any non-letter character.

Related

How can I remove all vowels in a sentence except for the vowel at the first letter?

get perl to process backslash escapes in string

search a specific sub string pattern in a string using perl

How do I include new lines in a string in Perl?

How to extract upper case words within a long string using perl

Categories

Resources

You need to escape the # in your regexp. As in $str =~ /RT \#.:/. Edit: you also escape slashes (/) with a backslash (\). $str =~/RT \#.: .* http:\/\/.*/.