Extract the required substring from another string -Perl - string

I want to extract a substring from a line in Perl. Let me explain giving an example:
fhjgfghjk3456mm 735373653736
icasd 666666666666
111111111111
In the above lines, I only want to extract the 12 digit number. I tried using split function:
my #cc = split(/[0-9]{12}/,$line);
print #cc;
But what it does is removes the matched part of the string and stores the residue in #cc. I want the part matching the pattern to be printed. How do I that?

You can do it with regular expressions:
#!/usr/bin/perl
my $string = 'fhjgfghjk3456mm 735373653736 icasd 666666666666 111111111111';
while ($string =~ m/\b(\d{12})\b/g) {
say $1;
}
Test the regex here: http://rubular.com/r/Puupx0zR9w
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/\b(\d+)\b/)->explain();
The regular expression:
(?-imsx:\b(\d+)\b)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

The $1 built-in variable stores the last match from a regex. Also, if you perform a regex on a whole string, it will return the whole string. The best solution here is to put parentheses around your match then print $1.
my $strn = "fhjgfghjk3456mm 735373653736\nicasd\n666666666666 111111111111";
$strn =~ m/([0-9]{12})/;
print $1;
This makes our regex match JUST the twelve digit number and then we return that match with $1.

#!/bin/perl
my $var = 'fhjgfghjk3456mm 735373653736 icasd 666666666666 111111111111';
if($var =~ m/(\d{12})/) {
print "Twelve digits: $1.";
}

#!/usr/bin/env perl
undef $/;
$text = <DATA>;
#res = $text =~ /\b\d{12}\b/g;
print "#res\n";
__DATA__
fhjgfghjk3456mm 735373653736
icasd 666666666666
111111111111

Related

How to edit a multi-line scalar and print the edits

I need to edit a multi-line scalar and print the results, however I am not able to do it neatly.
my $text = "$omething\n nothing\n Everything\n";
What I need to do is check each line, and if there's a capital letter or special charracter - print this line and remove it from the original scalar ($text).
In this example it would print two times, first time:
$omething
Second time:
Everything
And remove both of those strings from the $text scalar.
To include a dollar sign in a double quoted string, you need to escape it by a backslash.
You can remove the matching lines in a while loop:
#!/usr/bin/perl
use warnings;
use strict;
my $text = "\$omething\nnothing\nEverything\n";
while ($text =~ s/(.*[[:upper:]\$].*\n)//) {
print $1;
}
print "Remaining: $text";
A period never matches a newline (unless you specify the /s modifier).

Recursively extract strings between quotes in a given string

Given a string with substrings within quotes, extract all such substrings
I have written the following piece of code but something tells me that it is ugly (although it does seem to do the trick)
my $str = 'printf ("hellp;world", and "this is ; also" and )';
loop:
if ($str =~ /"(.*?)"/) {
my $substr = $1;
$str =~ s/"$substr"//;
print "$substr\n";
}
if ($str =~ /"/) {
goto loop;
}
perl quotes.pl
hellp;world
this is ; also
So it does work as expected.
You can do that directly by using the /g regex flag in either scalar context:
while ($str =~ /"([^"]*)"/g) {
print "$1\n";
}
... or list context:
for my $match ($str =~ /"([^"]*)"/g) {
print "$match\n";
}
I've also changed .*? to [^"]* because it's better to be specific about what you want to match.
/g is documented in perldoc perlop:
The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.
In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the /c modifier (for example, m//gc). Modifying the target string also resets the search position.
(Emphasis mine.)

search a specific sub string pattern in a string using perl

I'm a newbie to perl, I went through this Check whether a string contains a substring to how to check a substring is present in a string, Now my scenario is little different
I have a string like
/home/me/Desktop/MyWork/systemfile/directory/systemfile64.elf ,
In the end this might be systemfile32.elf or systemfile16.elf,so In my perl script I need to check whether this string contains a a substring in the format systemfile*.elf.
How can I achieve this in perl ?
I'm planing to do like this
if(index($mainstring, _serach_for_pattern_systemfile*.elf_ ) ~= -1) {
say" Found the string";
}
You can use the pattermatching
if ($string =~ /systemfile\d\d\.elf$/){
# DoSomething
}
\d stands for a digit (0-9)
$ stands for end of string
Well
if( $mainstring =~ m'/systemfile(16|32)\.elf$' ) {
say" Found the string";
}
does the job.
For your informations :
$string =~ m' ... '
is the same than
$string =~ / ... /
which checks the string against the given regular expression. This is one of the most useful features of the Perl language.
More info at http://perldoc.perl.org/perlre.html
(I did use the m'' syntax to improve readability, because of the presence of another '/' character in the regexp. I could also write /\/systemfile\d+\.elf$/
if ($string =~ /systemfile.*\.elf/) {
# Do something with the string.
}
That should match only the strings you seek (given that every time, a given string is stored in $string). Inside the curly brackets you should write your logic.
The . stands for "any character" and the * stands for "as many times you see the last character". So, .* means "any character as many times you see it". If you know that the string will end in this pattern, then it will be safer to add $ at the end of the pattern to mark that the string should end with this:
$string =~ /systemfile.*\.elf$/
Just don't forget to chomp $string to avoid any line-breaks that might mess with your desired output.
use strict;
use warnings;
my $string = 'systemfile16.elf';
if ($string =~ /^systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
will match systemfile'anythinghere'.elf if you have a set directory.
if you want to search entire string, including directory then:
my $string = 'c:\\windows\\system\\systemfile16.elf';
if ($string =~ /systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
if you only want to match 2 systemfile then 2 numeric characters .elf then use the other methods mentioned above by other answers. but if you want systemanything.elf then use one of these.

Perl: Count number of times a word appears in text and print out surrounding words

I want to do two things:
1) count the number of times a given word appears in a text file
2) print out the context of that word
This is the code I am currently using:
my $word_delimiter = qr{
[^[:alnum:][:space:]]*
(?: [[:space:]]+ | -- | , | \. | \t | ^ )
[^[:alnum:]]*
}x;
my $word = "hello";
my $count = 0;
#
# here, a file's contents are loaded into $lines, code not shown
#
$lines =~ s/\R/ /g; # replace all line breaks with blanks (cannot just erase them, because this might connect words that should not be connected)
$lines =~ s/\s+/ /g; # replace all multiple whitespaces (incl. blanks, tabs, newlines) with single blanks
$lines = " ".$lines." "; # add a blank at beginning and end to ensure that first and last word can be found by regex pattern below
while ($lines =~ m/$word_delimiter$word$word_delimiter/g ) {
++$count;
# here, I would like to print the word with some context around it (i.e. a few words before and after it)
}
Three problems:
1) Is my $word_delimiter pattern catching all reasonable characters I can expect to separate words? Of course, I would not want to separate hyphenated words, etc. [Note: I am using UTF-8 throughout but only English and German text; and I understand what reasonably separates a word might be a matter of judgment]
2) When the file to be analzed contains text like "goodbye hello hello goodbye", the counter is incremented only once, because the regex only matches the first occurence of " hello ". After all, the second time it could find "hello", it is not preceeded by another whitespace. Any ideas on how to catch the second occurence, too? Should I maybe somehow reset pos()?
3) How to (reasonably efficiently) print out a few words before and after any matched word?
Thanks!
1. Is my $word_delimiter pattern catching all reasonable characters I can expect to separate words?
Word characters are denoted by the character class \w. It also matches digits and characters from non-roman scripts.
\W represents the negated sense (non-word characters).
\b represents a word boundary and has zero-length.
Using these already available character classes should suffice.
2. Any ideas on how to catch the second occurence, too?
Use zero-length word boundaries.
while ( $lines =~ /\b$word\b/g ) {
++$count;
}

Patterns in Lua with space

How could I use string.gmatch(text, pattern) to do this:
text = "Hello.%23 Awesome7^.."
pattern = --what to put here?
for word in string.gmatch(text, pattern) do
print(word)
end
--Result
>test
Hello.%23
Awesome7^..
>
I have been using "%w+%p", but this results in:
>test
Hello.
%
23
Awesome7^
.
.
Which is not the desired result.
Note: I have not tested this exact string, it could vary... but still, does not create the desired result
From your example, every word contains no spaces, and are separated by spaces, so the simplest pattern is "%S+":
text = "Hello.%23 Awesome7^.."
pattern = "%S+"
for word in string.gmatch(text, pattern) do
print(word)
end
"%s" matches a space character, "%S" matches a non-space character.

Resources