How to use Regex in Perl

How to use Regex in Perl - linux

I need some help , I have an output from a command and need to extract only the time i.e. "10:57:09" from the output.
The command is: tail -f /var/log/sms
command output:
Thu 2016/08/04 10:57:09 gammu-smsd[48014]: Read 0 messages
how could I do this in perl and put the result into variable
Thank you

Normally, we'd expect you to show some evidence of trying to solve the problem yourself before giving an answer.
You use the match operator (m/.../) to check if a string matches a regular expression. The m is often omitted so you'll see it written as /.../. By default, it matches against the variable $_ but you can change that by using the binding operator, =~. If a regex includes parentheses ((...)) then whatever is matched by that section of the regex is stored in $1 (and $2, $3, etc for subsequent sets of parentheses). Those "captured" values are also returned by the match operator when it is evaluated in list context.
It's always a good idea to check the return value from the match operator, as you'll almost certainly want to take different actions if the match was unsuccessful.
See perldoc perlop for more details of the match operator and perldoc perlre for more details of Perl's regex support.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
$_ = 'Thu 2016/08/04 10:57:09 gammu-smsd[48014]: Read 0 messages';
if (my ($time) = /(\d\d:\d\d:\d\d)/) {
say "Time is '$time'";
} else {
say 'No time found in string';
}
And to get the data from your external process...
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
open my $tail_fh, 'tail -f /var/log/sms |' or die $!;
while (<$tail_fh>) {
if (my ($time) = /(\d\d:\d\d:\d\d)/) {
say "Time is '$time'";
} else {
say 'No time found in string';
}
}

Perl code:
$txt = "Thu 2016/08/04 10:57:09 gammu-smsd[48014]: Read 0 messages";
$txt =~ /(\d{2}:\d{2}:\d{2})/;
print $1; # result of regex
print "\n"; # new line
And it prints:
10:57:09
The result goes to a variable called $1, due to the capturing parenthesis. Had there been more capturing parenthesis their captured text would have put int $2, $3 etc...
EDIT
To read the line from console, use in the above script:
$txt = <STDIN>;
Now, suppose the script is called myscript.pl, execute tail like so:
tail -f /var/log/sms | myscript.pl

Related

Perl Unable to truncate string

I am trying to extract AAA and BBB from the output of the command "dspmq".
$dspmq <- this command gives output as -->
QMNAME(AAA) STATUS(Running)
QMNAME(BBB) STATUS(Running)
But it doesn't work with the below code.
perl -e 'use Data::Dumper qw(Dumper);my #qmgrlist = `dspmq`;$size = #qmgrlist;foreach my $i (#qmgrlist){my #temp1 = split /QMNAME\(/, $i;print #temp1;}'
AAA) STATUS(Running)
BBB) STATUS(Running)
I am able to truncate "QMNAME(" but unable to truncate those to the right of AAA and BBB. Basically I want to get the string between "QMNAME(" and the immediate ")". Please assist.

I think a regex approach is better than split() here, but you could use split() by splitting on parentheses and taking the second item in the returned list.
for (#qmgrlist) {
say +(split /[()]/)[0];
}
And a brief note on your use of command-line options to run this code. You can make it simpler if you a) pipe the output of qspmq into your code and b) use -n to process a record at a time.
$ perl -nE 'say +(split /[()]/)[1]' `dspmq`
There's also -M to load modules (e.g. -MData::Dumper), but you don't seem to be using Data::Dumper any more.

split isn't going to do what you need. I would just use a regular expression to match the sub-string you need
So change the loop from this
foreach my $i (#qmgrlist)
{
my #temp1 = split /QMNAME\(/, $i;
print #temp1;
}
to this
foreach my $i (#qmgrlist)
{
print "$1\n"
if /QMNAME\((.+?)\)/;
}

Try this perl one-liner:
dspmq | perl -lne 'print for m{ QMNAME [(] ( [^)]* ) [)] }x'
Here, dspmq STDOUT is fed using a pipe | into STDIN of the perl code, which has these flags:
-e tells Perl interpreter to look for the code inline rather than in a separate script file.
-n feeds the input line by line to the inline code (this way you do not need to store the output in an array - this matters for large outputs, not in your case).
-l strips the input record separator (newline on *NIX) before feeding it to the code, and appends it automatically after during print.
The print ... for ... m{... (...) ...} code prints every pattern captured in parentheses.
The captured pattern is [^)]*, which is maximum number (0 or more) chars that are not (^) listed in the character class, that is, that are not closing parens.
[(] ... [)] are literal parentheses escaped as character classes for readability. I prefer this to escaping like so: \( ... \).
QMNAME is used to make the programmer's intentions clear: you want the string that follows QMNAME in parens. I prefer this to using the field index, such as 1, which protects you against minor variation in output of your command used with different options, on different systems, etc.
Finally, the x regex modifier in m{...}x enables comments and whitespace to be ignored, and is preferred for readability.
RELATED:
Cutting the output of a dspmq command

Desired output can be achieved with following code
use strict;
use warnings;
use feature 'say';
map{ say $1 if /QMNAME\((.+?)\)/ } <DATA>;
__DATA__
QMNAME(AAA) STATUS(Running)
QMNAME(BBB) STATUS(Running)
output
AAA
BBB
and one liner (not tested - I am on Windows computer)
dspmq | perl -lne 'print $1 if /QMNAME\((.+?)\)/'

remove lines from text file that contain specific text

I'm trying to remove lines that contain 0/0 or ./. in column 71 "FORMAT.1.GT" from a tab delimited text file.
I've tried the following code but it doesn't work. What is the correct way of accomplishing this? Thank you
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";

You can either call a one-liner as borodin and zdim said. Which one is right for you is still not clear because you don't tell whether 71st column means the 71st tab-separated field of a line or the 71st character of that line. Consider
12345\t6789
Now what is the 2nd column? Is it the character 2 or the field 6789? Borodin's answer assumes it's 6789 while zdim assumes it's 2. Both showed a solution for either case but these solutions are stand-alone solutions. Programs of its own to be run from the commandline.
If you want to integrate that into your Perl script you could do it like this:
Replace this line:
my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6";
with this snippet:
open( my $fh_in, '<', $Variantlinestsvfile ) or die "cannot open $Variantlinestsvfile: $!\n";
open( my $fh_out, '>', $MDLtsvfile ) or die "cannot open $MDLtsvfile: $!\n";
while( my $line = <$fh_in> ) {
# character-based:
print $fh_out $line unless (substr($line, 70, 3) =~ m{(?:0/0|\./\.)});
# tab/field-based:
my #fields = split(/\s+/, $line);
print $fh_out $line unless ($fields[70] =~ m|([0.])/\1|);
}
close($fh_in);
close($fh_out);
Use either the character-based line or the tab/field-based lines. Not both!
Borodin and zdim condensed this snippet to a one-liner, but you must not call that from a Perl script.

Since you need the exact position and know string lenghts substr can find it
perl -ne 'print if not substr($_, 70, 3) =~ m{(?:0/0|\./\.)}' filename
This prints lines only when a three-character long string starting at 71st column does not match either of 0/0 and ./.
The {} delimiters around the regex allow us to use / and | inside without escaping. The ?: is there so that the () are used only for grouping, and not capturing. It will work fine also without ?: which is there only for efficiency's sake.

perl -ane 'print unless $F[70] =~ m|([0.])/\1|' myfile > newfile

The problem with your command is that you are attempting to capture the output of a command which produces no output - all the matches are redirected to a file, so that's where all the output is going.
Anyway, calling grep from Perl is just wacky. Reading the file in Perl itself is the way to go.
If you do want a single shell command,
grep -Ev $'^([^\t]*\t){70}(\./\.|0/0)\t' file
would do what you are asking more precisely and elegantly. But you can use that regex straight off in your Perl program just as well.

Try it!
awk '{ if ($71 != "./." && $71 != ".0.") print ; }' old_file.txt > new_file.txt

how to split the data in the unix file

I've a file in Unix (solaris) system with data like below
[TYPEA]:/home/typeb/file1.dat
[TYPEB]:/home/typeb/file2.dat
[TYPEB]:/home/typeb/file3.dat
[TYPE_C]:/home/type_d/file4.dat
[TYPE_C]:/home/type_d/file5.dat
[TYPE_C]:/home/type_d/file6.dat
I want to separate the headings like below
[TYPEA]
/home/typeb/file1.dat
[TYPEB]
/home/typeb/file2.dat
/home/typeb/file3.dat
[TYPE_C]
/home/type_d/file4.dat
/home/type_d/file5.dat
/home/type_d/file6.dat
Files with similar type have to come under one type.
Please help me with any logic to achieve this without hardcoding.

Assuming the input is sorted by type like in your example,
awk -F : '$1 != prev { print $1 } { print $2; prev=$1 }' file
If there are more than 2 fields you will need to adjust the second clause.

sed 'H;$ !b
x
s/\(\(\n\)\(\[[^]]\{1,\}]\):\)/\1\2\1/g
:cycle
=;l
s/\(\n\[[^]]\{1,\}]\)\(.*\)\1/\1\2/g
t cycle
s/^\n//' YourFile
Posix sed version a bit unreadeable due to presence of [ in pattern
- allow : in label or file/path
- failed if same label have a line with another label between them (sample seems ordered).

If you can use perl you will be able to make use of hashes to create a simple data structure:
#! /usr/bin/perl
use warnings;
use strict;
my %h;
while(<>){
chomp;
my ($key,$value) = split /:/;
$h{$key} = [] unless exists $h{$key};
push ${h{$key}},$value;
}
foreach my $key (sort keys %h) {
print "$key"."\n";
foreach my $value (#{$h{$key}}){
print "$value"."\n";
}
}
In action:
perl script.pl file
[TYPEA]
/home/typeb/file1.dat
[TYPEB]
/home/typeb/file2.dat
/home/typeb/file3.dat
[TYPE_C]
/home/type_d/file4.dat
/home/type_d/file5.dat
/home/type_d/file6.dat
If you like it, there is a wholeTutorial to solve this simple problem. It's worth reading it.

User input as number or string in perl?

I am new to perl. When I enter a number from command prompt, the variable in my script takes it as a number.
How do i make perl take the user input number as string?
I just want to check if the user entered string "1" and not number 1?
I wrote the code as follows:
#!usr/bin/perl
use strict;
use warnings;
print 'enter';
$a=<>;
if($a==1)
{
print 'Number entered';
}
elsif($a eq "1")
{
print 'Text entered';
}

Input from a file or from the console is always a string. You could check to see whether that string contains anything other than numeric characters, but you have to think about what you mean by entering a number.
Perl scalar variables behave as strings and numbers interchangeably and simultaneously. But with strings you have to get the comparison exactly right. An extra space at the beginning or end of the string will stop it from matching as you expect.
This program demonstrates
use strict;
use warnings;
my $aa = '1';
my $bb = 2;
my $cc = '3 ';
print $aa == 1 ? 'match' : 'no match', "\n";
print $aa eq '1' ? 'match' : 'no match', "\n";
print $bb == 2 ? 'match' : 'no match', "\n";
print $bb eq '2' ? 'match' : 'no match', "\n";
print $cc == 3 ? 'match' : 'no match', "\n";
print $cc eq '3' ? 'match' : 'no match', "\n";
output
match
match
match
match
match
no match
So Perl is quite happy saying that '3 ' is numerically equal to 3, but is is different from the string '3' because of the trailing space.
That is what is happening in your case. The value you enter into $a is something like "42\n", which Perl will happily convert to the number 42 for you. But if you compare it to the string '42' then it is different because it has a trailing newline.
You will want to use chomp almost invariably when you read input from a file, and especially from a console.
You should also indent your code properly to make it more readable.
Update
Data::Dumper is a very useful tool to see exactly what is in a string and why a string comparison isn't working. (Data::Dump is even better, but it isn't a core module and you may have to install it.)
If I run this program
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Useqq = 1;
my $input = <>;
print Dumper $input;
chomp $input;
print Dumper $input;
and type abcEnter. Then the output is
$VAR1 = "abc\n";
$VAR1 = "abc";
which makes the trailing newline obvious.
Note that setting $Data::Dumper::Useqq to a true value is essential. Otherwise the output is little better than a simple print.

Using Perl to remove n characters from the end of multiple lines

I want to remove n characters from each line using PERL.
For example, I have the following input:
catbathatxx (length 11; 11%3=2 characters) (Remove 2 characters from this line)
mansunsonx (length 10; 10%3=1 character) (Remove 1 character from this line)
#!/usr/bin/perl -w
open FH, "input.txt";
#array=<FH>;
foreach $tmp(#array)
{
$b=length($tmp)%3;
my $c=substr($tmp, 0, length($tmp)-$b);
print "$c\n";
}
I want to output the final string (after the characters have been removed).
However, this program is not giving the correct result. Can you please guide me on what the mistake is?
Thanks a lot. Please let me know if there are any doubts/clarifications.

I am assuming trailing whitespace is not significant.
#!/usr/bin/env perl
use strict; use warnings;
use constant MULTIPLE_OF => 3;
while (my $line = <DATA>) {
$line =~ s/\s+\z//;
next unless my $length = length $line;
my $chars_to_remove = $length % MULTIPLE_OF;
$line =~ s/.{$chars_to_remove}\z//;
print $line, "\n";
}
__DATA__
catbathatxx
mansunsonx
0123456789
012345678

The \K regex sequence makes this a lot clearer; it was introduced in Perl v5.10.0.
The code looks like this
use 5.10.0;
use warnings;
for (qw/ catbathatxx mansunsonx /) {
(my $s = $_) =~ s/^ (?:...)* \K .* //x;
say $s;
}
output
catbathat
mansunson

In general you would want to post the result you are getting. That being said...
Each line in the file has a \n (or \r\n on windows) on the end of it that you're not accounting for. You need to chomp() the line.
Edit to add: My perl is getting rusty from non-use but if memory serves me correct you can actually chomp() the entire array after reading the file: chomp(#array)

You should use chomp() on your array, like this:
#array=<FH>;
chomp(#array);

perl -plwe 'chomp; $c = length($_) % 3; chop while $c--' < /tmp/zock.txt
Look up the options in perlrun. Note that line endings are characters, too. Get them out of the way using chomp; re-add them on output using the -l option. Use chop to efficiently remove characters from the end of a string.

Reading your code, you are trying to print just the first 'nx3' characters for the largest value of n for each line.
The following code does this using a simple regular expression.
For each line, it first removes the line ending, then greedy matches
as many .{3} as it can (. matches any character, {3} asks for exactly 3 of them).
The memory requirement of this approach (compared with using an array the size of your file) is fixed. Not too important if your file is small compared with your free memory, but sometimes files are gigabytes, and sometimes memory is very small.
It's always worth using variable names that reflect the purpose of the variable, rather than things like $a or #array. In this case I used only one variable, which I called $line.
It's also good practice to close files as soon as you have finished with them.
#!/usr/bin/perl
use strict;
use warnings; # This will apply warnings even if you use command perl to run it
open FH, '<', 'input.txt'; # Use three part file open - single quote where no interpolation required.
for my $line (<FH>){
chomp($line);
$line =~ s/((.{3})*).*/$1\n/;
print $line;
}
close FH;

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to use Regex in Perl - linux

I need some help , I have an output from a command and need to extract only the time i.e. "10:57:09" from the output. The command is: tail -f /var/log/sms command output: Thu 2016/08/04 10:57:09 gammu-smsd[48014]: Read 0 messages how could I do this in perl and put the result into variable Thank you

Related

Perl Unable to truncate string

remove lines from text file that contain specific text

how to split the data in the unix file

User input as number or string in perl?

Using Perl to remove n characters from the end of multiple lines

Categories

Resources