Perl script to compress strings replacing repeated chars - string

I'm writing a perl script that, given a list of string as input, similar to this:
AADDDDKPP PrRRRR
wwwwwwwwwwYY SSSSS SSSSGGGGGGGGGGGGGG
ZZZZZFZZQZZZZZZZZZZZZZ
should substituite every char that is repeated more than 2 times consecutively with the number of repetitions followed by the char. So with the input showed above, the script should give in output these strings:
AA4DKPP Pr4R
10wYY 5S 4S14G
5ZFZZQ13Z
Here's the perl script I've written so far:
foreach my $line(#lines){
#letters=split("",$line);
#alreadyChecked=();
foreach my $letter(#letters){
$count=0;
if (grep {$letter} #alreadyChecked) {
next;
}
push(#alreadyChecked,$letter);
foreach my $index(#letters){
if($letter eq $index){
$count=$count+1;
} else {
#alreadyChecked=0;
last;
}
}
if($count>2){
#chops=split(/$letter+/,$line);
$line=$chops[0].$count.$letter.$chops[1];
}
}
}
I think there's more than one flaw in this code, but can't find where.

my $str ="
AADDDDKPP PrRRRR
wwwwwwwwwwYY SSSSS SSSSGGGGGGGGGGGGGG
ZZZZZFZZQZZZZZZZZZZZZZ
";
$str =~ s/((.)\2{2,})/ length($1) . $2 /ge;
print $str;
output
AA4DKPP Pr4R
10wYY 5S 4S14G
5ZFZZQ13Z

You can solve this with a regex substitution:
perl -plwe's/((.)\2\2+)/length($1) . $2/eg'
This will check for repetition of a character \2, captured by the inner parenthesis, and replace it with a number representing the length of the entire match $1 followed by the character itself $2. The script version could be something like:
use strict;
use warnings;
while (<>) {
s/((.)\2\2+)/length($1) . $2/eg;
print;
}

Related

How can I remove all vowels in a sentence except for the vowel at the first letter?

What is wrong specifically with this code? How can I correct it?
$x = "without any vowels after the first letter\n";
foreach $i (#x[1..]) {
if ($i =~ /[AEIOUaeiou]/) {
$x =~ tr/A E I O U a e i o u//d;
}
}
print "$x\n";
I tried [1..] to exclude the first letter. If it does not work, how else can I remove the first letter?
EDIT I edited code to make it syntactically (mostly) correct to convey their obvious original idea, except for the attempt to index into a string which isn't correct in Perl. (Clarifying that is a part of what I consider useful in this question.)
First, most of that is not Perl, or any programming language for that matter. I'd suggest to work through a Perl tutorial of your choice first, before trying to get solutions for specific problems. However, here's an answer since the problem itself is of enough interest in general.
Next, in Perl you can't directly index into a string, so you can't skip the first character(s) like that.
But you can separate that first character in the string and process the rest (removing vowels), of course. One way with regex†
use warnings;
use strict;
use feature 'say';
my $str = shift // 'out with ALL vowels after first';
$str =~ s/.\K(.*)/ $1 =~ tr{[aeiouAEIOU]}{}dr /e;
say $str; #--> ot wth LL vwls ftr frst
This relies on the /e modifier, which makes it so that the replacement side is evaluated as code, and so it runs an independent transliteration (tr) there, processing the captured substring.
Then we need the /r modifier in that embedded tr/regex, to return the new string instead of changing the old one in place -- what wouldn't be possible anyway as one can't change $1.
One can also use a regex insteda of tr, less efficient but with its many conveniences
$str =~ s/.\K(.*)/ $1 =~ s{[aeiou]}{}igr /e;
Now we can use far more sophisticated tools in that regex than in tr; in this case it's only the i flag, for case-insensitive.
If it were more than the one first character to keep change . to .{N}.
† Regex is not compulsory, of course. A more elementary take: split the string into its first character and the rest, then use tr on the rest
use warnings;
use strict;
use feature 'say';
my $str = shift // q(out with ALL vowels after first);
my ($result, $rest) = split //, $str, 2; # first char, rest of string
$result .= $rest =~ tr/aeiouAEIOU//dr; # prune $rest of vowels, append to $result
say $result;
Then put this in a little mini subroutine. To change the original string in place, instead of getting a new ($result) string, use it ($str) everywhere instead of $result.
I am not sure about how it compares efficiency wise but it may well fare well.
For the curiosity's sake, here it is in a single statement
$str = join '', map { length > 1 ? tr/aeiouAEIOU//dr : $_ } split //, $str, 2;
This specifically uses the fact that only the first (one) character need be skipped; that is easily made dynamical, as long as the criterion does involve the length of substrings.
More importantly, this assumes that the rest of the string is longer than 1 character. To drop that assumption change the criterion
use feature 'state';
$str = join '', map {
state $chr_cnt = 0;
++$chr_cnt > 1 ? tr/aeiouAEIOU//dr : $_
}
split //, $str, 2;
This also relies on leaving aside just one character. It uses a feature to keep a lexical value across executions, state.
A more generic solution, which uses the property of substr to be possible to write to
substr($str, 1) =~ tr/aeiouAEIOU//d;
Here it's much cleaner and simpler to relax the limitation to the first character: just change that 1 in order to skip more characters. The tricky -- unexpected -- part here may be that normally builtins can't be written to like that, they aren't lvalue subroutines
The algorithm for solution of the problem is in your question
add letter to a string if it isn't vowel
add letter to the string if it is first vowel in the input string
use strict;
use warnings;
my $x = "without any vowels after the first letter\n";
my($o,$count) = ('',0);
print 'IN: ' . $x;
for ( split('',$x) ) {
$o .= $_ unless $count != 0 and /[aeiou]/i;
$count++ if /[aeiou]/i;
}
print 'OUT: ' . $o;
Output
IN: without any vowels after the first letter
OUT: witht ny vwls ftr th frst lttr
Addendum: OP's clarification of the problem
look at each word in the sentence
if a word starts from vowel then delete all vowels but first one
if a word starts from none vowel then delete all vowels
use strict;
use warnings;
use feature 'say';
my $x = 'I like apples more than oranges';
my #o;
say 'IN: ' . $x;
for ( split(' ', $x) ) {
if ( /^[aeiou]/i ) {
s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e;
} else {
tr|aeiouAEIOU||d;
}
#o = (#o,$_);
}
say 'OUT: ' . join(' ', #o);
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs
Or in perlish style
use strict;
use warnings;
use feature 'say';
my $x = "I like apples more than oranges";
say 'IN: ' . $x;
say 'OUT: ' . join(' ', map { s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e && $_ } split('[ ]+', $x));
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs

search a specific sub string pattern in a string using perl

I'm a newbie to perl, I went through this Check whether a string contains a substring to how to check a substring is present in a string, Now my scenario is little different
I have a string like
/home/me/Desktop/MyWork/systemfile/directory/systemfile64.elf ,
In the end this might be systemfile32.elf or systemfile16.elf,so In my perl script I need to check whether this string contains a a substring in the format systemfile*.elf.
How can I achieve this in perl ?
I'm planing to do like this
if(index($mainstring, _serach_for_pattern_systemfile*.elf_ ) ~= -1) {
say" Found the string";
}
You can use the pattermatching
if ($string =~ /systemfile\d\d\.elf$/){
# DoSomething
}
\d stands for a digit (0-9)
$ stands for end of string
Well
if( $mainstring =~ m'/systemfile(16|32)\.elf$' ) {
say" Found the string";
}
does the job.
For your informations :
$string =~ m' ... '
is the same than
$string =~ / ... /
which checks the string against the given regular expression. This is one of the most useful features of the Perl language.
More info at http://perldoc.perl.org/perlre.html
(I did use the m'' syntax to improve readability, because of the presence of another '/' character in the regexp. I could also write /\/systemfile\d+\.elf$/
if ($string =~ /systemfile.*\.elf/) {
# Do something with the string.
}
That should match only the strings you seek (given that every time, a given string is stored in $string). Inside the curly brackets you should write your logic.
The . stands for "any character" and the * stands for "as many times you see the last character". So, .* means "any character as many times you see it". If you know that the string will end in this pattern, then it will be safer to add $ at the end of the pattern to mark that the string should end with this:
$string =~ /systemfile.*\.elf$/
Just don't forget to chomp $string to avoid any line-breaks that might mess with your desired output.
use strict;
use warnings;
my $string = 'systemfile16.elf';
if ($string =~ /^systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
will match systemfile'anythinghere'.elf if you have a set directory.
if you want to search entire string, including directory then:
my $string = 'c:\\windows\\system\\systemfile16.elf';
if ($string =~ /systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
if you only want to match 2 systemfile then 2 numeric characters .elf then use the other methods mentioned above by other answers. but if you want systemanything.elf then use one of these.

How to use Regex in Perl

I need some help , I have an output from a command and need to extract only the time i.e. "10:57:09" from the output.
The command is: tail -f /var/log/sms
command output:
Thu 2016/08/04 10:57:09 gammu-smsd[48014]: Read 0 messages
how could I do this in perl and put the result into variable
Thank you
Normally, we'd expect you to show some evidence of trying to solve the problem yourself before giving an answer.
You use the match operator (m/.../) to check if a string matches a regular expression. The m is often omitted so you'll see it written as /.../. By default, it matches against the variable $_ but you can change that by using the binding operator, =~. If a regex includes parentheses ((...)) then whatever is matched by that section of the regex is stored in $1 (and $2, $3, etc for subsequent sets of parentheses). Those "captured" values are also returned by the match operator when it is evaluated in list context.
It's always a good idea to check the return value from the match operator, as you'll almost certainly want to take different actions if the match was unsuccessful.
See perldoc perlop for more details of the match operator and perldoc perlre for more details of Perl's regex support.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
$_ = 'Thu 2016/08/04 10:57:09 gammu-smsd[48014]: Read 0 messages';
if (my ($time) = /(\d\d:\d\d:\d\d)/) {
say "Time is '$time'";
} else {
say 'No time found in string';
}
And to get the data from your external process...
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
open my $tail_fh, 'tail -f /var/log/sms |' or die $!;
while (<$tail_fh>) {
if (my ($time) = /(\d\d:\d\d:\d\d)/) {
say "Time is '$time'";
} else {
say 'No time found in string';
}
}
Perl code:
$txt = "Thu 2016/08/04 10:57:09 gammu-smsd[48014]: Read 0 messages";
$txt =~ /(\d{2}:\d{2}:\d{2})/;
print $1; # result of regex
print "\n"; # new line
And it prints:
10:57:09
The result goes to a variable called $1, due to the capturing parenthesis. Had there been more capturing parenthesis their captured text would have put int $2, $3 etc...
EDIT
To read the line from console, use in the above script:
$txt = <STDIN>;
Now, suppose the script is called myscript.pl, execute tail like so:
tail -f /var/log/sms | myscript.pl

Perl String parsing multiple patterns

I'm trying to parse text file which has multiple patters.
Goal is to have everything in between * * and only integer in between ^ ^ it should remove all special character or string if found.
data.txt
*ABC-13077* ^817266, 55555^
*BCD-13092* ^CL: 816933^
*CDE-13127* ^ ===> Change 767666 submitted^
output.txt
ABC-13077 817266 55555
BCD-13092 816933
CDE-13127 767666
my script
#!/usr/bin/perl
use strict;
use Cwd;
my $var;
open(FH,"changelists.txt")or die("can't open file:$!");
while($var=<FH>)
{
my #vareach=split(/[* \s\^]+/,$var);
for my $each(#vareach)
{
print "$each\n";
}
}
Replace the while loop with the following:
while (<FH>) {
s/\*(.*)\*/$1/;
s/\^(.*)\^/ join ' ', $1 =~ m([0-9]+)g /e;
print;
}
The first substitution removes the asterisks.
The second substitution takes the ^...^ part, and replaces it with the result of the code in the replacement part because of the /e modifier. The code matches all the integers, and as join forces list context on the match, it returns all the matches.

Using Perl to remove n characters from the end of multiple lines

I want to remove n characters from each line using PERL.
For example, I have the following input:
catbathatxx (length 11; 11%3=2 characters) (Remove 2 characters from this line)
mansunsonx (length 10; 10%3=1 character) (Remove 1 character from this line)
#!/usr/bin/perl -w
open FH, "input.txt";
#array=<FH>;
foreach $tmp(#array)
{
$b=length($tmp)%3;
my $c=substr($tmp, 0, length($tmp)-$b);
print "$c\n";
}
I want to output the final string (after the characters have been removed).
However, this program is not giving the correct result. Can you please guide me on what the mistake is?
Thanks a lot. Please let me know if there are any doubts/clarifications.
I am assuming trailing whitespace is not significant.
#!/usr/bin/env perl
use strict; use warnings;
use constant MULTIPLE_OF => 3;
while (my $line = <DATA>) {
$line =~ s/\s+\z//;
next unless my $length = length $line;
my $chars_to_remove = $length % MULTIPLE_OF;
$line =~ s/.{$chars_to_remove}\z//;
print $line, "\n";
}
__DATA__
catbathatxx
mansunsonx
0123456789
012345678
The \K regex sequence makes this a lot clearer; it was introduced in Perl v5.10.0.
The code looks like this
use 5.10.0;
use warnings;
for (qw/ catbathatxx mansunsonx /) {
(my $s = $_) =~ s/^ (?:...)* \K .* //x;
say $s;
}
output
catbathat
mansunson
In general you would want to post the result you are getting. That being said...
Each line in the file has a \n (or \r\n on windows) on the end of it that you're not accounting for. You need to chomp() the line.
Edit to add: My perl is getting rusty from non-use but if memory serves me correct you can actually chomp() the entire array after reading the file: chomp(#array)
You should use chomp() on your array, like this:
#array=<FH>;
chomp(#array);
perl -plwe 'chomp; $c = length($_) % 3; chop while $c--' < /tmp/zock.txt
Look up the options in perlrun. Note that line endings are characters, too. Get them out of the way using chomp; re-add them on output using the -l option. Use chop to efficiently remove characters from the end of a string.
Reading your code, you are trying to print just the first 'nx3' characters for the largest value of n for each line.
The following code does this using a simple regular expression.
For each line, it first removes the line ending, then greedy matches
as many .{3} as it can (. matches any character, {3} asks for exactly 3 of them).
The memory requirement of this approach (compared with using an array the size of your file) is fixed. Not too important if your file is small compared with your free memory, but sometimes files are gigabytes, and sometimes memory is very small.
It's always worth using variable names that reflect the purpose of the variable, rather than things like $a or #array. In this case I used only one variable, which I called $line.
It's also good practice to close files as soon as you have finished with them.
#!/usr/bin/perl
use strict;
use warnings; # This will apply warnings even if you use command perl to run it
open FH, '<', 'input.txt'; # Use three part file open - single quote where no interpolation required.
for my $line (<FH>){
chomp($line);
$line =~ s/((.{3})*).*/$1\n/;
print $line;
}
close FH;

Resources