Recursively extract strings between quotes in a given string - string

Given a string with substrings within quotes, extract all such substrings
I have written the following piece of code but something tells me that it is ugly (although it does seem to do the trick)
my $str = 'printf ("hellp;world", and "this is ; also" and )';
loop:
if ($str =~ /"(.*?)"/) {
my $substr = $1;
$str =~ s/"$substr"//;
print "$substr\n";
}
if ($str =~ /"/) {
goto loop;
}
perl quotes.pl
hellp;world
this is ; also
So it does work as expected.

You can do that directly by using the /g regex flag in either scalar context:
while ($str =~ /"([^"]*)"/g) {
print "$1\n";
}
... or list context:
for my $match ($str =~ /"([^"]*)"/g) {
print "$match\n";
}
I've also changed .*? to [^"]* because it's better to be specific about what you want to match.
/g is documented in perldoc perlop:
The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.
In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see "pos" in perlfunc. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the /c modifier (for example, m//gc). Modifying the target string also resets the search position.
(Emphasis mine.)

Related

How to edit a multi-line scalar and print the edits

I need to edit a multi-line scalar and print the results, however I am not able to do it neatly.
my $text = "$omething\n nothing\n Everything\n";
What I need to do is check each line, and if there's a capital letter or special charracter - print this line and remove it from the original scalar ($text).
In this example it would print two times, first time:
$omething
Second time:
Everything
And remove both of those strings from the $text scalar.
To include a dollar sign in a double quoted string, you need to escape it by a backslash.
You can remove the matching lines in a while loop:
#!/usr/bin/perl
use warnings;
use strict;
my $text = "\$omething\nnothing\nEverything\n";
while ($text =~ s/(.*[[:upper:]\$].*\n)//) {
print $1;
}
print "Remaining: $text";
A period never matches a newline (unless you specify the /s modifier).

How to remove text within brackets in a string while still keeping the other text?

my $book = Spreadsheet::Read->new();
my $book = ReadData
('D:\Profiles\jmahroof\Desktop\Scheduled_Build_Overview.xls');
my $cell = "CD7";
my $n = "1";
my $send = $book->[$n]{$cell};
$send =~ s/\(/ /g;
$send =~ s/\)//g;
I have the above code that gets data from an excel file and then picks out text from a specified cell and removes brackets from the string. I need to be able to remove everything within the brackets including the brackets themselves while leaving the rest of the text. The format of the string is exactly like the following : text(text)
$send =~ s/\(.*?\)//;
Explained:
s/ does the search
\ escapes the bracket that comes next as it will be seen as part of the code if not escaped.
(.*?\) here we say what we are searching for an we use non-greedy .*? to match anything up to the last bracket again the last bracket is escaped by \
/ begins the replace function for the search
/ ends the search and replace.
So we Search for (*) and then replace with nothing.
Explaing Non greedy vs Greedy.
.* being greedy will match up until the last ) found
So if we have string((substring)end) then s/(.*)// will go from the first ( up to the last ) leaving you with string
non greedy will not, it will begin with the first ( up to the first ) leaving you with stringend) so it will be lazy and only match what you ask for which is from ( to ) where greedy will match grab everything, even if you have this (string)(())((strings))()()strings)strings)
If you don't have nested parentheses, this single substitution can do it:
$send =~ s/\(.*?\)//;
If these parenthesis are always the last item in text, it can be further simplified to:
$send =~s/\(.*//;

search a specific sub string pattern in a string using perl

I'm a newbie to perl, I went through this Check whether a string contains a substring to how to check a substring is present in a string, Now my scenario is little different
I have a string like
/home/me/Desktop/MyWork/systemfile/directory/systemfile64.elf ,
In the end this might be systemfile32.elf or systemfile16.elf,so In my perl script I need to check whether this string contains a a substring in the format systemfile*.elf.
How can I achieve this in perl ?
I'm planing to do like this
if(index($mainstring, _serach_for_pattern_systemfile*.elf_ ) ~= -1) {
say" Found the string";
}
You can use the pattermatching
if ($string =~ /systemfile\d\d\.elf$/){
# DoSomething
}
\d stands for a digit (0-9)
$ stands for end of string
Well
if( $mainstring =~ m'/systemfile(16|32)\.elf$' ) {
say" Found the string";
}
does the job.
For your informations :
$string =~ m' ... '
is the same than
$string =~ / ... /
which checks the string against the given regular expression. This is one of the most useful features of the Perl language.
More info at http://perldoc.perl.org/perlre.html
(I did use the m'' syntax to improve readability, because of the presence of another '/' character in the regexp. I could also write /\/systemfile\d+\.elf$/
if ($string =~ /systemfile.*\.elf/) {
# Do something with the string.
}
That should match only the strings you seek (given that every time, a given string is stored in $string). Inside the curly brackets you should write your logic.
The . stands for "any character" and the * stands for "as many times you see the last character". So, .* means "any character as many times you see it". If you know that the string will end in this pattern, then it will be safer to add $ at the end of the pattern to mark that the string should end with this:
$string =~ /systemfile.*\.elf$/
Just don't forget to chomp $string to avoid any line-breaks that might mess with your desired output.
use strict;
use warnings;
my $string = 'systemfile16.elf';
if ($string =~ /^systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
will match systemfile'anythinghere'.elf if you have a set directory.
if you want to search entire string, including directory then:
my $string = 'c:\\windows\\system\\systemfile16.elf';
if ($string =~ /systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
if you only want to match 2 systemfile then 2 numeric characters .elf then use the other methods mentioned above by other answers. but if you want systemanything.elf then use one of these.

perl extract numbers from string, edit, put back into string at their original position

I'm trying to edit the numbers in a string and put it back in the same place as they have been before.
Example:
$string = "struct:{thin:[[75518103,75518217],[75518338,75518363],[75532810,75533910],],thick:[[75518363,75518424],[75521257,75521463],],}";
I need to edit the numbers, but want to keep the rest of the string at it is. Additionally the number of brackets can vary.
Until now I split the string at "," with
#array = split (',',$string);
and extracted the numbers for editing with
foreach (#array) {
$_ =~ s/\D//g;
$_ = $number - $_;
}
now I want to put the numbers back in their original place in the string, but I don't know how.
Somehow I hope there is a better way to edit the numbers in the string without splitting it and extracting the numbers. Hope you can help me
You could use a regular expression substitution with the /e flag, search for long numbers and run Perl code in the substitution part.
use strict;
use warnings;
use feature 'say';
my $number = 100_000_000;
my $string = "struct:{thin:[[75518103,75518217],[75518338,75518363],[75532810,75533910],],thick:[[75518363,75518424],[75521257,75521463],],}";
$string =~ s/(\d+)/{$number - $1}/eg;
say $string;
__END__
struct:{thin:[[24481897,24481783],[24481662,24481637],[24467190,24466090],],thick:[[24481637,24481576],[24478743,24478537],],}
If there are no other numbers in the string, that would work. In case there is more logic involved, you can also move it into a subroutine and just call that in the substitution.
sub replace {
return $_ % 2 ? $_ * 2 : $_ / 4;
}
$string =~ s/(\d+)/{replace($1)}/eg;
You might also need to revise the search pattern to be a bit more precise.
I just found the evaluation modifier for regex! I now did it with
$string =~ s/([0-9]+)/$number-$1/eg;
and it worked!

How do I include new lines in a string in Perl?

I have a string that looks like this
Acanthocolla_cruciata,#8B5F65Acanthocyrta_haeckeli,#8B5F65Acanthometra_fusca,#8B5F65Acanthopeltis_japonica,#FFB5C5
I am trying to added in new lines so get in list format. Like this
Acanthocolla_cruciata,#8B5F65
Acanthocyrta_haeckeli,#8B5F65
Acanthometra_fusca,#8B5F65
Acanthopeltis_japonica,#FFB5C5
I have a perl script
use strict;
use warnings;
open my $new_tree_fh, '>', 'test_match.txt'
or die qq{Failed to open "update_color.txt" for output: $!\n};
open my $file, '<', $ARGV[0]
or die qq{Failed to open "$ARGV[0]" for input: $!\n};
while ( my $string = <$file> ) {
my $splitmessage = join ("\n", ($string =~ m/(.+)+\,+\#+\w{6}/gs));
print $new_tree_fh $splitmessage, "\n";
}
close $file;
close $new_tree_fh;
The pattern match works but it wont print the new line as I want to make the list. Can anyone please suggest anything.
I'd do:
my $str = 'Acanthocolla_cruciata,#8B5F65Acanthocyrta_haeckeli,#8B5F65Acanthometra_fusca,#8B5F65Acanthopeltis_japonica,#FFB5C5';
$str =~ s/(?<=,#\w{6})/\n/g;
say $str;
Output:
Acanthocolla_cruciata,#8B5F65
Acanthocyrta_haeckeli,#8B5F65
Acanthometra_fusca,#8B5F65
Acanthopeltis_japonica,#FFB5C5
OK, I think your problem here is that your regular expression doesn't match properly.
(.+)+
for example - probably doesn't do what you think it does. It's a greedy capture of 1 or more of "anything" which will grab your whole string.
Check it out on regex101.
Try:
#!/usr/bin/perl
use strict;
use warnings;
while ( my $string = <DATA> ) {
my $splitmessage = join( "\n", ( $string =~ m/(\w+,\#+\w{6})/g ) );
print $splitmessage, "\n";
}
__DATA__
Acanthocolla_cruciata,#8B5F65Acanthocyrta_haeckeli,#8B5F65Acanthometra_fusca,#8B5F65Acanthopeltis_japonica,#FFB5C5
Which will print:
Acanthocolla_cruciata,#8B5F65
Acanthocyrta_haeckeli,#8B5F65
Acanthometra_fusca,#8B5F65
Acanthopeltis_japonica,#FFB5C5
Rather than a quickfix solution, let's find the problem in your existing code and hence learn from it. Your problem is in the regular expression, so we'll dissect and fix it.
($string =~ m/(.+)+\,+\#+\w{6}/gs)
First, the two significant mistakes that lead to the bug:
At the beginning, you're doing a .+, followed by matching with , and # and so on. The problem is, .+ is greedy, which means it'll match upto the last , in the input, and not the first one. So when you run this, almost the entire line (except for the last plant's color) gets matched up by this single .+.
There are a few different ways you can fix this, but the easiest is to restrict what you're matching. Instead of saying .+ "match anything", make it [\w\s]+ at the beginning - which means match either "word characters" (which includes alphabets and digits) or space characters (since there is a space in the middle of the plant name).
($string =~ m/([\w\s]+)+\,+\#+\w{6}/gs)
That changes the output, but still not to the fully correct version because:
m/some regex/g returns a list of its matches as a list here, and what we want is for it to return the whole match including both plant name and color. But, when there are paranthesis inside the match anywhere, m/ returns only the part matched by the paranthesis (which is the plant name here), not the whole match. So, remove the paranthesis, and it becomes:
($string =~ m/[\w\s]++\,+\#+\w{6}/gs)
This works, but is quite clumsy and bug-prone, so here's some improvement suggestions:
Since your input has no newline characters, the /s at the end is unnecessary.
($string =~ m/[\w\s]++\,+\#+\w{6}/g)
, and # are not a special character in perl regular expressions, so they don't need a \ before them.
($string =~ m/[\w\s]++,+#+\w{6}/g)
+ is for when you know only that the character will be present, but don't know how many times it'll be there. Here, since we're only trying to match one , and one # characters, the + after them is unnecessary.
($string =~ m/[\w\s]++,#\w{6}/g)
The ++ after [\w\s] means something quite different from + (basically an even greedier match than usual), so let's make it a single +
($string =~ m/[\w\s]+,#\w{6}/g)
Optionally, you can change the last \w to match only the hexadecimal characters which will appear in the colour code:
($string =~ m/[\w\s]+,#[0-9A-F]{6}/g)
That's a pretty solid, working regular expression that does what you want.

Resources