I am trying to split a string in Perl such as below :-
String = "What are you doing these days?"
Split1 - What
Split2 - are
Split3 - you
Split4 - doing these days?
I want the first n number of words separately and the rest of the line together in a separate variable.
Is there any way to do this ? There is no common delimiter I can use. Any help is appreciated ! Thanks.
Perl's split has a limit parameter that seems to be just what you want. To split off the first $n words and leave the rest together, use $n+1 as the limit (the result will be at most $n+1 elements):
my $n = 3;
my $string = "What are you doing these days?";
my #words = split / /, $string, $n+1;
print "$_\n" for #words;
($string1, $string2, $string3, $rest) = split (/ /, $instring, 4);
You can use the following regex to split the string according to your requirement
$ip_tring = "What are you doing these days?";
if($ip_tring =~ m/(\S+)\s(\S+)\s(\S+)\s(.*)/)
{
print("1=$1,2=$2,3=$3,4=$4\n");
}
else
{
print("no match...\n");
}
Related
What is wrong specifically with this code? How can I correct it?
$x = "without any vowels after the first letter\n";
foreach $i (#x[1..]) {
if ($i =~ /[AEIOUaeiou]/) {
$x =~ tr/A E I O U a e i o u//d;
}
}
print "$x\n";
I tried [1..] to exclude the first letter. If it does not work, how else can I remove the first letter?
EDIT I edited code to make it syntactically (mostly) correct to convey their obvious original idea, except for the attempt to index into a string which isn't correct in Perl. (Clarifying that is a part of what I consider useful in this question.)
First, most of that is not Perl, or any programming language for that matter. I'd suggest to work through a Perl tutorial of your choice first, before trying to get solutions for specific problems. However, here's an answer since the problem itself is of enough interest in general.
Next, in Perl you can't directly index into a string, so you can't skip the first character(s) like that.
But you can separate that first character in the string and process the rest (removing vowels), of course. One way with regex†
use warnings;
use strict;
use feature 'say';
my $str = shift // 'out with ALL vowels after first';
$str =~ s/.\K(.*)/ $1 =~ tr{[aeiouAEIOU]}{}dr /e;
say $str; #--> ot wth LL vwls ftr frst
This relies on the /e modifier, which makes it so that the replacement side is evaluated as code, and so it runs an independent transliteration (tr) there, processing the captured substring.
Then we need the /r modifier in that embedded tr/regex, to return the new string instead of changing the old one in place -- what wouldn't be possible anyway as one can't change $1.
One can also use a regex insteda of tr, less efficient but with its many conveniences
$str =~ s/.\K(.*)/ $1 =~ s{[aeiou]}{}igr /e;
Now we can use far more sophisticated tools in that regex than in tr; in this case it's only the i flag, for case-insensitive.
If it were more than the one first character to keep change . to .{N}.
† Regex is not compulsory, of course. A more elementary take: split the string into its first character and the rest, then use tr on the rest
use warnings;
use strict;
use feature 'say';
my $str = shift // q(out with ALL vowels after first);
my ($result, $rest) = split //, $str, 2; # first char, rest of string
$result .= $rest =~ tr/aeiouAEIOU//dr; # prune $rest of vowels, append to $result
say $result;
Then put this in a little mini subroutine. To change the original string in place, instead of getting a new ($result) string, use it ($str) everywhere instead of $result.
I am not sure about how it compares efficiency wise but it may well fare well.
For the curiosity's sake, here it is in a single statement
$str = join '', map { length > 1 ? tr/aeiouAEIOU//dr : $_ } split //, $str, 2;
This specifically uses the fact that only the first (one) character need be skipped; that is easily made dynamical, as long as the criterion does involve the length of substrings.
More importantly, this assumes that the rest of the string is longer than 1 character. To drop that assumption change the criterion
use feature 'state';
$str = join '', map {
state $chr_cnt = 0;
++$chr_cnt > 1 ? tr/aeiouAEIOU//dr : $_
}
split //, $str, 2;
This also relies on leaving aside just one character. It uses a feature to keep a lexical value across executions, state.
A more generic solution, which uses the property of substr to be possible to write to
substr($str, 1) =~ tr/aeiouAEIOU//d;
Here it's much cleaner and simpler to relax the limitation to the first character: just change that 1 in order to skip more characters. The tricky -- unexpected -- part here may be that normally builtins can't be written to like that, they aren't lvalue subroutines
The algorithm for solution of the problem is in your question
add letter to a string if it isn't vowel
add letter to the string if it is first vowel in the input string
use strict;
use warnings;
my $x = "without any vowels after the first letter\n";
my($o,$count) = ('',0);
print 'IN: ' . $x;
for ( split('',$x) ) {
$o .= $_ unless $count != 0 and /[aeiou]/i;
$count++ if /[aeiou]/i;
}
print 'OUT: ' . $o;
Output
IN: without any vowels after the first letter
OUT: witht ny vwls ftr th frst lttr
Addendum: OP's clarification of the problem
look at each word in the sentence
if a word starts from vowel then delete all vowels but first one
if a word starts from none vowel then delete all vowels
use strict;
use warnings;
use feature 'say';
my $x = 'I like apples more than oranges';
my #o;
say 'IN: ' . $x;
for ( split(' ', $x) ) {
if ( /^[aeiou]/i ) {
s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e;
} else {
tr|aeiouAEIOU||d;
}
#o = (#o,$_);
}
say 'OUT: ' . join(' ', #o);
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs
Or in perlish style
use strict;
use warnings;
use feature 'say';
my $x = "I like apples more than oranges";
say 'IN: ' . $x;
say 'OUT: ' . join(' ', map { s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e && $_ } split('[ ]+', $x));
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs
Powershell 5.x
There is a string $s, approximately 2.5 KB long.
I need to run a series of replacements (about 20) on it, in a loop.
There are some 800K strings like that in total so I need this to be fast.
For each replacement, I know position [int] $x and new value [string] $ns.
Example:
We start with $s == "abcdefghijklmn" and the $x (position) is 3, and a new value to put there is $ns == "XYZ"
We end up with $s == "abcXYZghijklmn"
(strings are indexed 0-based)
My solution so far:
$s = "abcdefghijklmn"
$ns = "XYZ"
$x = 3
$s = $s.Remove($x, $ns.Length).Insert($x, $ns)
This is at least three operations: removal of a string then insertion of a new string and finally storing the final result (not sure about internals here but I assume this is how things work). For 800K strings of 2.5KB len each we're talking about ~2GB of data being processed three times in memory. That's not the most effective way of doing things.
In Python, with MutableString, I can do in-place replacement with minimal cost. Does a similar thing exist in Powershell?
Here is my take using the Stringbuilder class.
$s = "abcdefghijklmn" -as [system.text.stringbuilder]
$ns = "XYZ"
$x = 3
$s.Replace($s.tostring().substring($x,$ns.length),$ns,$x,$ns.length).tostring()
How about reconstructing the string instead ?
$s.Substring(0,$x) + $ns + $s.Substring($x)
Not sure if it's faster, might be worth checking on all the strings you have. You could also run things in parallel with foreach to speed up the process.
This should be faster. You should convert the input string to a char array once before the replacements and convert it back to a string when all replacements are done:
$s = ("abcdefghijklmn").ToCharArray()
$ns = ("XYZ").ToCharArray()
$x = 3
0..($ns.Length-1) | ForEach-Object { $s[$x + $_] = $ns[$_] }
$result = [String]::new($s)
Here's another alternative. It uses Regular Expression.
$s = "abcdefghijklmn"
$ns = "XYZ"
$x = 3
$s -replace "(?m)^(.{$x}).(.+)", "`$1$ns`$2"
Regex details:
^ Assert position at the beginning of a line (at beginning of the string or after a line break character)
( Match the regular expression below and capture its match into backreference number 1
. Match any single character that is not a line break character
{3} Exactly 3 times
)
. Match any single character that is not a line break character
( Match the regular expression below and capture its match into backreference number 2
. Match any single character that is not a line break character
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
You'd have to test yourself which of the offered solutions is the fastest in your situation.
I'm a newbie to perl, I went through this Check whether a string contains a substring to how to check a substring is present in a string, Now my scenario is little different
I have a string like
/home/me/Desktop/MyWork/systemfile/directory/systemfile64.elf ,
In the end this might be systemfile32.elf or systemfile16.elf,so In my perl script I need to check whether this string contains a a substring in the format systemfile*.elf.
How can I achieve this in perl ?
I'm planing to do like this
if(index($mainstring, _serach_for_pattern_systemfile*.elf_ ) ~= -1) {
say" Found the string";
}
You can use the pattermatching
if ($string =~ /systemfile\d\d\.elf$/){
# DoSomething
}
\d stands for a digit (0-9)
$ stands for end of string
Well
if( $mainstring =~ m'/systemfile(16|32)\.elf$' ) {
say" Found the string";
}
does the job.
For your informations :
$string =~ m' ... '
is the same than
$string =~ / ... /
which checks the string against the given regular expression. This is one of the most useful features of the Perl language.
More info at http://perldoc.perl.org/perlre.html
(I did use the m'' syntax to improve readability, because of the presence of another '/' character in the regexp. I could also write /\/systemfile\d+\.elf$/
if ($string =~ /systemfile.*\.elf/) {
# Do something with the string.
}
That should match only the strings you seek (given that every time, a given string is stored in $string). Inside the curly brackets you should write your logic.
The . stands for "any character" and the * stands for "as many times you see the last character". So, .* means "any character as many times you see it". If you know that the string will end in this pattern, then it will be safer to add $ at the end of the pattern to mark that the string should end with this:
$string =~ /systemfile.*\.elf$/
Just don't forget to chomp $string to avoid any line-breaks that might mess with your desired output.
use strict;
use warnings;
my $string = 'systemfile16.elf';
if ($string =~ /^systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
will match systemfile'anythinghere'.elf if you have a set directory.
if you want to search entire string, including directory then:
my $string = 'c:\\windows\\system\\systemfile16.elf';
if ($string =~ /systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
if you only want to match 2 systemfile then 2 numeric characters .elf then use the other methods mentioned above by other answers. but if you want systemanything.elf then use one of these.
I'm trying to edit the numbers in a string and put it back in the same place as they have been before.
Example:
$string = "struct:{thin:[[75518103,75518217],[75518338,75518363],[75532810,75533910],],thick:[[75518363,75518424],[75521257,75521463],],}";
I need to edit the numbers, but want to keep the rest of the string at it is. Additionally the number of brackets can vary.
Until now I split the string at "," with
#array = split (',',$string);
and extracted the numbers for editing with
foreach (#array) {
$_ =~ s/\D//g;
$_ = $number - $_;
}
now I want to put the numbers back in their original place in the string, but I don't know how.
Somehow I hope there is a better way to edit the numbers in the string without splitting it and extracting the numbers. Hope you can help me
You could use a regular expression substitution with the /e flag, search for long numbers and run Perl code in the substitution part.
use strict;
use warnings;
use feature 'say';
my $number = 100_000_000;
my $string = "struct:{thin:[[75518103,75518217],[75518338,75518363],[75532810,75533910],],thick:[[75518363,75518424],[75521257,75521463],],}";
$string =~ s/(\d+)/{$number - $1}/eg;
say $string;
__END__
struct:{thin:[[24481897,24481783],[24481662,24481637],[24467190,24466090],],thick:[[24481637,24481576],[24478743,24478537],],}
If there are no other numbers in the string, that would work. In case there is more logic involved, you can also move it into a subroutine and just call that in the substitution.
sub replace {
return $_ % 2 ? $_ * 2 : $_ / 4;
}
$string =~ s/(\d+)/{replace($1)}/eg;
You might also need to revise the search pattern to be a bit more precise.
I just found the evaluation modifier for regex! I now did it with
$string =~ s/([0-9]+)/$number-$1/eg;
and it worked!
I want to count the number of upper case letters in a string using perl.
For example: I need to know how many upper case characters the word "EeAEzzKUwUHZws" contains.
Beware of Unicode, as the straight A-Z thing isn't really portable for other characters, such as accented uppercase letters. if you need to handle these too, try:
my $result = 0;
$result++ while($string =~ m/\p{Uppercase}/g);
Use the tr operator:
$upper_case_letters = $string =~ tr/A-Z//;
This is a common question and the tr operator usually outperforms other techniques.
sub count {
$t = shift;
$x = 0;
for( split//,$t ) {
$x++ if m/[A-Z]/;
}
return $x;
}
The one-liner method is:
$count = () = $string =~ m/\p{Uppercase}/g
This is based off Stuart Watt's answer but modified according to the tip that ysth posted in the comments to make it a one-liner.