Powershell 5.x
There is a string $s, approximately 2.5 KB long.
I need to run a series of replacements (about 20) on it, in a loop.
There are some 800K strings like that in total so I need this to be fast.
For each replacement, I know position [int] $x and new value [string] $ns.
Example:
We start with $s == "abcdefghijklmn" and the $x (position) is 3, and a new value to put there is $ns == "XYZ"
We end up with $s == "abcXYZghijklmn"
(strings are indexed 0-based)
My solution so far:
$s = "abcdefghijklmn"
$ns = "XYZ"
$x = 3
$s = $s.Remove($x, $ns.Length).Insert($x, $ns)
This is at least three operations: removal of a string then insertion of a new string and finally storing the final result (not sure about internals here but I assume this is how things work). For 800K strings of 2.5KB len each we're talking about ~2GB of data being processed three times in memory. That's not the most effective way of doing things.
In Python, with MutableString, I can do in-place replacement with minimal cost. Does a similar thing exist in Powershell?
Here is my take using the Stringbuilder class.
$s = "abcdefghijklmn" -as [system.text.stringbuilder]
$ns = "XYZ"
$x = 3
$s.Replace($s.tostring().substring($x,$ns.length),$ns,$x,$ns.length).tostring()
How about reconstructing the string instead ?
$s.Substring(0,$x) + $ns + $s.Substring($x)
Not sure if it's faster, might be worth checking on all the strings you have. You could also run things in parallel with foreach to speed up the process.
This should be faster. You should convert the input string to a char array once before the replacements and convert it back to a string when all replacements are done:
$s = ("abcdefghijklmn").ToCharArray()
$ns = ("XYZ").ToCharArray()
$x = 3
0..($ns.Length-1) | ForEach-Object { $s[$x + $_] = $ns[$_] }
$result = [String]::new($s)
Here's another alternative. It uses Regular Expression.
$s = "abcdefghijklmn"
$ns = "XYZ"
$x = 3
$s -replace "(?m)^(.{$x}).(.+)", "`$1$ns`$2"
Regex details:
^ Assert position at the beginning of a line (at beginning of the string or after a line break character)
( Match the regular expression below and capture its match into backreference number 1
. Match any single character that is not a line break character
{3} Exactly 3 times
)
. Match any single character that is not a line break character
( Match the regular expression below and capture its match into backreference number 2
. Match any single character that is not a line break character
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
You'd have to test yourself which of the offered solutions is the fastest in your situation.
Related
What is wrong specifically with this code? How can I correct it?
$x = "without any vowels after the first letter\n";
foreach $i (#x[1..]) {
if ($i =~ /[AEIOUaeiou]/) {
$x =~ tr/A E I O U a e i o u//d;
}
}
print "$x\n";
I tried [1..] to exclude the first letter. If it does not work, how else can I remove the first letter?
EDIT I edited code to make it syntactically (mostly) correct to convey their obvious original idea, except for the attempt to index into a string which isn't correct in Perl. (Clarifying that is a part of what I consider useful in this question.)
First, most of that is not Perl, or any programming language for that matter. I'd suggest to work through a Perl tutorial of your choice first, before trying to get solutions for specific problems. However, here's an answer since the problem itself is of enough interest in general.
Next, in Perl you can't directly index into a string, so you can't skip the first character(s) like that.
But you can separate that first character in the string and process the rest (removing vowels), of course. One way with regex†
use warnings;
use strict;
use feature 'say';
my $str = shift // 'out with ALL vowels after first';
$str =~ s/.\K(.*)/ $1 =~ tr{[aeiouAEIOU]}{}dr /e;
say $str; #--> ot wth LL vwls ftr frst
This relies on the /e modifier, which makes it so that the replacement side is evaluated as code, and so it runs an independent transliteration (tr) there, processing the captured substring.
Then we need the /r modifier in that embedded tr/regex, to return the new string instead of changing the old one in place -- what wouldn't be possible anyway as one can't change $1.
One can also use a regex insteda of tr, less efficient but with its many conveniences
$str =~ s/.\K(.*)/ $1 =~ s{[aeiou]}{}igr /e;
Now we can use far more sophisticated tools in that regex than in tr; in this case it's only the i flag, for case-insensitive.
If it were more than the one first character to keep change . to .{N}.
† Regex is not compulsory, of course. A more elementary take: split the string into its first character and the rest, then use tr on the rest
use warnings;
use strict;
use feature 'say';
my $str = shift // q(out with ALL vowels after first);
my ($result, $rest) = split //, $str, 2; # first char, rest of string
$result .= $rest =~ tr/aeiouAEIOU//dr; # prune $rest of vowels, append to $result
say $result;
Then put this in a little mini subroutine. To change the original string in place, instead of getting a new ($result) string, use it ($str) everywhere instead of $result.
I am not sure about how it compares efficiency wise but it may well fare well.
For the curiosity's sake, here it is in a single statement
$str = join '', map { length > 1 ? tr/aeiouAEIOU//dr : $_ } split //, $str, 2;
This specifically uses the fact that only the first (one) character need be skipped; that is easily made dynamical, as long as the criterion does involve the length of substrings.
More importantly, this assumes that the rest of the string is longer than 1 character. To drop that assumption change the criterion
use feature 'state';
$str = join '', map {
state $chr_cnt = 0;
++$chr_cnt > 1 ? tr/aeiouAEIOU//dr : $_
}
split //, $str, 2;
This also relies on leaving aside just one character. It uses a feature to keep a lexical value across executions, state.
A more generic solution, which uses the property of substr to be possible to write to
substr($str, 1) =~ tr/aeiouAEIOU//d;
Here it's much cleaner and simpler to relax the limitation to the first character: just change that 1 in order to skip more characters. The tricky -- unexpected -- part here may be that normally builtins can't be written to like that, they aren't lvalue subroutines
The algorithm for solution of the problem is in your question
add letter to a string if it isn't vowel
add letter to the string if it is first vowel in the input string
use strict;
use warnings;
my $x = "without any vowels after the first letter\n";
my($o,$count) = ('',0);
print 'IN: ' . $x;
for ( split('',$x) ) {
$o .= $_ unless $count != 0 and /[aeiou]/i;
$count++ if /[aeiou]/i;
}
print 'OUT: ' . $o;
Output
IN: without any vowels after the first letter
OUT: witht ny vwls ftr th frst lttr
Addendum: OP's clarification of the problem
look at each word in the sentence
if a word starts from vowel then delete all vowels but first one
if a word starts from none vowel then delete all vowels
use strict;
use warnings;
use feature 'say';
my $x = 'I like apples more than oranges';
my #o;
say 'IN: ' . $x;
for ( split(' ', $x) ) {
if ( /^[aeiou]/i ) {
s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e;
} else {
tr|aeiouAEIOU||d;
}
#o = (#o,$_);
}
say 'OUT: ' . join(' ', #o);
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs
Or in perlish style
use strict;
use warnings;
use feature 'say';
my $x = "I like apples more than oranges";
say 'IN: ' . $x;
say 'OUT: ' . join(' ', map { s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e && $_ } split('[ ]+', $x));
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs
I'm trying to edit the numbers in a string and put it back in the same place as they have been before.
Example:
$string = "struct:{thin:[[75518103,75518217],[75518338,75518363],[75532810,75533910],],thick:[[75518363,75518424],[75521257,75521463],],}";
I need to edit the numbers, but want to keep the rest of the string at it is. Additionally the number of brackets can vary.
Until now I split the string at "," with
#array = split (',',$string);
and extracted the numbers for editing with
foreach (#array) {
$_ =~ s/\D//g;
$_ = $number - $_;
}
now I want to put the numbers back in their original place in the string, but I don't know how.
Somehow I hope there is a better way to edit the numbers in the string without splitting it and extracting the numbers. Hope you can help me
You could use a regular expression substitution with the /e flag, search for long numbers and run Perl code in the substitution part.
use strict;
use warnings;
use feature 'say';
my $number = 100_000_000;
my $string = "struct:{thin:[[75518103,75518217],[75518338,75518363],[75532810,75533910],],thick:[[75518363,75518424],[75521257,75521463],],}";
$string =~ s/(\d+)/{$number - $1}/eg;
say $string;
__END__
struct:{thin:[[24481897,24481783],[24481662,24481637],[24467190,24466090],],thick:[[24481637,24481576],[24478743,24478537],],}
If there are no other numbers in the string, that would work. In case there is more logic involved, you can also move it into a subroutine and just call that in the substitution.
sub replace {
return $_ % 2 ? $_ * 2 : $_ / 4;
}
$string =~ s/(\d+)/{replace($1)}/eg;
You might also need to revise the search pattern to be a bit more precise.
I just found the evaluation modifier for regex! I now did it with
$string =~ s/([0-9]+)/$number-$1/eg;
and it worked!
I just want to know the reason if i use the (.) instead i got the result but + is doing arhtematic addition but how is that ASCII addition
my $string = "ZZ";
my $appendstring = $string+1;
print $appendstring;
output
1
Expeccting
ZZ1
First of all, your question is very unclear, and your "example" (if you want to call it that) does not match reality, but in an effort to help whoever stumbles across this question in the future, I'm going to venture an answer anyway.
Let's clear up your example first:
$ perl -lwe '$x = "ZZ"; print $x + 1;'
Argument "ZZ" isn't numeric in addition (+) at -e line 1.
1
What I think you might have meant was:
$ perl -lwe '$x = "ZZ"; print ++$x;'
AAA
And the reason for that is explained in perlop:
The auto-increment operator has a little extra builtin magic to it. If
you increment a variable that is numeric, or that has ever been used
in a numeric context, you get a normal increment. If, however, the
variable has been used in only string contexts since it was set, and
has a value that is not the empty string and matches the pattern
/^[a-zA-Z]*[0-9]*\z/, the increment is done as a string, preserving
each character within its range, with carry.
Edit: your updated question isn't any clearer than your original question, but now I think you're asking about string concatenation, which means you want the string concatenation operator: .
$ perl -lwe '$x = "ZZ"; print $x . 1;'
ZZ1
There is, however, a special case where you can use a string with the numeric addition operator and not generate a warning:
$ perl -lwe '$x = "0 but true"; print $x + 1;'
1
You also mentioned "ASCII addition", but I have no idea what that is or what you mean by that.
According to this
this is the way to concatenate
use strict;
use warnings;
my $x = "4T";
my $y = 3;
print $x . $y; # 4T3
but if you do this:
print $x + $y; # 7
# Argument "4T" isn't numeric in addition (+) at ...
Whenever you use the "+" perl tries to convert both values to numeric, if you provide a string and a number or 2 strings it'll take these as 0 and sum them.
http://ideone.com/0LyEij
I am trying to split a string in Perl such as below :-
String = "What are you doing these days?"
Split1 - What
Split2 - are
Split3 - you
Split4 - doing these days?
I want the first n number of words separately and the rest of the line together in a separate variable.
Is there any way to do this ? There is no common delimiter I can use. Any help is appreciated ! Thanks.
Perl's split has a limit parameter that seems to be just what you want. To split off the first $n words and leave the rest together, use $n+1 as the limit (the result will be at most $n+1 elements):
my $n = 3;
my $string = "What are you doing these days?";
my #words = split / /, $string, $n+1;
print "$_\n" for #words;
($string1, $string2, $string3, $rest) = split (/ /, $instring, 4);
You can use the following regex to split the string according to your requirement
$ip_tring = "What are you doing these days?";
if($ip_tring =~ m/(\S+)\s(\S+)\s(\S+)\s(.*)/)
{
print("1=$1,2=$2,3=$3,4=$4\n");
}
else
{
print("no match...\n");
}
I have a string which holds a decimal value in it and I need to convert that string into a floating point variable. So an example of the string I have is "5.45" and I want a floating point equivalent so I can add .1 to it. I have searched around the internet, but I only see how to convert a string to an integer.
You don't need to convert it at all:
% perl -e 'print "5.45" + 0.1;'
5.55
This is a simple solution:
Example 1
my $var1 = "123abc";
print $var1 + 0;
Result
123
Example 2
my $var2 = "abc123";
print $var2 + 0;
Result
0
Perl is a context-based language. It doesn't do its work according to the data you give it. Instead, it figures out how to treat the data based on the operators you use and the context in which you use them. If you do numbers sorts of things, you get numbers:
# numeric addition with strings:
my $sum = '5.45' + '0.01'; # 5.46
If you do strings sorts of things, you get strings:
# string replication with numbers:
my $string = ( 45/2 ) x 4; # "22.522.522.522.5"
Perl mostly figures out what to do and it's mostly right. Another way of saying the same thing is that Perl cares more about the verbs than it does the nouns.
Are you trying to do something and it isn't working?
Google lead me here while searching on the same question phill asked (sorting floats) so I figured it would be worth posting the answer despite the thread being kind of old. I'm new to perl and am still getting my head wrapped around it but brian d foy's statement "Perl cares more about the verbs than it does the nouns." above really hits the nail on the head. You don't need to convert the strings to floats before applying the sort. You need to tell the sort to sort the values as numbers and not strings.
i.e.
my #foo = ('1.2', '3.4', '2.1', '4.6');
my #foo_sort = sort {$a <=> $b} #foo;
See http://perldoc.perl.org/functions/sort.html for more details on sort
As I understand it int() is not intended as a 'cast' function for designating data type it's simply being (ab)used here to define the context as an arithmetic one. I've (ab)used (0+$val) in the past to ensure that $val is treated as a number.
$var += 0
probably what you want. Be warned however, if $var is string could not be converted to numeric, you'll get the error, and $var will be reset to 0:
my $var = 'abc123';
print "var = $var\n";
$var += 0;
print "var = $var\n";
logs
var = abc123
Argument "abc123" isn't numeric in addition (+) at test.pl line 7.
var = 0
Perl really only has three types: scalars, arrays, and hashes. And even that distinction is arguable. ;) The way each variable is treated depends on what you do with it:
% perl -e "print 5.4 . 3.4;"
5.43.4
% perl -e "print '5.4' + '3.4';"
8.8
In comparisons it makes a difference if a scalar is a number of a string. And it is not always decidable. I can report a case where perl retrieved a float in "scientific" notation and used that same a few lines below in a comparison:
use strict;
....
next unless $line =~ /and your result is:\s*(.*)/;
my $val = $1;
if ($val < 0.001) {
print "this is small\n";
}
And here $val was not interpreted as numeric for e.g. "2e-77" retrieved from $line. Adding 0 (or 0.0 for good ole C programmers) helped.
Perl is weakly typed and context based. Many scalars can be treated both as strings and numbers, depending on the operators you use.
$a = 7*6; $b = 7x6; print "$a $b\n";
You get 42 777777.
There is a subtle difference, however. When you read numeric data from a text file into a data structure, and then view it with Data::Dumper, you'll notice that your numbers are quoted. Perl treats them internally as strings.
Read:$my_hash{$1} = $2 if /(.+)=(.+)\n/;.
Dump:'foo' => '42'
If you want unquoted numbers in the dump:
Read:$my_hash{$1} = $2+0 if /(.+)=(.+)\n/;.
Dump:'foo' => 42
After $2+0 Perl notices that you've treated $2 as a number, because you used a numeric operator.
I noticed this whilst trying to compare two hashes with Data::Dumper.