perl interpolate code - string

I know in perl you can interpolate scalars by simply doing this:
"This is my $string"
However, I'm wondering if there is a way where I can interpolate actual perl code to be evaluated? An idea of what I want can be seen with ruby strings:
"5 + 4 = #{5 + 4}"
And it will evaluate whatever is in between the {}.
Does anyone know of a way to do this in perl? Thanks!

You can use the following trick:
"5 + 4 = #{[ 5 + 4 ]}"
Alternatively, you can use sprintf:
sprintf("5 + 4 = %d", 5 + 4);
Either of these yields the desired string. Arguably, sprintf is safer than the first one, as you can restrict the type of the interpolated value somewhat. The first one is closer in spirit to what you desire, however.
Further reading:
Why does Perl evaluate code in ${…} during string interpolation?

If you want to interpolate computations in string, you can do 3 things:
Use #{[]}. Inside the third brackets, place whatever expression you want.
Use ${\ do {}}. Place your expression within the 2nd braces.
Use a simple string concatenation: $targetstring = "$string1".func()."$string2", where func() will contain the dynamic content.

print "5 + 4 = ", 5 + 4;
Simply move it outside the quotes. Is there a point in forcing it to be within the quotes?
You can do a bit more formatting with printf, e.g.:
printf "5 + 4 = %s\n", 5 + 4;

You can use eval:
eval('$a = 4 + 5');
Mind you, eval should be used sparingly, it is pretty unsafe.

Related

Should I use f-string when writing with eval()

So recently I learned that eval() is a surprising function that can turn a string into a code command, it could be quite useful when writing a function where an argument is a string of a function's name.
But I wonder what is more pythonic way of using it.
Example:
a = [1,2,3,1,2,3,4,1,2,5,6,7]
b = 'count'
target = 2
# regular way
print(a.count(target)) # >>> 3
I tried to write with f-string, which it would work:
print(eval(f'{a}' + '.' + b + f'({target})')) # >>> 3
What amazed me is that it will work even if I don't use f-string:
print(eval('a' + '.' + b + '(target)')) # >>> 3
This is now a little questioning to me, because without f-string, 'a' can be confusing, it won't be easy to tell if this is just a string or a variable pretending to be a string.
Not sure how you guys think about this? Which one is more pythonic to you?
Thanks!
As people have mentioned in the comments, eval is dangerous (Alexander Huszagh) and rarely necessary.
However, you seem to have a problem with f-strings.
Before 3.6, there were two ways of constructing strings:
'a' + '.' + b + '(target)'
and
'{}.{}({})'.format(a, b, target)
With your values for a, b, and target, they both evaluate to: '[1,2,3,1,2,3,4,1,2,5,6,7].count(2)'
When you have several variables, the first option is cluttered with pluses and quotes, while the second one has the readability problem; which set of {} matches which variable (or expression).
A variant of the second way is to put names in the {} and refer to those names in format:
'{a}.{b}({target})'.format(a=a, b=b, target=target)
This makes the .format version more readable, but very verbose.
With the introduction of f-strings, we now include expressions inside the {}, similar to the named version of .format, but without the .format(...) part:
f'{a}.{b}({target})'
F-strings are a clear improvement on both + and .format version in readability and maintainability.
The way you are using f-strings, while technically correct, is not pytonic:
f'{a}' + '.' + b + f'({target})'
It seems you basically using f'{a}' as alternative to str(a).
To summarize, f'{a}' + '.' + b + f'({target})' can be simplified as f'{a}.{b}({target})'

XQuery - Using sum() returns NaN for string values

Trying to sum the total earnings from the top NBA players in 2012-2013 from this wikipedia page: https://en.wikipedia.org/wiki/Highest-paid_NBA_players_by_season
Here is my code:
sum(
let $doc := doc("NBApaid.xml")//table
for $x in $doc
where $x/tr/td/h2/span/#id ="2012.E2.80.932013"
for $y in $x/tr/td
where $y/h2/span = "2012–2013"
for $z in $y//td
where starts-with($z,"$")
let $a := substring($z, 2,10)
return number($a)
)
And the output is:
NaN
The problem here is that thenumber($a) returns a whole column of NaNs.
When I only return $a before converting it using number(), the output looks like this:
30,453,805
20,907,128
19,948,799
19,752,645
19,444,503
19,285,850
19,067,500
19,067,500
18,673,000
18,668,431
How come I can't convert the strings?
Use number(translate(xxx, ',', ''))
The problem is that your output strings are not real numbers, because the thousand separator , is not part of an XQuery number. So you will have to remove the separator from the string. For this, you can either use translate() as #MichaelKay rightfully suggests.
You could also use replace(), the difference between the two functions being that translate() only replaces single characters (which is all you need in this case) and replace can use regex. However, I personally feel that replace is a much more logical name and easier to read, so I personally tend to not use translate().
Also, if your processor supports XQuery 3.1 you can use the arrow notation and write it like this:
let $a := substring($z, 2, 10) => replace(",", "")

Getting precision of a float in Perl?

Let's say I had a Perl variable:
my $string = "40.23";
my $convert_to_num = $string * 1;
Is there a way I can find the precision of this float value? My solution so far was to simply just loop through the string, find the first instance of '.', and just start counting how many decimal places, returning 2 in this case. I'm just wondering if there was a more elegant or built-in function for this sort of thing. Thanks!
Here is an answer for "number of things after the period" in $nstring
length(($nstring =~ /\.(.*)/)[0]);
The matching part first finds . (\.), then matches everything else (.*). Since .* is in parentheses, it is returned as the first array element ([0]). Then I count how many with the length() function.
Anything you do in Perl with plain variables will be dependent on the compiler and hardware you use. If you really care about the precision, use
use "Math::BigFloat";
And set the desired properties. The number of digits is more properly termed accuracy in Math::BigFloat.
use Math::BigFloat;
Math::BigFloat->accuracy(12);
$n = new Math::BigFloat "52.12";
print "Accuracy of $n is ", $n->accuracy(), " length ",scalar($n->length()),"\n";
Will return
Accuracy of 52.1200000000 is 12 length 4

String concatenation with spaces

I would like to concatenate strings. I tried using strcat:
x = 5;
m = strcat('is', num2str(x))
but this function removes trailing white-space characters from each string. Is there another MATLAB function to perform string concatenation which maintains trailing white-space?
You can use horzcat instead of strcat:
>> strcat('one ','two')
ans =
onetwo
>> horzcat('one ','two')
ans =
one two
Alternatively, if you're going to be substituting numbers into strings, it might be better to use sprintf:
>> x = 5;
>> sprintf('is %d',x)
ans =
is 5
How about
strcat({' is '},{num2str(5)})
that gives
' is 5'
Have a look at the final example on the strcat documentation: try using horizontal array concatination instead of strcat:
m = ['is ', num2str(x)]
Also, have a look at sprintf for more information on string formatting (leading/trailing spaces etc.).
How about using strjoin ?
x = 5;
m ={'is', num2str(x)};
strjoin(m, ' ')
What spaces does this not take into account ? Only the spaces you haven't mentioned ! Did you mean:
m = strcat( ' is ',num2str(x) )
perhaps ?
Matlab isn't going to guess (a) that you want spaces or (b) where to put the spaces it guesses you want.

How to parse a string (by a "new" markup) with R?

I want to use R to do string parsing that (I think) is like a simplistic HTML parsing.
For example, let's say we have the following two variables:
Seq <- "GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA"
Str <- ">>>>>>>..>>>>........<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<."
Say that I want to parse "Seq" According to "Str", by using the legend here
Seq: GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA
Str: >>>>>>>..>>>>........<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<.
| | | | | | | || |
+-----+ +--------------+ +---------------+ +---------------++-----+
| Stem 1 Stem 2 Stem 3 |
| |
+----------------------------------------------------------------+
Stem 0
Assume that we always have 4 stems (0 to 3), but that the length of letters before and after each of them can very.
The output should be something like the following list structure:
list(
"Stem 0 opening" = "GCCTCGA",
"before Stem 1" = "TA",
"Stem 1" = list(opening = "GCTC",
inside = "AGTTGGGA",
closing = "GAGC"
),
"between Stem 1 and 2" = "G",
"Stem 2" = list(opening = "TACGA",
inside = "CTGAAGA",
closing = "TCGTA"
),
"between Stem 2 and 3" = "AGGtC",
"Stem 3" = list(opening = "ACCAG",
inside = "TTCGATC",
closing = "CTGGT"
),
"After Stem 3" = "",
"Stem 0 closing" = "TCGGGGC"
)
I don't have any experience with programming a parser, and would like advices as to what strategy to use when programming something like this (and any recommended R commands to use).
What I was thinking of is to first get rid of the "Stem 0", then go through the inner string with a recursive function (let's call it "seperate.stem") that each time will split the string into:
1. before stem
2. opening stem
3. inside stem
4. closing stem
5. after stem
Where the "after stem" will then be recursively entered into the same function ("seperate.stem")
The thing is that I am not sure how to try and do this coding without using a loop.
Any advices will be most welcomed.
Update: someone sent me a bunch of question, here they are.
Q: Does each sequence have the same number of ">>>>" for the opening sequence as it does for "<<<<" on the ending sequence?
A: Yes
Q: Does the parsing always start with a partial stem 0 as your example shows?
A: No. Sometimes it will start with a few "."
Q: Is there a way of making sure you have the right sequences when you start?
A: I am not sure I understand what you mean.
Q: Is there a chance of error in the middle of the string that you have to restart from?
A: Sadly, yes. In which case, I'll need to ignore one of the inner stems...
Q: How long are these strings that you want to parse?
A: Each string has between 60 to 150 characters (and I have tens of thousands of them...)
Q: Is each one a self contained sequence like you show in your example, or do they go on for thousands of characters?
A: each sequence is self contained.
Q: Is there always at least one '.' between stems?
A: No.
Q: A full set of rules as to how the parsing should be done would be useful.
A: I agree. But since I don't have even a basic idea on how to start coding this, I thought first to have some help on the beginning and try to tweak with the other cases that will come up before turning back for help.
Q: Do you have the BNF syntax for parsing?
A: No. Your e-mail is the first time I came across it (http://en.wikipedia.org/wiki/Backus–Naur_Form).
You can simplify the task by using run length encoding.
First, convert Str to be a vector of individual characters, then call rle.
split_Str <- strsplit(Str, "")[[1]]
rle_Str <- rle(split_Str)
Run Length Encoding
lengths: int [1:14] 7 2 4 8 4 1 5 7 5 5 ...
values : chr [1:14] ">" "." ">" "." "<" "." ">" "." "<" "." ">" "." "<" "."
Now you just need to parse rle_Str$values, which is perhaps simpler. For instance, an inner stem will always look like ">" "." "<".
I think the main thing that you need to think about is the structure of the data. Does a "." always have to come between ">" and "<", or is it optional? Can you have a "." at the start? Do you need to be able to generalise to stems within stems within stems, or even more complex structures?
Once you have this solved, contructing your list output should be straightforward.
Also, don't worry about using loops, they are in the language because they are useful. Get the thing working first, then worry about speed optimisations (if you really have to) afterwards.

Resources