Perl compare operators and stringified "numbers" - string

I've been working a lot lately with perl, still I dont really know how <,>,>=,=<, ne,gt, etc.. on stringified "numbers", by "number" I mean something like: '1.4.5.6.7.8.0'
correct me If I'm wrong, the following returns true:
if ('1.4.5' > '8.7.8');
because both will be coerced to true (not an empty string).
but, how does ne,gt,etc string operators work on such numbers?
basically I'm trying to compare version numbers consisted of the following form:
1.3.4.0.2
I can make a numerical comparison of each digit, but before, I ranther want to know of the
string comparing operators perform on such strings.
Thanks,

First: Please use warnings all the time. You would have realized the following at once:
$ perl -wle 'print 1 unless "1.4.5" > "8.7.8"'
Argument "8.7.8" isn't numeric in numeric gt (>) at -e line 1.
Argument "1.4.5" isn't numeric in numeric gt (>) at -e line 1.
Perl v5.9.0 came distributed with version. And this module makes it very easy to compare version numbers:
use warnings;
use version;
my ($small, $large) = (version->parse('1.4.5'), version->parse('8.7.8'));
print "larger\n" if $small > $large;
print "smaller\n" if $small < $large;

A string comparison will only work if every number between the dots has the same length. A string comparison has no knowledge of number and will begin to compare dots and digits (as they are both characters in a string).
There a CPAN module that does exactly what you are looking for: Sort::Versions

When you compare strings using numerical relation operators <, >, etc., Perl issues a warning if you use warnings. However, Perl will still attempt to convert the strings into numbers. If the string starts with digits, Perl will use these, otherwise the string equates to 0. In your example comparing '1.4.5' and '8.7.8' has the same effect as comparing numbers 1.4 and 8.7.
But for ne, gt, etc. it really doesn't matter if your strings consist of numbers or anything else (including dots). Therefore:
print "greater" if '2.3.4' gt '10.1.2' # prints 'greater' because '2' > '1' stringwise
print "greater" if '02.3.4' gt '10.1.2' # prints nothing because '0' < '1' stringwise
Therefore you cannot use neither >, <, etc. nor gt, lt, etc. for version comparison, you have to choose different approach, as proposed in another answers, for example.

Not sure on the overhead of this, but you might try Sort::Naturally. And particularly, the ncmp operator.

As #tent pointed out, #SebastianStumpf's solution is close, but not quite right because:
>perl -Mversion -e 'my #n = ( "1.10", "1.9" ); print "$n[0] is " . ( version->parse($n[0]) > version->parse($n[1]) ? "larger" : "smaller" ) . " than $n[1]\n";'
1.10 is smaller than 1.9
Luckily this is easily solved following the hint in version's documentation:
The leading 'v' is now strongly recommended for clarity, and will
throw a warning in a future release if omitted.
>perl -Mversion -e 'my #n = ( "1.10", "1.9" ); print "$n[0] is " . ( version->parse("v$n[0]") > version->parse("v$n[1]") ? "larger" : "smaller" ) . " than $n[1]\n";'
1.10 is larger than 1.9

Related

XQuery - Using sum() returns NaN for string values

Trying to sum the total earnings from the top NBA players in 2012-2013 from this wikipedia page: https://en.wikipedia.org/wiki/Highest-paid_NBA_players_by_season
Here is my code:
sum(
let $doc := doc("NBApaid.xml")//table
for $x in $doc
where $x/tr/td/h2/span/#id ="2012.E2.80.932013"
for $y in $x/tr/td
where $y/h2/span = "2012–2013"
for $z in $y//td
where starts-with($z,"$")
let $a := substring($z, 2,10)
return number($a)
)
And the output is:
NaN
The problem here is that thenumber($a) returns a whole column of NaNs.
When I only return $a before converting it using number(), the output looks like this:
30,453,805
20,907,128
19,948,799
19,752,645
19,444,503
19,285,850
19,067,500
19,067,500
18,673,000
18,668,431
How come I can't convert the strings?
Use number(translate(xxx, ',', ''))
The problem is that your output strings are not real numbers, because the thousand separator , is not part of an XQuery number. So you will have to remove the separator from the string. For this, you can either use translate() as #MichaelKay rightfully suggests.
You could also use replace(), the difference between the two functions being that translate() only replaces single characters (which is all you need in this case) and replace can use regex. However, I personally feel that replace is a much more logical name and easier to read, so I personally tend to not use translate().
Also, if your processor supports XQuery 3.1 you can use the arrow notation and write it like this:
let $a := substring($z, 2, 10) => replace(",", "")

Perl Morgan and a String?

I am trying to solve this problem on hackerrank:
So the problem is:
Jack and Daniel are friends. Both of them like letters, especially upper-case ones.
They are cutting upper-case letters from newspapers, and each one of them has their collection of letters stored in separate stacks.
One beautiful day, Morgan visited Jack and Daniel. He saw their collections. Morgan wondered what is the lexicographically minimal string, made of that two collections. He can take a letter from a collection when it is on the top of the stack.
Also, Morgan wants to use all the letters in the boys' collections.
This is my attempt in Perl:
#!/usr/bin/perl
use strict;
use warnings;
chomp(my $n=<>);
while($n>0){
chomp(my $string1=<>);
chomp(my $string2=<>);
lexi($string1,$string2);
$n--;
}
sub lexi{
my($str1,$str2)=#_;
my #str1=split(//,$str1);
my #str2=split(//,$str2);
my $final_string="";
while(#str2 && #str1){
my $st2=$str2[0];
my $st1=$str1[0];
if($st1 le $st2){
$final_string.=$st1;
shift #str1;
}
else{
$final_string.=$st2;
shift #str2;
}
}
if(#str1){
$final_string=$final_string.join('',#str1);
}
else{
$final_string=$final_string.join('',#str2);
}
print $final_string,"\n";
}
Sample Input:
2
JACK
DANIEL
ABACABA
ABACABA
The first line contains the number of test cases, T.
Every next two lines have such format: the first line contains string A, and the second line contains string B.
Sample Output:
DAJACKNIEL
AABABACABACABA
But for Sample test-case it is giving right results while it is giving wrong results for other test-cases. One case for which it gives an incorrect result is
1
AABAC
AACAB
It outputs AAAABACCAB instead of AAAABACABC.
I don't know what is wrong with the algorithm and why it is failing with other test cases?
Update:
As per #squeamishossifrage comments If I add
($str1,$str2)=sort{$a cmp $b}($str1,$str2);
The results become same irrespective of user-inputs but still the test-case fails.
The problem is in your handling of the equal characters. Take the following example:
ACBA
BCAB
When faced with two identical characters (C in my example), you naïvely chose the one from the first string, but that's not always correct. You need to look ahead to break ties. You may even need to look many characters ahead. In this case, next character after C of the second string is lower than the next character of the first string, so you should take the C from the second string first.
By leaving the strings as strings, a simple string comparison will compare as many characters as needed to determine which character to consume.
sub lexi {
my ($str1, $str2) = #_;
utf8::downgrade($str1); # Makes sure length() will be fast
utf8::downgrade($str2); # since we only have ASCII letters.
my $final_string = "";
while (length($str2) && length($str1)) {
$final_string .= substr($str1 le $str2 ? $str1 : $str2, 0, 1, '');
}
$final_string .= $str1;
$final_string .= $str2;
print $final_string, "\n";
}
Too little rep to comment thus the answer:
What you need to do is to look ahead if the two characters match. You currently do a simple le match and in the case of
ZABB
ZAAA
You'll get ZABBZAA since the first match Z will be le Z. So what you need to do (a naive solution which most likely won't be very effective) is to keep looking as long as the strings/chars match so:
Z eq Z
ZA eq ZA
ZAB gt ZAA
and at that point will you know that the second string is the one you want to pop from for the first character.
Edit
You updated with sorting the strings, but like I wrote you still need to look ahead. The sorting will solve the two above strings but will fail with these two:
ZABAZA
ZAAAZB
ZAAAZBZABAZA
Because here the correct answer is ZAAAZABAZAZB and you can't find that will simply comparing character per character

as.numeric with comma decimal separators?

I have a large vector of strings of the form:
Input = c("1,223", "12,232", "23,0")
etc. That's to say, decimals separated by commas, instead of periods. I want to convert this vector into a numeric vector. Unfortunately, as.numeric(Input) just outputs NA.
My first instinct would be to go to strsplit, but it seems to me that this will likely be very slow. Does anyone have any idea of a faster option?
There's an existing question that suggests read.csv2, but the strings in question are not directly read in that way.
as.numeric(sub(",", ".", Input, fixed = TRUE))
should work.
The readr package has a function to parse numbers from strings. You can set many options via the locale argument.
For comma as decimal separator you can write:
readr::parse_number(Input, locale = readr::locale(decimal_mark = ","))
scan(text=Input, dec=",")
## [1] 1.223 12.232 23.000
But it depends on how long your vector is. I used rep(Input, 1e6) to make a long vector and my machine just hangs. 1e4 is fine, though. #adibender's solution is much faster. If we run on 1e4, a lot faster:
Unit: milliseconds
expr min lq median uq max neval
adibender() 6.777888 6.998243 7.119136 7.198374 8.149826 100
sebastianc() 504.987879 507.464611 508.757161 510.732661 517.422254 100
Also, if you are reading in the raw data, the read.table and all the associated functions have a dec argument. eg:
read.table("file.txt", dec=",")
When all else fails, gsub and sub are your friends.
Building on #adibender solution:
input = '23,67'
as.numeric(gsub(
# ONLY for strings containing numerics, comma, numerics
"^([0-9]+),([0-9]+)$",
# Substitute by the first part, dot, second part
"\\1.\\2",
input
))
I guess that is a safer match...
As stated by , it's way easier to do this while importing a file.
Thw recently released reads package has a very useful features, locale, well explained here, that allows the user to import numbers with comma decimal mark using locale = locale(decimal_mark = ",") as argument.
The answer by adibender does not work when there are multiple commas.
In that case the suggestion from use554546 and answer from Deena can be used.
Input = c("1,223,765", "122,325,000", "23,054")
as.numeric(gsub("," ,"", Input))
ouput:
[1] 1223765 122325000 23054
The function gsub replaces all occurences. The function sub replaces only the first.

Getting precision of a float in Perl?

Let's say I had a Perl variable:
my $string = "40.23";
my $convert_to_num = $string * 1;
Is there a way I can find the precision of this float value? My solution so far was to simply just loop through the string, find the first instance of '.', and just start counting how many decimal places, returning 2 in this case. I'm just wondering if there was a more elegant or built-in function for this sort of thing. Thanks!
Here is an answer for "number of things after the period" in $nstring
length(($nstring =~ /\.(.*)/)[0]);
The matching part first finds . (\.), then matches everything else (.*). Since .* is in parentheses, it is returned as the first array element ([0]). Then I count how many with the length() function.
Anything you do in Perl with plain variables will be dependent on the compiler and hardware you use. If you really care about the precision, use
use "Math::BigFloat";
And set the desired properties. The number of digits is more properly termed accuracy in Math::BigFloat.
use Math::BigFloat;
Math::BigFloat->accuracy(12);
$n = new Math::BigFloat "52.12";
print "Accuracy of $n is ", $n->accuracy(), " length ",scalar($n->length()),"\n";
Will return
Accuracy of 52.1200000000 is 12 length 4

Any other ways to emulate `tr` in J?

I picked up J a few weeks ago, about the same time the CodeGolf.SE beta opened to the public.
A recurrent issue (of mine) when using J over there is reformatting input and output to fit the problem specifications. So I tend to use code like this:
( ] ` ('_'"0) ) #. (= & '-')
This one untested for various reasons (edit me if wrong); intended meaning is "convert - to _". Also come up frequently: convert newlines to spaces (and converse), merge numbers with j, change brackets.
This takes up quite a few characters, and is not that convenient to integrate to the rest of the program.
Is there any other way to proceed with this? Preferably shorter, but I'm happy to learn anything else if it's got other advantages. Also, a solution with an implied functional obverse would relieve a lot.
It sometimes goes against the nature of code golf to use library methods, but in the string library, the charsub method is pretty useful:
'_-' charsub '_123'
-123
('_-', LF, ' ') charsub '_123', LF, '_stuff'
-123 -stuff
rplc is generally short for simple replacements:
'Test123' rplc 'e';'3'
T3st123
Amend m} is very short for special cases:
'*' 0} 'aaaa'
*aaa
'*' 0 2} 'aaaa'
*a*a
'*&' 0 2} 'aaaa'
*a&a
but becomes messy when the list has to be a verb:
b =: 'abcbdebf'
'L' (]g) } b
aLcLdeLf
where g has to be something like g =: ('b' E. ]) # ('b' E. ]) * [: i. #.
There are a lot of other "tricks" that work on a case by case basis. Example from the manual:
To replace lowercase 'a' through 'f' with uppercase 'A'
through 'F' in a string that contains only 'a' through 'f':
('abcdef' i. y) { 'ABCDEF'
Extending the previous example: to replace lowercase 'a' through
'f' with uppercase 'A' through 'F' leaving other characters unchanged:
(('abcdef' , a.) i. y) { 'ABCDEF' , a.
I've only dealt with the newlines and CSV, rather than the general case of replacement, but here's how I've handled those. I assume Unix line endings (or line endings fixed with toJ) with a final line feed.
Single lines of input: ".{:('1 2 3',LF) (Haven't gotten to use this yet)
Rectangular input: (".;._2) ('1 2 3',LF,'4 5 6',LF)
Ragged input: probably (,;._2) or (<;._2) (Haven't used this yet either.)
One line, comma separated: ".;._1}:',',('1,2,3',LF)
This doesn't replace tr at all, but does help with line endings and other garbage.
You might want to consider using the 8!:2 foreign:
8!:2]_1
-1

Resources