Python float() limitation on scientific notation - python-3.x

python 3.6.5
numpy 1.14.3
scipy 1.0.1
cerberus 1.2
I'm trying to convert a string '6.1e-7' to a float 0.00000061 so I can save it in a mongoDb field.
My problem here is that float('6.1e-7') doesn't work (it will work for float('6.1e-4'), but not float('6.1e-5') and more).
Python float
I can't seem to find any information about why this happen, on float limitations, and every examples I found shows a conversion on e-3, never up to that.
Numpy
I installed Numpy to try the float96()/float128() ...float96() doesn't exist and float128() return a float '6.09999999999999983e-07'
Format
I tried 'format(6.1E-07, '.8f')' which works, as it return a string '0.00000061' but when I convert the string to a float (so it can pass cerberus validation) it revert back to '6.1E-7'.
Any help on this subject would be greatly appreciated.
Thanks

'6.1e-7' is a string:
>>> type('6.1e-7')
<class 'str'>
While 6.1e-7 is a float:
>>> type(6.1e-7)
<class 'float'>
0.00000061 is the same as 6.1e-7
>>> 0.00000061 == 6.1e-7
True
And, internally, this float is represented by 0's and 1's. That's just yet another representation of the same float.
However, when converted into a string, they're no longer compared as numbers, they are just characters:
>>> '0.00000061' == '6.1e-7'
False
And you can't compare strings with numbers either:
>>> 0.00000061 == '6.1e-7'
False

Your problem description is too twisted to be precisely understood but I'll try to get some telepathy for this.
In an internal format, numbers don't keep any formatting information, neither integers nor floats do. For an integer 123, you can't restore whether it was presented as "123", " 123 " (with tons of spaces before and after it), 000000123 or +0123. For a floating number, 0.1, +0.0001e00003, 1.000000e-1 and myriads of other forms can be used. Internally, all they shall result in the same number.
(There are some specifics with it when you use IEEE754 "decimal floating", but I am sure it is not your case.)
When saving to a database, internal representation stops having much sense. Instead, the database specifics starts playing role, and it can be quite different. For example, SQL suggests using column types like numeric(10,4), and each value will be converted to decimal format corresponding to the column type (typically, saved on disk as text string, with or without decimal point). In MongoDB, you can keep a floating value either as JSON number (IEEE754 double) or as text. Each variant has its own specifics, but, if you choose text, it is your own responsibility to provide proper formatting each time you form this text. You want to see a fixed-point decimal number with 8 digits after point? OK, no problems: you just shall format according to %.8f on each preparing of such representation.
The issues with representation selection are:
Uniqueness: no different forms should be available for the same value. Otherwise you can, for example, store the same contents under multiple keys, and then mistake older one for a last one.
Ordering awareness: DB should be able to provide natural order of values, for requests like "ceiling key-value pair".
If you always format values using %.8f, you will reach uniqueness, but not ordering. The same for %.g, %.e and really other text format except special (not human readable) ones that are constructed to keep such ordering. If you need ordering, just use numbers as numbers, and don't concentrate on how they look like in text forms.
(And, your problem is not tied with numpy.)

Related

How is this error possible and what can be done about it? "ValueError: invalid literal for int() with base 10: '1.0'"

I'm using Python 3 with the pandas library and some other data science libraries. After running into a variety of subtle type errors while just trying to compare values across two columns that should contain like integer values in a single pandas DataFrame (although the Python interpreter arbitrarily interprets the types as float, string, or series, seemingly almost at random), I'm now running into this inexplicable / nonsensical seeming error while attempting to cast back to integer, after converting the values to string to strip out blank spaces introduced (presumably by pandas internal processing, because my code tries to keep the type int throughout) much further upstream in the program flow.
ValueError: invalid literal for int() with base 10: '1.0'
The main problem I have with this error message is there should be no reason a type conversion to int should ever blow up on the value '1.0.' Just taking the error message at face value, it makes no sense to me and seems like a deeper problem or bug in pandas.
But ignoring more fundamental problems or bugs in Python or pandas, any help resolving this in a generalizable way that will play nice consistently in every reasonable scenario (behaving more like strongly-typed, type-safe code, basically) would be appreciated.
Here's the bit of code where I'm trying to deal with all the various type conversion and blank value issues I've bumped into at once, because I've gone round and around on this a few times in subtly different scenarios, and every time I thought I'd finally bullet-proofed this bit of code and gotten it working as intended in every case, some new unexpected type conversion issue like this crops up.
df[getVariableLabel(outvar)] = df[getVariableLabel(outvar)].astype(str).str.strip()
df['prediction'] = df['prediction'].astype(str).str.strip()
actual = np.array(df[getVariableLabel(outvar)].fillna(-1).astype(int))
// this is the specific line that throws the error
predicted = np.array(df['prediction'].fillna(-1).astype(int))
For further context on the code above, the "df" object is a pandas dataframe passed in by parameter. "getVariableLabel" is a helper function used to format a dynamic field name. Both columns contain simple "1" and "0" values, except where there may be nAn/blanks (which I'm attempting to fill with dummy values).
It doesn't really have to be a conversion to int for my needs. String values would be fine, too, if it were possible to keep pandas/Python from arbitrarily treating one series as ints, and the other, as floats before the conversion to string, which makes value comparisons between the two sets of values fail.
Here's the bit of the call stack dump where pandas is throwing the error, for further context:
File "C:\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py",
line 874, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas_libs\lib.pyx", line 560, in
pandas._libs.lib.astype_intsafe
Solved it for myself with the following substitution, in case anyone else runs into this. It may also have helped that I updated pandas from 1.0.1 to 1.0.2, since that update did includes some type conversion bug fixes, but more likely it was this workaround (where pd is of course the alias for the pandas library):
df[getVariableLabel(outvar)] = pd.to_numeric(df[getVariableLabel(outvar)])
df['prediction'] = pd.to_numeric(df['prediction'])
The original value error message is still confusing and seems like a bug but this worked in my particular case.

Is there any difference between using/not using "astype(np.float)" for narray?

I'm going to import the txt file which contains Numbers only, for some coding practice.
Noticed that i can get the same result with either code_1 or code_2:
code_1 = np.array(pd.read_csv('e:/data.txt', sep='\t', header=None)).astype(np.float)
code_2 = np.array(pd.read_csv('e:/data.txt', sep='\t', header=None))
So I wonder if there is any difference between using or not using .astype(np.float)?
please tell me if there is an similar question. thx a lot.
DataFrame.astype() method is used to cast a pandas object to a specified dtype. astype() function also provides the capability to convert any suitable existing column to categorical type.
The DataFrame.astype() function comes very handy when we want to case a particular column data type to another data type.
In your case, the file is loaded as a DataFrame. The numbers will be loaded as integers or floats depending on the numbers. The astype(np.float) method converts the numbers to floats. On the other hand if the numbers are already of float type, then as you saw there will not be any difference between the two.

Discrepencies in Python hard coding string vs str() methods

Okay. Here is my minimal working example. When I type this into python 3.6.2:
foo = '0.670'
str(foo)
I get
>>>'0.670'
but when I type
foo = 0.670
str(foo)
I get
>>>'0.67'
What gives? It is stripping off the zero, which I believe has to do with representing a float on a computer in general. But by using the str() method, why can it retain the extra 0 in the first case?
You are mixing strings and floats. The string is sequence of code points (one code point represents one character) representing some text and interpreter processing it as a text. The string is always inside single-quotes or double-quotes (e.g. 'Hello'). The float is a number and Python know it so it also know that 1.0000 is the same as 1.0.
In the first case you saved into foo a string. The str() call on string just take the string and return it as is.
In the second case you saved 0.670 as a float (because it's not wrapped in quotes). When Python converting float into a string it always tries create the shortest string possible.
Why Python automatically truncates the trailing zero?
When you try save some real number into computer's memory you have to convert it into binary representation. Usually (but there some exceptions) it's saved in format described in the standard IEEE 754 and Python uses it for floats too.
Let's go to the some example:
from struct import pack
x = -1.53
y = -1.53000
print("X:", pack(">d", x).hex())
print("Y:", pack(">d", y).hex())
The pack() function takes input and based on given format (>d) convert it into bytes. In this case it takes float number and give as how it is saved in memory. If you run the code you will see the x and y are saved in the memory in the same way. The memory doesn't contain information about the format of saved number.
Of course you can add some information about it but:
It would take another memory and it's good practice to use as much memory as you actually need and don't waste it.
What would be result of 0.10 + 0.1 should it be 0.2 or 0.20?
For scientific purposes and significant figures shouldn't it leave the value as the user defined it?
It doesn't matter how you defined the input number. The important is what format you want to use for presenting. As I said the str() always tries create the shortest string possible. str() is good for some simple scripts or tests. For scientific purposes (or for uses where some representation is required) you can convert your numbers to string as you want or need.
For example:
x = -1655484.4584631
y = 42.0
# always print number with sign and exactly 5 numbers from fractional part
print("{:+.5f}".format(x)) # -1655484.45846
print("{:+.5f}".format(y)) # +42.00000
# always print number in scientific format (sign is showed only when the number is negative)
print("{:-2e}".format(x)) # -1.66e+06
print("{:-2e}".format(y)) # 4.20e+01
For more information about formatting numbers and others types look at the Python's documentation.

JsonSlurper avoid trimming last zero in a string

I am using JsonSlurper in groovy to convert a json text to a map.
def slurper = new JsonSlurper();
def parsedInput = slurper.parseText("{amount=10.00}");
Result is
[amount:10.0]
I need result without trimming last zero. Like
[amount:10.00]
Have checked various solutions but this is not getting converted without trimming last zero. Am I missing something here.
One of the ways I have found is to give input as:
{amount="10.00"}
In numbers and maths, 10.00 IS 10.0
They are exactly the same number.
They just have different String representations.
If you need to display 10.0 to the user as 10.00 then that is a conversion thing, as you will need to convert it to a String with 2 decimal places
Something like:
def stringRepresentation = String.format("%.02f", 10.0)
But for any calculations, 10.0 and 10.00 are the same thing
Edit -- Try again...
Right so when you have the json:
{"amount"=10.00}
The value on the right is a floating point number.
To keep the extra zero (which is normally dropped by every sane representation of numbers), you will need to convert it to a String.
To do this, you can use the String.format above (other methods are available).
You cannot keep it as a floating point number with an extra zero.
Numbers don't work like that in every language I can think of... They might do in COBOL from the back of my memory, but that's way off track
The issue (GROOVY-6922) was fixed in Groovy version 2.4.6. With 2.4.6 the scale of the number should be retained.

In Python, how can I print a dictionary containing large numbers without an 'L' being inserted after the large numbers?

I have a dictionary as follows
d={'apples':1349532000000, 'pears':1349532000000}
Doing either str(d) or repr(d) results in the following output
{'apples': 1349532000000L, 'pears': 1349532000000L}
How can I get str, repr, or print to display the dictionary without it adding an L to the numbers?
I am using Python 2.7
You can't, because the L suffix denotes a 64-bit integer. Without it, those numbers are 32-bit integers. Those numbers don't fit into 32 bits because they are too large. If the L suffix was omitted, the result would not be valid Python, and the whole point of repr() is to emit valid Python.
Well this is a little embarrasing, I just found a solution after quite a few attempts :-P
One way to do this (which suits my purpose) is to use json.dumps() to convert the dictionary to a string.
d={'apples':1349532000000, 'pears':1349532000000}
import json
json.dumps(d)
Outputs
'{"apples": 1349532000000, "pears": 1349532000000}'

Resources