Python data precision loss when using pandas - python-3.x

I have a function in python like below
def format_quantities(x,prec=2):
X=float(abs(x)) if x is not none else 0
return '{:,.{prec}f}'.format(x, prec=prec)
Which I am using for formatting some. Quantities. I see a weird behaviour using it. It's working fine when used on single value like format_quantities(12345.67, prec=2)
But when I am using this same function in pandas using apply lambda. I am losing precision. I am getting 1234567.67==} 1234567.66. I see this only when I am using this function in pandas apply lambda. Can someone please advise why this occurs. I am not sure if float and abs kinda builtins behave differently in pandas which is causing this precision loss.

Related

pandas style changes ceil value unexpectedly

I'm using style in pandas to display a dataframe consisting of a timestamp on jupyter notebook.
The displayed value, 1623838447949609984 turned out to be different from the input, 1623838447949609899.
pandas version, 1.4.2.
Can someone please explain the reason of the following code and output?
Thanks.
import pandas as pd
pd.DataFrame([[1623838447949609899]]).style
Pandas Styler, within its render script, contains the line return f"{x:,.0f}" when x is an integer.
In python if you execute
>>> "{:.0f}".format(1623838447949609899)
'1623838447949609984'
you obtain the result you cite. I suspect this is due to data storage of integers. Although why it pandas might be converting from a 64 bit int to a 32 bit int is unclear, and not related to Styler

python hack converting floats to decimals

I've written a large program, with dependencies on libraries written in my lab. I'm getting wrong (and somewhat random) results, which are caused by floating-point errors.
I would like to do some python magic and change all floats to decimals, or some other more precise type.
I can't write the full code here, but following is the general flow -
def run(n):
...
x = 0.5 # initializes as float
for _ in range(n):
x = calc(x)
...
return x
What I'm trying to avoid is to go over all initialization in the code and add a manual cast to decimal.
Is there a trick I can do to make python initialize all floats in lines such as x = 0.5 as decimals? or perhaps use a custom interpreter which has more exact floats?
Thanks,
I can't post the full code, hope my edit makes it clearer.
I think you can use this:
from decimal import Decimal
Decimal(variable)

float precision with Numba

I'm currently using Numba to speed up calculations, and so far it's going pretty well. I've run into small issue. print('{B:.4f}'.format(B)) Numba does not support f strings or format. I'm wondering if anyone has been able to work around this? Is there a Numpy function that changes the precision, or changing all float values to float 32bit, or would it be easier to create a function that changes the precision of the float?
I am using #njit, and I understand that #jit defaults to object mode, but #njit significantly decreases run time in my case, which is what my goal is.

How to convert a cupy.ndarray to a scalar?

I'm working with CuPy at the moment. I've noticed something that seems to be a bug, but I'm not sure.
I've noticed that if I use certain math functions, they result in the wrong shape. For example:
if I do this using numpy it returns a scalar with no shape. As it should.
s8 = numpy.abs(2)
However if I use CuPy:
s8 = cupy.abs(2)
It results in cupy.ndarray. When evaluated with s8.shape, it returns ()
The frustrating part is trying to convert cupy.ndarray into a scalar with no shape. I've tried using:
cupy.Squeeze(s8) and s8.item() to return just the scalar. In both cases it returns the value but retains the shape. I've also tried using float(s8) and int(s8). Both of which return the appropriate value type, but the array still remains.
Any thoughts on how to convert this from a cupy.ndarray to a scalar?
In 2022 a working link is https://docs.cupy.dev/en/stable/user_guide/difference.html#reduction-methods but the answer is to explicitly cast the value, e.g. use int(cupy.abs(2)) or float(cupy.abs(2.0)).
Also .item() is used in some code, e.g.:
s8 = cupy.abs(2)
s8 = s8.item()
This is a known behavior in CuPy and is not a bug:
https://docs.cupy.dev/en/stable/reference/difference.html#reduction-methods
Unlike in NumPy (which makes the distinction between scalars and 0-d arrays), all scalars in CuPy are 0-d arrays, otherwise it would unavoidably lead to a data transfer by converting them to Python scalars, which compromises the performance.

How to create random distributions?

I see nearly similar posts, though not quite and I can't seem to adapt any of them to Python. This post suggests an R solution using method="SANN" in optim(): though I can't seem to duplicate it in Python. I want to create a model/plot using (n,min,max,mean,sd,25%,50%,75%,skew, and kurtosis) as 10 known parameters WITHOUT using a distribution family. I can do it fine with a dist family, for example:
x = np.linspace(39,401,100)
plt.plot(x, scipystats.???.pdf(x,p[0],p[1],p[2],p[3],p[4],p[5],p[6],p[7],p[8],p[9])
Then use the model for simulation using a random generator. For example:
ModDist = ???.rvs(p0=n,p1=min,p2=max,p3=m,p4=sd,p5=25,p6=50,p7=75,p8=s,p9=k,rs=3)
It does not have to be only scipy if pandas, numpy, etc.., or combined can work.
Any ideas please?

Resources