float precision with Numba

float precision with Numba - python-3.x

I'm currently using Numba to speed up calculations, and so far it's going pretty well. I've run into small issue. print('{B:.4f}'.format(B)) Numba does not support f strings or format. I'm wondering if anyone has been able to work around this? Is there a Numpy function that changes the precision, or changing all float values to float 32bit, or would it be easier to create a function that changes the precision of the float?
I am using #njit, and I understand that #jit defaults to object mode, but #njit significantly decreases run time in my case, which is what my goal is.

Related

Python data precision loss when using pandas

I have a function in python like below
def format_quantities(x,prec=2):
X=float(abs(x)) if x is not none else 0
return '{:,.{prec}f}'.format(x, prec=prec)
Which I am using for formatting some. Quantities. I see a weird behaviour using it. It's working fine when used on single value like format_quantities(12345.67, prec=2)
But when I am using this same function in pandas using apply lambda. I am losing precision. I am getting 1234567.67==} 1234567.66. I see this only when I am using this function in pandas apply lambda. Can someone please advise why this occurs. I am not sure if float and abs kinda builtins behave differently in pandas which is causing this precision loss.

Does fitting Weibull distribution to data using scipy.stats perform poor?

I am working on fitting Weibull distribution on some integer data and estimating relevant shape, scale, location parameters. However, I noticed poor performance of scipy.stats library while doing so.
So, I took a different direction and checked the fit performance by using the code below. I first create 100 numbers using Weibull distribution with parameters shape=3, scale=200, location=1. Subsequently, I estimate the best distribution fit using fitter library.
from fitter import Fitter
import numpy as np
from scipy.stats import weibull_min
# generate numbers
x = weibull_min.rvs(3, scale=200, loc=1, size=100)
# make them integers
data = np.asarray(x, dtype=int)
# fit one of the four distributions
f = Fitter(data, distributions=["gamma", "rayleigh", "uniform", "weibull_min"])
f.fit()
f.summary()
I expect the best fit to be Weibull distribution. I have tried re-running this test. Sometimes Weibull fit is a good estimate. However, most of the time Weibull fit is reported as the worst result. In this case, the estimated parameters are = (0.13836651040093312, 66.99999999999999, 1.3200752378443505). I assume these parameters correspond to shape, scale, location in order. Below is the summary of the fit procedure.
$ f.summary()
sumsquare_error aic bic kl_div
gamma 0.001601 1182.739756 -1090.410631 inf
rayleigh 0.001819 1154.204133 -1082.276256 inf
uniform 0.002241 1113.815217 -1061.400668 inf
weibull_min 0.004992 1558.203041 -976.698452 inf
Additionally, the following plot is produced.
Also, Rayleigh distribution is a special case of Weibull with shape parameter = 2. So, I expect the resulting Weibull fit to be at least as good as Rayleigh.
Update
I ran the tests above on Linux/Ubuntu 20.04 machine with numpy version 1.19.2 and scipy version 1.5.2. The code above seems to run as expected and return proper results for Weibull distribution on a Mac machine.
I have also tested fitting a Weibull distribution on data x generated above on the Linux machine by using an R library fitdistrplus as:
fit.weib <- fitdist(x, "weibull")
and observed that the estimated shape and scale values are found to be very close to the initially given values. The best guess so far is that the problem is due to some Python-Ubuntu bug/incompatibility.
I can be considered as a newbie in this area. So, I am wondering, am I doing something wrong here? Or is this result somehow expected? Any help is greatly appreciated.
Thank you.

Library fitter doesn't allow to specify parameters for distributions such as a, loc, etc. And strangely, Mac produces better fit while Linux heavily pains the results for best fit, for the same version of Numpy and Scipy. Underlying reasons may include different BLAS-LAPACK algorithms designed for Linux and Mac, https://stackoverflow.com/a/49274049/6806531, or weibull_min may not initialize parameter a = 1 which is discussed online, or default floating-point accuracy. However, one can solve the error inside fitter library. Knowing the fact that weib_min is expon_weib with parameter a is fixed as 1, changing the run function inside of _timed_run function in fitter.py as
def run(self):
try:
if distribution == "exponweib":
self.result = func(args,floc=0,fa = 1, **kwargs)
else:
self.result = func(args, floc=0, **kwargs)
except Exception as err:
self.exc_info = sys.exc_info()
and using exponweib as weib_min gives nearly same results as R fitdist.

I am not familiar with the Fitter library, but in order to draw some conclusions I would suggest:
Retry your code, but by taking size=10,000. In this case, there are sufficient datapoints for the fitting methods to utilize. Theoretically, you would then expect the Weibull to deliver the best fit.
I noticed that the location parameter can sometimes be a pain. You could try to run your fits by fixing the location parameter with floc=1 (i.e. equal to your sampling parameter for location). What do you get? Aditionally, FYI, with MLE, it suffices to take loc=min(x), where x is your dataset. For the exponential distribution, this in fact the MLE of the location parameter. For other distributions I am not sure, but I wouldn't be surprised if this holds for other distributions as well. This would reduce the fitting procedure with 1 parameter.
Lastly, I noticed that if you take small values for location/scale/shape for some distributions, the functions logpdf and logcdf of scipy.stats distributions result in np.inf values. In this scenario, you could perhaps use the Powell optimization algorithm and set bounds on the values of your parameters.

cplex, Is there a way to increase the precision in objective function?

I implemented my mathematical model using Ilog Cplex ver 2.7. the decimal part of the objective function is very small and cplex returns 0 so the cplex abandons part of the objective function (so the objective function is not really optimized). Is there a way to increase the accuracy so that the cplex takes the max of decimal into account?
I have created a file ops to change the decimal prision from 4 to 10 but cplex always does not take into account the figures after the decimal point for you well understands to see the image below.?

In that part you cannot change the display precision. But as said in this post you may see more in the statistics tab.
Or you may use scripting to get any precision you need.

Geometric Series - partial sum (processing efficiency)

so here is my situation. I have to solve a math problem on server end and could expect tens of thousands of requests a second so I'm trying to find the most efficient path to solving the problem.
Client will submit some number, let's call it A, and I need to determine base of the exponent in a geometric series (see below) where the result will be as close to A as possible without exceeding it.
The problem is that in the real-world, each value of the geometric series is rounded, so standard math can't apply.
round(x^1)+round(x^2)+round(x^3).
I can use the partial sum of geometric series equation to find some rough upper and lower limits using:
((x)^(n+1)-1)/((x)-1)
So say x=2 is a lower limit and x=2.03 is an upper limit... and the value i'm solving for is x=2.02392372838123.
So far the only solution i found was to use a recursive function to go through decimals individually testing until I find the number, but the load on the server is too high at the volume of requests I expect. (I am using node.js).
Does anyone have any thoughts or suggestions on a more efficient way to solve this? Again the only reason I can't solve this with math alone (to the best of my skill) is because of the real-world rounding of numbers in the sum.
Thanks.

Computation in fixed point or int

I am using fixed point numbers within my network based on keras framework. My concern is when there are multiplication operations in the network on theano variables, the result is float32 ( even if the numbers supplied are in fixed point). Is there any intrinsic way to get the result in fixed point format, or even int.
If not, what can be alternative approaches?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

float precision with Numba - python-3.x

Related

Python data precision loss when using pandas

Does fitting Weibull distribution to data using scipy.stats perform poor?

cplex, Is there a way to increase the precision in objective function?

Geometric Series - partial sum (processing efficiency)

Computation in fixed point or int

Categories

Resources