I have a function in python like below
def format_quantities(x,prec=2):
X=float(abs(x)) if x is not none else 0
return '{:,.{prec}f}'.format(x, prec=prec)
Which I am using for formatting some. Quantities. I see a weird behaviour using it. It's working fine when used on single value like format_quantities(12345.67, prec=2)
But when I am using this same function in pandas using apply lambda. I am losing precision. I am getting 1234567.67==} 1234567.66. I see this only when I am using this function in pandas apply lambda. Can someone please advise why this occurs. I am not sure if float and abs kinda builtins behave differently in pandas which is causing this precision loss.
I am working on fitting Weibull distribution on some integer data and estimating relevant shape, scale, location parameters. However, I noticed poor performance of scipy.stats library while doing so.
So, I took a different direction and checked the fit performance by using the code below. I first create 100 numbers using Weibull distribution with parameters shape=3, scale=200, location=1. Subsequently, I estimate the best distribution fit using fitter library.
from fitter import Fitter
import numpy as np
from scipy.stats import weibull_min
# generate numbers
x = weibull_min.rvs(3, scale=200, loc=1, size=100)
# make them integers
data = np.asarray(x, dtype=int)
# fit one of the four distributions
f = Fitter(data, distributions=["gamma", "rayleigh", "uniform", "weibull_min"])
f.fit()
f.summary()
I expect the best fit to be Weibull distribution. I have tried re-running this test. Sometimes Weibull fit is a good estimate. However, most of the time Weibull fit is reported as the worst result. In this case, the estimated parameters are = (0.13836651040093312, 66.99999999999999, 1.3200752378443505). I assume these parameters correspond to shape, scale, location in order. Below is the summary of the fit procedure.
$ f.summary()
sumsquare_error aic bic kl_div
gamma 0.001601 1182.739756 -1090.410631 inf
rayleigh 0.001819 1154.204133 -1082.276256 inf
uniform 0.002241 1113.815217 -1061.400668 inf
weibull_min 0.004992 1558.203041 -976.698452 inf
Additionally, the following plot is produced.
Also, Rayleigh distribution is a special case of Weibull with shape parameter = 2. So, I expect the resulting Weibull fit to be at least as good as Rayleigh.
Update
I ran the tests above on Linux/Ubuntu 20.04 machine with numpy version 1.19.2 and scipy version 1.5.2. The code above seems to run as expected and return proper results for Weibull distribution on a Mac machine.
I have also tested fitting a Weibull distribution on data x generated above on the Linux machine by using an R library fitdistrplus as:
fit.weib <- fitdist(x, "weibull")
and observed that the estimated shape and scale values are found to be very close to the initially given values. The best guess so far is that the problem is due to some Python-Ubuntu bug/incompatibility.
I can be considered as a newbie in this area. So, I am wondering, am I doing something wrong here? Or is this result somehow expected? Any help is greatly appreciated.
Thank you.
Library fitter doesn't allow to specify parameters for distributions such as a, loc, etc. And strangely, Mac produces better fit while Linux heavily pains the results for best fit, for the same version of Numpy and Scipy. Underlying reasons may include different BLAS-LAPACK algorithms designed for Linux and Mac, https://stackoverflow.com/a/49274049/6806531, or weibull_min may not initialize parameter a = 1 which is discussed online, or default floating-point accuracy. However, one can solve the error inside fitter library. Knowing the fact that weib_min is expon_weib with parameter a is fixed as 1, changing the run function inside of _timed_run function in fitter.py as
def run(self):
try:
if distribution == "exponweib":
self.result = func(args,floc=0,fa = 1, **kwargs)
else:
self.result = func(args, floc=0, **kwargs)
except Exception as err:
self.exc_info = sys.exc_info()
and using exponweib as weib_min gives nearly same results as R fitdist.
I am not familiar with the Fitter library, but in order to draw some conclusions I would suggest:
Retry your code, but by taking size=10,000. In this case, there are sufficient datapoints for the fitting methods to utilize. Theoretically, you would then expect the Weibull to deliver the best fit.
I noticed that the location parameter can sometimes be a pain. You could try to run your fits by fixing the location parameter with floc=1 (i.e. equal to your sampling parameter for location). What do you get? Aditionally, FYI, with MLE, it suffices to take loc=min(x), where x is your dataset. For the exponential distribution, this in fact the MLE of the location parameter. For other distributions I am not sure, but I wouldn't be surprised if this holds for other distributions as well. This would reduce the fitting procedure with 1 parameter.
Lastly, I noticed that if you take small values for location/scale/shape for some distributions, the functions logpdf and logcdf of scipy.stats distributions result in np.inf values. In this scenario, you could perhaps use the Powell optimization algorithm and set bounds on the values of your parameters.
I implemented my mathematical model using Ilog Cplex ver 2.7. the decimal part of the objective function is very small and cplex returns 0 so the cplex abandons part of the objective function (so the objective function is not really optimized). Is there a way to increase the accuracy so that the cplex takes the max of decimal into account?
I have created a file ops to change the decimal prision from 4 to 10 but cplex always does not take into account the figures after the decimal point for you well understands to see the image below.?
In that part you cannot change the display precision. But as said in this post you may see more in the statistics tab.
Or you may use scripting to get any precision you need.
so here is my situation. I have to solve a math problem on server end and could expect tens of thousands of requests a second so I'm trying to find the most efficient path to solving the problem.
Client will submit some number, let's call it A, and I need to determine base of the exponent in a geometric series (see below) where the result will be as close to A as possible without exceeding it.
The problem is that in the real-world, each value of the geometric series is rounded, so standard math can't apply.
round(x^1)+round(x^2)+round(x^3).
I can use the partial sum of geometric series equation to find some rough upper and lower limits using:
((x)^(n+1)-1)/((x)-1)
So say x=2 is a lower limit and x=2.03 is an upper limit... and the value i'm solving for is x=2.02392372838123.
So far the only solution i found was to use a recursive function to go through decimals individually testing until I find the number, but the load on the server is too high at the volume of requests I expect. (I am using node.js).
Does anyone have any thoughts or suggestions on a more efficient way to solve this? Again the only reason I can't solve this with math alone (to the best of my skill) is because of the real-world rounding of numbers in the sum.
Thanks.
I am using fixed point numbers within my network based on keras framework. My concern is when there are multiplication operations in the network on theano variables, the result is float32 ( even if the numbers supplied are in fixed point). Is there any intrinsic way to get the result in fixed point format, or even int.
If not, what can be alternative approaches?