py2 vs py3 addition output difference in float format - python-3.x

a = 310.97
b = 233.33
sum= 0.0
for i in [a,b]:
sum += i
print(sum)
py2 o/p: 544.3
py3 o/p: 544.3000000000001
Any way to report py3 output as same as py2 with futurizing? without using round-off ?

You could convert the values to integers before performing the operation and afterwards divide by a constant e.g. 100.0 in this case.
a = 310.97
b = 233.33
c = int(a * 100)
d = int(b * 100)
sum = 0
for i in [c,d]:
sum += i
result = sum / 100.0
print(result) # 544.3
The reason for the difference is the precision in the conversion from float to string in the two versions of Python.
a = 310.97
b = 233.33
sum = 0.0
for i in [a,b]:
sum += i
print("{:.12g}".format(x)) # 544.3
See this answer for further details: Python format default rounding when formatting float number

Related

How Do I Calculate Rolling STD Without Pandas STD or Mean functions?

I would like to incorporate Rolling STD into my Rolling Mean function. I cannot use Pandas std due to the Nan values it produces. I am unsure how to compute the sum of squares. Any suggestions? Do I need list comprehension?
def rolling_mean_std(x, wd_size=3):
rolling_mean, rolling_std = None, None
i = 0
rolling_mean = []
rolling_std = []
while i < len(x) - wd_size + 1:
if i<=wd_size:
this_window = x[0 : 1 + i]
window_average = sum(this_window) / (i+1)
##Edit to include std calculation
window_std = sum((x - window_average) ** 2 for x in this_window)/(i+1)
squirt=math.sqrt(window_std)
rolling_mean.append(window_average)
rolling_std.append(math.sqrt(squirt))
i += 1
else:
this_window = ser[i : i + wd_size]
window_average = sum(this_window) / wd_size
##Edit to include std calculation
window_std = sum((x - window_average) ** 2 for x in this_window)/wd_size
squirt=math.sqrt(window_std)
rolling_mean.append(window_average)
rolling_std.append(math.sqrt(squirt))
i += 1
return np.array(rolling_mean) #np.array(rolling_std)
alist = [2,4,5,7,9,10,21,89,43,90,13,100,1,45]
x = pd.Series(alist)
rolling_mean_std(x, wd_size=3)
You can try to use Pandas rolling function and calculate the std of each rolling window, and append each std to a list to get the rolling standard deviations. Same for the rolling means.
For example:
wd_size = 3
alist = [2,4,5,7,9,10,21,89,43,90,13,100,1,45]
x = pd.Series(alist)
rolling_mean = []
rolling_std = []
for i in x.rolling(window = wd_size):
rolling_mean.append(np.mean(i))
rolling_std.append(np.std(i))

Decimal module is not working with Numpy or Scipy

I want to use Decimal module.
getcontext().prec = 3
d1 = Decimal("0.1")
a = float(0.20052)
b = str(a)
d2 = Decimal(b)
q = d1+d2
print(q) ###0.301
and
getcontext().prec = 1
d1 = Decimal("0.1")
a = float(0.20052)
b = str(a)
d2 = Decimal(b)
q = d1+d2
print(q)##0.3
is working.
However, the codes below is not working. I want "0.0."
np.random.seed(12345678) #fix random seed to get the same result
n1 = 200 # size of first sample
n2 = 300 # size of second sample
rvs1 = stats.norm.rvs(size=n1, loc=0., scale=1)
rvs2 = stats.norm.rvs(size=n2, loc=0.5, scale=1.5)
print(stats.mannwhitneyu(rvs1, rvs2))###MannwhitneyuResult(statistic=25639.0, pvalue=0.0029339910470636116)
p_value = stats.mannwhitneyu(rvs1, rvs2).pvalue
print(p_value)###0.0029339910470636116
p_str = str(p_value)
getcontext().prec = 1
p_n = Decimal(p_str)
print(p_n)###0.0029339910470636116
I saw this question and used item method, but the result has not changed. I want "0.0029."
getcontext().prec = 4
p2 = Decimal(p_value.item())
print(p2)####0.0029339910470636116311682339841127031832002103328704833984375
MacOS 10.14.5; python 3.7.2; jupyter notebook 4.4.0; numpy 1.17.2; scipy 1.2.1
In addition, I want "0.0029" but the results are shown below.
getcontext().prec = 4
p_n = Decimal(p_str)
print(p_n)##0.0029339910470636116
p_n = Decimal(p_str) + 0
print(p_n)##0.002934
p_n = Context(prec=4).create_decimal(p_str)+0
print(p_n)##0.002934
... the result has not changed.
There is a conceptual gap, here.
Changing prec of the current context
changes how e.g. __add__( ... ) behaves.
It does not change how the constructor behaves -- if you supply a
high precision input the ctor will still offer a high precision output.
Consider this demo:
>>> getcontext().prec = 1
>>>
>>> Decimal('.12345')
Decimal('0.12345')
>>>
>>> Decimal('.12345') + 0
Decimal('0.1')
And naturally, the prec attribute has no effect at all on
unrelated math packages that use IEEE-754 FP operations, such as numpy.
If p_value has many digits of precision,
then it is unsurprising that Decimal(p_value)
will report many digits of precision.
Perhaps you'd like to add 0 to that?

Manually implementing approximation functions

I have a dataset from kaggle of 45,253 rows and a single column for temperature in Kelvin for the city of Detroit. It's mean = 282.97, std = 11, min = 243.48, max = 308.05.
This is the result when plotted as a histogram of 100 bins with density=True:
I am expected to write the following two functions and see whichever one approximates the closest to the histogram:
Like this one here using scipy.stats.norm.pdf:
I generated the above image using:
x = np.linspace(dataset.Detroit.min(), dataset.Detroit.max(), 1001)
P_norm = norm.pdf(x, dataset.Detroit.mean(), dataset.Detroit.std())
plot_pdf_single(x, P_norm)
However, whenever I try to implement any of the two approximation functions all of my values for P_norm result in 0s or infs.
This is what I tried:
P_norm = [(1.0/(np.sqrt(2.0*pi*(std*std))))*np.exp(((-x_i-mu)*(-x_i-mu))/(2.0*(std*std))) for x_i in x]
I also broke it down into parts for a single x_i:
part1 = ((-x[0] - mu)*(-x[0] - mu)) / (2.0*(std * std))
part2 = np.exp(part1)
part3 = 1.0 / (np.sqrt(2.0 * pi * (std*std)))
total = part3*part2
I got the following values:
1145.3913234604413
inf
0.036267480036493875
inf
Since both of the equations use the same formula:
def pdf_approximation(x_i, mu, std):
return (1.0 / (np.sqrt(2.0 * pi * (std*std)))) * np.exp((-(x_i-mu)*(x_i-mu)) / (2.0 * (std*std)))
The code for the first approximation is:
mu = 283
std = 11
P_norm = np.array([pdf_approximation(x_i, mu, std) for x_i in x])
plot_pdf_single(x, P_norm)
The code for the second approximation is:
mu1 = 276
std1 = 6
mu2 = 293
std2 = 6.5
P_norm = np.array([(pdf_approximation(x_i, mu1, std1) * 0.5) + (pdf_approximation(x_i, mu2, std2) * 0.5) for x_i in x])
plot_pdf_single(x, P_norm)

None as a output to my code. I wanted a calculated output from my code. Is it wrong to use logical statement after if else?

I am working on a project and my code is returning none as output
I have tried indenting and unindenting the return line but it din't help. Is it wrong if I put the formula just after the if else statement?
def cost_of_ground_shipping(weight):
#weight = float(weight)
if(weight<=2.0):
cost = (weight * 1.5) + 20.00
return cost
elif(weight>2.0) and (weight<=6.0):
cost = (weight * 3.00) + 20.00
return cost
elif(weight>6.0) and (weight>=10.0):
cost = (weight * 4.00) + 20.00
return cost
elif(weight>10.0):
cost = (weight * 4.75) + 20.00
return cost
print(cost_of_ground_shipping(8.4))
I expected the result 53.60
You have a typo: weight >= 10.0 should be weight <= 10.0. Since the function never reached a return statement, it implicitly returned None.
As well, we can improve the code by factoring out the calculation and return statement (see DRY), removing unnecessary parens, using Python's clearer range check syntax (x < y <= z), and using descriptive variable names:
def cost_of_ground_shipping(weight):
base = 20.0
if weight <= 2.0:
multiplier = 1.5
elif 2.0 < weight <= 6.0:
multiplier = 3.0
elif 6.0 < weight <= 10.0:
multiplier = 4.0
elif 10.0 < weight:
multiplier = 4.75
return weight * multiplier + base
print(cost_of_ground_shipping(8.4)) # -> 53.6
If you had written the code like this but still had the typo, the return line would have thrown an error: NameError: name 'multiplier' is not defined, which would be your first hint for starting debugging.

Looping through multiple dataframes does not calculate properly

I am attempting to perform calculations, then loop through the same pandas dataframe and perform the same calculation but with an altered variable (one that increases each time it loops). If the loop range is set to just 1, all rows calculate properly and the new dataframe is created. However, attempting to actually loop the program results in NaN values everywhere except the first row.
Omega loop
for i in range(10):
#Determine first and last Julian dates of data
t1 = df.ix[:0,'jd']
t2 = df.ix[n-1:,'jd']
t2 = t2.reset_index(drop=True)
tj = t2-t1
#Iterate over each observation within each star file
jd = df['jd']
dmag = df['dmag']
sinw = np.sin(2*omega*jd)
sum1 = sinw.sum()
cosw = np.cos(2*omega*jd)
sum2 = cosw.sum()
#Calculate tau
tau = ((np.arctan(sum1/sum2))/(2*omega))
avgdmag = dmag.sum()/n
#Calculate sample variance
tot = (df['dmag']-avgdmag)**2
tot2 = tot.sum()
var = tot2/(n-1)
#Calculate sums for power series
sum3 = sum3 + ((dmag - avgdmag)*np.cos(omega*(jd-tau)))
sum4 = sum4 + (np.cos(omega*(jd-tau)))**2
sum5 = sum5 + ((dmag - avgdmag)*np.sin(omega*(jd-tau)))
sum6 = sum6 + (np.sin(omega*(jd-tau)))**2
#Calculate power series and normalized power series
px = (((sum3**2)/sum4)+((sum5**2)/sum6))/2
pn = px/var
#Step through sequential frequencies
omega = omega + (1/tj)
I also received a runtime warning from NumPy caused by the omega term at the end. I disabled "invalid" warnings as it was not causing an issue with the actual calculations. The first dataframe that incorrectly computes is sinw and cosw. And all subsequently calculated dataframes have NaN values.
It is because your tj is a pd.Series of length 1, not scalar as you would expect. After the first loop, omega = omega + 1/tj becomes a Series of length 1 (with 0 as index). Then in the 2nd loop, tau = ((np.arctan(sum1/sum2))/(2*omega)) also becomes such a Series. When updating sum3, jd - tau (a Series of length n minus a Series of length 1) gives you a Series with all NaN except at index 0 where both series match. After that all subsequent Series has lots of NaNs.
The solution is to calculate tj as a scalar, such as
tj = df.loc[n-1,'jd'] - df.loc[0,'jd'] (assuming n = len(df)).
Anyway, your piece of code can be re-written for readability.
tj = df.loc[n-1,'jd'] - df.loc[0,'jd'] #tj is loop invariant
for _ in range(10):
sum1 = np.sin(2*omega*df['jd']).sum()
sum2 = np.cos(2*omega*df['jd']).sum()
tau = np.arctan(sum1/sum2)/(2*omega)
avgdmag = df['dmag'].mean()
var = df['dmag'].var() #unbiased sample variance
sum3 += ((df['dmag'] - avgdmag)*np.cos(omega*(df['jd']-tau)))
sum4 += (np.cos(omega*(df['jd']-tau)))**2
sum5 += ((df['dmag'] - avgdmag)*np.sin(omega*(df['jd']-tau)))
sum6 += (np.sin(omega*(df['jd']-tau)))**2
px = (((sum3**2)/sum4)+((sum5**2)/sum6))/2
pn = px/var
omega += 1/tj

Resources