I am working on the estimation module of a prototype. The purpose is to send proper seasonality variation parameters to the forecaster module.
Initially, in the booking curve estimation, we were using a formula for day of year seaonality - trigonometric function with 5 orders (fixed). It goes like this:
doy_seasonality = np.exp(z[0]*np.sin(2*np.pi*doy/365.)+z[1]*np.cos(2*np.pi*doy/365.)
+z[2]*np.sin(4*np.pi*doy/365.)+ z[3]*np.cos(4*np.pi*doy/365.)
+z[4]*np.sin(6*np.pi*doy/365.)+ z[5]*np.cos(6*np.pi*doy/365.)
+z[6]*np.sin(8*np.pi*doy/365.)+ z[7]*np.cos(8*np.pi*doy/365.)
+ z[8]*np.sin(10*np.*pi*doy/365.)+ z[9]*np.cos(10*np.pi*doy/365.))
i.e. we had 5 fixed orders [2, 4, 6, 8, 10]
Now, we have found a better way to get the orders through Fast Fourier Transform. Depending on the estimation key we use as input in the simulation, the order array could have different number of values.
For instance, let's say the order array is as follows
orders = [2, 6, 10, 24]
Corresponding to every order value, there would be two values of z (it's a trigonometric parameter - one value for SIN part and one value for COS part). For example, it could look like this
z = [0.08 0.11 0.25 0.01 0.66 0.19 0.45 0.07]
To achieve this, I would need to define a for-loop with two parallel iterations:
z[0] to z[2*length(orders)-1] i.e. `z[0] to z[7]`
and orders[0] to orders[length(orders)-1] i.e. orders[0] to orders[3]
ultimately, the formula should compute this:
doy_seasonality = np.exp(z[0]*np.sin(orders[0]*np.pi*doy/365.)+z[1]*np.cos(orders[0]*np.pi*doy/365.)
+z[2]*np.sin(orders[1]*np.pi*doy/365.)+ z[3]*np.cos(orders[1]*np.pi*doy/365.)
+z[4]*np.sin(orders[2]*np.pi*doy/365.)+ z[5]*np.cos(orders[2]*np.pi*doy/365.)
+z[6]*np.sin(orders[3]*np.pi*doy/365.)+ z[7]*np.cos(orders[3]*np.pi*doy/365.))
I am not able to design the appropriate syntax for this.
doy (day of year) is a vector taking equally spaced values : 1, 2, 3... 364, 365
orders = np.array([2, 6, 10, 24])
z = np.array([0.08, 0.11, 0.25, 0.01, 0.66, 0.19, 0.45, 0.07])
doy = np.arange(365) + 1
s = 0
for k in range(len(orders)):
s += z[2 * k ] * np.sin(orders[k] * np.pi * doy / 365.)
s += z[2 * k + 1] * np.cos(orders[k] * np.pi * doy / 365.)
s = np.exp(s)
Related
What will be the conditions if I want to check whether two geometries are same or not.
I have coordinates of two geometries. Both geometries are same. But First geometry is rotated and second geometry is not rotated, however, input order (coordinates numbers) of the geometry is now changed. How can I prove in python code that both geometries are same even after oriented in different way in 2D space.
#Coordinates for first geometry,
X1 = [0.0, 0.87, 1.37, 1.87, 2.73, 3.6, 4.46, 4.96, 5.46, 4.6, 3.73, 2.87, 2.0, 1.5, 1.0, 0.5, 2.37, 3.23, 4.1]
Y1 = [0.0, 0.5, -0.37, -1.23, -0.73, -0.23, 0.27, -0.6, -1.46, -1.96, -2.46, -2.96, -3.46, -2.6, -1.73, -0.87, -2.1, -1.6, -1.1]
#Coordinates for second geometry,
X2 = [2, 4, 4, 2, 3, 2, 4, 3, 1, 3, 4, 3, 1, 2, 0, 3, 4, 2, 0]
Y2 = [3, 4, 2, 1, 3, 2, 1, 0, 0, 2, 3, 4, 1, 4, 0, 1, 0, 0, 1]
What I have tried so far: I tried to verify using below conditions.
(1) Number of node in both geometries should be same in both geometries.
#---------------------------CONDITION: 1------------------------------------------------#
# Condition: Number of nodes in both geometries should be same
node_number1 = df_1['node_number'].tolist() # converting 'node number' column from first dataframe to a list
node_number2 = df_2['node_number'].tolist() # converting 'X_coordinate' column from second dataframe to a list
(2) Distance between two consecutive nodes (such as Node 1 & node2, Node 2 & Node 3) should be same in both geometries.
#---------------------------CONDITION: 2------------------------------------------------#
# Condition: Distance between two successive nodes must be 1.
x_shift1 = df_1['New_X_coordinate_reformed'].shift(-1) # In first dataframe shift the index by (-1) from X_coordinate values
y_shift1 = df_1['New_Y_coordinate_reformed'].shift(-1) # In first dataframe shift the index by (-1) from Y_coordinate values
# Finding a distance between two coordinates (nodes)
# General formula for this: square root of [(x2 - x1)^2 - (y2 - y1)^2]
nodal_dist_df_1 = np.sqrt((x_shift1 - df_1['New_X_coordinate_reformed'])**2 + (y_shift1 - df_1['New_Y_coordinate_reformed'])**2).dropna().to_list()
nodal_dist_df_1 = [round(num,2) for num in nodal_dist_df_1] # rounding a obtained values in 'nodal_dist_df1' by 2 decimals
x_shift2 = df_2['New_X_coordinate_reformed'].shift(-1) # In second dataframe shift the index by (-1) from X_coordinate values
y_shift2 = df_2['New_Y_coordinate_reformed'].shift(-1) # In second dataframe shift the index by (-1) from Y_coordinate values
# Finding a distance between two coordinates (nodes)
# General formula for this: square root of [(x2 - x1)^2 - (y2 - y1)^2]
nodal_dist_df_2 = np.sqrt((x_shift2 - df_2['New_X_coordinate_reformed'])**2 + (y_shift2 - df_2['New_Y_coordinate_reformed'])**2).dropna().to_list()
nodal_dist_df_2 = [round(num,2) for num in nodal_dist_df_2] # rounding a obtained values in 'nodal_dist_df1' by 2 decimals
(3) Distance to every node of geometry from 1st node should be same in both geometries.
#---------------------------CONDITION: 3------------------------------------------------#
# Condition: Distances between first node to each node should be same in both geometries.
i=0
dist_from_N1_1 = [] # For first dataframe initializng a blank list
for i in range(len(df_1['New_X_coordinate_reformed'])-1):
# Finding a distance between two coordinates (nodes) in first dataframe
# General formula for this: square root of [(x2 - x1)^2 - (y2 - y1)^2] where x1 and y1 will be constant and second coordinate will vary from x2 to x15
d = np.sqrt((df_1['New_X_coordinate_reformed'][i+1]-df_1['New_X_coordinate_reformed'][0])**2 + (df_1['New_Y_coordinate_reformed'][i+1]-df_1['New_Y_coordinate_reformed'][0])**2)
dist_from_N1_1.append(d) # appending a distance value in a list initialized before 'for' loop
i = i+1
dist_from_N1_1 = [round(num,2) for num in dist_from_N1_1] # rounding a obtained values in 'nodal_dist_df1' by 2 decimals
i=0
dist_from_N1_2 = [] # For second dataframe initializng a blank list
for i in range(len(df_2['New_X_coordinate_reformed'])-1):
# Finding a distance between two coordinates (nodes) in second dataframe
# General formula for this: square root of [(x2 - x1)^2 - (y2 - y1)^2] where x1 and y1 will be constant and second coordinate will vary from x2 to x15
d_N2 = np.sqrt((df_2['New_X_coordinate_reformed'][i+1]-df_2['New_X_coordinate_reformed'][0])**2 + (df_2['New_Y_coordinate_reformed'][i+1]-df_2['New_Y_coordinate_reformed'][0])**2)
dist_from_N1_2.append(d_N2)
i = i+1
dist_from_N1_2 = [round(num,2) for num in dist_from_N1_2] # rounding a obtained values in 'nodal_dist_df1' by 2 decimals
Now after analysing three conditions, I have checked them whether they satisfy or not.
#---------------------------Checking of all three condition-----------------------------#
if (len(node_number2) == len(node_number1) and (nodal_dist_df_1 == nodal_dist_df_2) and dist_from_N1_1 == dist_from_N1_2):
print('All three conditions are satisfied.')
print('Hence Yes, two geometries are same.')
else:
print('No, two geometries are not same.')
But this approach gives oncorrect result. Kindly help me by giving your suggestions.
My code generates a number of distributions (I only plotted one below to make it more legible). Y axis - here represents a probability density function and the X axis - is a simple array of values.
In more detail.
Y = [0.02046505 0.10756612 0.24319883 0.30336375 0.22071875 0.0890625 0.015625 0 0 0]
And X is generated using np.arange(0,10,1) = [0 1 2 3 4 5 6 7 8 9]
I want to find the mean of this distribution (i.e where the curve peaks on the X-axis, not the Y value mean. I know how to use numpy packages np.mean to find the mean of Y but its not what I need.
By eye, the mean here is about x=3 but I would like to generate this with a code to make it more accurate.
Any help would be great.
By definition, the mean (actually, the expected value of a random variable x, but since you have the PDF, you could use the expected value) is sum(p(x[j]) * x[j]), where p(x[j]) is the value of the PDF at x[j]. You can implement this as code like this:
>>> import numpy as np
>>> Y = np.array(eval(",".join("[0.02046505 0.10756612 0.24319883 0.30336375 0.22071875 0.0890625 0.015625 0 0 0]".split())))
>>> Y
array([0.02046505, 0.10756612, 0.24319883, 0.30336375, 0.22071875,
0.0890625 , 0.015625 , 0. , 0. , 0. ])
>>> X = np.arange(0, 10)
>>> Y.sum()
1.0
>>> (X * Y).sum()
2.92599253
So the (approximate) answer is 2.92599253.
I have a bumpy array. I want to find the number of points which lies within an epsilon distance from each point.
My current code is (for a n*2 array, but in general I expect the array to be n * m)
epsilon = np.array([0.5, 0.5])
np.array([ 1/np.float(np.sum(np.all(np.abs(X-x) <= epsilon, axis=1))) for x in X])
But this code might not be efficient when it comes to an array of let us say 1 million rows and 50 columns. Is there a better and more efficient method ?
For example data
X = np.random.rand(10, 2)
you can solve this using broadcasting:
1 / np.sum(np.all(np.abs(X[:, None, ...] - X[None, ...]) <= epsilon, axis=-1), axis=-1)
According the answer to this post,
The most classic "correlation" measure between a nominal and an interval ("numeric") variable is Eta, also called correlation ratio, and equal to the root R-square of the one-way ANOVA (with p-value = that of the ANOVA). Eta can be seen as a symmetric association measure, like correlation, because Eta of ANOVA (with the nominal as independent, numeric as dependent) is equal to Pillai's trace of multivariate regression (with the numeric as independent, set of dummy variables corresponding to the nominal as dependent).
I would appreciate if you could let me know how to compute Eta in python.
In fact, I have a dataframe with some numeric and some nominal variables.
Besides, how to plot a heatmap like plot for it?
The answer above is missing root extraction, so as a result, you will receive an eta-squared. However, in the main article (used by User777) that issue has been fixed.
So, there is an article on Wikipedia about the correlation ratio is and how to calculate it. I've created a simpler version of the calculations and will use the example from wiki:
import pandas as pd
import numpy as np
data = {'subjects': ['algebra'] * 5 + ['geometry'] * 4 + ['statistics'] * 6,
'scores': [45, 70, 29, 15, 21, 40, 20, 30, 42, 65, 95, 80, 70, 85, 73]}
df = pd.DataFrame(data=data)
print(df.head(10))
>>> subjects scores
0 algebra 45
1 algebra 70
2 algebra 29
3 algebra 15
4 algebra 21
5 geometry 40
6 geometry 20
7 geometry 30
8 geometry 42
9 statistics 65
def correlation_ratio(categories, values):
categories = np.array(categories)
values = np.array(values)
ssw = 0
ssb = 0
for category in set(categories):
subgroup = values[np.where(categories == category)[0]]
ssw += sum((subgroup-np.mean(subgroup))**2)
ssb += len(subgroup)*(np.mean(subgroup)-np.mean(values))**2
return (ssb / (ssb + ssw))**.5
coef = correlation_ratio(df['subjects'], df['scores'])
print('Eta_squared: {:.4f}\nEta: {:.4f}'.format(coef**2, coef))
>>> Eta_squared: 0.7033
Eta: 0.8386
The answer is provided here:
def correlation_ratio(categories, measurements):
fcat, _ = pd.factorize(categories)
cat_num = np.max(fcat)+1
y_avg_array = np.zeros(cat_num)
n_array = np.zeros(cat_num)
for i in range(0,cat_num):
cat_measures = measurements[np.argwhere(fcat == i).flatten()]
n_array[i] = len(cat_measures)
y_avg_array[i] = np.average(cat_measures)
y_total_avg = np.sum(np.multiply(y_avg_array,n_array))/np.sum(n_array)
numerator = np.sum(np.multiply(n_array,np.power(np.subtract(y_avg_array,y_total_avg),2)))
denominator = np.sum(np.power(np.subtract(measurements,y_total_avg),2))
if numerator == 0:
eta = 0.0
else:
eta = numerator/denominator
return eta
raster1 {{0,1},{1,1}}
raster2 {{1,1},{0,0}}
hi can you explain me how the Ordered Weighted Average method works given the above two rasters step-by-step? thanks
The tricky part about the concept of OWA is the order of the input vector before the operation. Given a vector and a weighting vector:
v = (1, 3, 2, 7)
weights = (0.5, 0.3, 0.1, 0.1)
Notice that, as all the weight vectors, the sum of the components must sum 1. Now, construct v1 ordering the components of v:
v1 = (7, 3, 2, 1)
OK. Now, let's look at the theory of the OWA:
OWA = sum v1_i * weights_i
so in our example we will get something like this:
OWA = 7 * 0.5 + 3 * 0.3 + 2 * 0.1 + 1 * 0.1