Sequential use of Bayes theorem - statistics

Name of this process ???
I have been working on generating ratings from reviews by repeatedly applying Bayes theorem. I start with the prior distribution of rating:
P(rating = 5) = 0.2
P(rating = 4) = 0.2
P(rating = 3) = 0.2
P(rating = 2) = 0.2
P(rating = 1) = 0.2
I have some knowledge about the world about the reviews. I classify each review as positive or negative. Based on that, I have the following likelihood distribution:
P(review is +ve | rating = 5) = 0.9 (because it is highly likely that the review is positive if a rating is 5)
P(review is +ve | rating = 4) = 0.7
P(review is +ve | rating = 3) = 0.5
P(review is +ve | rating = 2) = 0.3
P(review is +ve | rating = 1) = 0.1 (because it is least likely that the review is positive if a rating is 1)
I update my initial belief about the ratings using:
P(rating = 5 | a review is +ve) = P(review is +ve|rating = 5) * P(rating = 5) / for each rating r, sum over[ P(review is +ve|rating = r) * P(rating = r) ]
I repeat this for each of the ratings.
Then, I get posterior distribution
P(rating = 5 | review is +ve) = some value
P(rating = 4 | review is +ve) = some value
P(rating = 3 | review is +ve) = some value
P(rating = 2 | review is +ve) = some value
P(rating = 1 | review is +ve) = some value
I repeatedly apply this bayes theorem as I observe more reviews:
P(rating = i | review is -ve, review is +ve) = P(review is +ve|rating = 5, review is +ve) * P(rating = i | review is +ve) / for each rating r, sum over[ P(review is +ve|rating = r, review is +ve) * P(rating = r | review is +ve) ]
where, i = 1, 2, 3, 4, 5
Essentially, I sequentially apply the bayes theorem and use the posterior distribution of previous update as prior for the next update.
I want to know if there is such method of getting posterior distribution after sequentially applying bayes rule. If there exists such method, do you know the name of this method? (I highly believe that there exists such method) If so, can you provide me a link to the paper that describes this method?
I would appreciate any help regarding this.
I just wanted to know the name of this process, and reference links that I can use to show that this method is valid.

Related

Negative degrees of fredom when using GEKKO python

I'm trying to solve the optimization problem as above.
And my code is as belows.
It worked, but I got the negative degrees of freedom problem.
And the objective value was also negative, which I did not expect to be. I expected the positive one.
I can't understand why this happened and don't know how this problem can be solved.
Can somebody give me a suggestion?
Code
# Import package
from gekko import GEKKO
import numpy as np
# Define parameters
P_CO = 600 # $/tonCO
beta_CO2 = 1 # no unit
P_CO2 = 60 # $/tonCO2eq
E_ref = 3.1022616 # tonCO2eq/tonCO
E_dir = -1.600570692 # tonCO2eq/tonCO
E_indir_others = 0.3339226804 # tonCO2eq/tonCO
E_indir_elec_cons = 18.46607256 # GJ/tonCO
C1_CAPEX = 285695 # no unit
C2_CAPEX = 188.42 # no unit
C1_FOX = 82282 # no unit
C2_FOX = 24.094 # no unit
C1_ROX = 4471.5 # no unit
C2_ROX = 96.034 # no unit
C1_UOX = 1983.7 # no unit
C2_UOX = 249.79 # no unit
r = 0.08 # discount rate
N = 10 # number of scenarios
T = 30 # total time period
GWP_init = 0.338723235 # 2020 Electricity GWP in EU 27 countries
theta_max = 1600000 # Max capacity
# Function to make GWP_EU matrix (TxN matrix)
def Electricity_GWP(GWP_init, n_years, num_episodes):
GWP_mean = 0.36258224*np.exp(-0.16395611*np.arange(1, n_years+2)) + 0.03091272
GWP_mean = GWP_mean.reshape(-1,1)
GWP_Yearly = np.tile(GWP_mean, num_episodes)
noise = np.zeros((n_years+1, num_episodes))
stdev2050 = GWP_mean[-1] * 0.25
stdev = np.arange(0, stdev2050 * (1 + 1/n_years), stdev2050/n_years)
for i in range(n_years+1):
noise[i,:] = np.random.normal(0, stdev[i], num_episodes)
GWP_forecast = GWP_Yearly + noise
return GWP_forecast
GWP_EU = Electricity_GWP(GWP_init, T, N) # (T+1)*N matrix
GWP_EU = GWP_EU[1:,:] # T*N matrix
print(np.shape(GWP_EU))
# Build Gekko model
m = GEKKO(remote=False)
theta = m.Array(m.Var, N, lb=0, ub=theta_max)
demand = np.ones((T,1))
demand[0] = 8031887.589
for k in range(1,11):
demand[k] = demand[k-1] * 1.026
for k in range(11,21):
demand[k] = demand[k-1] * 1.016
for k in range(21,T):
demand[k] = demand[k-1] * 1.011
demand = 0.12 * demand
demand = np.tile(demand, N) # T*N matrix
print(np.shape(demand))
obj = m.sum([m.sum([((1/(1+r))**(t+1))*((P_CO*m.min3(demand[t,s], theta[s])) \
+ (beta_CO2*P_CO2*m.min3(demand[t,s], theta[s])*(E_ref-E_dir-E_indir_others-E_indir_elec_cons*GWP_EU[t,s])) \
- (C1_CAPEX+C2_CAPEX*theta[s]+C1_FOX+C2_FOX*theta[s])-(C1_ROX+C2_ROX*m.min3(demand[t,s], theta[s])+C1_UOX+C2_UOX*m.min3(demand[t,s], theta[s]))) for t in range(T)]) for s in range(N)])
m.Maximize(obj/N)
m.solve()
Output message
(30, 10)
(30, 10)
----------------------------------------------------------------
APMonitor, Version 1.0.0
APMonitor Optimization Suite
----------------------------------------------------------------
--------- APM Model Size ------------
Each time step contains
Objects : 11
Constants : 0
Variables : 5121
Intermediates: 0
Connections : 321
Equations : 3901
Residuals : 3901
Number of state variables: 5121
Number of total equations: - 3911
Number of slack variables: - 2400
---------------------------------------
Degrees of freedom : -1190
* Warning: DOF <= 0
----------------------------------------------
Steady State Optimization with APOPT Solver
----------------------------------------------
Iter: 1 I: 0 Tm: 18.61 NLPi: 5 Dpth: 0 Lvs: 0 Obj: -1.87E+09 Gap: 0.00E+00
Successful solution
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 18.619200000000003 sec
Objective : -1.8677021320161405E+9
Successful solution
---------------------------------------------------
The negative DOF warning is because of the slack variables that are created when using the min3() function. It is only a warning that if all of the inequalities are active then this could lead to an over-specified system of equations (more equations than variables). If there is a successful solution then this warning can be ignored.
The negative objective is because most solvers require a minimization of the objective. Gekko automatically converts m.Maximize(obj) to m.Minimize(-obj). This is an equivalent objective. If you'd like to report the maximization and the positive objective, use the following at the end:
print('Objective: ',-m.options.OBJFCNVAL)

FiPy for charged particle flow

Premise
I am trying to solve a set of coupled PDEs that describes the diffusion of charged particles with different diffusion coefficients using FiPy. The ultimate goal is to obtain the concentration profile for both species and the electric field.
The geometry is an infinitely long cylinder with radius R. I want to use a non-uniform grid with more points close to the domain walls.
Charged particles diffuse from the center of the domain (left boundary) to the walls of the domain (right boundary). This translates to a Dirichlet boundary condition (B.C.) at the left boundary where both species concentration = 0, and a Neumann B.C. at the right boundary where species flux are 0 to describe radial symmetry. Because the charged species diffuse at different rates, there is an electric field that arises from the space charge. The electric field accelerates the slower species and decelerates the faster species proportional to the field magnitude.
P is the positively charged species concentration and N is negatively charged charged species concentration. E is the space charge electric field.
Issue
I can't seem to get a sensible solution from my code and I think it may be related on how I casted the gradient/divergence terms as a ConvectionTerm:
from fipy import *
import scipy.constants as constant
from fipy.tools import numerix
import numpy as np
## Defining physical constants
pi = constant.pi
m_argon = 6.6335e-26 # kg
k_b = constant.k # J/K
e_0 = constant.epsilon_0 # F/m
q_e = constant.elementary_charge # C
m_e = constant.electron_mass # kg
planck = constant.h
def char_diff_length(L,R):
"""Characteristic diffusion length in a cylinder.
Used for determining the ambipolar diffusion coefficient.
ref: https://doi.org/10.6028/jres.095.035"""
a = (pi/L)**2
b = (2.405/R)**2
c = (a+b)**(1/2)
return c
def L_Debye(ne,Te):
"""Electron Debye screening length given in m.
ne is in #/m3, Te is in K."""
if ne < 3.3e-5:
ne = 3.3e-5
return (((e_0*k_b*Te)/(ne*q_e**2)))**(1/2)
## Setting system parameters
# Operation parameters
Pressure = 1.e5 # ambient pressure Pa
T_g = 400. # background gas temperature K
n_g = Pressure/k_b/T_g # gas number density #/m3
Q_std = 300. # standard volumetric flowrate in sccm
T_e_0 = 11. # plasma temperature ratio T_e/T_g here assumed to be T_e = 0.5 eV and T_g = 500 K
n_e_0 = 1.e20 # electron density in bulk plasma #/m3
# Geometric parameters
R_b = 1.e-3 # radius cylinder m
L = 1.e-1 # length of cylinder m
# Transport parameters
D_ion = 4.16e-6 #m2/s ion diffusion, obtained from https://doi.org/10.1007/s12127-020-00258-z
mu_ion = D_ion*q_e/k_b/T_g # ion electrical mobility using Einstein relation
D_e = 100.68122*D_ion #m2/s electron diffusion
mu_e = D_e*q_e/k_b/T_g # electron electrical mobility using Einstein relation
Lambda = char_diff_length(L,R_b)
debyelength_e = L_Debye(n_e_0,T_g)
gamma = (Lambda/debyelength_e)**2
delta = D_ion/D_e
def d_j(rb,n): #sets the desired spatial steps for mesh
dj = np.zeros(n)
for j in range(n):
dj[j] = 2*rb*(1 - j/n)/n
return dj
#Initializing mesh
dj = d_j(1.,100) # 100 points
mesh = CylindricalGrid1D(dr = dj)
#Declaring cell variables
N = CellVariable(mesh=mesh, value = 1., hasOld = True, name = "electron density")
P = CellVariable(mesh=mesh, value = 1., hasOld = True, name = "ion density")
H = CellVariable(mesh=mesh, value = 0., hasOld = True, name = "electric field")
#Setting boundary conditions
N.constrain(0.,mesh.facesRight) # electron density = 0 at walls
P.constrain(0.,mesh.facesRight)# ion density = 0 at walls
H.constrain(0.,mesh.facesLeft) # electric field = 0 in the center
N.faceGrad.constrain([0.],mesh.facesLeft) # flux of electron = 0 in the center
P.faceGrad.constrain([0.],mesh.facesLeft) # flux of ion = 0 in the center
if __name__ == '__main__':
viewer = Viewer(vars=(P,N))
viewer.plot()
eqn1 = (TransientTerm(var=P) == DiffusionTerm(coeff=delta,var=P)
- ConvectionTerm(coeff=[H.cellVolumeAverage,],var=P)
- ConvectionTerm(coeff=[P.cellVolumeAverage,],var=H))
eqn2 = (TransientTerm(var=N) == DiffusionTerm(var=N)
+ (1/delta)*(ConvectionTerm(coeff=[H.cellVolumeAverage,],var=N)
+ConvectionTerm(coeff=[N.cellVolumeAverage,],var=H)))
eqn3 = (TransientTerm(var=H) == gamma*(ConvectionTerm(coeff=[delta**2,],var=P)
- ConvectionTerm(coeff=[delta,],var=N)
- H*(delta*P.cellVolumeAverage + N.cellVolumeAverage)))
P.setValue(1.)
N.setValue(1.)
H.setValue(0.)
eqn1d = eqn1 & eqn2 & eqn3
timesteps = 1e-5
steps = 100
for i in range(steps):
P.updateOld()
N.updateOld()
H.updateOld()
res = 1e10
sweep = 0
while res > 1e-3 and sweep < 20:
res = eqn1d.sweep(dt=timesteps)
sweep += 1
if __name__ == '__main__':
viewer.plot()
Electric field is a vector, not a scalar.
H = CellVariable(rank=1, mesh=mesh, value = 0., hasOld = True, name = "electric field")
Correcting that should make converting the terms to FiPy clearer:
There's no reason to run the chain rule on the last term of eq1 or eq2; they're already in the canonical form for a FiPy ConvectionTerm. After the chain rule, they become, e.g., , neither of which is a form that FiPy likes. You could write those last two terms as explicit sources, but you shouldn't.
eqn1 = (TransientTerm(var=P) == DiffusionTerm(coeff=delta, var=P)
- ConvectionTerm(coeff=H, var=P))
eqn2 = (TransientTerm(var=N) == DiffusionTerm(var=N)
+ (1/delta)*ConvectionTerm(coeff=H, var=N))
I don't really understand eq3. It looks sort of like an integration of the continuity equation? I don't see it on a quick scan of the Phelps paper you cite. Regardless, it's not in a form that FiPy is amenable to; you can write it, but it won't solve well. The terms on the right aren't ConvectionTerms, they're just gradients.
If you're going to be allowing charge separation and worrying about the Debye length, I think you should be solving Poisson's equation. Can you share where this equation comes from? We might be able to put it in a form that FiPy will be happier with.
eq3 is a modified Poisson's equation. I tried to follow the procedure outlined by Freeman where a time derivative is performed on the Poisson equation to substitute the species continuity equations. Freeman solved these equations using the Gear package which I can only assume is a package on Fortran. I followed his steps out of naivite because I am out of my depth with numerical methods.
I will try solving again with the Poisson equation in its standard form.
edit: I have changed the electric field H to a rank 1 tensor and I have modified eq3 as well as slightly changed the definition of gamma. Everything else is unchanged.
H = CellVariable(rank = 1, mesh=mesh, value = 0., hasOld = True, name = "electric field")
charlength = char_diff_length(L,R_b)
debyelength_e = L_Debye(n_e_0,T_g)
gamma = (debyelength_e/charlength)**2
delta = D_ion/D_e
eqn1 = (TransientTerm(var=P) == DiffusionTerm(coeff=delta,var=P)
- ConvectionTerm(coeff=H,var=P))
eqn2 = (TransientTerm(var=N) == DiffusionTerm(var=N)
+ (1/delta)*ConvectionTerm(coeff=H,var=N))
eqn3 = (ConvectionTerm(coeff = gamma/delta, var=H) == ImplicitSourceTerm(var=P)
- ImplicitSourceTerm(var=N))
P.setValue(1.)
N.setValue(1.)
H.setValue(0.)
eqn1d = eqn1 & eqn2 & eqn3
timesteps = 1e-8
steps = 100
for i in range(steps):
P.updateOld()
N.updateOld()
H.updateOld()
res = 1e10
sweep = 0
while res > 1e-3 and sweep < 20:
res = eqn1d.sweep(dt=timesteps)
sweep += 1
if __name__ == '__main__':
viewer.plot()
It does not give me the same errors as before which is some indication of progress. However, it is spitting out a new error:
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 2 has 2 dimension(s)

If min-max normalization is just a kind of rescaling then why mean is not zero after this rescaling?

I have performed Min-Max normalization after which sample ranges in [-1,1]. Since this normalization is just a kind of re-scaling, so why mean is not zero in the new data? Is there anything wrong in my code or is there anything wrong in my explanation?
data np.array([-3, 1,2])
print("data mean:" , data.mean())
#perform min-max normalization:
old_range = np.amax(data) - np.amin(data)
new_range = 2
new_min = -1
data_norm = ((data - np.amin(data)) / old_range)*new_range + new_min
print("data_norm:", data_norm)
print("mean after normalization: ", data_norm.mean())
#Result:
#data mean: 0.0
#mean after normalization: 0.60000001
In general if x is a random variable and y = bx+c then (reference)
mean(y) = mean(x)*b + c
std(y) = std(x)*b
variance(y) = variance(x)*b**2
x = np.array([-3, 1,2])
new_min = -1
new_max = 1
new_range = new_max - new_min
new_x = ((x-np.min(x))/(np.max(x)-np.min(x)))*new_range + new_min
print ("Mean: {0:.3}, std: {1:.3}, Var: {2:.3}".format(np.mean(new_x), np.std(new_x), np.var(new_x)))
alpha = new_range/(np.max(x)-np.min(x))
beta = np.min(x)*alpha - new_min
new_mean = np.mean(x)*alpha - beta
new_std = np.std(x)*alpha
new_var = np.var(x)*alpha*alpha
print ("Mean: {0:.3}, std: {1:.3}, Var: {2:.3}".format(new_mean,new_std,new_var))
Output:
Mean: 0.2, std: 0.864, Var: 0.747
Mean: 0.2, std: 0.864, Var: 0.747
So mean of y depends on the mean of x and alpha and beta as shown in the above equations.
I want to add why standardization of data produces data with mean zero.
Normalization usually means to scale a variable to have a values between new_min and new_max(in your case, it's between -1 and 1), while standardization transforms data to have a mean of zero and a standard deviation of 1.
For Instance, let suppose you want to scale your variables between (0, 1), i.e. new_min=0 and new_max=1. Then how in this case, mean will be 0? There is no negative value to cancel out positive value.

Correlation between independent explanatory variables in regression

sample_size = 100
x1 = rnorm(sample_size, mean = 0, sd = 0.5)
x2 = rnorm(sample_size, mean = 0, sd = 0.5)
x3 = 0.5*x2+rnorm(sample_size, mean = 0, sd = 0.5)
generate errors:
epsilon = rnorm(sample_size, mean = 0, sd = 0.1)
generate dependent variable:
y = 0+0.1*x1+0.2*x2+0.3*x3+epsilon
Case 1 :x1 and x2 are independent variables.
correlation should come to 0 as we increase sample size ie. cor(x1,x2)
Case 2: But seeing the equation in terms of x2,
x2=(5*y-0.5*x1-1.5*x3)
it says the beta should be -0.5
' correlation =beta*var(x)/var(y)'
correlation= -0.5*var(x1)/var(x2)
which states that correlation should not be zero.
Where am i going wrong in this second case???

Why isn’t NUTS sampling with tt.dot or pm.math.dot?

I am trying to implement parts of Facebook's prophet with some help from this example.
https://github.com/luke14free/pm-prophet/blob/master/pmprophet/model.py
This goes well :), but I am having some problems with the dot product I don't understand. Note that I am implementing the linear trends.
ds = pd.to_datetime(df['dagindex'], format='%d-%m-%y')
m = pm.Model()
changepoint_prior_scale = 0.05
n_changepoints = 25
changepoints = pd.date_range(
start=pd.to_datetime(ds.min()),
end=pd.to_datetime(ds.max()),
periods=n_changepoints + 2
)[1: -1]
with m:
# priors
sigma = pm.HalfCauchy('sigma', 10, testval=1)
#trend
growth = pm.Normal('growth', 0, 10)
prior_changepoints = pm.Laplace('changepoints', 0, changepoint_prior_scale, shape=len(changepoints))
y = np.zeros(len(df))
# indexes x_i for the changepoints.
s = [np.abs((ds - i).values).argmin() for i in changepoints]
g = growth
x = np.arange(len(ds))
# delta
d = prior_changepoints
regression = x * g
base_piecewise_regression = []
for i in s:
local_x = x.copy()[:-i]
local_x = np.concatenate([np.zeros(i), local_x])
base_piecewise_regression.append(local_x)
piecewise_regression = np.array(base_piecewise_regression)
# this dot product doesn't work?
piecewise_regression = pm.math.dot(theano.shared(piecewise_regression).T, d)
# If I comment out this line and use that one as dot product. It works fine
# piecewise_regression = (piecewise_regression.T * d[None, :]).sum(axis=-1)
regression += piecewise_regression
y += regression
obs = pm.Normal('y',
mu=(y - df.gebruikers.mean()) / df.gebruikers.std(),
sd=sigma,
observed=(df.gebruikers - df.gebruikers.mean()) / df.gebruikers.std())
start = pm.find_MAP(maxeval=10000)
trace = pm.sample(500, step=pm.NUTS(), start=start)
If I run the snippet above with
piecewise_regression = (piecewise_regression.T * d[None, :]).sum(axis=-1)
the model works as expected. However I cannot get it to work with a dot product. The NUTS sampler doesn't sample at all.
piecewise_regression = pm.math.dot(theano.shared(piecewise_regression).T, d)
EDIT
Ive got a minimal working example
The problem still occurs with theano.shared. I’ve got a minimal working example:
np.random.seed(5)
n_changepoints = 10
t = np.arange(1000)
s = np.sort(np.random.choice(t, size=n_changepoints, replace=False))
a = (t[:, None] > s) * 1
real_delta = np.random.normal(size=n_changepoints)
y = np.dot(a, real_delta) * t
with pm.Model():
sigma = pm.HalfCauchy('sigma', 10, testval=1)
delta = pm.Laplace('delta', 0, 0.05, shape=n_changepoints)
g = tt.dot(a, delta) * t
obs = pm.Normal('obs',
mu=(g - y.mean()) / y.std(),
sd=sigma,
observed=(y - y.mean()) / y.std())
trace = pm.sample(500)
It seems to have something to do with the size of matrix a. NUTS doesnt’t sample if I start with
t = np.arange(1000)
however the example above does sample when I reduce the size of t to:
t = np.arange(100)

Resources