¿Why does the AIC doesn't change with different values of alpha on the BigVAR package? - var

I currently use R v.4.2.0, BigVAR package v.1.1.0, and Windows 11 home version 21H2.
I'm using the ConstructModel function to generate a penalized VAR model using the 'HLAGELEM' structure, however, the dual argument is not being used in order to run cross-validation to choose the best alpha.
However, I assigned different values of alpha manually in order to compare via information criteria which alpha would be associated with the best model. The problem is that no matter which alpha I choose, the AIC is still the same for every model. ¿Why is this happening, and how can I properly use the dual argument to choose the best alpha automatically?
require(quantmod)
require(zoo)
require(vars)
library(expm)
library(BigVAR)
# get GDP, Federal Funds Rate, CPI from FRED
#Gross Domestic Product (Relative to 2000)
getSymbols('GDP',src='FRED',type='xts')
#> [1] "GDP"
GDP<- aggregate(GDP,as.yearqtr,mean)
GDP <- GDP/mean(GDP["2000"])*100
# Transformation Code: First Difference of Logged Variables
GDP <- diff(log(GDP))
index(GDP) <- as.yearqtr(index(GDP))
# Federal Funds Rate
getSymbols('FEDFUNDS',src='FRED',type='xts')
#> [1] "FEDFUNDS"
FFR <- aggregate(FEDFUNDS,as.yearqtr,mean)
# Transformation Code: First Difference
FFR <- diff(FFR)
# CPI ALL URBAN CONSUMERS, relative to 1983
getSymbols('CPIAUCSL',src='FRED',type='xts')
#> [1] "CPIAUCSL"
CPI <- aggregate(CPIAUCSL,as.yearqtr,mean)
CPI <- CPI/mean(CPI['1983'])*100
# Transformation code: difference of logged variables
CPI <- diff(log(CPI))
# Seasonally Adjusted M1
getSymbols('M1SL',src='FRED',type='xts')
#> [1] "M1SL"
M1<- aggregate(M1SL,as.yearqtr,mean)
# Transformation code, difference of logged variables
M1 <- diff(log(M1))
# combine series
Y <- cbind(CPI,FFR,GDP,M1)
names(Y) <- c("CPI","FFR","GDP","M1")
Y <- na.omit(Y)
k=ncol(Y)
T <- nrow(Y)
# start/end of rolling validation
T1 <- which(index(Y)=="1985 Q1")
T2 <- which(index(Y)=="2005 Q1")
#Demean
Y <- Y - (c(rep(1, nrow(Y))))%*%t(c(apply(Y[1:T1,], 2, mean)))
#Standarize Variance
for (i in 1:k) {
Y[, i] <- Y[, i]/apply(Y[1:T1,], 2, sd)[i]
}
# Fit an Elementwise HLAG model
Model1=constructModel(as.matrix(Y),p=4,struct="HLAGELEM",
gran=c(25,10),verbose=FALSE,VARX=list(),T1=T1,T2=T2)
Model1Results=cv.BigVAR(Model1)
str(Model1Results)
Now I tried manually different values for alpha
model_0 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0))
model_0.1 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0.1))
model_0.2 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0.2))
model_0.3 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0.3))
model_0.4 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0.4))
model_0.5 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0.5))
model_0.6 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0.6))
model_0.7 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0.7))
model_0.8 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0.8))
model_0.9 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0.9))
model_1 = constructModel(as.matrix(Y),p=4,struct="HLAGELEM", gran=c(25,10),verbose=FALSE, IC = TRUE, model.controls=list(alpha = 0))
#Cross Validation
Results_0 =cv.BigVAR(model_0)
Results_0.1=cv.BigVAR(model_0.1)
Results_0.2=cv.BigVAR(model_0.2)
Results_0.3=cv.BigVAR(model_0.3)
Results_0.4=cv.BigVAR(model_0.4)
Results_0.5=cv.BigVAR(model_0.5)
Results_0.6=cv.BigVAR(model_0.6)
Results_0.7=cv.BigVAR(model_0.7)
Results_0.8=cv.BigVAR(model_0.8)
Results_0.9=cv.BigVAR(model_0.9)
Results_1 =cv.BigVAR(model_1)
#Best AIC
print(Results_0#AICSD, quote=TRUE)
print(Results_0.1#AICSD, quote=TRUE)
print(Results_0.2#AICSD, quote=TRUE)
print(Results_0.3#AICSD, quote=TRUE)
print(Results_0.4#AICSD, quote=TRUE)
print(Results_0.5#AICSD, quote=TRUE)
print(Results_0.6#AICSD, quote=TRUE)
print(Results_0.7#AICSD, quote=TRUE)
print(Results_0.8#AICSD, quote=TRUE)
print(Results_0.9#AICSD, quote=TRUE)
print(Results_0.1#AICSD, quote=TRUE)
These are the results I get.
[1] 284.9829
[1] 284.9829
[1] 284.9829
[1] 284.9829
[1] 284.9829
[1] 284.9829
[1] 284.9829
[1] 284.9829
[1] 284.9829
[1] 284.9829
[1] 284.9829
I tried this with AICMSFE, AICSD, AICpvec, AICsvec, BICMSFE, BICSD, BICpvec, BIC, and BICMSFE from the model results object, but still, no change was shown.
I'd really appreciate your help solving this issue.

Related

Faster way to calculate the Hessian / Fisher Information Matrix of a nnet::multinom multinomial regression in R using Rcpp & Kronecker products

It appears that for larger nnet::multinom multinomial regression models (with a few thousand coefficients), calculating the Hessian (the matrix of second derivatives of the negative log likelihood, also known as the observed Fisher information matrix) becomes super slow, which then prevents me from calculating the variance-covariance matrix & allowing me to calculate confidence intervals on model predictions.
It seems the culprit is the following pure R function - it seems it uses some code to calculate the Fisher information matrix analytically using code contributed by David Firth :
https://github.com/cran/nnet/blob/master/R/vcovmultinom.R
multinomHess = function (object, Z = model.matrix(object))
{
probs <- object$fitted
coefs <- coef(object)
if (is.vector(coefs)) {
coefs <- t(as.matrix(coefs))
probs <- cbind(1 - probs, probs)
}
coefdim <- dim(coefs)
p <- coefdim[2L]
k <- coefdim[1L]
ncoefs <- k * p
kpees <- rep(p, k)
n <- dim(Z)[1L]
## Now compute the observed (= expected, in this case) information,
## e.g. as in T Amemiya "Advanced Econometrics" (1985) pp 295-6.
## Here i and j are as in Amemiya, and x, xbar are vectors
## specific to (i,j) and to i respectively.
info <- matrix(0, ncoefs, ncoefs)
Names <- dimnames(coefs)
if (is.null(Names[[1L]]))
Names <- Names[[2L]]
else Names <- as.vector(outer(Names[[2L]], Names[[1L]], function(name2,
name1) paste(name1, name2, sep = ":")))
dimnames(info) <- list(Names, Names)
x0 <- matrix(0, p, k + 1L)
row.totals <- object$weights
for (i in seq_len(n)) {
Zi <- Z[i, ]
xbar <- rep(Zi, times=k) * rep(probs[i, -1, drop=FALSE], times=kpees)
for (j in seq_len(k + 1)) {
x <- x0
x[, j] <- Zi
x <- x[, -1, drop = FALSE]
x <- x - xbar
dim(x) <- c(1, ncoefs)
info <- info + (row.totals[i] * probs[i, j] * crossprod(x))
}
}
info
}
The info in the Advanced Econometrics book that is referenced states
From this explanation, we can see that the Hessian indeed is just given by the sum of a bunch of crossproducts. I also saw this and this in terms of derivation of how to calculate the Hessian matrix of a multinomial regression model, which may be even more elegant and efficient, as the Hessian is there calculated based on a sum of Kronecker products.
For a smallish nnet::multinom model (in which I am modelling the frequency of different SARS-CoV2 lineages through time) the provided function runs quickly :
library(nnet)
library(splines)
download.file("https://www.dropbox.com/s/gt0yennn2gkg3rd/smallmodel.RData?dl=1",
"smallmodel.RData",
method = "auto", mode="wb")
load("smallmodel.RData")
length(fit_multinom_small$lev) # k=12 outcome levels
dim(coef(fit_multinom_small)) # 11 x 3 = (k-1) x p = 33 coefs
system.time(hess <- nnet:::multinomHess(fit_multinom_small)) # 0.11s
dim(hess) # 33 33
but doing this for a large model takes more than 2 hours (even though the model itself fits in ca. 1 minute) (again modelling the frequency of different SARS-CoV2 lineages through time, but now across different continents / countries) :
download.file("https://www.dropbox.com/s/mpz08jj7fmubd68/bigmodel.RData?dl=1",
"bigmodel.RData",
method = "auto", mode="wb")
load("bigmodel.RData")
length(fit_global_multi_last3m$lev) # k=20 outcome levels
dim(coef(fit_global_multi_last3m)) # 19 x 229 = (k-1) x p = 4351 coefficients
system.time(hess <- nnet:::multinomHess(fit_global_multi_last3m)) # takes forever
I was now looking for ways to speed up the above function.
The obvious attempt could be to port it to Rcpp, but unfortunately I am not so experienced in this. Anybody any thoughts?
EDIT: From the info here and here, it appears that calculating the Hessian for a multinomial fit should just come down to calculating a sum of Kronecker products, which we can just do from R using efficient matrix algebra, but right now I am unsure how to include my total row counts fit$weights. Anybody any idea?
download.file("https://www.dropbox.com/s/gt0yennn2gkg3rd/smallmodel.RData?dl=1",
"smallmodel.RData",
method = "auto", mode="wb")
load("smallmodel.RData")
library(nnet)
length(fit_multinom_small$lev) # k=12 outcome levels
dim(coef(fit_multinom_small)) # 11 x 3 = (k-1) x p = 33 coefs
fit = fit_multinom_small
Z = model.matrix(fit)
P = fitted(fit)[, -1, drop=F]
k = ncol(P) # nr of outcome categories-1
p = ncol(Z) # nr of parameters
n = nrow(Z) # nr of observations
ncoefs = k*p
library(fastmatrix)
# Fisher information matrix
info <- matrix(0, ncoefs, ncoefs)
for (i in 1:n) { # sum over observations
info = info + kronecker.prod(diag(P[i,]) - tcrossprod(P[i,]), tcrossprod(Z[i,]))
}
Figured it out in the end & was able to calculate the observed Fisher information matrix using Kronecker products, as well as port that bit to Rcpp, using Armadillo classes (full disclosure: I made that Rcpp port just using OpenAI's code-davinci / Codex, https://openai.com/blog/openai-codex/, and surprisingly it worked straight out of the box - AI is getting better every day; parallelReduce could still be used to parallelize the accumulation I presume; the function was faster than an equivalent RcppEigen implementation I tried). The mistake I made was that the formula above was the observed Fisher information for a single observation, so I had to accumulate over observations & I also had to take into account my total row counts.
Rcpp function:
// RcppArmadillo utility function to calculate observed Fisher
// information matrix of multinomial fit, with
// probs=fitted probabilities (with 1st category/column dropped)
// Z = model matrix
// row_totals = row totals
// We do this using Kronecker products, as in
// https://ieeexplore.ieee.org/abstract/document/1424458
// B. Krishnapuram; L. Carin; M.A.T. Figueiredo; A.J. Hartemink
// Sparse multinomial logistic regression: fast algorithms and
// generalization bounds
// IEEE Transactions on Pattern Analysis and Machine
// Intelligence ( Volume: 27, Issue: 6, June 2005)
#include <RcppArmadillo.h>
using namespace arma;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat calc_infmatrix_RcppArma(arma::mat probs, arma::mat Z, arma::vec row_totals) {
int n = Z.n_rows;
int p = Z.n_cols;
int k = probs.n_cols;
int ncoefs = k * p;
arma::mat info = arma::zeros<arma::mat>(ncoefs, ncoefs);
arma::mat diag_probs;
arma::mat tcrossprod_probs;
arma::mat tcrossprod_Z;
arma::mat kronecker_prod;
for (int i = 0; i < n; i++) {
diag_probs = arma::diagmat(probs.row(i));
tcrossprod_probs = arma::trans(probs.row(i)) * probs.row(i);
tcrossprod_Z = (arma::trans(Z.row(i)) * Z.row(i)) * row_totals(i);
kronecker_prod = arma::kron(diag_probs - tcrossprod_probs, tcrossprod_Z);
info += kronecker_prod;
}
return info;
}
saved as "calc_infmatrix_arma.cpp".
library(Rcpp)
library(RcppArmadillo)
sourceCpp("calc_infmatrix_arma.cpp")
R wrapper function :
# Function to calculate Hessian / observed Fisher information
# matrix of nnet::multinom multinomial fit object
fastmultinomHess <- function(object, Z = model.matrix(object)) {
probs <- object$fitted # predicted probabilities, avoid napredict from fitted.default
coefs <- coef(object)
if (is.vector(coefs)){ # ie there are only 2 response categories
coefs <- t(as.matrix(coefs))
probs <- cbind(1 - probs, probs)
}
coefdim <- dim(coefs)
p <- coefdim[2L] # nr of parameters
k <- coefdim[1L] # nr out outcome categories-1
ncoefs <- k * p # nr of coefficients
n <- dim(Z)[1L] # nr of observations
# Now compute the Hessian = the observed
# (= expected, in this case)
# Fisher information matrix
info <- calc_infmatrix_RcppArma(probs = probs[, -1, drop=F],
Z = Z,
row_totals = object$weights)
Names <- dimnames(coefs)
if (is.null(Names[[1L]])) Names <- Names[[2L]] else Names <- as.vector(outer(Names[[2L]], Names[[1L]],
function(name2, name1)
paste(name1, name2, sep = ":")))
dimnames(info) <- list(Names, Names)
return(info)
}
For my larger model this now calculates in 100s instead of >2 hours, so almost 80 times faster :
download.file("https://www.dropbox.com/s/mpz08jj7fmubd68/bigmodel.RData?dl=1",
"bigmodel.RData",
method = "auto", mode="wb")
load("bigmodel.RData")
object = fit_global_multi_last3m # large nnet::multinom fit
system.time(info <- fastmultinomHess(object, Z = model.matrix(object))) # 103s
system.time(info <- nnet:::multinomHess(object, Z = model.matrix(object))) # 8127s = 2.25h
A pure R version of the calc_infmatrix function (ca. 5x slower than the Rcpp function above) would be
# Utility function to calculate observed Fisher information matrix
# of multinomial fit, with
# probs=fitted probabilities (with 1st category/column dropped)
# Z = model matrix
# row_totals = row totals
calc_infmatrix = function(probs, Z, row_totals) {
require(fastmatrix) # for kronecker.prod Kronecker product function
n <- nrow(Z)
p <- ncol(Z)
k <- ncol(probs)
ncoefs <- k * p
info <- matrix(0, ncoefs, ncoefs)
for (i in 1:n) {
info <- info + kronecker.prod((diag(probs[i,]) - tcrossprod(probs[i,])), tcrossprod(Z[i,])*row_totals[i] )
}
return(info)
}

FiPy for charged particle flow

Premise
I am trying to solve a set of coupled PDEs that describes the diffusion of charged particles with different diffusion coefficients using FiPy. The ultimate goal is to obtain the concentration profile for both species and the electric field.
The geometry is an infinitely long cylinder with radius R. I want to use a non-uniform grid with more points close to the domain walls.
Charged particles diffuse from the center of the domain (left boundary) to the walls of the domain (right boundary). This translates to a Dirichlet boundary condition (B.C.) at the left boundary where both species concentration = 0, and a Neumann B.C. at the right boundary where species flux are 0 to describe radial symmetry. Because the charged species diffuse at different rates, there is an electric field that arises from the space charge. The electric field accelerates the slower species and decelerates the faster species proportional to the field magnitude.
P is the positively charged species concentration and N is negatively charged charged species concentration. E is the space charge electric field.
Issue
I can't seem to get a sensible solution from my code and I think it may be related on how I casted the gradient/divergence terms as a ConvectionTerm:
from fipy import *
import scipy.constants as constant
from fipy.tools import numerix
import numpy as np
## Defining physical constants
pi = constant.pi
m_argon = 6.6335e-26 # kg
k_b = constant.k # J/K
e_0 = constant.epsilon_0 # F/m
q_e = constant.elementary_charge # C
m_e = constant.electron_mass # kg
planck = constant.h
def char_diff_length(L,R):
"""Characteristic diffusion length in a cylinder.
Used for determining the ambipolar diffusion coefficient.
ref: https://doi.org/10.6028/jres.095.035"""
a = (pi/L)**2
b = (2.405/R)**2
c = (a+b)**(1/2)
return c
def L_Debye(ne,Te):
"""Electron Debye screening length given in m.
ne is in #/m3, Te is in K."""
if ne < 3.3e-5:
ne = 3.3e-5
return (((e_0*k_b*Te)/(ne*q_e**2)))**(1/2)
## Setting system parameters
# Operation parameters
Pressure = 1.e5 # ambient pressure Pa
T_g = 400. # background gas temperature K
n_g = Pressure/k_b/T_g # gas number density #/m3
Q_std = 300. # standard volumetric flowrate in sccm
T_e_0 = 11. # plasma temperature ratio T_e/T_g here assumed to be T_e = 0.5 eV and T_g = 500 K
n_e_0 = 1.e20 # electron density in bulk plasma #/m3
# Geometric parameters
R_b = 1.e-3 # radius cylinder m
L = 1.e-1 # length of cylinder m
# Transport parameters
D_ion = 4.16e-6 #m2/s ion diffusion, obtained from https://doi.org/10.1007/s12127-020-00258-z
mu_ion = D_ion*q_e/k_b/T_g # ion electrical mobility using Einstein relation
D_e = 100.68122*D_ion #m2/s electron diffusion
mu_e = D_e*q_e/k_b/T_g # electron electrical mobility using Einstein relation
Lambda = char_diff_length(L,R_b)
debyelength_e = L_Debye(n_e_0,T_g)
gamma = (Lambda/debyelength_e)**2
delta = D_ion/D_e
def d_j(rb,n): #sets the desired spatial steps for mesh
dj = np.zeros(n)
for j in range(n):
dj[j] = 2*rb*(1 - j/n)/n
return dj
#Initializing mesh
dj = d_j(1.,100) # 100 points
mesh = CylindricalGrid1D(dr = dj)
#Declaring cell variables
N = CellVariable(mesh=mesh, value = 1., hasOld = True, name = "electron density")
P = CellVariable(mesh=mesh, value = 1., hasOld = True, name = "ion density")
H = CellVariable(mesh=mesh, value = 0., hasOld = True, name = "electric field")
#Setting boundary conditions
N.constrain(0.,mesh.facesRight) # electron density = 0 at walls
P.constrain(0.,mesh.facesRight)# ion density = 0 at walls
H.constrain(0.,mesh.facesLeft) # electric field = 0 in the center
N.faceGrad.constrain([0.],mesh.facesLeft) # flux of electron = 0 in the center
P.faceGrad.constrain([0.],mesh.facesLeft) # flux of ion = 0 in the center
if __name__ == '__main__':
viewer = Viewer(vars=(P,N))
viewer.plot()
eqn1 = (TransientTerm(var=P) == DiffusionTerm(coeff=delta,var=P)
- ConvectionTerm(coeff=[H.cellVolumeAverage,],var=P)
- ConvectionTerm(coeff=[P.cellVolumeAverage,],var=H))
eqn2 = (TransientTerm(var=N) == DiffusionTerm(var=N)
+ (1/delta)*(ConvectionTerm(coeff=[H.cellVolumeAverage,],var=N)
+ConvectionTerm(coeff=[N.cellVolumeAverage,],var=H)))
eqn3 = (TransientTerm(var=H) == gamma*(ConvectionTerm(coeff=[delta**2,],var=P)
- ConvectionTerm(coeff=[delta,],var=N)
- H*(delta*P.cellVolumeAverage + N.cellVolumeAverage)))
P.setValue(1.)
N.setValue(1.)
H.setValue(0.)
eqn1d = eqn1 & eqn2 & eqn3
timesteps = 1e-5
steps = 100
for i in range(steps):
P.updateOld()
N.updateOld()
H.updateOld()
res = 1e10
sweep = 0
while res > 1e-3 and sweep < 20:
res = eqn1d.sweep(dt=timesteps)
sweep += 1
if __name__ == '__main__':
viewer.plot()
Electric field is a vector, not a scalar.
H = CellVariable(rank=1, mesh=mesh, value = 0., hasOld = True, name = "electric field")
Correcting that should make converting the terms to FiPy clearer:
There's no reason to run the chain rule on the last term of eq1 or eq2; they're already in the canonical form for a FiPy ConvectionTerm. After the chain rule, they become, e.g., , neither of which is a form that FiPy likes. You could write those last two terms as explicit sources, but you shouldn't.
eqn1 = (TransientTerm(var=P) == DiffusionTerm(coeff=delta, var=P)
- ConvectionTerm(coeff=H, var=P))
eqn2 = (TransientTerm(var=N) == DiffusionTerm(var=N)
+ (1/delta)*ConvectionTerm(coeff=H, var=N))
I don't really understand eq3. It looks sort of like an integration of the continuity equation? I don't see it on a quick scan of the Phelps paper you cite. Regardless, it's not in a form that FiPy is amenable to; you can write it, but it won't solve well. The terms on the right aren't ConvectionTerms, they're just gradients.
If you're going to be allowing charge separation and worrying about the Debye length, I think you should be solving Poisson's equation. Can you share where this equation comes from? We might be able to put it in a form that FiPy will be happier with.
eq3 is a modified Poisson's equation. I tried to follow the procedure outlined by Freeman where a time derivative is performed on the Poisson equation to substitute the species continuity equations. Freeman solved these equations using the Gear package which I can only assume is a package on Fortran. I followed his steps out of naivite because I am out of my depth with numerical methods.
I will try solving again with the Poisson equation in its standard form.
edit: I have changed the electric field H to a rank 1 tensor and I have modified eq3 as well as slightly changed the definition of gamma. Everything else is unchanged.
H = CellVariable(rank = 1, mesh=mesh, value = 0., hasOld = True, name = "electric field")
charlength = char_diff_length(L,R_b)
debyelength_e = L_Debye(n_e_0,T_g)
gamma = (debyelength_e/charlength)**2
delta = D_ion/D_e
eqn1 = (TransientTerm(var=P) == DiffusionTerm(coeff=delta,var=P)
- ConvectionTerm(coeff=H,var=P))
eqn2 = (TransientTerm(var=N) == DiffusionTerm(var=N)
+ (1/delta)*ConvectionTerm(coeff=H,var=N))
eqn3 = (ConvectionTerm(coeff = gamma/delta, var=H) == ImplicitSourceTerm(var=P)
- ImplicitSourceTerm(var=N))
P.setValue(1.)
N.setValue(1.)
H.setValue(0.)
eqn1d = eqn1 & eqn2 & eqn3
timesteps = 1e-8
steps = 100
for i in range(steps):
P.updateOld()
N.updateOld()
H.updateOld()
res = 1e10
sweep = 0
while res > 1e-3 and sweep < 20:
res = eqn1d.sweep(dt=timesteps)
sweep += 1
if __name__ == '__main__':
viewer.plot()
It does not give me the same errors as before which is some indication of progress. However, it is spitting out a new error:
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 2 has 2 dimension(s)

Plotting Average Length of Brownian Motion Realization

I have a function for a brownian motion:
mu , sig = 0 , 1 # normal dist
mu_s = 0 # mu in SDE
sig_s = 1 #sig in SDE
S0 = 10 # starting price of stock
n , m = 1000, 20 # paths = n = how many simulations, m for discritization
T = 1 # year
dt = 1 # each dt is one day
def ABM(n,m,S0,mu,sigma,dt):
np.random.seed(999)
mu_s = mu # mu in SDE
sig_s = sigma #sig in SDE
S0 = S0 # starting price of stock
n , m = n, m # paths = n = how many simulations, m for discritization
sig_db = sig_s*np.sqrt(dt)*np.random.normal(mu, sigma, (n,m+1))
mu_dt = mu_s*dt*np.ones([n,m+1])
sig_db[:,0] = 0 # set first column to zero
mu_dt[:,0] = 0
dS = mu_dt + sig_db
S = S0 + np.cumsum(dS,axis=1)
return n,m,S
n,m,S = ABM(1000,20,10,0,1,1)
Which works fine for plotting separate realizations on one plot:
index = np.arange(0,m+1)*np.ones([n,m+1]) # create indices as S_0, S_1, S_2
plt.plot(index.T,S.T)
but now I'd like to plot the average path length of those realizations for each time step and I'm not sure how to go about it. The expectation of arithmetic brownian motion is E(S)=S_0 + \mu*t which leads me to think I should be using np.mean() in some way but I can't seem to get it.
TIA
The matrix S consists of n realizations, you get the E(S(t)) by averaging along the realizations, i.e.
EE = np.mean(S, axis = 0)
Similarly, you can get the variance, also a function of time, via
np.mean((S - EE)**2, axis = 0)

Deep Neural Network does not update weights upon training

I am currently getting into tensorflow and have just now started to grasp the graph like concept of it. Now I tried to implement a NN using gradient descent(Adam optimizer) to solve the cartpole environment. I start by randomly intializing my weights and then take random actions(accounting for existing weights) during training. When testing I always take the action with maximum probability. However I always get a score that hovers around 10 and variance is around 0.8. Always. it doesn't change in a notable fashion at all making it look that it always takes purely random actions at every step, not learning anything at all. As I said it seems that the weights are never updated correctly. Where and how do I need to do that?
Here's my code:
import tensorflow as tf
import numpy as np
from gym.envs.classic_control import CartPoleEnv
env = CartPoleEnv()
learning_rate = 10**(-3)
gamma = 0.9999
n_train_trials = 10**3
n_test_trials = 10**2
n_actions = env.action_space.n
n_obs = env.observation_space.high.__len__()
goal_steps = 200
should_render = False
print_per_episode = 100
state_holder = tf.placeholder(dtype=tf.float32, shape=(None, n_obs), name='symbolic_state')
actions_one_hot_holder = tf.placeholder(dtype=tf.float32, shape=(None, n_actions),
name='symbolic_actions_one_hot_holder')
discounted_rewards_holder = tf.placeholder(dtype=tf.float32, shape=None, name='symbolic_reward')
# initialize neurons list dynamically
def get_neurons_list():
i = n_obs
n_neurons_list = [i]
while i < (n_obs * n_actions) // (n_actions // 2):
i *= 2
n_neurons_list.append(i)
while i // 2 > n_actions:
i = i // 2
n_neurons_list.append(i)
n_neurons_list.append(n_actions)
# print(n_neurons_list)
return n_neurons_list
with tf.name_scope('nonlinear_policy'):
# create list of layers with sizes
n_neurons_list = get_neurons_list()
network = None
for i in range((len(n_neurons_list) - 1)):
theta = tf.Variable(tf.random_normal([n_neurons_list[i], n_neurons_list[i+1]]))
bias = tf.Variable(tf.random_normal([n_neurons_list[i+1]]))
if network is None:
network = tf.matmul(state_holder, theta) + bias
else:
network = tf.matmul(network, theta) + bias
if i < len(n_neurons_list) - 1:
network = tf.nn.relu(network)
action_probabilities = tf.nn.softmax(network)
testing_action_choice = tf.argmax(action_probabilities, dimension=1, name='testing_action_choice')
with tf.name_scope('loss'):
actually_chosen_probability = action_probabilities * actions_one_hot_holder
L_theta = -1 * (tf.reduce_sum(tf.log(actually_chosen_probability)) * tf.reduce_sum(discounted_rewards_holder))
with tf.name_scope('train'):
# We define the optimizer to use the ADAM optimizer, and ask it to minimize our loss
gd_opt = tf.train.AdamOptimizer(learning_rate).minimize(L_theta)
sess = tf.Session() # FOR NOW everything is symbolic, this object has to be called to compute each value of Q
# Start
sess.run(tf.global_variables_initializer())
observation = env.reset()
batch_rewards = []
states = []
action_one_hots = []
episode_rewards = []
episode_rewards_list = []
episode_steps_list = []
step = 0
episode_no = 0
while episode_no <= n_train_trials:
if should_render: env.render()
step += 1
action_probability_values = sess.run(action_probabilities,
feed_dict={state_holder: [observation]})
# Choose the action using the action probabilities output by the policy implemented in tensorflow.
action = np.random.choice(np.arange(n_actions), p=action_probability_values.ravel())
# Calculating the one-hot action array for use by tensorflow
action_arr = np.zeros(n_actions)
action_arr[action] = 1.
action_one_hots.append(action_arr)
# Record states
states.append(observation)
observation, reward, done, info = env.step(action)
# We don't want to go above 200 steps
if step >= goal_steps:
done = True
batch_rewards.append(reward)
episode_rewards.append(reward)
# If the episode is done, and it contained at least one step, do the gradient updates
if len(batch_rewards) > 0 and done:
# First calculate the discounted rewards for each step
batch_reward_length = len(batch_rewards)
discounted_batch_rewards = batch_rewards.copy()
for i in range(batch_reward_length):
discounted_batch_rewards[i] *= (gamma ** (batch_reward_length - i - 1))
# Next run the gradient descent step
# Note that each of action_one_hots, states, discounted_batch_rewards has the first dimension as the length
# of the current trajectory
gradients = sess.run(gd_opt, feed_dict={actions_one_hot_holder: action_one_hots, state_holder: states,
discounted_rewards_holder: discounted_batch_rewards})
action_one_hots = []
states = []
batch_rewards = []
if done:
# Done with episode. Reset stuff.
episode_no += 1
episode_rewards_list.append(np.sum(episode_rewards))
episode_steps_list.append(step)
episode_rewards = []
step = 0
observation = env.reset()
if episode_no % print_per_episode == 0:
print("Episode {}: Average steps in last {} episodes".format(episode_no, print_per_episode),
np.mean(episode_steps_list[(episode_no - print_per_episode):episode_no]), '+-',
np.std(episode_steps_list[(episode_no - print_per_episode):episode_no])
)
observation = env.reset()
episode_rewards_list = []
episode_rewards = []
episode_steps_list = []
step = 0
episode_no = 0
print("Testing")
while episode_no <= n_test_trials:
env.render()
step += 1
# For testing, we choose the action using an argmax.
test_action, = sess.run([testing_action_choice],
feed_dict={state_holder: [observation]})
observation, reward, done, info = env.step(test_action[0])
if step >= 200:
done = True
episode_rewards.append(reward)
if done:
episode_no += 1
episode_rewards_list.append(np.sum(episode_rewards))
episode_steps_list.append(step)
episode_rewards = []
step = 0
observation = env.reset()
if episode_no % print_per_episode == 0:
print("Episode {}: Average steps in last {} episodes".format(episode_no, print_per_episode),
np.mean(episode_steps_list[(episode_no - print_per_episode):episode_no]), '+-',
np.std(episode_steps_list[(episode_no - print_per_episode):episode_no])
)
Here is an example tensorflow program that uses Q Learning to learn the CartPole Open Gym.
It is able to quickly learn to stay upright for 80 steps.
Here is the code :
import math
import numpy as np
import sys
import random
sys.path.append("../gym")
from gym.envs.classic_control import CartPoleEnv
env = CartPoleEnv()
discount = 0.5
learning_rate = 0.5
gradient = .001
regularizaiton_factor = .1
import tensorflow as tf
tf_state = tf.placeholder( dtype=tf.float32 , shape=[4] )
tf_state_2d = tf.reshape( tf_state , [1,4] )
tf_action = tf.placeholder( dtype=tf.int32 )
tf_action_1hot = tf.reshape( tf.one_hot( tf_action , 2 ) , [1,2] )
tf_delta_reward = tf.placeholder( dtype=tf.float32 )
tf_value = tf.placeholder( dtype=tf.float32 )
tf_matrix1 = tf.Variable( tf.random_uniform([4,7], -.001, .001) )
tf_matrix2 = tf.Variable( tf.random_uniform([7,2], -.001, .001) )
tf_logits = tf.matmul( tf_state_2d , tf_matrix1 )
tf_logits = tf.matmul( tf_logits , tf_matrix2 )
tf_loss = -1 * learning_rate * ( tf_delta_reward + discount * tf_value - tf_logits ) * tf_action_1hot
tf_regularize = tf.reduce_mean( tf.square( tf_matrix1 )) + tf.reduce_mean( tf.square( tf_matrix2 ))
tf_train = tf.train.GradientDescentOptimizer(gradient).minimize( tf_loss + tf_regularize * regularizaiton_factor )
sess = tf.Session()
sess.run( tf.global_variables_initializer() )
def max_Q( state ) :
actions = sess.run( tf_logits, feed_dict={ tf_state:state } )
actions = actions[0]
value = actions.max()
action = 0 if actions[0] == value else 1
return action , value
avg_age = 0
for trial in range(1,101) :
# initialize state
previous_state = env.reset()
# initialize action and the value of the expected reward
action , value = max_Q(previous_state)
previous_reward = 0
for age in range(1,301) :
if trial % 100 == 0 :
env.render()
new_state, new_reward, done, info = env.step(action)
new_state = new_state
action, value = max_Q(new_state)
# The cart-pole gym doesn't return a reward of Zero when done.
if done :
new_reward = 0
delta_reward = new_reward - previous_reward
# learning phase
sess.run(tf_train, feed_dict={ tf_state:previous_state, tf_action:action, tf_delta_reward:delta_reward, tf_value:value })
previous_state = new_state
previous_reward = new_reward
if done :
break
avg_age = avg_age * 0.95 + age * .05
if trial % 50 == 0 :
print "Average age =",int(round(avg_age))," , trial",trial," , discount",discount," , learning_rate",learning_rate," , gradient",gradient
elif trial % 10 == 0 :
print int(round(avg_age)),
Here is the output:
6 18 23 30 Average age = 36 , trial 50 , discount 0.5 , learning_rate 0.5 , gradient 0.001
38 47 50 53 Average age = 55 , trial 100 , discount 0.5 , learning_rate 0.5 , gradient 0.001
Summary
I wasn't able to get Q learning with a simple neural net to be able to solve the CartPole problem, but have fun experimenting with different NN sizes and depths!
Hope you enjoy this code,
cheers

finding optimum lambda and features for polynomial regression

I am new to Data Mining/ML. I've been trying to solve a polynomial regression problem of predicting the price from given input parameters (already normalized within range[0, 1])
I'm quite close as my output is in proportion to the correct one, but it seems a bit suppressed, my algorithm is correct, just don't know how to reach to an appropriate lambda, (regularized parameter) and how to decide to what extent I should populate features as the problem says : "The prices per square foot, are (approximately) a polynomial function of the features. This polynomial always has an order less than 4".
Is there a way we could visualize data to find optimum value for these parameters, like we find optimal alpha (step size) and number of iterations by visualizing cost function in linear regression using gradient descent.
Here is my code : http://ideone.com/6ctDFh
from numpy import *
def mapFeature(X1, X2):
degree = 2
out = ones((shape(X1)[0], 1))
for i in range(1, degree+1):
for j in range(0, i+1):
term1 = X1**(i-j)
term2 = X2 ** (j)
term = (term1 * term2).reshape( shape(term1)[0], 1 )
"""note that here 'out[i]' represents mappedfeatures of X1[i], X2[i], .......... out is made to store features of one set in out[i] horizontally """
out = hstack(( out, term ))
return out
def solve():
n, m = input().split()
m = int(m)
n = int(n)
data = zeros((m, n+1))
for i in range(0, m):
ausi = input().split()
for k in range(0, n+1):
data[i, k] = float(ausi[k])
X = data[:, 0 : n]
y = data[:, n]
theta = zeros((6, 1))
X = mapFeature(X[:, 0], X[:, 1])
ausi = computeCostVect(X, y, theta)
# print(X)
print("Results usning BFGS : ")
lamda = 2
theta, cost = findMinTheta(theta, X, y, lamda)
test = [0.05, 0.54, 0.91, 0.91, 0.31, 0.76, 0.51, 0.31]
print("prediction for 0.31 , 0.76 (using BFGS) : ")
for i in range(0, 7, 2):
print(mapFeature(array([test[i]]), array([test[i+1]])).dot( theta ))
# pyplot.plot(X[:, 1], y, 'rx', markersize = 5)
# fig = pyplot.figure()
# ax = fig.add_subplot(1,1,1)
# ax.scatter(X[:, 1],X[:, 2], s=y) # Added third variable income as size of the bubble
# pyplot.show()
The current output is:
183.43478288
349.10716957
236.94627602
208.61071682
The correct output should be:
180.38
1312.07
440.13
343.72

Resources