Is there a workaround for not fusing the observed data into model definition in Pymc3? - theano

Problem definition: consider the "Simpletest" model (from pymc3 examples)which is something similar to the following one:
model = Model()
data = np.random.normal(size=(2, 20))
with model:
x = Normal('x', mu=.5, tau=2. ** -2, shape=(2, 1))
z = Beta('z', alpha=10, beta=5.5)
d = Normal('data', mu=x, tau=.75 ** -2, observed=data)
step = NUTS()
trace = sample(1000, step)
I'd like to change it so that I'll have a fixed model structure but run the sampling several iterations, each time adding a new data point to the previous (observed) dataset. Since the observed data is somehow embedded inside the model definition, the only way I know to do this is to put the whole model definition inside a loop:
model = Model()
# a set of initial data points
data = getInitPoints((2,5))
for i in xrange(m):
with model:
x = Normal('x', mu=.5, tau=2. ** -2, shape=(2, 1))
z = Beta('z', alpha=10, beta=5.5)
d = Normal('data', mu=x, tau=.75 ** -2, observed=data)
step = NUTS()
trace = sample(1000, step)
data = numpy.vstack( (data,getnewPoint( (2,1) ) ) )
#use the samples
This may produce some unnecessary overhead specially if the model is large. To refrain from the overhead of repeatedly defining the same model, I wonder if there is a solution so that the same results could be achieved with something similar to the following idea:
with model:
x = Normal('x', mu=.5, tau=2. ** -2, shape=(2, 1))
z = Beta('z', alpha=10, beta=5.5)
data = getInitPoints()
for i in xrange(m):
# only necessary parts are included in the loop
with model:
d = Normal('data', mu=x, tau=.75 ** -2, observed=data)
step = NUTS()
trace = sample(1000, step)
data = numpy.vstack((data,getnewPoint()))
or even better:
data = getInitPoints()
dataHandle = magicHandle(data)
with model:
x = Normal('x', mu=.5, tau=2. ** -2, shape=(2, 1))
z = Beta('z', alpha=10, beta=5.5)
d = Normal('data', mu=x, tau=.75 ** -2, observed=dataHandle)
step = NUTS()
for i in xrange(m):
with model:
trace = sample(1000, step)
dataHandle = numpy.vstack((data,getnewPoint()))

It seems that it's not possible right know. But there is an open issue on this topic with possible solutions here :


Error: `data` and `reference` should be factors with the same levels for imbalanced class

I Used SMOTE and Tomek methods for imbalanced classes that I have. I'm trying to do boosted regression tree.
It runs smoothly until I create the confusion matrix I have this error (
Error: data and reference should be factors with the same levels.
### SMOTE and Tomek
NOAA_SMOTE= read.csv("NOAA_SMOTE.csv", TRUE, ",")
train.index <- createDataPartition(NOAA_SMOTE$japon, p = .7, list = FALSE)
train <- NOAA_SMOTE[ train.index,]
test <- NOAA_SMOTE[-train.index,]
tomek = ubTomek(train[,-1], train[,1])
model_train_tomek = cbind(tomek$X,tomek$Y)
names(model_train_tomek)[1] = "japon"
removed.index = tomek$id.rm
train$japon = as.factor(train$japon)
train_tomek = train[-removed.index,]
## SMOTE after tomek links
traintomeksmote <- SMOTE(japon ~ ., train_tomek, perc.over = 2000,perc.under = 100)
fitControlSmoteTomek<- trainControl(## 10-fold CV
method = "repeatedcv",
number = 10,
repeats = 3,
## Estimate class probabilities
classProbs = TRUE,
## Evaluate performance using
## the following function
summaryFunction = twoClassSummary)
gbmGridSmoteTomek <- expand.grid(interaction.depth = c(3,4, 5, 6),
n.trees = (1:30)*50,
shrinkage = c(0.1,0.001,0.75,0.0001),
n.minobsinnode = 10)
gbmFitNOAASMOTETomek <- caret::train (make.names(japon) ~ ., data = traintomeksmote,
method = "gbm",
trControl = fitControlSmoteTomek,
distribution = "bernoulli",
verbose = FALSE,
tuneGrid = gbmGridSmoteTomek,
## Specify which metric to optimize
metric = "ROC")
test$japon = as.factor(test$japon)
PredNOAASMOTETomek <- predict(gbmFitNOAASMOTETomek, newdata= test ,type='prob')
cmSMOTETomekNOAA = confusionMatrix(PredNOAASMOTETomek , as.factor(test$japon), mode="everything")
part of the data
[enter image description here](

RuntimeError: Trying to backward through the graph a second time. Saved intermediate values of the graph are freed when you call .backward()

I am trying to train SRGAN from scratch. I have read solutions for this type of problem, but it would be great if someone could help me debug my code. The exact error is: "RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad()" Here is the snippet I am trying to train:
gen_model = Generator().to(device, non_blocking=True)
disc_model = Discriminator().to(device, non_blocking=True)
opt_gen = optim.Adam(gen_model.parameters(), lr=0.01)
opt_disc = optim.Adam(disc_model.parameters(), lr=0.01)
from torch.nn.modules.loss import BCELoss
def train_model(gen, disc):
for epoch in range(20):
run_loss_disc = 0
run_loss_gen = 0
for data in train:
low_res, high_res = data[0].to(device, non_blocking=True, dtype=torch.float).permute(0, 3, 1, 2),data[1].to(device, non_blocking=True, dtype=torch.float).permute(0, 3, 1, 2)
gen_image = gen(low_res)
gen_image = gen_image.detach()
disc_gen = disc(gen_image)
disc_real = disc(high_res)
loss_gen = p(disc_real, torch.ones_like(disc_real))
loss_real = p(disc_gen, torch.zeros_like(disc_gen))
loss_disc = loss_gen + loss_real
cont_loss = vgg_loss(high_res, gen_image)
adv_loss = 1e-3*p(disc_gen, torch.ones_like(disc_gen))
gen_loss = cont_loss+(10^-3)*adv_loss
print("Run Loss Discriminator: %d", run_loss_disc)
print("Run Loss Generator: %d", run_loss_gen)
train_model(gen_model, disc_model)
Apparently your disc_gen value was discarded by the first backward() call, as it says.
It should work if you change the discriminator part a bit:
gen_image = gen(low_res)
disc_gen = disc(gen_image.detach())
and add this at the start of the generator part:
disc_gen = disc(gen_image)

How do I extract x co-ordinate of a point using Python

I'm trying to build an NMF model for topic extraction. For re-training of the model, I've to pass a parameter to the nmf function, for which I need to pass the x co-ordinate from a given point that the algorithm returns, here is the code for reference:
no_features = 1000
no_topics = 9
print ('Old number of topics: ', no_topics)
tfidf_vectorizer = TfidfVectorizer(max_df = 0.95, min_df = 2, max_features = no_features, stop_words = 'english')
tfidf = tfidf_vectorizer.fit_transform(documents)
tfidf_feature_names = tfidf_vectorizer.get_feature_names()
no_topics = tfidf.shape
print('New number of topics :', no_topics)
# nmf = NMF(n_components = no_topics, random_state = 1, alpha = .1, l1_ratio = .5, init = 'nndsvd').fit(tfidf)
On the third last line, the tfidf.shape returns a point (3,1000) to the variable 'no_topics', however I want that variable to be set to only the x co-ordinate, i.e (3).
How can I extract just the x co-ordinate from the point?
you can select the first values with no_topics[0]
print('New number of topics : {}'.format(no_topics[0]))
You can do a slicing on your numpy array tfidf with
topics = tfidf[0,:]

Python3, scipy.optimize: Fit model to multiple datas sets

I have a model which is defined as:
m(x,z) = C1*x^2*sin(z)+C2*x^3*cos(z)
I have multiple data sets for different z (z=1, z=2, z=3), in which they give me m(x,z) as a function of x.
The parameters C1 and C2 have to be the same for all z values.
So I have to fit my model to the three data sets simultaneously otherwise I will have different values of C1 and C2 for different values of z.
It this possible to do with scipy.optimize.
I can do it for just one value of z, but can't figure out how to do it for all z's.
For one z I just write this:
def my_function(x,C1,C1):
return C1*x**2*np.sin(z)+ C2*x**3*np.cos(z)
data = 'some/path/for/data/z=1'
x= data[:,0]
y= data[:,1]
from lmfit import Model
gmodel = Model(my_function)
result =, x=x, C1=1.1)
How can I do it for multiple set of datas (i.e different z values?)
So what you want to do is fit a multi-dimensional fit (2-D in your case) to your data; that way for the entire data set you get a single set of C parameters that bests describes your data. I think the best way to do this is using scipy.optimize.curve_fit().
So your code would look something like this:
import scipy.optimize as optimize
import numpy as np
def my_function(xz, *par):
""" Here xz is a 2D array, so in the form [x, z] using your variables, and *par is an array of arguments (C1, C2) in your case """
x = xz[:,0]
z = xz[:,1]
return par[0] * x**2 * np.sin(z) + par[1] * x**3 * np.cos(z)
# generate fake data. You will presumable have this already
x = np.linspace(0, 10, 100)
z = np.linspace(0, 3, 100)
xx, zz = np.meshgrid(x, z)
xz = np.array([xx.flatten(), zz.flatten()]).T
fakeDataCoefficients = [4, 6.5]
fakeData = my_function(xz, *fakeDataCoefficients) + np.random.uniform(-0.5, 0.5, xx.size)
# Fit the fake data and return the set of coefficients that jointly fit the x and z
# points (and will hopefully be the same as the fakeDataCoefficients
popt, _ = optimize.curve_fit(my_function, xz, fakeData, p0=fakeDataCoefficients)
# Print the results
When I do this fit I get precisely the fakeDataCoefficients I used to generate the function, so the fit works well.
So the conclusion is that you don't do 3 fits independently, setting the value of z each time, but instead you do a 2D fit which takes the values of x and z simultaneously to find the best coefficients.
Your code is incomplete and has a few syntax errors.
But I think that you want to build a model that concatenates the models for the different data sets, and then fit the concatenated data to that model. Within the context of lmfit (disclosure: author and maintainer), I often find it easier to use minimize() and an objective function for multiple data set fits rather than the Model class. Perhaps start with something like this:
import lmfit
import numpy as np
# define the model function for each dataset
def my_function(x, c1, c2, z=1):
return C1*x**2*np.sin(z)+ C2*x**3*np.cos(z)
# Then write an objective function like this
def f2min(params, x, data2d, zlist):
ndata, npts = data2d.shape
residual = 0.0*data2d[:]
for i in range(ndata):
c1 = params['c1_%d' % (i+1)].value
c2 = params['c2_%d' % (i+1)].value
residual[i,:] = data[i,:] - my_function(x, c1, c2, z=zlist[i])
return residual.flatten()
# now build that `data2d`, `zlist` and build the `Parameters`
data2d = []
zlist = []
x = None
for fname in dataset_names:
d = np.loadtxt(fname) # or however you read / generate data
if x is None: x = d[:, 0]
data2d.append(d[:, 1])
zlist.append(z_for_dataset(fname)) # or however ...
data2d = np.array(data2d) # turn list into nd array
ndata, npts = data2d.shape
params = lmfit.Parameters()
for i in range(ndata):
params.add('c1_%d' % (i+1), value=1.0) # give a better starting value!
params.add('c2_%d' % (i+1), value=1.0) # give a better starting value!
# now you're ready to do the fit and print out the results:
result = lmfit.minimize(f2min, params, args=(x, data2d, zlist))
That code really a sketch and is all untested, but hopefully will give you a good starting foundation.

SparsePCA in sklearn not working properly?

First let me clarify that here "sparse PCA" means PCA with L1 penalty and sparse loadings, not PCA on sparse matrix.
I've read the paper on sparse PCA by Zou and Hastie, I've read the documentation on sklearn.decomposition.SparsePCA, and I know how to use PCA, but I can't seem to get the right result from SparsePCA.
Namely, when L1 penalty is 0, the result from SparsePCA is supposed to agree with PCA, but the loadings differ quite a lot. To make sure that I didn't mess up any hyperparameters, I used the same hyperparameters (convergence tolerance, maximum iterations, ridge penalty, lasso penalty...) in R with 'spca' from 'elasticnet', and R gave me the correct result. I'd rather not have to go through the source code of SparsePCA if anyone has experience using this function and could let me know if I made any mistakes.
Below is how I generated my dataset. It's a bit convoluted because I wanted a specific Markov Decision Process to test some reinforcement learning algorithms. Just treat it as some non-sparse dataset.
import numpy as np
from sklearn.decomposition import PCA, SparsePCA
import numpy.random as nr
def transform(data, TranType=None):
if TranType == 'quad':
data = np.minimum(np.square(data), 3)
if TranType == 'cubic':
data = np.maximum(np.minimum(np.power(data, 3), 3), -3)
if TranType == 'exp':
data = np.minimum(np.exp(data), 3)
if TranType == 'abslog':
data = np.minimum(np.log(abs(data)), 3)
return data
def NewStateGen(OldS, A, TranType, m=0, sd=0.5, nsd=0.1, dim=64):
# dim needs to be a multiple of 4, and preferably a multiple of 16.
assert (dim == len(OldS) and dim % 4 == 0)
TrueDim = dim / 4
NewS = np.zeros(dim)
# Generate new state according to action
if A == 0:
NewS[range(0, dim, 4)] = transform(OldS[0:TrueDim], TranType) + \
nr.normal(scale=nsd, size=TrueDim)
NewS[range(1, dim, 4)] = transform(OldS[0:TrueDim], TranType) + \
nr.normal(scale=nsd, size=TrueDim)
NewS[range(2, dim, 4)] = nr.normal(m, sd, size=TrueDim)
NewS[range(3, dim, 4)] = nr.normal(m, sd, size=TrueDim)
R = 2 * np.sum(transform(OldS[0:int(np.ceil(dim / 32.0))], TranType)) - \
np.sum(transform(OldS[int(np.ceil(dim / 32.0)):(dim / 16)], TranType)) + \
if A == 1:
NewS[range(0, dim, 4)] = nr.normal(m, sd, size=TrueDim)
NewS[range(1, dim, 4)] = nr.normal(m, sd, size=TrueDim)
NewS[range(2, dim, 4)] = transform(OldS[0:TrueDim], TranType) + \
nr.normal(scale=nsd, size=TrueDim)
NewS[range(3, dim, 4)] = transform(OldS[0:TrueDim], TranType) + \
nr.normal(scale=nsd, size=TrueDim)
R = 2 * np.sum(transform(OldS[int(np.floor(dim / 32.0)):(dim / 16)], TranType)) - \
np.sum(transform(OldS[0:int(np.floor(dim / 32.0))], TranType)) + \
return NewS, R
def MDPGen(dim=64, rep=1, n=30, T=100, m=0, sd=0.5, nsd=0.1, TranType=None):
X_all = np.zeros(shape=(rep*n*T, dim))
Y_all = np.zeros(shape=(rep*n*T, dim+1))
A_all = np.zeros(rep*n*T)
R_all = np.zeros(rep*n*T)
for j in xrange(rep*n):
# Data for a single subject
X = np.zeros(shape=(T+1, dim))
A = np.zeros(T)
R = np.zeros(T)
NewS = np.zeros(dim)
X[0] = nr.normal(m, sd, size=dim)
for i in xrange(T):
OldS = X[i]
# Pick a random action
A[i] = nr.randint(2)
# Generate new state according to action
X[i+1], R[i] = NewStateGen(OldS, A[i], TranType, m, sd, nsd, dim)
Y = np.concatenate((X[1:(T+1)], R.reshape(T, 1)), axis=1)
X = X[0:T]
X_all[(j*T):((j+1)*T)] = X
Y_all[(j*T):((j+1)*T)] = Y
A_all[(j*T):((j+1)*T)] = A
R_all[(j*T):((j+1)*T)] = R
return {'X': X_all, 'Y': Y_all, 'A': A_all, 'R': R_all, 'rep': rep, 'n': n, 'T': T}
MDP = MDPGen(dim=64, rep=1, n=30, T=90, sd=0.5, nsd=0.1, TranType=None)
X = MDP.get('X').astype(np.float32)
Now I run PCA and SparsePCA. When the lasso penalty, 'alpha', is 0, SparsePCA is supposed to give the same result as PCA, which is not the case. The other hyperparameters are set with the default values from elasticnet in R. If I use the default from SparsePCA the result will still be incorrect.
PCA_model = PCA(n_components=64)
Z = PCA_model.transform(X)
SPCA_model = SparsePCA(n_components=64, alpha=0, ridge_alpha=1e-6, max_iter=200, tol=1e-3)
SZ = SPCA_model.transform(X)
# Check the first 2 loadings from PCA and SPCA. They are supposed to agree.
print PCA_model.components_[0:2]
print SPCA_model.components_[0:2]
# Check the first 2 observations of transformed data. They are supposed to agree.
print Z[0:2]
print SZ[0:2]
When the lasso penalty is greater than 0, the result from SparsePCA is still quite different from what R gives me, and the latter is correct based on manual inspection and what I learned from the original paper. So, is SparsePCA broken, or did I miss anything?
As often: there are many different formulations & implementations.
sklearn is using a different implementation with different characteristics.
Let's have a look how they differ:
sklearn: (reference within user-guide)
Elasticnet: (Zou et. al. paper)
So it seems sklearn is at least doing something different in regards to the l2-norm based component (it's missing).
This is by design as this is the basic form within the area of dictionary-learning: (algorithm-paper linked by sklearn used for implementation).
It is quite possible, that this alternative formulation is not guaranteeing (or does not care at all) to emulate classic PCA when the sparsity-parameter is zero (which is not really surprising as these problems differ a lot in regards to optimization-theory and sparsePCA has to reside to some heuristic-based algorithm as the problem itself is NP-hard, ref). This idea is strengthened by the describing of the equivalence theorem here:
The answers aren't different. First, I thought it may be the solvers, but checking for different solvers, I get almost identical loadings. See this:
MDP = MDPGen(dim=16, rep=1, n=30, T=90, sd=0.5, nsd=0.1, TranType=None)
X = MDP.get('X').astype(np.float32)
PCA_model = PCA(n_components=10,svd_solver='auto',tol=1e-6)
SPCA_model = SparsePCA(n_components=10, alpha=0, ridge_alpha=0)
PC1 = PCA_model.components_[0]/np.linalg.norm(PCA_model.components_[0])
SPC1 = SPCA_model.components_[0].T/np.linalg.norm(SPCA_model.components_[0])
import pylab
