I am pretty new to model predictive controls modeling with Gekko and in general.
I have created an ARX MPC in Gekko, which is working great. I, however, noticed that in the first 50-80 iterations, the results are well.. disappointing. However, after the first iterations, I get good results (I guess the ARX algorithm is at play here or possible BIAS?). Now my problem is that the model might crash after some time, and I have to redo the 50-80 iteration to get good results again, is there a way to "save" the last calculated model and use that when rebooting the calculations?

The issue that you are likely encountering is that the "prior" values have not yet been initialized. Try solving once with a steady-state initialization as shown in the example MPC application with the TCLab that is the final source block on for TCLab F.
You can then switch to control or simulation mode:
# set up MPC
m.options.IMODE = 6 # MPC
Background information on using ARX models
Identification of the ARX model and prediction or control with the ARX model are two separate applications.
Identify ARX Model
The m.sysid() function to identify an ARX model does not save an archive but does return the model as output arguments:
yp,p,K = m.sysid(t,u,y,na,nb,pred='meas')
The model is returned as p.
# see
from gekko import GEKKO
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# load data and parse into columns
url = ''
data = pd.read_csv(url)
t = data['Time']
u = data['H1']
y = data['T1']
m = GEKKO(remote=False)
# system identification
na = 2 # output coefficients
nb = 2 # input coefficients
yp,p,K = m.sysid(t,u,y,na,nb,pred='meas')
plt.ylabel('Temperature (°C)')
plt.xlabel('Time (sec)')
Predict with ARX Model
Below is an example of prediction with the ARX model.
import numpy as np
from gekko import GEKKO
import matplotlib.pyplot as plt
na = 2 # Number of A coefficients
nb = 1 # Number of B coefficients
ny = 2 # Number of outputs
nu = 2 # Number of inputs
# A (na x ny)
A = np.array([[0.36788,0.36788],\
# B (ny x (nb x nu))
B1 = np.array([0.63212,0.18964]).T
B2 = np.array([0.31606,1.26420]).T
B = np.array([[B1],[B2]])
C = np.array([0,0])
# create parameter dictionary
# parameter dictionary p['a'], p['b'], p['c']
# a (coefficients for a polynomial, na x ny)
# b (coefficients for b polynomial, ny x (nb x nu))
# c (coefficients for output bias, ny)
p = {'a':A,'b':B,'c':C}
# Create GEKKO model
m = GEKKO(remote=False)
# Build GEKKO ARX model
y,u = m.arx(p)
# load inputs
tf = 20 # final time
u1 = np.zeros(tf+1)
u2 = u1.copy()
u1[5:] = 3.0
u2[10:] = 5.0
u[0].value = u1
u[1].value = u2
# customize names
mv1 = u[0]
mv2 = u[1]
cv1 = y[0]
cv2 = y[1]
# options
m.time = np.linspace(0,tf,tf+1)
m.options.imode = 4
m.options.nodes = 2
# simulate
plt.xlabel('Time (sec)')
The model is saved in the m.path folder that can be viewed with m.open_folder(). Set m = GEKKO(remote=False) to calculate locally and observe all of the files that are used to generate the model and the solution.


Predicting classes in MNIST dataset with a Gaussian- the same prediction errors with different paramemters?

I am trying to find the best c parameter following the instructions to a task that asks me to ' Define a function, fit_generative_model, that takes as input a training set (train_data, train_labels) and fits a Gaussian generative model to it. It should return the parameters of this generative model; for each label j = 0,1,...,9, where
pi[j]: the frequency of that label
mu[j]: the 784-dimensional mean vector
sigma[j]: the 784x784 covariance matrix
It is important to regularize these matrices. The standard way of doing this is to add cI to them, where c is some constant and I is the 784-dimensional identity matrix. c is now a parameter, and by setting it appropriately, we can improve the performance of the model.
%matplotlib inline
import sys
import matplotlib.pyplot as plt
import gzip, os
import numpy as np
from scipy.stats import multivariate_normal
if sys.version_info[0] == 2:
from urllib import urlretrieve
from urllib.request import urlretrieve
# Downloads the dataset
def download(filename, source=''):
print("Downloading %s" % filename)
urlretrieve(source + filename, filename)
# Invokes download() if necessary, then reads in images
def load_mnist_images(filename):
if not os.path.exists(filename):
with, 'rb') as f:
data = np.frombuffer(, np.uint8, offset=16)
data = data.reshape(-1,784)
return data
def load_mnist_labels(filename):
if not os.path.exists(filename):
with, 'rb') as f:
data = np.frombuffer(, np.uint8, offset=8)
return data
## Load the training set
train_data = load_mnist_images('train-images-idx3-ubyte.gz')
train_labels = load_mnist_labels('train-labels-idx1-ubyte.gz')
## Load the testing set
test_data = load_mnist_images('t10k-images-idx3-ubyte.gz')
test_labels = load_mnist_labels('t10k-labels-idx1-ubyte.gz')
train_data.shape, train_labels.shape
So I have written this code for three different C-values. they each give me the same error?
def fit_generative_model(x,y):
for c in [20,200, 4000]:
k = 10 # labels 0,1,...,k-1
d = (x.shape)[1] # number of features
mu = np.zeros((k,d))
sigma = np.zeros((k,d,d))
pi = np.zeros(k)
for label in range(0,k):
indices = (y == label)
mu[label] = np.mean(x[indices,:], axis=0)
sigma[label] = np.cov(x[indices,:], rowvar=0, bias=1) + c*np.identity(784) # I define the identity matrix
predictions = np.argmax(score, axis=1)
errors = np.sum(predictions != y)
print(c,"Model makes " + str(errors) + " errors out of 10000", lst)
Then I fit it to the training data and get these same errors:
mu, sigma, pi = fit_generative_model(train_data, train_labels)
20 Model makes 1 errors out of 10000 [1]
200 Model makes 1 errors out of 10000 [1, 1]
4000 Model makes 1 errors out of 10000 [1, 1, 1]
and to the test data:
mu, sigma, pi = fit_generative_model(test_data, test_labels)
20 Model makes 9020 errors out of 10000 [9020]
200 Model makes 9020 errors out of 10000 [9020, 9020]
4000 Model makes 9020 errors out of 10000 [9020, 9020, 9020]
What is it I'm doing wrong? the correct answer is c=4000 which yields an error of ~4.3%.

How to run a proper Bayesian Logistic Regression

I'm trying to run a bayesian logistic regression on the wine dataset provided from the sklearn package. As variables, I decided to use alcohol, color_intensity, flavanoids, hue and magnesium where alcohol is my response variable and the rest the predictors. To do so, I'm using pyro and torch packages:
import pyro
import torch
import pyro.distributions as dist
import pyro.optim as optim
from pyro.infer import SVI, Trace_ELBO
import pandas as pd
import numpy as np
from pyro.infer import Predictive
import torch.distributions.constraints as constraints
from sklearn import datasets
#loading data and prepearing dataframe
wine = datasets.load_wine()
data = pd.DataFrame(columns = wine['feature_names'], data=wine['data'] )
#choosiing variables: response and predictors
variables = data[['alcohol', 'color_intensity', 'flavanoids', 'hue', 'magnesium']]
variables = (variables-variables.min())/(variables.max()-variables.min())
alcohol = torch.tensor(variables['alcohol'].values, dtype=torch.float)
predictors = torch.stack([torch.tensor(variables[column].values, dtype=torch.float)
for column in ['alcohol', 'color_intensity', 'flavanoids', 'hue', 'magnesium']], 1)
#splitting data
k = int(0.8 * len(variables))
x_train, y_train = predictors[:k], alcohol[:k]
x_test, y_test = predictors[k:], alcohol[k:]
def model_alcohol(predictors, alcohol):
n_observations, n_predictors = predictors.shape
w = pyro.sample('w', dist.Normal(torch.zeros(n_predictors), torch.ones(n_predictors)))
epsilon = pyro.sample('epsilon', dist.Normal(0.,1.))
y_hat = torch.sigmoid((w*predictors).sum(dim=1) + epsilon)
sigma = pyro.sample("sigma", dist.Uniform(0.,3.))
with pyro.plate('alcohol', len(alcohol)):
y=pyro.sample('y', dist.Normal(y_hat, sigma), obs=alcohol)
def guide_alcohol(predictors, alcohol=None):
n_observations, n_predictors = predictors.shape
w_loc = pyro.param('w_loc', torch.rand(n_predictors))
w_scale = pyro.param('w_scale', torch.rand(n_predictors), constraint=constraints.positive)
w = pyro.sample('w', dist.Normal(w_loc, w_scale))
epsilon_loc = pyro.param('b_loc', torch.rand(1))
epsilon_scale = pyro.param('b_scale', torch.rand(1), constraint=constraints.positive)
epsilon = pyro.sample('epsilon', dist.Normal(epsilon_loc, epsilon_scale))
sigma_loc = pyro.param('sigma_loc', torch.rand(n_predictors))
sigma_scale = pyro.param('sigma_scale', torch.rand(n_predictors),
sigma = pyro.sample('sigma', dist.Normal(sigma_loc, sigma_scale))
alcohol_svi = SVI(model=model_alcohol, guide=guide_alcohol, optim=optim.ClippedAdam({'lr' : 0.0002}),
losses = []
for step in range(10000):
loss = alcohol_svi.step(x_train, y_train)/len(x_train)
As I have to use Stochastic Variational Inference, I have defined both the model and the guide. My problem is now at matching tensor sizes, as I now I get the error:
RuntimeError: The size of tensor a (142) must match the size of tensor b (5) at non-singleton
dimension 0
Trace Shapes:
Param Sites:
Sample Sites:
w dist 5 |
value 5 |
epsilon dist |
value 1 |
sigma dist |
value 5 |
alcohol dist |
value 142 |
I'm kinda new to the idea of modelling on my own, so clearly there are mistakes around the code (hopefully not on the theory behind it). Still, I see I should adjust dimension on the guide maybe? I'm not entirely sure on how to honestly.
Your main problem is that w is not declared as a single event (.to_event(1)), and your variance (sigma) should have the same dim as your observations (()). The model and guide below fix this; I suggest you look at auto-generated guides in Pyro, and a different prior on sigma.
def model_alcohol(predictors, alcohol):
n_observations, n_predictors = predictors.shape
# weights
# w is a single event
w = pyro.sample('w', dist.Normal(torch.zeros(n_predictors), torch.ones(n_predictors)).to_event(1))
epsilon = pyro.sample('epsilon', dist.Normal(0., 1.))
# non-linearity
y_hat = torch.sigmoid(predictors # w + epsilon) # (predictors * weight).sum(1) == predictors # w
sigma = pyro.sample("sigma", dist.Uniform(0., 3.))
with pyro.plate('alcohol', len(alcohol)):
pyro.sample('y', dist.Normal(y_hat, sigma), obs=alcohol)
def guide_alcohol(predictors, alcohol=None):
n_observations, n_predictors = predictors.shape
w_loc = pyro.param('w_loc', torch.rand(n_predictors))
w_scale = pyro.param('w_scale', torch.rand(n_predictors), constraint=constraints.positive)
pyro.sample('w', dist.Normal(w_loc, w_scale).to_event(1))
epsilon_loc = pyro.param('b_loc', torch.rand(1))
epsilon_scale = pyro.param('b_scale', torch.rand(1), constraint=constraints.positive)
epsilon = pyro.sample('epsilon', dist.Normal(epsilon_loc, epsilon_scale))
sigma_loc = pyro.param('sigma_loc', torch.rand(1))
sigma_scale = pyro.param('sigma_scale', torch.rand(1),
pyro.sample('sigma', dist.HalfNormal(sigma_loc, sigma_scale)) # MUST BE POSITIVE

Model selection & Selecting the number of active components in Bayesian Gaussian Mixture Models

I have generated 2 groups of 1-D data points which are visually clearly separable and I want to use a Bayesian Gaussian Mixture Model (BGMM) to ideally recover 2 clusters.
Since BGMMs maximize a lower bound on the model evidence (ELBO) and given that the ELBO is supposed to combine notions of accuracy and complexity, I would expect more complex models to be penalized.
However, when running Grid Search over the number of clusters, I often get a solution with more than 2 clusters. More specifically, I often get the maximal number of clusters on my grid search. In the example below, I would expect the best model to define 2 clusters. Instead, the best models defines 4 but assigns minimal weights to 2 out of 4 clusters.
I am really surprised, since 2 out of 4 clusters are therefore adding little information and this more complex model still gets selected as the best model.
Why is the BGMM then picking 4 clusters for the best model?
If this is indeed the behavior a BGMM should show, how can I then assess how many active components I actually have in my model? Visually? By defining an arbitrary threshold on the weights?
I have added the code to reproduce my example below.
# Import statements
import itertools
import multiprocessing
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
from joblib import Parallel, delayed
from sklearn.mixture import BayesianGaussianMixture
from sklearn.utils import shuffle
def fitmodel(x, params):
Instantiates and fits Bayesian GMM
Used in the parallel for loop
# Gaussian mixture model
clf = BayesianGaussianMixture(**params)
# Fit
clf =, y=None)
return clf
def plot_results(X, means, covariances, title):
plt.plot(X, np.random.uniform(low=0, high=1, size=len(X)),'o', alpha=0.1, color='cornflowerblue', label='data points')
for i, (mean, covar) in enumerate(zip(
means, covariances)):
# Get normal PDF
n_sd = 2.5
x = np.linspace(mean - n_sd*covar, mean + n_sd*covar, 300)
x = x.ravel()
y = stats.norm.pdf(x, mean, covar).ravel()
if i == 0:
label = 'Component PDF'
label = None
plt.plot(x, y, color='darkorange', label=label)
# Generate data
g1 = np.random.uniform(low=-1.5, high=-1, size=(1,100))
g2 = np.random.uniform(low=1.5, high=1, size=(1,100))
X = np.append(g1, g2)
# Shuffle data
X = shuffle(X)
X = X.reshape(-1, 1)
# Define parameters for grid search
parameters = {
'n_components': [1, 2, 3, 4],
# Create permutations of parameter settings
keys, values = zip(*parameters.items())
param_grid = [dict(zip(keys, v)) for v in itertools.product(*values)]
# Run GridSearch using parallel for loop
list_clf = [None] * len(param_grid)
num_cores = multiprocessing.cpu_count()
list_clf = Parallel(n_jobs=num_cores)(delayed(fitmodel)(X, params) for params in param_grid)
# Print best model (based on lower bound on model evidence)
lower_bounds = [x.lower_bound_ for x in list_clf] # Extract lower bounds on model evidence
idx = int(np.where(lower_bounds == np.max(lower_bounds))[0]) # Find best model
best_estimator = list_clf[idx]
print(f'Parameter setting of best model: {param_grid[idx]}')
print(f'Components weights: {best_estimator.weights_}')
# Plot data points and gaussian components
ax = plt.subplot(2, 1, 1)
if best_estimator.weight_concentration_prior_type == 'dirichlet_process':
prior_label = 'Dirichlet process'
elif best_estimator.weight_concentration_prior_type == 'dirichlet_distribution':
prior_label = 'Dirichlet distribution'
plot_results(X, best_estimator.means_, best_estimator.covariances_,
f'Best Bayesian GMM | {prior_label} prior')
# Plot histogram of weights
ax = plt.subplot(2, 1, 2)
for k, w in enumerate(best_estimator.weights_):, w,
plt.text(k, w + 0.01, "%.1f%%" % (w * 100.),
ax.yaxis.grid(True, alpha=0.7)
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=0.4)
plt.ylabel('Component weight')
plt.ylim(0, np.max(best_estimator.weights_)+0.25*np.max(best_estimator.weights_))

Tensorflow Extracting Classification Predictions

I've a tensorflow NN model for classification of one-hot-encoded group labels (groups are exclusive), which ends with (layerActivs[-1] are the activations of the final layer):
probs =[-1]),...)
classes =
preds =
The tf.round is included to force any low probabilities to 0. If all probabilities are below 50% for an observation, this means that no class will be predicted. I.e., if there are 4 classes, we could have probs[0,:] = [0.2,0,0,0.4], so classes[0,:] = [0,0,0,0]; preds[0] = 0 follows.
Obviously this is ambiguous, as it is the same result that would occur if we had probs[1,:]=[.9,0,.1,0] -> classes[1,:] = [1,0,0,0] -> 1 preds[1] = 0. This is a problem when using the tensorflow builtin metrics class, as the functions can't distinguish between no prediction, and prediction in class 0. This is demonstrated by this code:
import numpy as np
import tensorflow as tf
import pandas as pd
''' prepare '''
classes = 6
n = 100
# simulate data
simY = np.random.randint(0,classes,n) # pretend actual data
simYhat = np.random.randint(0,classes,n) # pretend pred data
truth = np.sum(simY == simYhat)/n
tabulate = pd.Series(simY).value_counts()
# create placeholders
lab = tf.placeholder(shape=simY.shape, dtype=tf.int32)
prd = tf.placeholder(shape=simY.shape, dtype=tf.int32)
AM_lab = tf.placeholder(shape=simY.shape,dtype=tf.int32)
AM_prd = tf.placeholder(shape=simY.shape,dtype=tf.int32)
# create one-hot encoding objects
simYOH = tf.one_hot(lab,classes)
# create accuracy objects
acc = tf.metrics.accuracy(lab,prd) # real accuracy with tf.metrics
accOHAM = tf.metrics.accuracy(AM_lab,AM_prd) # OHE argmaxed to labels - expected to be correct
# now setup to pretend we ran a model & generated OHE predictions all unclassed
z = np.zeros(shape=(n,classes),dtype=float)
testPred = tf.constant(z)
''' run it all '''
# setup
sess = tf.Session()[tf.global_variables_initializer(),tf.local_variables_initializer()])
# real accuracy with tf.metrics
ACC =,feed_dict = {lab:simY,prd:simYhat})
# OHE argmaxed to labels - expected to be correct, but is it?
l,p =[simYOH,testPred],feed_dict={lab:simY})
p = np.argmax(p,axis=-1)
ACCOHAM =,feed_dict={AM_lab:simY,AM_prd:p})
''' print stuff '''
print('-known truth: %0.4f'%truth)
print('-on unprocessed data: %0.4f'%ACC[1])
print('-on faked unclassed labels data (s.b. 0%%): %0.4f'%ACCOHAM[1])
print('----------\nTrue Class Freqs:\n%r'%(tabulate.sort_index()/n))
which has the output:
-known truth: 0.1500
-on unprocessed data: 0.1500
-on faked unclassed labels data (s.b. 0%): 0.1100
True Class Freqs:
0 0.11
1 0.19
2 0.11
3 0.25
4 0.17
5 0.17
dtype: float64
Note freq for class 0 is same as faked accuracy...
I experimented with setting a value of preds to np.nan for observations with no predictions, but tf.metrics.accuracy throws ValueError: cannot convert float NaN to integer; also tried np.inf but got OverflowError: cannot convert float infinity to integer.
How can I convert the rounded probabilities to class predictions, but appropriately handle unpredicted observations?
This has gone long enough without an answer, so I'll post here as the answer my solution. I convert belonging probabilities to class predictions with a new function that has 3 main steps:
set any NaN probabilities to 0
set any probabilities below 1/num_classes to 0
use np.argmax() to extract predicted classes, then set any unclassed observations to a uniformly selected class
The resultant vector of integer class labels can be passed to the tf.metrics functions. My function below:
def predFromProb(classProbs):
Take in as input an (m x p) matrix of m observations' class probabilities in
p classes and return an m-length vector of integer class labels (0...p-1).
Probabilities at or below 1/p are set to 0, as are NaNs; any unclassed
observations are randomly assigned to a class.
numClasses = classProbs.shape[1]
# zero out class probs that are at or below chance, or NaN
probs = classProbs.copy()
probs[np.isnan(probs)] = 0
probs = probs*(probs > 1/numClasses)
# find any un-classed observations
unpred = ~np.any(probs,axis=1)
# get the predicted classes
preds = np.argmax(probs,axis=1)
# randomly classify un-classed observations
rnds = np.random.randint(0,numClasses,np.sum(unpred))
preds[unpred] = rnds
return preds

Adding gaussian noise to a dataset of floating points and save it (python)

I'm working on classification problem where i need to add different levels of gaussian noise to my dataset and do classification experiments until my ML algorithms can't classify the dataset.
unfortunately i have no idea how to do that. any advise or coding tips on how to add the gaussian noise?
You can follow these steps:
Load the data into a pandas dataframe clean_signal = pd.read_csv("data_file_name")
Use numpy to generate Gaussian noise with the same dimension as the dataset.
Add gaussian noise to the clean signal with signal = clean_signal + noise
Here's a reproducible example:
import pandas as pd
# create a sample dataset with dimension (2,2)
# in your case you need to replace this with
# clean_signal = pd.read_csv("your_data.csv")
clean_signal = pd.DataFrame([[1,2],[3,4]], columns=list('AB'), dtype=float)
print output:
0 1.0 2.0
1 3.0 4.0
import numpy as np
mu, sigma = 0, 0.1
# creating a noise with the same dimension as the dataset (2,2)
noise = np.random.normal(mu, sigma, [2,2])
print output:
array([[-0.11114313, 0.25927152],
[ 0.06701506, -0.09364186]])
signal = clean_signal + noise
print output:
0 0.888857 2.259272
1 3.067015 3.906358
Overall code without the comments and print statements:
import pandas as pd
# clean_signal = pd.read_csv("your_data.csv")
clean_signal = pd.DataFrame([[1,2],[3,4]], columns=list('AB'), dtype=float)
import numpy as np
mu, sigma = 0, 0.1
noise = np.random.normal(mu, sigma, [2,2])
signal = clean_signal + noise
To save the file back to csv
signal.to_csv("output_filename.csv", index=False)
