Which layers I need for modeling variational autoencoder? I want to predict X_test to detect anomalies in dataset which consist 2 float variables(money) and 40 binaries columns (one-hot encoded).
m = 50
n_z = 2
n_epoch = 10
# Q(z|X) -- encoder
inputs = Input(shape=(42,))
h_q = Dense(21, activation='relu')(inputs)
mu = Dense(n_z, activation='linear')(h_q)
log_sigma = Dense(n_z, activation='linear')(h_q)
def sample_z(args):
mu, log_sigma = args
eps = K.random_normal(shape=(m, n_z), mean=0., std=1.)
return mu + K.exp(log_sigma / 2) * eps
# Sample z ~ Q(z|X)
z = Lambda(sample_z)([mu, log_sigma])
# P(X|z) -- decoder
decoder_hidden = Dense(21, activation='relu')
decoder_out = Dense(42, activation='sigmoid')
h_p = decoder_hidden(z)
outputs = decoder_out(h_p)
# Overall VAE model, for reconstruction and training
vae = Model(inputs, outputs)
# Encoder model, to encode input into latent variable
# We use the mean as the output as it is the center point, the representative of the gaussian
encoder = Model(inputs, mu)
# Generator model, generate new data given latent variable z
d_in = Input(shape=(n_z,))
d_h = decoder_hidden(d_in)
d_out = decoder_out(d_h)
decoder = Model(d_in, d_out)
def vae_loss(y_true, y_pred):
""" Calculate loss = reconstruction loss + KL loss for each data in minibatch """
# E[log P(X|z)]
recon = K.sum(K.binary_crossentropy(y_pred, y_true), axis=1)
# D_KL(Q(z|X) || P(z|X)); calculate in closed form as both dist. are Gaussian
kl = 0.5 * K.sum(K.exp(log_sigma) + K.square(mu) - 1. - log_sigma, axis=1)
return recon + kl
vae.compile(optimizer='adam', loss=vae_loss), X_train, batch_size=m, nb_epoch=n_epoch)


How to get logit matrix from a customized CNN?

This is my model
engine1 = tf.keras.applications.Xception(
# Freezing the weights of the top layer in the InceptionResNetV2 pre-traiined model
include_top = False,
# Use Imagenet weights
weights = 'imagenet',
# Define input shape to 224x224x3
input_shape = (256, 256 , 3)
x1 = tf.keras.layers.GlobalAveragePooling2D(name = 'avg_pool')(engine1.output)
x1 =tf.keras.layers.Dropout(0.75)(x1)
x1 = tf.keras.layers.BatchNormalization(
out1 = tf.keras.layers.Dense(3, activation = 'softmax', name = 'dense_output')(x1)
# Build the Keras model
model1 = tf.keras.models.Model(inputs = engine1.input, outputs = out1)
# Compile the model
# Set optimizer to Adam(0.0001)
optimizer = tf.keras.optimizers.Adam(learning_rate= 3e-4),
#optimizer= SGD(lr=0.001, decay=1e-6, momentum=0.99, nesterov=True),
# Set loss to binary crossentropy
#loss = tf.keras.losses.SparseCategoricalCrossentropy(),
loss = 'categorical_crossentropy',
# Set metrics to accuracy
metrics = ['accuracy']
I want logits so I wrote this
logits = model1(X_test)
probs = tf.nn.softmax(logits)
Getting error as
ResourceExhaustedError: OOM when allocating tensor with shape[1288,64,125,125] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Conv2D]
How to fix this and get the logits? I want to apply the distillation method after getting the logits. My test set consists of 3 classes and 60 samples.
so logit matrix should be a matrix of 60 * 3.
To get the logits(1288 * 3) I made a change in the output layer of my model
out1 = tf.keras.layers.Dense(3, activation = 'linear', name = 'dense_output')(x1)
Now I am getting logits,
y_pred_logits = model1.predict(X_test)
I want to apply softmax on this, My softmax function looks like this,
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x)
return e_x / e_x.sum(axis=1)
But when I am doing this
y_pred_logits_activated = softmax(y_pred_logits)
Getting errors as
How to fix this and is this method correct? Further, I want to apply this on logits

Bias grad in linear regression remains small compared to weight grad, and intercept is not properly learnt

I have thrown together a dummy model to showcase linear regression in pytorch, but I find that my model is not properly learning. It's doing well when it comes to learning the slope, but the intercept is not really budging. Printing out the grads at every epoch tells me that, indeed, the grad is a lot smaller for the bias. Why is that? How can I remedy it, so the intercept is properly learnt?
This is what happens (a set to 0 to illustrate):
# Create some dummy data: we establish a linear relationship between x and y
a = np.random.rand()
b = np.random.rand()
x = np.linspace(start=0, stop=100, num=100)
y = a * x + b
# Now let's create some noisy measurements
noise = np.random.normal(size=100)
y_noisy = a * x + b + noise
# What's the overall error?
mse_actual = np.sum(np.power(y-y_noisy,2))/len(y)
# Visualize
plt.scatter(x,y_noisy, label='Measurements', alpha=.7)
plt.plot(x,y,'r', label='Underlying')
# Let's learn something!
inputs = torch.from_numpy(x).type(torch.FloatTensor).unsqueeze(1)
targets = torch.from_numpy(y_noisy).type(torch.FloatTensor).unsqueeze(1)
# This is our model (one hidden node + bias)
model = torch.nn.Linear(1,1)
optimizer = torch.optim.SGD(model.parameters(),lr=1e-5)
loss_function = torch.nn.MSELoss()
# What does it predict right now?
shuffled_inputs, preds = [], []
for input, target in zip(inputs,targets):
pred = model(input)
# Visualize
plt.scatter(x,y_noisy, color='blue', label='Measurements', alpha=.7)
plt.plot(shuffled_inputs, preds, color='orange', label='Predictions', alpha=.7)
plt.plot(x,y,'r', label='Underlying')
# Let's train!
epochs = 100
a_s, b_s = [], []
for epoch in range(epochs):
# Reset optimizer values
# Predict values using current model
preds = model(inputs)
# How far off are we?
loss = loss_function(targets,preds)
# Calculate the gradient
# Update model
for p in model.parameters():
print('Grads:', p.grad)
# New parameters
print(f"Epoch {epoch+1} -- loss = {loss}")
It's a bit of a non-answer, but just use more epochs or add more datapoints. When you have 100 datapoints with noise as significant as you had (if you just plot the initial data it becomes obvious) the model will struggle with MSE as a loss.
I can't see your image (work blocked imgur...) but I found it looked bad if you didn't adjust the axes on your matplotlib plot because it was so zoomed in on the x axis (when a=0), so I zoomed out of that too:
# Create some dummy data: we establish a linear relationship between x and y
a = np.random.rand()
b = np.random.rand()
N = 10000
x = np.linspace(start=0, stop=100, num=N)
y = a * x + b
# Now let's create some noisy measurements
noise = np.random.normal(size=N)*0.1
y_noisy = a * x + b + noise
# What's the overall error?
mse_actual = np.sum(np.power(y-y_noisy,2))/len(y)
# Visualize
plt.scatter(x,y_noisy, label='Measurements', alpha=.7)
plt.plot(x,y,'r', label='Underlying')
# Let's learn something!
inputs = torch.from_numpy(x).type(torch.FloatTensor).unsqueeze(1)
targets = torch.from_numpy(y_noisy).type(torch.FloatTensor).unsqueeze(1)
# This is our model (one hidden node + bias)
model = torch.nn.Linear(1,1)
optimizer = torch.optim.SGD(model.parameters(),lr=1e-5)
loss_function = torch.nn.MSELoss()
# Let's train!
epochs = 50000
a_s, b_s = [], []
for epoch in range(epochs):
# Reset optimizer values
# Predict values using current model
preds = model(inputs)
# How far off are we?
loss = loss_function(targets,preds)
# Calculate the gradient
# Update model
#for p in model.parameters():
# print('Grads:', p.grad)
# New parameters
print(f"Epoch {epoch+1} -- loss = {loss}")
# What does it predict right now?
shuffled_inputs, preds = [], []
for input, target in zip(inputs,targets):
pred = model(input)
plt.scatter(x,y_noisy, color='blue', label='Measurements', alpha=.7)
plt.plot(shuffled_inputs, preds, color='orange', label='Predictions', alpha=.7)
plt.plot(x,y,'r', label='Underlying')

Jointly optimizing autoencoder and fully connected network for classification

I have a large set of unlabeled data and a smaller set of labeled data. Thus, I would like to first train a variational autoencoder on the unlabeled data and then use the encoder for classification of three classes (with a fully connected layer attached) on the labeled data. For optimization of the hyperparameters I would like to use Optuna.
One possibility would be to first optimize the autoencoder and then optimize the fully connected network (classification) but then the autoencoder might learn an encoding which is meaningless for the classification.
Is there a possibility to jointly optimize the autoencoder and the fully connected network?
My autoencoder looks as follows (params is just a dictionary holding the params):
inputs = Input(shape=image_size, name='encoder_input')
x = inputs
for i in range(len(params["conv_filter_encoder"])):
x, _ = convolutional_unit(x, params["conv_filter_encoder"][i], params["conv_kernel_size_encoder"][i], params["strides_encoder"][i],
batchnorm=params["batchnorm"][i], dropout=params["dropout"][i], maxpool=params["maxpool"][i], deconv=False)
shape = K.int_shape(x)
x = Flatten()(x)
x = Dense(params["inner_dim"], activation='relu')(x)
z_mean = Dense(params["latent_dim"], name='z_mean')(x)
z_log_var = Dense(params["latent_dim"], name='z_log_var')(x)
# use reparameterization trick to push the sampling out as input
# note that "output_shape" isn't necessary with the TensorFlow backend
z = Lambda(sampling, output_shape=(params["latent_dim"],), name='z')([z_mean, z_log_var])
# instantiate encoder model
encoder = Model(inputs, [z_mean, z_log_var, z], name='encoder')
# build decoder model
latent_inputs = Input(shape=(params["latent_dim"],), name='z_sampling')
x = Dense(params["inner_dim"], activation='relu')(latent_inputs)
x = Dense(shape[1] * shape[2] * shape[3], activation='relu')(x)
x = Reshape((shape[1], shape[2], shape[3]))(x)
len_batchnorm = len(params["batchnorm"])
len_dropout = len(params["dropout"])
for i in range(len(params["conv_filter_decoder"])):
x, _ = convolutional_unit(x, params["conv_filter_decoder"][i], params["conv_kernel_size_decoder"][i], params["strides_decoder"][i],
batchnorm=params["batchnorm"][len_batchnorm-i-1], dropout=params["dropout"][len_dropout-i-1], maxpool=None, deconv=True, activity_regularizer=params["activity_regularizer"])
outputs = Conv2DTranspose(filters=1,
# instantiate decoder model
decoder = Model(latent_inputs, outputs, name='decoder')
# instantiate VAE model
outputs = decoder(encoder(inputs)[2])
vae = Model(inputs, outputs, name='vae')
vae.higgins_beta = K.variable(value=params["beta"])
loss = config["loss"].value
def vae_loss(x, x_decoded_mean):
"""VAE loss function"""
# VAE loss = mse_loss or xent_loss + kl_loss
if loss == Loss.mse.value:
reconstruction_loss = mse(K.flatten(x), K.flatten(x_decoded_mean))
elif loss == Loss.bce.value:
reconstruction_loss = binary_crossentropy(K.flatten(x),
raise ValueError("Loss unknown")
reconstruction_loss *= image_size[0] * image_size[1]
kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
kl_loss = K.sum(kl_loss, axis=-1)
# kl_loss *= -0.5
kl_loss *= -vae.higgins_beta
vae_loss = K.mean(reconstruction_loss + kl_loss)
return vae_loss
batch_size = params["batch_size"]
optimizer = keras.optimizers.Adam(lr=params["learning_rate"], beta_1=0.9, beta_2=0.999,
epsilon=1e-08, decay=params["learning_rate_decay"])
vae.compile(loss=vae_loss, optimizer=optimizer), train_X,
callbacks=get_callbacks(config.CONFIG, autoencoder_path, encoder, decoder, vae),
validation_data=(valid_X, valid_X))
My fully connected network attached to the encoder looks as follows:
latent = vae.predict(images)[0]
inputs = Input(shape=(input_shape,), name='fc_input')
den = inputs
for i in range(len(self.params["units"])):
den = Dense(self.params["units"][i])(den)
den = Activation('relu')(den)
out = Dense(self.num_classes, activation='softmax')(den)
model = Model(inputs, out, name='fcnn')
optimizer = keras.optimizers.Adam(["fcnn"]["learning_rate"], beta_1=0.9, beta_2=0.999,
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy']), y,
y_prob = model.predict(latent)

3-layer feedfoward neural network not predicting regression values accurately

I'm pretty new to Tensorflow. Currently, I'm doing a 3-layer network, with 10 neurons in the hidden layer with ReLU, mini-batch gradient descent size of 8, L2 regularisation weight decay parameter (beta) of 0.001. The Tensorflow version I'm using is 1.14 and I'm on Python 3.6.
The issue that boggles my mind is that my predicted values and testing errors are absolutely off the charts.
For example, I plotted out the test errors and the predicted vs target values for a sample size of 50, and this is what came out.
As you can see, both plots are way off, and I haven't had the slightest clue as to why.
Here's how the dataset roughly looks like. The first column is discarded as it is just a counter value, and the last column is the target.
My code:
num_neuron = 10
batch_size = 8
beta = 0.001
learning_rate = 0.001
epochs = 4000
seed = 10
# read and divide data into test and train sets
total_dataset= np.genfromtxt('dataset_excel.csv', delimiter=',')
X_data, Y_data = total_dataset[1:, 1:8], total_dataset[1:, -1]
Y_data = Y_data.reshape(Y_data.shape[0], 1)
# shuffle input, ensure both are shuffled with the same order
shufflestate = np.random.get_state()
# 70% used for training, 30% used for testing
trainX = X_data[:280]
trainY = Y_data[:280]
testX = X_data[280:]
testY = Y_data[280:]
trainX = (trainX - np.mean(trainX, axis=0)) / np.std(trainX, axis=0)
# Create the model
x = tf.placeholder(tf.float32, [None, NUM_FEATURES])
y_ = tf.placeholder(tf.float32, [None, 1])
# get 50 samples for plotting of predicted vs target values
limited50testX = testX[:50]
limited50testY = testY[:50]
# Hidden
with tf.name_scope('hidden'):
weight1 = tf.Variable(tf.truncated_normal([NUM_FEATURES, num_neuron],stddev=1.0,name='weight1'))
bias1 = tf.Variable(tf.zeros([num_neuron]),name='bias1')
hidden = tf.nn.relu(tf.matmul(x, weight1) + bias1)
# output
with tf.name_scope('linear'):
weight2 = tf.Variable(tf.truncated_normal([num_neuron, 1],stddev=1.0 / np.sqrt(float(num_neuron))),name='weight2')
bias2 = tf.Variable(tf.zeros([1]),name='bias2')
logits = tf.matmul(hidden, weight2) + bias2
ridgeLoss = tf.square(y_ - logits)
regularisation = tf.nn.l2_loss(weight1) + tf.nn.l2_loss(weight2)
loss = tf.reduce_mean(ridgeLoss + beta * regularisation)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(loss)
error = tf.reduce_mean(tf.square(y_ - logits))
N = len(trainX)
idx = np.arange(N)
with tf.Session() as sess:
train_err = []
test_err_ = []
for i in range(epochs):
for batchStart, batchEnd in zip(range(0, trainX.shape[0], batch_size),range(batch_size, trainX.shape[0], batch_size)):{x: trainX[batchStart:batchEnd], y_: trainY[batchStart:batchEnd]})
err = error.eval(feed_dict={x: trainX, y_: trainY})
if i % 100 == 0:
print('iter %d: train error %g' % (i, train_err[i]))
test_err = error.eval(feed_dict={x: testX, y_: testY})
predicted =, feed_dict={x:limited50testX})
print("predicted values: ", predicted)
print("size of predicted values is", len(predicted))
print("targets: ", limited50testY)
print("size of target values is", len(limited50testY))
#plot predictions vs targets
numberList=np.arange(0, 50, 1).tolist()
predplot = plt.figure(1)
plt.plot(numberList, predicted, label='Predictions')
plt.plot(numberList, limited50testY, label='Targets')
plt.xlabel('50 samples')
plt.legend(loc='lower right')
# plot training error
trainplot = plt.figure(2)
plt.plot(range(epochs), train_err)
plt.xlabel(str(epochs) + ' iterations')
plt.ylabel('Train Error')
#plot testing error
testplot = plt.figure(3)
plt.plot(range(epochs), test_err_)
plt.xlabel(str(epochs) + ' iterations')
plt.ylabel('Test Error')
Not sure if that's it, but trainX is normalized whereas testX is not. You might want to use the same normalization on testX before predicting.

LSTM Time-Series produces shifted forecast?

I am doing a time-series forecast with a LSTM NN and Keras. As input features there are two variables (precipitation and temperature) and the one target to be predicted is groundwater-level.
It seems to be working quite all right, though there is a serious offset between the actual data and the output (see image).
Now I've read that this is can be a classic sign of the network not working, as it seems to be mimicing the output and
what the model is actually doing is that when predicting the value at
time “t+1”, it simply uses the value at time “t” as its prediction
However, this is not actually possible in my case, as the target-values are not used as input variable. I am using a multi variate time-series with two features, independent of the output feature.
Also, the predicted values are not offset in future (t+1) but rather seem to lag behind (t-1).
Does anyone know what could cause this problem?
This is the complete code of my network:
# Split in Input and Output Data
x_1 = data[['MeanT']].values
x_2 = data[['Precip']].values
y = data[['Z_424A_6857']].values
# Scale Data
x = np.hstack([x_1, x_2])
scaler = MinMaxScaler(feature_range=(0, 1))
x = scaler.fit_transform(x)
scaler_out = MinMaxScaler(feature_range=(0, 1))
y = scaler_out.fit_transform(y)
# Reshape Data
x_1, x_2, y = H.create2feature_data(x_1, x_2, y, window)
train_size = int(len(x_1) * .8)
test_size = int(len(x_1)) # * .5
x_1 = np.expand_dims(x_1, 2) # 3D tensor with shape (batch_size, timesteps, input_dim) // (nr. of samples, nr. of timesteps, nr. of features)
x_2 = np.expand_dims(x_2, 2)
y = np.expand_dims(y, 1)
# Split Training Data
x_1_train = x_1[:train_size]
x_2_train = x_2[:train_size]
y_train = y[:train_size]
# Split Test Data
x_1_test = x_1[train_size:test_size]
x_2_test = x_2[train_size:test_size]
y_test = y[train_size:test_size]
# Define Model Input Sets
inputA = Input(shape=(window, 1))
inputB = Input(shape=(window, 1))
# Build Model Branch 1
branch_1 = layers.GRU(16, activation=act, dropout=0, return_sequences=False, stateful=False, batch_input_shape=(batch, 30, 1))(inputA)
branch_1 = layers.Dense(8, activation=act)(branch_1)
#branch_1 = layers.Dropout(0.2)(branch_1)
branch_1 = Model(inputs=inputA, outputs=branch_1)
# Build Model Branch 2
branch_2 = layers.GRU(16, activation=act, dropout=0, return_sequences=False, stateful=False, batch_input_shape=(batch, 30, 1))(inputB)
branch_2 = layers.Dense(8, activation=act)(branch_2)
#branch_2 = layers.Dropout(0.2)(branch_2)
branch_2 = Model(inputs=inputB, outputs=branch_2)
# Combine Model Branches
combined = layers.concatenate([branch_1.output, branch_2.output])
# apply a FC layer and then a regression prediction on the combined outputs
comb = layers.Dense(6, activation=act)(combined)
comb = layers.Dense(1, activation="linear")(comb)
# Accept the inputs of the two branches and then output a single value
model = Model(inputs=[branch_1.input, branch_2.input], outputs=comb)
model.compile(loss='mse', optimizer='adam', metrics=['mse', H.r2_score])
# Training[x_1_train, x_2_train], y_train, epochs=epoch, batch_size=batch, validation_split=0.2, callbacks=[tensorboard])
# Evaluation
print('Train evaluation')
print(model.evaluate([x_1_train, x_2_train], y_train))
print('Test evaluation')
print(model.evaluate([x_1_test, x_2_test], y_test))
# Predictions
predictions_train = model.predict([x_1_train, x_2_train])
predictions_test = model.predict([x_1_test, x_2_test])
predictions_train = np.reshape(predictions_train, (-1,1))
predictions_test = np.reshape(predictions_test, (-1,1))
# Reverse Scaling
predictions_train = scaler_out.inverse_transform(predictions_train)
predictions_test = scaler_out.inverse_transform(predictions_test)
# Plot results
plt.figure(figsize=(15, 6))
plt.plot(orig_data, color='blue', label='True GWL')
plt.plot(range(train_size), predictions_train, color='red', label='Predicted GWL (Training)')
plt.plot(range(train_size, test_size), predictions_test, color='green', label='Predicted GWL (Test)')
plt.title('GWL Prediction')
I am using a batch size of 30 timesteps, a lookback of 90 timesteps, with a total data size of around 7500 time steps.
Any help would be greatly appreciated :-) Thank you!
Probably my answer is not relevant two years later, but I had a similar issue when experimenting with LSTM encoder-decoder model. I solved my problem by scaling the input data in the range -1 .. 1 instead of 0 .. 1 as in your example.
