Why do I get the same prediction for all training samples? - keras

I have a neural network with num_labels separate outputs where each output consists of a softmax layer with two nodes (Yes/No).
I am taking the output of a convolution_layer and feed it as input for a simple softmax_layer which I further feed into each of said outputs:
softmax_layer = Dense(num_labels, activation='softmax', name='softmax_layer')(convolution_layer)
outputs = list()
for i in range(num_labels):
out_y = Dense(2, activation='softmax', name='out_{:d}'.format(i))(softmax_layer)
outputs.append(out_y)
So far I was able to train the model by providing a list of training samples but now I noticed that I am getting the exact same output for completely different samples in a batch:
Please note: Here, each column consists of (2,1) arrays. Each column is the prediction for one sample.
I've checked the samples, they are different. I've also tried to e.g. feed the convolution_layer into the outputs. In that case the predictions are different. I can only see this outcome if I do it the way shown above.
I could live with the fact that the outputs are "similar". In that case I'd think that the network is just learning not what I want it to learn but since they are really the same I am not quite sure what the problem here is.
I've tried something similar with a simple feed forward network:
class FeedForward:
def __init__(self, input_dim, nb_classes):
in_x = Input(shape=(input_dim, ), name='in_x')
h1 = Dense(14, name='h1', activation='relu')(in_x)
h2 = Dense(8, name='h2', activation='relu')(h1)
out = Dense(nb_classes, name='out', activation='softmax')(h2)
self.model = Model(input=[in_x], output=[out])
def compile_model(self, optimizer='adam', loss='binary_crossentropy'):
self.model.compile(optimizer=optimizer, loss=loss, metrics=["accuracy"])
But it behaves similarly. I can't imagine it's due to imbalanced data. There are 13 classes. There is some imbalance but it's not like that one class has 90% of the mass.
Am I doing this right?

Related

tensorflow, desing of neural network for optimization of photonic structure

I just started learning about neural networks, and I found that tensorflow (keras) seems to be reasonable tool for this purpose. I'd like to know how to start configuring the neural network for the purpose of optimizing photonic strucutres. Basically, I have a lot of numerical results that relates a certain geometry of the structure and a resulting spectra, e.g. every geometry is defined by 3 numbers: radius, gratind period and grating position, and every spectra is an array of 200 numbers in range (0:1). I have ~1000 of such results. Now I'd like to define the neural network using all spectra as inputs and geometry parameters that I used as outputs, so the input should be a matrix of (1000,200) size and output should be a matrix of (1000,3) size. Then, I hope that a learned network can take as an input a desired spectra to find or predict geometry of corresponding photonic strucutre. Is it doable with tensorflow/keras?
My first guess would be sth like this:
inputs = tf.keras.layers.Input(shape=(1000,200, ),batch_size=1)
x = tf.keras.layers.Dense(300, "relu")(inputs)
x = tf.keras.layers.Dense(300, "relu")(x)
outputs = tf.keras.layers.Dense(3, "relu")(x)
model = tf.keras.models.Model(inputs=inputs, outputs=outputs, name="my_model_name")
however, to be honest, I'm confused with all answers I already found about using multiple inputs of many dimmensions and multiple outputs in the network, so I'd appreciate any help.
#################
I think I found a code which works, however I'm not sure if it is optimum version. Could anyone verify this? I use as input data with shape (600, 201) and (600, 3) to train the network
def get_model(n_inputs, n_outputs):
model = Sequential()
model.add(Dense(200, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
model.add(Dense(200, activation='relu'))
model.add(Dense(n_outputs, kernel_initializer='he_uniform'))
model.compile(loss='mae', optimizer='adam')
return model
model = get_model(n_inputs, n_outputs)
history=model.fit(yyy,xxx, verbose=0 ,epochs=100)
yhat = model.predict(yyy2)
print('Predicted: %s' % yhat)

how effective is transfer learning? keeping only two specific output features without resetting features

I want to keep only two specific output features without resetting features.
Resetting features would lose the pre-trained weights.
For example, I don't want to do...
# https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html?highlight=transfer%20learning%20ant%20bees
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 2)
Here is code (following the transfer learning tutorial on Pytorch)
I want to do this to see how effective transfer learning is.
Even without transfer learning, a model might be effective. Removing 998 out of 1000 categories and leaving only two categories, ant and bee, could be a great categorical model since you are left with only two choices.
I do not want to re-train the model, I want to use the weights as it is, otherwise, it will be the same as transfer learning.
You can certainly try this. You can reduce the model output to just the two logits you want to compare with:
chosen_cats = torch.Tensor([ant_index, bee_index]).long()
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
outputs = torch.index_select(output, 1, chosen_cats)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
In this scenario, the preds will be 0 or 1, with 0 predicting ant and 1 predicting bee, so you will need to also modify your labels to reflect this.

Tensorflow 1.15 / Keras 2.3.1 Model.train_on_batch() returns more values than there are outputs/loss functions

I am trying to train a model that has more than one output and as a result, also has more than one loss function attached to it when I compile it.
I haven't done something similar in the past (not from scratch at least).
Here's some code I am using to figure out how this works.
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model
batch_size = 50
input_size = 10
i = Input(shape=(input_size,))
x = Dense(100)(i)
x_1 = Dense(output_size)(x)
x_2 = Dense(output_size)(x)
model = Model(i, [x_1, x_2])
model.compile(optimizer = 'adam', loss = ["mse", "mse"])
# Data creation
x = np.random.random_sample([batch_size, input_size]).astype('float32')
y = np.random.random_sample([batch_size, output_size]).astype('float32')
loss = model.train_on_batch(x, [y,y])
print(loss) # sample output [0.8311912, 0.3519104, 0.47928077]
I would expect the variable loss to have two entries (one for each loss function), however, I get back three. I thought maybe one of them is the weighted average but that does not look to be the case.
Could anyone explain how passing in multiple loss functions works, because obviously, I am misunderstanding something.
I believe the three outputs are the sum of all the losses, followed by the individual losses on each output.
For example, if you look at the sample output you've printed there:
0.3519104 + 0.47928077 = 0.83119117 ≈ 0.8311912
Your assumption that there should be two losses in incorrect. You have a model with two outputs, and you specified one loss for each output, but the model has to be trained on a single loss, so Keras trains the model on a new loss that is the sum of the per-output losses.
You can control how these losses are mixed using the loss_weights parameter in model.compile. I think by default it takes weights values equal to 1.0.
So in the end what train_on_batch returns is the loss, output one mse, and output two mse. That is why you get three values.

Is it possible to predict a certain numerical value given a DNA sequence using LSTM?

I have 16 letters of DNA sequence. From this 16-letter DNA sequence, there is an output value so called 'Inhibition value' which ranges from 0 to 100. When I tried using LSTM, the prediction only output a constant. Is the problem lies in the code or is it just not a suitable task for LSTM or RNN in general to solve?
I have tried to increase batch size and epochs, make the LSTM deeper, change the number of LSTM units, but none of them works.
I was also wondering whether the labeling method matters or not. I tried to use One-hot encoder at first, but it didn't work. Then, I changed it to LabelEncoder, but it's also not working. Same constant output is produced.
Below here is the code for my model structure
def create_model():
input1 = Input(shape=(16,1))
classifier = LSTM(64, input_shape=(16,1), return_sequences=True)(input1)
for i in range(2):
classifier = LSTM(32, return_sequences=True)(classifier)
classifier = LSTM(32)(classifier)
classifier = Dense(1, activation='relu')(classifier)
model = Model(inputs = [input1], outputs = classifier)
adam = keras.optimizers.adam(lr=0.01)
model.compile(loss='mean_squared_error', optimizer=adam)
return model
If anyone wondering why I use functional API instead of sequential, it is because there is a possible modification where I need to use 2 input variables that needs to be processed independently before concatenating it at the end.
Thank you in advance.

Why does multi layer perceprons outperform RNN in CartPole?

Recently, I compared two models for a DQN on CartPole-v0 environment. One of them is a multilayer perceptron with 3 layers and the other is an RNN built up from an LSTM and 1 fully connected layer. I have an experience replay buffer of size 200000 and the training doesn't start until it is filled up.
Although MLP has solved the problem under a reasonable amount of training steps (this means to achieve a mean reward of 195 for the last 100 episodes), the RNN model could not converge as quickly and its maximum mean reward did not even reach 195 too!
I have already tried to increase batch size, add more neurons to the LSTM'S hidden state, increase the RNN'S sequence length and making the fully connected layer more complex - but every attempt failed as I saw enormous fluctuations in mean reward so the model hardly converged at all. May these are the sings of early overfitting?
class DQN(nn.Module):
def __init__(self, n_input, output_size, n_hidden, n_layers, dropout=0.3):
super(DQN, self).__init__()
self.n_layers = n_layers
self.n_hidden = n_hidden
self.lstm = nn.LSTM(input_size=n_input,
hidden_size=n_hidden,
num_layers=n_layers,
dropout=dropout,
batch_first=True)
self.dropout= nn.Dropout(dropout)
self.fully_connected = nn.Linear(n_hidden, output_size)
def forward(self, x, hidden_parameters):
batch_size = x.size(0)
output, hidden_state = self.lstm(x.float(), hidden_parameters)
seq_length = output.shape[1]
output1 = output.contiguous().view(-1, self.n_hidden)
output2 = self.dropout(output1)
output3 = self.fully_connected(output2)
new = output3.view(batch_size, seq_length, -1)
new = new[:, -1]
return new.float(), hidden_state
def init_hidden(self, batch_size, device):
weight = next(self.parameters()).data
hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().to(device),
weight.new(self.n_layers, batch_size, self.n_hidden).zero_().to(device))
return hidden
Contrarily to what I expected, the simpler model gave a much better result than the other; even though RNN is supposed to be better in processing time series data.
Can anybody tell me what's the reason for this?
Also, I have to state that I applied no feature engineering and both DQN's worked with raw data. Could RNN outperform the MLP on using normalized features? (I mean feeding both models with normalized data)
Is there anything you can recommend me to improve training efficiency on RNN's to achieve the best results?
Contrary to what I expected the simpler model gave much better result that the other; even though RNN's supposed to be better in processing time series data.
There is no time series in the cart-pole, the state contains all the information needed for optimal decision. It would be different if, for instance, you would learn from images and you would need to estimate the pole velocity from a series of images.
Also, it is not true that the more complex model should perform better. On the contrary, it is more likely to overfit. For the cart-pole you don't even need a NN, a simple linear approximator with RBFs or random Fourier features would suffice. A RNN + LSTM is for sure an overkill for such a simple problem.

Resources