Interpretation of predictions of sparse_categorical_crossentropy keras model - keras

I am trying to train a model to classify news. I'm using the bbc-text database:
Data
I have transformed both, the output and input variables to be used in a keras model using Tokenizer().
Finally, I have set up the following model:
model = tf.keras.Sequential([
tf.keras.layers.Embedding(20000, 200, input_length=250),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(24, activation='relu'),
tf.keras.layers.Dense(6, activation='softmax')])
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['sparse_categorical_accuracy'])
model.fit(X_train_text, y_train_text, batch_size=32, epochs=50, validation_data=(X_test_text, y_test_text))
While I get a good accuracy for this model, I do not know how to interpret the predictions. I would have expected to get, for each input, a list of 5 probabilities (summing up to 1), as my output has 5 possible categories.
Instead, I get this:
Predictions
Any help please?

Related

What to expect from model.predict in Keras?

I am new to Keras and trying to write my first code. I want to understand what 'model.predict' should return. Consider a simple model below.
model = keras.Sequential()
model.add(keras.layers.Dense(12, input_dim=232, activation='relu'))
model.add(keras.layers.Dense(232, activation='relu'))
model.add(keras.layers.Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(vSignal, vLabels, epochs=15, batch_size=100 )
# evaluate the keras model
_, accuracy = model.evaluate(vSignal, vLabels)
print('Accuracy: %.2f' % (accuracy*100))
pred=model.predict(vSignalT)
Consider we train the "model" with "vSignal" and "vLabels" as shown above. Now consider that the accuracy of the model as given by model.evaluate is 100%. Now if we give same data 'vSignal' to 'model.predict' should we get the 'vLabels' return?
pred=model.predict(vSignalT) returns a numpy arrays of predictions.
each row consists of one of the vlabels that the model predicted.
for more information refer to here
save return value of fit function:
hist = model.fit(vSignal, vLabels, epochs=15, batch_size=100 );
then check the
hist.history["accuracy"]

Why this simple keras 3 class classifier is predicting only one class instead of other classes?

I am trying to create a simple 3 class deep learning classifier using keras as follows:
clf = Sequential()
clf.add(Dense(20, activation='relu', input_dim=NUM_OF_FEATURES))
clf.add(Dense(10, activation='relu'))
clf.add(Dense(3, activation='relu'))
clf.add(Dense(1, activation='softmax'))
# Model Compilation
clf.compile(optimizer = 'adam',
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
# Training the model
clf.fit(X_train,
y_train,
epochs=10,
batch_size=16,
validation_data=(X_val, y_val))
How after training while predicting, it is only predicting the same class (class 1) out of the 3 classes ALWAYS.
Is my network architecture not correct?
I am new to deep learning and AI.
If you want a network to classify three classes, your last dense layer should have three output nodes. In the example, the last dense layer has one output node.
clf = Sequential()
clf.add(Dense(20, activation='relu', input_dim=NUM_OF_FEATURES))
clf.add(Dense(10, activation='relu'))
clf.add(Dense(3, activation='relu'))
clf.add(Dense(3, activation='softmax'))
For each input sample, the output will be three values, all of which sum to one. These represent the probabilities that the input belongs to each class.
Regarding the loss function, if you want to use cross entropy, you have a choice between sparse categorical cross entropy and categorical cross entropy. The latter expects ground truth labels to be one-hot encoded (you can use tf.one_hot for this). In other words, the shape of the labels is the same as the shape as the network's output. Sparse categorical cross entropy, on the other hand, expects labels with a rank N-1, where N is the rank of the neural network's output. In order words, these are the labels before one-hot encoding.
When the model is used for inference, the predicted class values can be retrieved with argmax of the last dimension of the predictions.
predictions = clf.predict(x)
classes = predictions.argmax(-1)

Keras get the output of the last layer during training

The goal is to recover the output of the last layer of the variational auto-encoder in the training phase for use as training data for another algorithm.
Attached is the model variational autoencoder code:
encoding_dim=58
input_dim=xtrain.shape[1]
inputArray=Input(shape=(input_dim,))
encoded= Dense(units=encoding_dim,activation="tanh")(inputArray)
encoded= Dense(units=29,activation="tanh")(encoded)
encoded= Dense(units=15,activation="tanh")(encoded)
encoded= Dense(units=10,activation="tanh")(encoded)
encoded= Dense(units=3,activation="tanh")(encoded)
encoded= Dense(units=10,activation="tanh")(encoded)
decoded= Dense(units=15,activation="tanh")(encoded)
decoded= Dense(units=29,activation="tanh")(decoded)
decoded= Dense(units=encoding_dim,activation="tanh")(decoded)
decoded= Dense(units=input_dim,activation="sigmoid")(decoded)
autoecoder=Model(inputArray,decoded)
autoecoder.summary()
autoecoder.compile(optimizer=RMSprop(),loss="mean_squared_error",metrics=["mae"])
#hyperparametrs :
batchsize=100
epoch=10
history = autoecoder.fit(xtrain_noise,xtrain,
batch_size=batchsize,
epochs=epoch,
verbose=1,
shuffle=True,
validation_data=(xtest_noise,xtest),
callbacks=[TensorBoard(log_dir="../logs/DenoiseautoencoderHoussem")])
I have found that I can retrieve the desired layer as follows:
autoecoder.layers[10].output
but how do I store his output during training in a list? Thanks.
Edit:
I can do this by use the prediction method of the model on the xtrain data, but I think this is not the best way to do it.
You can train a new model using the predictions of a previously trained model simply stacking on the desired output new layers and set trainable = False on the old layer. Here a dummy example
# after autoencoder fitting
for i,l in enumerate(autoecoder.layers):
autoecoder.layers[i].trainable = False
print(l.name, l.trainable)
output_autoecoder = autoecoder.layers[10].output
x_new = Dense(32, activation='relu')(output_autoecoder) # add a new layer for exemple
new_model = Model(autoecoder.input, x_new)
new_model.compile('adam', 'mse')
new_model.summary()
I use the output of the last autoencoder layer as the input of new blocks. We can merge all compiling a new model where the inputs are the same as autoecoder, in this way we can use the training data for another algorithm without calling the prediction method
To solve this problem, the only solution that can be used is the .predict method of DL model. thank you #marrco

Which lstm architecture for my data and what data process should I do

I'm trying to build LSTM architecture to predict sickness rate(0%-100%). My input is an array with dimension 4760x10 (of number of sick persons per town per age, number of consultation .....) My output or the y is the sickness rate.
I'm new in machine learning and I tried several tips like changing the optimzer, the layer node number and the dropout value and my model didn't converge(the lowest mse was =616.245). I tried also to scale my data with 'MinMaxScaler'. So could you guys help me with some advice to change the architecture or some data processing to help the model to converge.
here is the lstm model which give me the mse=616.245
def build_modelz4():
model = Sequential()
model.add(LSTM(10, input_shape=(1, 10), return_sequences=True))
model.add(LSTM(84, return_sequences= True))
model.add(LSTM(84, return_sequences=False))
model.add(Dense(1,activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'] )
model.summary()
return model
lstmz4 = build_modelz4()
checkpointer = ModelCheckpoint(filepath="weightslstmz4.hdf5", verbose=1, save_best_only=True)
newsclstmhis = lstmz4.fit(trainX,trainY,epochs=1000,batch_size=221, validation_data=(testX, testY) ,verbose=2, shuffle=False, callbacks=[checkpointer])
Notice that when I used the ann model it converge with mse=0.8. So with lstm it should converge
and thank you in advance
4760 is a very small number of dimensions for a LSTM.Plus its seems like a very simple classification model try using simpler algorithms like svm for the process but if you are adamant on using deep learning use Sequential model with Dense layer instead with few more layers than this one that should definitely give you better results.

Implementing Tensorflow Regression Model on Basketball data

I am following along the following guide to tensorflow regression models: https://www.tensorflow.org/tutorials/keras/basic_regression
Using basketball data. I am wanting to predict NBA career length based on college stats. I currently have normalized data in the format:
I then build the following model based on the code in the above link:
def build_model():
model = keras.Sequential([
keras.layers.Dense(64, activation=tf.nn.relu,
input_shape=(train.shape[1],)),
keras.layers.Dense(64, activation=tf.nn.relu),
keras.layers.Dense(1)
])
optimizer = tf.train.RMSPropOptimizer(0.001)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae'])
return model
model = build_model()
model.summary()
Which appears to work fine. However when I then try to run the model and record the history using the following code:
EPOCHS = 200
labels = ['Age','G','FG','FGA','X3P','X3PA','FTA','TRB','AST','STL','BLK','Wt','final_ht','colyears','nbayears']
# Store training stats
history = model.fit(train, labels, epochs=EPOCHS, validation_split=0.2, verbose=0)
This gives me an error that: 'str' object has no attribute 'ndim', which I am having trouble understanding what it means. Am I doing something wrong?
When you call the .fit function of the model the second parameter should represent your target variable (NBA career length). This will be a one-dimensional array instead of the list you tried to pass to the function.
This should solve the problem.

Resources