My keras neural network model is giving me accuracy 0.0000e+00 - python-3.x

I am trying to code a simple neural network to predict total number of corona cases given a multiple of factors related to each country.
However when using the dataset I created, the accuracy in 0.0000e+00. Although I tried this code on a different dataset I downloaded online concerning house pricing and the accuracy went up to 60%.
Both datasets are around 200 rows.
Here is my code below.
import pandas as pd
df = pd.read_excel (r'Dataset2.xlsx', sheet_name='class 2')
df.head()
dataset = df.values
X = dataset[:,1:7]
Y = dataset[:,7]
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
X_scale = min_max_scaler.fit_transform(X)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size=0.1, random_state=4)
from keras.models import Sequential
from keras.layers import Dense
model = Sequential([ Dense(32, activation='relu', input_shape=(6,)), Dense(32, activation='relu'), Dense(1, activation='sigmoid'),])
model.compile(optimizer='sgd', loss='mse',metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=25, batch_size=32, verbose=1, validation_data=(X_test,y_test))
Also here is a screenshot of my dataset.

Accuracy is a metric for classification, while from your description your task is regression. Use different metrics suitable for regression task, such as MAE or MSE.

Related

GridsearchCV loss doesn't equal model.fit() loss values

I am confused as to which metric GridsearchCV is using in its parameter search. My understanding is that my model object feeds it a metric and this is what is used to determine the "best_params". But this doesn't appear to be the case. I thought that score=None is the default and as a result the first metric given in the metrics option of model.compile() was used. So in my case the the scoring function used should be the mean_squred_error. My explanation for this issue is described next.
Here is what I am doing. I simulated some regression data using sklearn with 10 features on 100,000 observations. I am playing around with keras because I typically used pytorch in the past and never really dabbled with keras until now. I am noticing a discrepancy in the loss function output from my GridsearchCV call vs the model.fit() call after I have my optimal set of parameters. Now I know I can just refit=True and not re-fit the model again, but I am trying to get a feel for the output of the keras and sklearn GridsearchCV functions.
To be explicit about the discrepancy here is what I am seeing. I simulated some data using sklearn as follows:
# Setting some data basics
N = 10000
feats = 10
# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)
# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]
I have created a "create_model" function that is looking to tune which activation function I am using (again this is a simple example for a proof of concept).
def create_model(activation_fn):
# create model
model = Sequential()
model.add(Dense(30, input_dim=feats, activation=activation_fn,
kernel_initializer='normal'))
model.add(Dropout(0.2))
model.add(Dense(10, activation=activation_fn))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
# Compile model
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['mean_squared_error','mae'])
return model
Performing the grid search I get the following output
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
Best: -21.163454 using {'activation_fn': 'linear'}
Ok, so the best metric is the mean squared error of 21.16 (I understand they flip the sign to create a maximization problem). So, when I fit the model using the activation_fn = 'linear' the MSE I get is totally different.
best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)
.....
.....
Epoch 49/50
8000/8000 [==============================] - 0s 48us/step - loss: 344.1636 - mean_squared_error: 344.1636 - mean_absolute_error: 12.2109
Epoch 50/50
8000/8000 [==============================] - 0s 48us/step - loss: 326.4524 - mean_squared_error: 326.4524 - mean_absolute_error: 11.9250
history.history['mean_squared_error']
Out[723]:
[10053.778002929688,
9826.66806640625,
......
......
344.16363830566405,
326.45237121582034]
The difference is in 326.45 vs. 21.16. Any insight as to what I am misunderstanding would be greatly appreciated. I would be more comfortable if they were within a reasonable neighborhood of each other, given one is the error from one fold vs the entire training data set. But 21 is nowhere near 326. Thanks!
The entire code is seen here.
import pandas as pd
import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
from keras.constraints import maxnorm
from sklearn import preprocessing
from sklearn.preprocessing import scale
from sklearn.datasets import make_regression
from matplotlib import pyplot as plt
# Setting some data basics
N = 10000
feats = 10
# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)
# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]
def create_model(activation_fn):
# create model
model = Sequential()
model.add(Dense(30, input_dim=feats, activation=activation_fn,
kernel_initializer='normal'))
model.add(Dropout(0.2))
model.add(Dense(10, activation=activation_fn))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
# Compile model
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['mean_squared_error','mae'])
return model
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# create model
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)
# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)
best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)
history.history.keys()
plt.plot(history.history['mean_absolute_error'])
# summarize results
grid_result.cv_results_
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
The large loss reported in your output (326.45237121582034) is the training loss. If you need a metric to be compared with the grid_result.best_score_ (in the GridSearchCV) and the MSE (in the best_model.fit), you have to request the validation loss (cf. code below).
Now to the question: why is the validation loss lower than the training loss? In your case it is essentially because of dropout (which is applied during training but not during validation/test) - that is why the difference between training and validation losses disappears when you remove dropout. You can find a detailed explanation here of the possible reasons for a lower validation loss.
In short, the performance (MSE) of your model is given by the grid_result.best_score_ (21.163454 in your example).
import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.datasets import make_regression
import tensorflow as tf
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
tf.random.set_seed(42)
# Setting some data basics
N = 10000
feats = 10
# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)
# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]
def create_model(activation_fn):
# create model
model = Sequential()
model.add(Dense(30, input_dim=feats, activation=activation_fn,
kernel_initializer='normal'))
model.add(Dropout(0.2))
model.add(Dense(10, activation=activation_fn))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
# Compile model
model.compile(loss='mean_squared_error',
optimizer='adam',
metrics=['mean_squared_error','mae'])
return model
# create model
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)
# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1, validation_data=(X_test, y_test))
best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1, validation_data=(X_test, y_test))
history.history.keys()
# plt.plot(history.history['mae'])
# summarize results
print(grid_result.cv_results_)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Keras Multiclass Classification (Dense model) - Confusion Matrix Incorrect

I have a labeled dataset. last column (78) contains 4 types of attack. following codes confusion matrix is correct for two types of attack. can any one help to modify the code for keras multiclass attack detection and correction for get correct confusion matrix? and for correct code for precision, FPR,TPR for multiclass. Thanks.
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
from keras.utils.np_utils import to_categorical
dataset_original = pd.read_csv('./XYZ.csv')
# Dron NaN value from Data Frame
dataset = dataset_original.dropna()
# data cleansing
X = dataset.iloc[:, 0:78]
print(X.info())
print(type(X))
y = dataset.iloc[:, 78] #78 is labeled column contains 4 anomaly type
print(y)
# encode the labels to 0, 1 respectively
print(y[100:110])
encoder = LabelEncoder()
y = encoder.fit_transform(y)
print([y[100:110]])
# Split the dataset now
XTrain, XTest, yTrain, yTest = train_test_split(X, y, test_size=0.2, random_state=0)
# feature scaling
scalar = StandardScaler()
XTrain = scalar.fit_transform(XTrain)
XTest = scalar.transform(XTest)
# modeling
model = Sequential()
model.add(Dense(units=16, kernel_initializer='uniform', activation='relu', input_dim=78))
model.add(Dense(units=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(units=6, kernel_initializer='uniform', activation='relu'))
model.add(Dense(units=1, kernel_initializer='uniform', activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(XTrain, yTrain, batch_size=1000, epochs=10)
history = model.fit(XTrain, yTrain, batch_size=1000, epochs=10, verbose=1, validation_data=(XTest,
yTest))
yPred = model.predict(XTest)
yPred = [1 if y > 0.5 else 0 for y in yPred]
matrix = confusion_matrix(yTest, yPred)`enter code here`
print(matrix)
accuracy = (matrix[0][0] + matrix[1][1]) / (matrix[0][0] + matrix[0][1] + matrix[1][0] + matrix[1][1])
print("Accuracy: " + str(accuracy * 100) + "%")
If i understand correctly, you are trying to solve a multiclass classification problem where your target label belongs to 4 different attacks. Therefore, you should use the output Dense layer having 4 units instead of 1 with a 'softmax' activation function (not 'sigmoid' activation). Additionally, you should use 'categorical_crossentropy' loss in place of 'binary_crossentropy' while compiling your model.
Furthermore, with this setting, applying argmax on prediction result (that has 4 class probability values for each test sample) you will get the final label/class.
[Edit]
Your confusion matrix and high accuracy indicates that you are working with an imbalanced dataset. May be very high number of samples are from class 0 and few samples are from the remaining 3 classes. To handle this you may want to apply weighting samples or over-sampling/under-sampling approaches.

Model trained using LSTM is predicting only same value for all

I have a dataset with 4000 rows and two columns. The first column contains some sentences and the second column contains some numbers for it.
There are some 4000 sentences and they are categorized by some 100 different numbers. For example:
Sentences Codes
Google headquarters is in California 87390
Steve Jobs was a great man 70214
Steve Jobs has done great technology innovations 70214
Google pixel is a very nice phone 87390
Microsoft is another great giant in technology 67012
Bill Gates founded Microsoft 67012
Similarly, there are a total of 4000 rows containing these sentences and these rows are classified with 100 such codes
I have tried the below code but when I am predicting, it is predicting one same value for all. IN othr words y_pred is giving an array of same values.
May I know where is the code going wrong
import pandas as pd
import numpy as np
xl = pd.ExcelFile("dataSet.xlsx")
df = xl.parse('Sheet1')
#df = df.sample(frac=1).reset_index(drop=True)# shuffling the dataframe
df = df.sample(frac=1).reset_index(drop=True)# shuffling the dataframe
X = df.iloc[:, 0].values
Y = df.iloc[:, 1].values
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import pickle
count_vect = CountVectorizer()
X = count_vect.fit_transform(X)
tfidf_transformer = TfidfTransformer()
X = tfidf_transformer.fit_transform(X)
X = X.toarray()
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)
y = Y.reshape(-1, 1) # Because Y has only one column
onehotencoder = OneHotEncoder(categories='auto')
Y = onehotencoder.fit_transform(y).toarray()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=0)
inputDataLength = len(X_test[0])
outputDataLength = len(Y[0])
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.layers import Dropout
# fitting the model
embedding_vector_length = 100
model = Sequential()
model.add(Embedding(outputDataLength,embedding_vector_length, input_length=inputDataLength))
model.add(Dropout(0.2))
model.add(LSTM(outputDataLength))
model.add(Dense(outputDataLength, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=20)
y_pred = model.predict(X_test)
invorg = model.inverse_transform(y_test)
y_test = labelencoder_Y.inverse_transform(invorg)
inv = onehotencoder.inverse_transform(y_pred)
y_pred = labelencoder_Y.inverse_transform(inv)
You are using binary_crossentropy eventhough you have 100 classes. Which is not the right thing to do. You have to use categorical_crossentropy for this task.
Compile your model like this,
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
Also, you are predicting with the model and converting to class labels like this,
y_pred = model.predict(X_test)
inv = onehotencoder.inverse_transform(y_pred)
y_pred = labelencoder_Y.inverse_transform(inv)
Since your model is activated with softmax inorder to get the class label, you have to find the argmax of the predictions.
For example, if the prediction was [0.2, 0.3, 0.0005, 0.99] you have to take argmax, which will give you output 3. The class that have high probability.
So you have to modify the prediction code like this,
y_pred = model.predict(X_test)
y_pred = np.argmax(y_pred, axis=1)
y_pred = labelencoder_Y.inverse_transform(y_pred)
invorg = np.argmax(y_test, axis=1)
invorg = labelencoder_Y.inverse_transform(invorg)
Now you will have the actual class labels in invorg and predicted class labels at y_pred

How to compute accuracy and the confusion matrix using K-fold cross-validation?

I tried to do K-fold cross-validation with K=30 folds, with one confusion matrix for each fold. How to compute the accuracy and the confusion matrix to the model with confidence interval?
Could someone help me?
My code is:
import numpy as np
from sklearn import model_selection
from sklearn import datasets
from sklearn import svm
import pandas as pd
from sklearn.linear_model import LogisticRegression
UNSW = pd.read_csv('/home/sec/Desktop/CEFET/tudao.csv')
previsores = UNSW.iloc[:,UNSW.columns.isin(('sload','dload',
'spkts','dpkts','swin','dwin','smean','dmean',
'sjit','djit','sinpkt','dinpkt','tcprtt','synack','ackdat','ct_srv_src','ct_srv_dst','ct_dst_ltm',
'ct_src_ltm','ct_src_dport_ltm','ct_dst_sport_ltm','ct_dst_src_ltm')) ].values
classe= UNSW.iloc[:, -1].values
X_train, X_test, y_train, y_test = model_selection.train_test_split(
previsores, classe, test_size=0.4, random_state=0)
print(X_train.shape, y_train.shape)
#((90, 4), (90,))
print(X_test.shape, y_test.shape)
#((60, 4), (60,))
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
print(previsores.shape)
########K FOLD
print('########K FOLD########K FOLD########K FOLD########K FOLD')
from sklearn.model_selection import KFold
from sklearn.metrics import confusion_matrix
kf = KFold(n_splits=30, random_state=None, shuffle=False)
kf.get_n_splits(previsores)
for train_index, test_index in kf.split(previsores):
X_train, X_test = previsores[train_index], previsores[test_index]
y_train, y_test = classe[train_index], classe[test_index]
logmodel.fit(X_train, y_train)
print (confusion_matrix(y_test, logmodel.predict(X_test)))
print(10* '#')
For accuracy, I would use the function cross_val_score that does exactly what you are looking for. It outputs a list of 30 validation accuracies and you can then compute their mean, standard deviation, etc and create some kind of a confidence interval (mean +- 2*std)
.
Since confusion matrix cannot be seen as a performance metric (not a single number but a matrix) I would recommend creating a list and then iteratively just append it with a corresponding validation confusion matrix (currently you just print it). At the end, you can use this list to extract a lot of interesting information.
UPDATE:
...
...
cm_holder = []
for train_index, test_index in kf.split(previsores):
X_train, X_test = previsores[train_index], previsores[test_index]
y_train, y_test = classe[train_index], classe[test_index]
logmodel.fit(X_train, y_train)
cm_holder.append(confusion_matrix(y_test, logmodel.predict(X_test))))
Note that the len(cm_holder) = 30 and each of the elements is an array of shape=(n_classes, n_classes).

MLP classifier_for multi class

I am newbie on keras,
I try to follow the Keras tutorial for Multilayer Perceptron (MLP) for multi-class softmax classification, using my data set.
My data has 3 classes and only one feature, but I don't understand why the result always show just 0,3 of accuracy and the model predicted all training data as first class. then the confusion matrix is like this.
Confusion matrix
Here the coding:
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import pandas as pd
import numpy as np
# Importing the dataset
dataset = pd.read_csv('StatusAll.csv')
X = dataset.iloc[:, 1:].values
y = dataset.iloc[:, 0:1].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from keras.utils import to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# in the first layer, you must specify the expected input data shape:
# here, 20-dimensional vectors.
model.add(Dense(64, activation='tanh', input_dim=1))
model.add(Dropout(0.5))
model.add(Dense(64, activation='tanh'))
model.add(Dropout(0.5))
model.add(Dense(4, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
history = model.fit(x_train, y_train,
epochs=100,
batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
print('Test score:', score[0])
print('Test accuracy:', score[1])
from sklearn import metrics
prediction = model.predict(x_test)
prediction = np.around(prediction)
y_test_non_category = [ np.argmax(t) for t in y_test ]
y_predict_non_category = [ np.argmax(t) for t in prediction ]
from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)
print (conf_mat)
I hope I can get some advice, thanksss.
The x_train example
x_train
y_train before converted to categorical
enter image description here
Your final Dense layer has 4 outputs, it seems like you are classifying 4 instead of 3.
model.add(Dense(3, activation='softmax')) # Number of classes 3
It would be helpful to see sample data from x_train and y_train to make sure the pre-processing is correct. Because you have only 1 feature, a MLP might be overkill. A decision tree would be simpler unless you want to experiment with MLPs.

Resources