How is this simple Keras neural-network calculating result? - python-3.x

I'm trying to understand how a simple forward neural network works... starting off with the example here I have simplified it to make a trainer that generally gives a 100% accurate "AND" neuron:
# import the necessary packages
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Activation
from keras.optimizers import SGD
from keras.layers import Dense
from keras.utils import np_utils
from imutils import paths
import numpy as np
import argparse
import cv2
import os
def image_to_feature_vector(image, size=(32, 32)):
# resize the image to a fixed size, then flatten the image into
# a list of raw pixel intensities
return cv2.resize(image, size).flatten()
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
help="path to output model file")
args = vars(ap.parse_args())
# grab the list of images that we'll be describing
#print("[INFO] describing images...")
#imagePaths = list(paths.list_images(args["dataset"]))
## initialize the data matrix and labels list
#data = []
#labels = []
## loop over the input images
#for (i, imagePath) in enumerate(imagePaths):
## load the image and extract the class label (assuming that our
## path as the format: /path/to/dataset/{class}.{image_num}.jpg
#image = cv2.imread(imagePath)
#label = imagePath.split(os.path.sep)[-1].split(".")[0]
## construct a feature vector raw pixel intensities, then update
## the data matrix and labels list
#features = image_to_feature_vector(image)
#data.append(features)
#labels.append(label)
## show an update every 1,000 images
#if i > 0 and i % 1000 == 0:
#print("[INFO] processed {}/{}".format(i, len(imagePaths)))
#print(labels)
#exit()
data = [[0,0],[0,1],[1,0],[1,1]]
labels = [0,0,0,1]
# encode the labels, converting them from strings to integers
le = LabelEncoder()
labels = le.fit_transform(labels)
# scale the input image pixels to the range [0, 1], then transform
# the labels into vectors in the range [0, num_classes] -- this
# generates a vector for each label where the index of the label
# is set to `1` and all other entries to `0`
data = np.array(data)# / 255.0
labels = np_utils.to_categorical(labels, 2)
# partition the data into training and testing splits, using 75%
# of the data for training and the remaining 25% for testing
print("[INFO] constructing training/testing split...")
(trainData, testData, trainLabels, testLabels) = train_test_split(
data, labels, test_size=0.0, random_state=42)
print('train')
print(trainData)
print(trainLabels)
print('test')
print(testData) #oopse, empty...
print(testLabels)
testData = trainData
testLabels = trainLabels
# define the architecture of the network
model = Sequential()
model.add(Dense(2, input_dim=2, kernel_initializer="uniform",
activation="relu"))
model.add(Activation("softmax"))
# train the model using SGD
print("[INFO] compiling model...")
sgd = SGD(lr=0.35)
model.compile(loss="binary_crossentropy", optimizer=sgd,
metrics=["accuracy"])
print(model.fit.__doc__)
model.fit(trainData, trainLabels, epochs=50, batch_size=128,
verbose=False)
print(model.predict(np.array([[0,1]])))
print('Should be [1,0]')#false
print(model.predict(np.array([[1,0]])))
print('Should be [1,0]')#false
print(model.predict(np.array([[0,0]])))
print('Should be [1,0]')#false
print(model.predict(np.array([[1,1]])))
print('Should be [0,1] true.')
# show the accuracy on the testing set
print("[INFO] evaluating on testing set...")
(loss, accuracy) = model.evaluate(testData, testLabels,
batch_size=128, verbose=1)
print("[INFO] loss={:.4f}, accuracy: {:.4f}%".format(loss,
accuracy * 100))
# dump the network architecture and weights to file
print("[INFO] dumping architecture and weights to file...")
model.save(args["model"])
Now when running it with python3 ./simple_andNN.py --model ./outputfile.hdf5 I can open the output model in the Hdfview app and this is what I see:
Now I would expect the value of [1 1] (the only one classified in the positive group, result=[smaller, larger number]) to be dot product of a matrix (the kernel matrix in this simple case?), plus some constant bias, but when I try that it doesn't seem to add up to anything output is saying. Am I misunderstanding what this "neuron" is supposed to be doing based on this data?

The result you see in your print statements are after the softmax operation.
[1 1] . [-.70 .77; -.64 .81] + [1.9 -0.62] = [.53 .96]
Then
exp(.53) = 1.69, exp(.96) = 2.62
so result is
[1.69/(1.69+2.62) 2.62/(1.69+2.62)] = [.39 .61]
(obviously with rounding errors)
Also note that before taking the softmax technically you have a relu activation, but since that is the identity for positive numbers, it doesn't have an effect for the [1 1] example. It would make a difference for the [1 0] example, as the second component of your Wx + b is negative, which would then be zeroed out.

Related

Questions about Multitask deep neural network modeling using Keras

I'm trying to develop a multitask deep neural network (MTDNN) to make prediction on small molecule bioactivity against kinase targets and something is definitely wrong with my model structure but I can't figure out what.
For my training data (highly imbalanced data with 0 as inactive and 1 as active), I have 423 unique kinase targets (tasks) and over 400k unique compounds. I first calculate the ECFP fingerprint using smiles, and then I randomly split the input data into train, test, and valid sets based on 8:1:1 ratio using RandomStratifiedSplitter from deepchem package. After training my model using the train set and I want to make prediction on the test set to check model performance.
Here's what my data looks like (screenshot example):
(https://i.stack.imgur.com/8Hp36.png)
Here's my code:
# Import Packages
import numpy as np
import pandas as pd
import deepchem as dc
from sklearn.metrics import roc_auc_score, roc_curve, auc, confusion_matrix
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import initializers, regularizers
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input, Dropout, Reshape
from tensorflow.keras.optimizers import SGD
from rdkit import Chem
from rdkit.Chem import rdMolDescriptors
# Build Model
inputs = keras.Input(shape = (1024, ))
x = keras.layers.Dense(2000, activation='relu', name="dense2000",
kernel_initializer=initializers.RandomNormal(stddev=0.02),
bias_initializer=initializers.Ones(),
kernel_regularizer=regularizers.L2(l2=.0001))(inputs)
x = keras.layers.Dropout(rate=0.25)(x)
x = keras.layers.Dense(500, activation='relu', name='dense500')(x)
x = keras.layers.Dropout(rate=0.25)(x)
x = keras.layers.Dense(846, activation='relu', name='output1')(x)
logits = Reshape([423, 2])(x)
outputs = keras.layers.Softmax(axis=2)(logits)
Model1 = keras.Model(inputs=inputs, outputs=outputs, name='MTDNN')
Model1.summary()
opt = keras.optimizers.SGD(learning_rate=.0003, momentum=0.9)
def loss_function (output, labels):
loss = tf.nn.softmax_cross_entropy_with_logits(output,labels)
return loss
loss_fn = loss_function
Model1.compile(loss=loss_fn, optimizer=opt,
metrics=[keras.metrics.Accuracy(),
keras.metrics.AUC(),
keras.metrics.Precision(),
keras.metrics.Recall()])
for train, test, valid in split2:
trainX = pd.DataFrame(train.X)
trainy = pd.DataFrame(train.y)
trainy2 = tf.one_hot(trainy,2)
testX = pd.DataFrame(test.X)
testy = pd.DataFrame(test.y)
testy2 = tf.one_hot(testy,2)
validX = pd.DataFrame(valid.X)
validy = pd.DataFrame(valid.y)
validy2 = tf.one_hot(validy,2)
history = Model1.fit(x=trainX, y=trainy2,
shuffle=True,
epochs=10,
verbose=1,
batch_size=100,
validation_data=(validX, validy2))
y_pred = Model1.predict(testX)
y_pred2 = y_pred[:, :, 1]
y_pred3 = np.round(y_pred2)
# Check the # of nonzero in assay
(y_pred3!=0).sum () #all 0s
My questions are:
The roc and precision recall are all extremely high (>0.99), but the prediction result of test set contains all 0s, no actives at all. I also use the randomized dataset with same active:inactive ratio for each task to test if those values are too good to be true, and turns out all values are still above 0.99, including roc which is expected to be 0.5.
Can anyone help me to identify what is wrong with my model and how should I fix it please?
Can I use built-in functions in sklearn to calculate roc/accuracy/precision-recall? Or should I manually calculate the metrics based on confusion matrix on my own for multitasking purpose. Why and why not?

How can the coefficients of the multiclass logistic regression model be used to predict the probabilities of class membership of observations?

I am trying to solve one problem that resembles that of Fisher's irises classification. The problem is that I can train the model on my computer, but the given model has to predict class membership on a computer where it is impossible to install python and scikit learn. I want to understand how, having received the coefficients of the logistic regression model, I can predict the belonging to a certain class without using the predict method of the model.
Using the Fisher problem as an example, I do the following.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score, f1_score
# data preparation
iris = load_iris()
data = pd.DataFrame(data=np.hstack([iris.data, iris.target[:, np.newaxis]]),
columns=iris.feature_names + ['target'])
names = data.columns
# split data
X_train, X_test, y_train, y_test = train_test_split(data[names[:-1]], data[names[-1]], random_state=42)
# train model
cls = make_pipeline(
StandardScaler(),
LogisticRegression(C=2, random_state=42)
)
cls = cls.fit(X_train.to_numpy(), y_train)
preds_train = cls.predict(X_train)
# prediction
preds_test = cls.predict(X_test)
# scores
train_score = accuracy_score(preds_train, y_train), f1_score(preds_train, y_train, average='macro') # on train data
# train_score = (0.9642857142857143, 0.9653621232568601)
test_score = accuracy_score(preds_test, y_test), f1_score(preds_test, y_test, average='macro') # on test data
# test_score = (1.0, 1.0)
# model coefficients
cls[1].coef_, cls[1].intercept_
>>> (array([[-1.13948079, 1.30623841, -2.21496793, -2.05617771],
[ 0.66515676, -0.2541143 , -0.55819748, -0.86441227],
[ 0.47432404, -1.05212411, 2.77316541, 2.92058998]]),
array([-0.35860337, 2.43929019, -2.08068682]))
Now I have the coefficients of the model. And I want to use them to make predictions.
First, I make a prediction using the predict method for the first five observations on the test sample.
preds_test = cls.predict_proba(X_test)
preds_test[0:5]
>>>array([[5.66019001e-03, 9.18455687e-01, 7.58841233e-02],
[9.75854479e-01, 2.41455095e-02, 1.10881450e-08],
[1.18780156e-09, 6.53295166e-04, 9.99346704e-01],
[6.71574900e-03, 8.14174200e-01, 1.79110051e-01],
[6.98756622e-04, 8.09096425e-01, 1.90204818e-01]])
Then I manually calculate the predictions of the class probabilities for the observations using the coefficients of the model.
# define two functions for making predictions
def logit(x, w):
return np.dot(x, w)
# from here: https://stackoverflow.com/questions/34968722/how-to-implement-the-softmax-function-in-python
def softmax(z):
assert len(z.shape) == 2
s = np.max(z, axis=1)
s = s[:, np.newaxis] # necessary step to do broadcasting
e_x = np.exp(z - s)
div = np.sum(e_x, axis=1)
div = div[:, np.newaxis] # dito
return e_x / div
n, k = X_test.shape
X_ = np.hstack((np.ones((n, 1)), X_test)) # add column with 1 for intercept
weights = np.hstack((cls[1].intercept_[:, np.newaxis], cls[1].coef_)) # create weights matrix
results = softmax(logit(X_, weights.T)) # calculate probabilities
results[0:5]
>>>array([[3.67343725e-14, 4.63938438e-06, 9.99995361e-01],
[2.81976786e-05, 8.63083152e-01, 1.36888650e-01],
[1.24572182e-22, 5.47800683e-11, 1.00000000e+00],
[3.32990060e-14, 3.08352323e-06, 9.99996916e-01],
[2.66415118e-15, 1.78252465e-06, 9.99998217e-01]])
If you compare the two results obtained (preds_test[0:5] and results[0:5]), you can see that they do not coincide at all. Please explain me what I am doing wrong and how I can use the model's coefficients to calculate predictions without using the predict method.
I forgot that a scaler was applied. If you change the code a little, then the results are the same.
scaler = StandardScaler()
scaler.fit(X_train)
X_test_transf = scaler.transform(X_test)
def logit(x, w):
return np.dot(x, w)
def softmax(z):
assert len(z.shape) == 2
s = np.max(z, axis=1)
s = s[:, np.newaxis] # necessary step to do broadcasting
e_x = np.exp(z - s)
div = np.sum(e_x, axis=1)
div = div[:, np.newaxis] # dito
return e_x / div
n, k = X_test_transf.shape
X_ = np.hstack((np.ones((n, 1)), X_test_transf))
weights = np.hstack((cls[1].intercept_[:, np.newaxis], cls[1].coef_))
results = softmax(logit(X_, weights.T))
np.allclose(preds_test, results)
>>>True
There are two values for every predict_proba. The first value is the probability of the event not occurring and the probability of the event occurring. predict_proba(X)[:,1] to get the probability of the event occurring.

How to use the input gradients as variables within a custom loss function in Keras?

I am using the input gradient as feature important and want to compare the feature importance of a train datapoint with the human annotated feature importance. I would like to make this comparison differentiable such that it can be learned through backpropagation. For that, I am writing a custom loss function that in addition to the regular loss (e.g. m.s.e. on the prediction vs true labels) also checks whether the input gradient is correct (e.g. m.s.e. of the input gradient vs the human annotated feature importance).
With the following code I am able to get the input gradient:
from keras import backend as K
import numpy as np
from keras.models import Model
from keras.layers import Input, Dense
def normalize(x):
# utility function to normalize a tensor by its L2 norm
return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)
# Amount of training samples
N = 1000
input_dim = 10
# Generate training set make the 1st and 2nd feature same as the target feature
X = np.random.standard_normal(size=(N, input_dim))
y = np.random.randint(low=0, high=2, size=(N, 1))
X[:, 1] = y[:, 0]
X[:, 2] = y[:, 0]
# Create simple model
inputs = Input(shape=(input_dim,))
x = Dense(10, name="dense1")(inputs)
output = Dense(1, activation='sigmoid')(x)
model = Model(input=[inputs], output=output)
# Compile and fit model
model.compile(optimizer='adam', loss="mse", metrics=['accuracy'])
model.fit([X], y, epochs=100, batch_size=64)
# Get function to get input gradients
gradients = K.gradients(model.output, model.input)[0]
gradient_function = K.function([model.input], [normalize(gradients)])
# Get input gradient values of the training-set
grads_val = gradient_function([X])[0]
print(grads_val[:2])
This prints the following (you can see that the 1st and the 2nd features have the highest importance):
[[ 1.2629046e-02 2.2765596e+00 2.1479919e+00 2.1558853e-02
4.5277486e-03 2.9851785e-03 9.5279224e-04 -1.0903150e-02
-1.2230731e-02 2.1960819e-02]
[ 1.1318034e-02 2.0402350e+00 1.9250139e+00 1.9320872e-02
4.0577268e-03 2.6752844e-03 8.5390132e-04 -9.7713526e-03
-1.0961102e-02 1.9681118e-02]]
How can I write a custom loss function in which the input gradients are differentiable?
I started with the following loss function.
from keras.losses import mean_squared_error
def custom_loss():
# human annotated feature importance
# Let's say that it says to only look at the second feature
human_feature_importance = []
for i in range(N):
human_feature_importance.append([0,0,1,0,0,0,0,0,0,0])
def loss(y_true, y_pred):
# Get regular loss
regular_loss_value = mean_squared_error(y_true, y_pred)
# Somehow get the input gradient of each training sample as a tensor
# It should be differential w.r.t. all of the weights
gradients = ??
feature_importance_loss_value = mean_squared_error(gradients, human_feature_importance)
# Combine the both losses
return regular_loss_value + feature_importance_loss_value
return loss
I also found an implementation in tensorflow to make the input gradient differentialble: https://github.com/dtak/rrr/blob/master/rrr/tensorflow_perceptron.py#L18

Keras : KeyError: 'acc' , during plotting

i'm trying to visualize the Training plot of my Model , and i get this Error :
Traceback (most recent call last):
File "train.py", line 120, in <module>
plt.plot(H.history['acc'])
KeyError: 'acc'
& here's the Full Code :
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")
# import the necessary packages
from pyimagesearch.resnet import ResNet
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
from keras.utils import np_utils
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import cv2
import os
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
help="path to input dataset")
ap.add_argument("-a", "--augment", type=int, default=-1,
help="whether or not 'on the fly' data augmentation should be used")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
help="path to output loss/accuracy plot")
args = vars(ap.parse_args())
# initialize the initial learning rate, batch size, and number of
# epochs to train for
INIT_LR = 1e-1
BS = 8
EPOCHS = 50
# grab the list of images in our dataset directory, then initialize
# the list of data (i.e., images) and class images
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []
# loop over the image paths
for imagePath in imagePaths:
# extract the class label from the filename, load the image, and
# resize it to be a fixed 64x64 pixels, ignoring aspect ratio
label = imagePath.split(os.path.sep)[-2]
image = cv2.imread(imagePath)
image = cv2.resize(image, (64, 64))
# update the data and labels lists, respectively
data.append(image)
labels.append(label)
# convert the data into a NumPy array, then preprocess it by scaling
# all pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
# encode the labels (which are currently strings) as integers and then
# one-hot encode them
le = LabelEncoder()
labels = le.fit_transform(labels)
labels = np_utils.to_categorical(labels, 3)
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
test_size=0.25, random_state=42)
# initialize an our data augmenter as an "empty" image data generator
aug = ImageDataGenerator()
# check to see if we are applying "on the fly" data augmentation, and
# if so, re-instantiate the object
if args["augment"] > 0:
print("[INFO] performing 'on the fly' data augmentation")
aug = ImageDataGenerator(
rotation_range=20,
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest")
# initialize the optimizer and model
print("[INFO] compiling model...")
opt = SGD(lr=INIT_LR, momentum=0.9, decay=INIT_LR / EPOCHS)
model = ResNet.build(64, 64, 3, 2, (2, 3, 4),
(32, 64, 128, 256), reg=0.0001)
model.compile(loss="categorical_crossentropy", optimizer=opt,
metrics=["accuracy"])
# train the network
print("[INFO] training network for {} epochs...".format(EPOCHS))
H = model.fit_generator(
aug.flow(trainX, trainY, batch_size=BS),
validation_data=(testX, testY),
steps_per_epoch=len(trainX) // BS,
epochs=EPOCHS)
# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=BS)
print(classification_report(testY.argmax(axis=1),
predictions.argmax(axis=1), target_names=le.classes_))
# plot the training loss and accuracy
N = np.arange(0, EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history['val_loss'], label="val_loss")
plt.plot(N, H.history['acc'], label="accuracy")
plt.plot(N, H.history['val_acc'], label="val_acc")
plt.title("Training Loss and Accuracy on Dataset")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])
So, this code , is originally from the blog Pyimagesearch , on data augmentation , so , i decided to give it a try , the original "pipeline" was meant for a binary classification , so , since i was intrested in a mutli-class "task" , i brought some changes.
but the plotting part of the code , seems correct to me , i cheked on multiple posts here on Stack , checked the keras documentation , and nothing , i do not understand , why it dose not work !!!!
any suggestion would be much appreciated .
thank you.
Have you tried H.history['accuracy']? Since you compiled using 'accuracy' it will probably have the same string.
Now you can always inspect what you've got:
for key in H.history.keys():
print(key)
You will see what is logged there

Keras - Text Classification - LSTM - How to input text?

Im trying to understand how to use LSTM to classify a certain dataset that i have.
I researched and found this example of keras and imdb :
https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py
However, im confused about how the data set must be processed to input.
I know keras has pre-processing text methods, but im not sure which to use.
The x contain n lines with texts and the y classify the text by happiness/sadness. Basically, 1.0 means 100% happy and 0.0 means totally sad. the numbers may vary, for example 0.25~~ and so on.
So my question is, How i input x and y properly? Do i have to use bag of words?
Any tip is appreciated!
I coded this below but i keep getting the same error:
#('Bad input argument to theano function with name ... at index 1(0-based)',
'could not convert string to float: negative')
import keras.preprocessing.text
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.layers.embeddings import Embedding
from keras.layers.recurrent import LSTM
print('Loading data...')
import pandas
thedata = pandas.read_csv("dataset/text.csv", sep=', ', delimiter=',', header='infer', names=None)
x = thedata['text']
y = thedata['sentiment']
x = x.iloc[:].values
y = y.iloc[:].values
###################################
tk = keras.preprocessing.text.Tokenizer(nb_words=2000, filters=keras.preprocessing.text.base_filter(), lower=True, split=" ")
tk.fit_on_texts(x)
x = tk.texts_to_sequences(x)
###################################
max_len = 80
print "max_len ", max_len
print('Pad sequences (samples x time)')
x = sequence.pad_sequences(x, maxlen=max_len)
#########################
max_features = 20000
model = Sequential()
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128, input_length=max_len, dropout=0.2))
model.add(LSTM(128, dropout_W=0.2, dropout_U=0.2))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop')
model.fit(x, y=y, batch_size=200, nb_epoch=1, verbose=1, validation_split=0.2, show_accuracy=True, shuffle=True)
# at index 1(0-based)', 'could not convert string to float: negative')
Review how you are using your CSV parser to read the text in. Ensure that the fields are in the format Text, Sentiment if you want to to make use of the parser as you've written it in your code.

Resources