Keras batch normalization stops convergence - keras

I'm new to keras and have been experimenting with various things such as BatchNormalization but it is not working at all. When the BatchNormalization line is commented out it will converge to around 0.04 loss or better, but with it as it is it will converge to 0.71 and get stuck around there, I'm not sure what's wrong.
from sklearn import preprocessing
from sklearn.datasets import load_boston
from keras.models import Model
from keras.layers import Input, Dense
from keras.layers.normalization import BatchNormalization
import keras.optimizers
boston = load_boston()
x = boston.data
y = boston.target
normx = preprocessing.scale(x)
normy = preprocessing.scale(y)
# doesnt construct output layer
def layer_looper(inputs, number_of_loops, neurons):
inputs_copy = inputs
for i in range(number_of_loops):
inputs_copy = Dense(neurons, activation='relu')(inputs_copy)
inputs_copy = BatchNormalization()(inputs_copy)
return inputs_copy
inputs = Input(shape = (13,))
x = layer_looper(inputs, 40, 20)
predictions = Dense(1, activation='linear')(x)
model = Model(inputs=inputs, outputs=predictions)
opti = keras.optimizers.Adam(lr=0.0001)
model.compile(loss='mean_absolute_error', optimizer=opti, metrics=['acc'])
print(model.summary())
model.fit(normx, normy, epochs=5000, verbose=2, batch_size=128)
I have tried experimenting with batch sizes and the optimizer but it doesn't seem very effective. Am I doing something wrong?

I've increased learning rate to 0.01 and it seems like the network is able to learn something (I get Epoch 1000/5000- 0s - loss: 0.2330) .
I think it's worth to note the following from the abstract of original Batch Normalization paper:
Batch Normalization allows us to use much higher learning rates and
be less careful about initialization. It also acts as a regularizer (...)
That hinted to increased learning rate (that's something you might want to experiment with).
Be aware that since it works like regularization, BatchNorm should make your training loss worse - it's supposed to prevent overfitting and thus close the gap between the train and test/valid errors.

Related

Questions about Multitask deep neural network modeling using Keras

I'm trying to develop a multitask deep neural network (MTDNN) to make prediction on small molecule bioactivity against kinase targets and something is definitely wrong with my model structure but I can't figure out what.
For my training data (highly imbalanced data with 0 as inactive and 1 as active), I have 423 unique kinase targets (tasks) and over 400k unique compounds. I first calculate the ECFP fingerprint using smiles, and then I randomly split the input data into train, test, and valid sets based on 8:1:1 ratio using RandomStratifiedSplitter from deepchem package. After training my model using the train set and I want to make prediction on the test set to check model performance.
Here's what my data looks like (screenshot example):
(https://i.stack.imgur.com/8Hp36.png)
Here's my code:
# Import Packages
import numpy as np
import pandas as pd
import deepchem as dc
from sklearn.metrics import roc_auc_score, roc_curve, auc, confusion_matrix
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import initializers, regularizers
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input, Dropout, Reshape
from tensorflow.keras.optimizers import SGD
from rdkit import Chem
from rdkit.Chem import rdMolDescriptors
# Build Model
inputs = keras.Input(shape = (1024, ))
x = keras.layers.Dense(2000, activation='relu', name="dense2000",
kernel_initializer=initializers.RandomNormal(stddev=0.02),
bias_initializer=initializers.Ones(),
kernel_regularizer=regularizers.L2(l2=.0001))(inputs)
x = keras.layers.Dropout(rate=0.25)(x)
x = keras.layers.Dense(500, activation='relu', name='dense500')(x)
x = keras.layers.Dropout(rate=0.25)(x)
x = keras.layers.Dense(846, activation='relu', name='output1')(x)
logits = Reshape([423, 2])(x)
outputs = keras.layers.Softmax(axis=2)(logits)
Model1 = keras.Model(inputs=inputs, outputs=outputs, name='MTDNN')
Model1.summary()
opt = keras.optimizers.SGD(learning_rate=.0003, momentum=0.9)
def loss_function (output, labels):
loss = tf.nn.softmax_cross_entropy_with_logits(output,labels)
return loss
loss_fn = loss_function
Model1.compile(loss=loss_fn, optimizer=opt,
metrics=[keras.metrics.Accuracy(),
keras.metrics.AUC(),
keras.metrics.Precision(),
keras.metrics.Recall()])
for train, test, valid in split2:
trainX = pd.DataFrame(train.X)
trainy = pd.DataFrame(train.y)
trainy2 = tf.one_hot(trainy,2)
testX = pd.DataFrame(test.X)
testy = pd.DataFrame(test.y)
testy2 = tf.one_hot(testy,2)
validX = pd.DataFrame(valid.X)
validy = pd.DataFrame(valid.y)
validy2 = tf.one_hot(validy,2)
history = Model1.fit(x=trainX, y=trainy2,
shuffle=True,
epochs=10,
verbose=1,
batch_size=100,
validation_data=(validX, validy2))
y_pred = Model1.predict(testX)
y_pred2 = y_pred[:, :, 1]
y_pred3 = np.round(y_pred2)
# Check the # of nonzero in assay
(y_pred3!=0).sum () #all 0s
My questions are:
The roc and precision recall are all extremely high (>0.99), but the prediction result of test set contains all 0s, no actives at all. I also use the randomized dataset with same active:inactive ratio for each task to test if those values are too good to be true, and turns out all values are still above 0.99, including roc which is expected to be 0.5.
Can anyone help me to identify what is wrong with my model and how should I fix it please?
Can I use built-in functions in sklearn to calculate roc/accuracy/precision-recall? Or should I manually calculate the metrics based on confusion matrix on my own for multitasking purpose. Why and why not?

Polynomial Regression using keras

hi i am new to keras and i just wanted to know are ann's good for polynomial regression tasks or we shuold just
use sklearn for exmaple i write this script
import numpy as np
import keras
from keras.layers import Dense
from keras.models import Sequential
x=np.arange(1, 100)
y=x**2
model = Sequential()
model.add(Dense(units=200, activation = 'relu',input_dim=1))
model.add(Dense(units=200, activation= 'relu'))
model.add(Dense(units=1))
model.compile(loss='mean_squared_error',optimizer=keras.optimizers.SGD(learning_rate=0.001))
model.fit(x, y,epochs=2000)
but after testing it on some of numbers i didn't get good result like :
model.predict([300])
array([[3360.9023]], dtype=float32)
is there any problem in my code or i just shouldn't use ann's for polynomial regressions.
thank you.
I'm not 100 percent sure, but I think that the reason you are getting such bad predictions is because you did not scale your data. Artificial neural networks are extremely computationally intensive, and thus, scaling is a must. Scale your data as shown below:
import numpy as np
import keras
from keras.layers import Dense
from keras.models import Sequential
x=np.arange(1, 100)
y=x**2
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x = sc_x.fit_transform(x)
sc_y = StandardScaler()
y = sc_y.fit_transform(y)
model = Sequential()
model.add(Dense(units=5, activation = 'relu',input_dim=1))
model.add(Dense(units=5, activation= 'relu'))
model.add(Dense(units=1))
model.compile(loss='mean_squared_error',optimizer=keras.optimizers.SGD(learning_rate=0.001))
model.fit(x, y,epochs=75, batch_size=10)
prediction = sc_y.inverse_transform(model.predict(sc_x.transform([300])))
print(prediction)
Note that I changed the number of epochs from 2000 to 75. This is because 2000 epochs is way to high for a neural network, and it requires lots of time to train. Your X dataset contains only 100 values, so the maximum number of epochs I would suggest is 75.
Furthermore, I also changed the number of neurons in each hidden layer from 200 to 5. This is because 200 neurons is far to many for most datasets, let alone a small dataset of length 100.
These changes should ensure that your neural network produces more accurate predictions.
Hope that helped.

Question about enabling/disabling dropout with keras functional API

I am using Keras functional API to build a classifier and I am using the training flag in the dropout layer to enable dropout when predicting new instances (in order to get an estimate of the uncertainty). In order to get the expected response one needs to repeat this prediction several times, with keras randomly activating links in the dense layer, and of course it is computational expensive. Therefore, I would also like to have the option to not use dropout at the prediction phase, i.e., use all the network links. Does anyone know how I can do this? Following is a sample code of what I am doing. I tried to look if predict has any relevant parameter but does not seem like it does (?). I can technically train the same model without the training flag at the dropout layer, but I do not want to do this (or better I want to have a more clean solution, rather than having 2 different models).
from sklearn.datasets import make_circles
from keras.models import Sequential
from keras.utils import to_categorical
from keras.layers import Dense
from keras.layers import Dropout
import numpy as np
import keras
# generate a 2d classification sample dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
trainy = to_categorical(trainy)
testy = to_categorical(testy)
inputlayer = keras.layers.Input((2,))
d = keras.layers.Dense(500, activation = 'relu')(inputlayer)
d1 = keras.layers.Dropout(rate = .3)(d,training = True)
out = keras.layers.Dense(2, activation = 'softmax')(d1)
model = keras.Model(inputs = inputlayer, outputs = out)
model.compile(loss = 'categorical_crossentropy',metrics = ['accuracy'],optimizer='adam')
model.fit(x = trainX, y = trainy, validation_data=(testX, testy),epochs=1000, verbose=1)
# another prediction on a specific sample
print(model.predict(testX[0:1,:]))
# another prediction on the same sample
print(model.predict(testX[0:1,:]))
Running the above example I get the following output:
[[0.9230819 0.07691813]]
[[0.8222245 0.17777553]]
which is as expected, different class probabilities for the same input, since there is a random (de)activation of the links from the dropout layer.
Any suggestions on how I can enable/disable dropout at the prediction phase with the functional API?
Sure, you do not need to set the training flag when building the Dropout layer. After training your model you define this function:
mc_func = K.function([model.input, K.learning_phase()],
[model.output])
Then you call mc_func with your input and flag 1 to enable dropout, or 0 to disable it:
stochastic_pred = mc_func([some_input, 1])
deterministic_pred = mc_func([some_input, 0])

How to use LSTM to predict values from a different range than it was trained on

I try to train a simple LSTM to predict the next number in a sequence (1,2,3,4,5 --> 6).
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
xs = [[[(j+i)/100] for j in range(5)] for i in range(100)]
ys = [(i+5)/100 for i in range(100)]
x_train, x_test, y_train, y_test = train_test_split(xs, ys)
model = Sequential()
model.add(LSTM(1, input_shape=(5,1), return_sequences=True))
model.add(LSTM(1, return_sequences=False))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam', metrics=['accuracy'])
training = model.fit(x_train, y_train, epochs=200)
new_xs = np.array(xs)*5
new_ys = np.array(ys)*5
pred = model.predict(new_xs)
plt.scatter(range(len(pred)), pred, c='r')
plt.scatter(range(len(new_ys)), new_ys, c='b')
In order for the net to learn anything I had to normalize the training data (divided it by 100). It did work indeed for the data from the range it was trained on.
I want it to be able to predict the numbers form outside the range it was trained on, but as soon as it leaves the range, it starts to diverge:
When I increased the number of units in both LSTM layers to 30 it looks a little better, but it's still diverging:
Is LSTM capable of learning that task without adding an infinite number of units?
I was looking for an answer to the same question, and I came across the following paper from 2019 and the corresponding Git repo. In particular, see section 5.3 in the paper. It seems like ABBA-LSTM is the solution, though it depends on the time series problem you're trying to solve.

Neural Network In Scikit-Learn not producing meaningful results

I'm currently trying to use the scikit learn package for its neural network functionality. I have a complex problem to solve with it, but to start out I am just trying a couple of basic tests to familiarize myself with it. I have gotten it to do something, but it isn't producing meaningful results. My code:
import sklearn.neural_network.multilayer_perceptron as nnet
import numpy
def generateTargetDataset(expression="%s", generateRange=(-100,100), s=1000):
expression = expression.replace("x", "%s")
x = numpy.random.rand(s,)
y = numpy.zeros((s,), dtype="float")
numpy.multiply(x, abs(generateRange[1]-generateRange[0]), x)
numpy.subtract(x, min(generateRange), x)
for z in range(0, numpy.size(x)):
y[z] = eval(expression % (x[z]))
x = x.reshape(-1, 1)
outTuple = (x, y)
return(outTuple)
print("New Net + Training")
QuadRegressor = nnet.MLPRegressor(hidden_layer_sizes=(10), warm_start=True, verbose=True, learning_rate_init=0.00001, max_iter=10000, algorithm="sgd", tol=0.000001)
data = generateTargetDataset(expression="x**2", s=10000, generateRange=(-1,1))
QuadRegressor.fit(data[0], data[1])
print("Net Trained")
xt = numpy.random.rand(10000, 1)
yr = QuadRegressor.predict(xt)
yr = yr.reshape(-1, 1)
xt = xt.reshape(-1, 1)
numpy.multiply(xt, 100, xt)
numpy.multiply(yr, 10000, yr)
numpy.around(yr, 2, out=yr)
numpy.around(xt, 2, out=xt)
out = numpy.concatenate((xt, yr), axis=1)
numpy.set_printoptions(precision=4)
numpy.savetxt(fname="C:\\SCRATCHDIR\\numpydump.csv", X=out, delimiter=",")
I don't understand how to post the data it gives me, but it spits out between 7000 and 10000 for all inputs between 0 and 100. It seems to be correctly mapped very close to the top of the range, but for inputs close to 0, it just returns something near 7000.
EDIT: I forgot to add this. The network has the same behavior if I remove the dummy training to y=x, but I read somewhere that sometimes you can help a network along by training it to a different but closer function and then using that already weighted network as a starting ground. It didn't work but I just hadn't taken that bit out yet.
My recommendation is to reduce the number of neurons per layer, and increase the training dataset size. Right now, you have a lot of parameters to train in your network, and a small training set (~10K). However, the main point of my answer is that sklearn probably isn't a great choice for your end application.
So you have a complex problem you want to solve with neural networks?
I have a complex problem to solve with it, but to start out I am just trying a couple of basic tests to familiarize myself with it.
According to the official user guide, sklearn's implementation of neural networks isn't designed for large applications and is a lot less flexible than other options for deep learning.
One Python deep learning library I've had good experiences with is keras, a modular, easy-to-use library with GPU support.
Here's a sample I coded up that trains a single perceptron to do quadratic regression.
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD
import numpy as np
import matplotlib.pyplot as plt
model = Sequential()
model.add(Dense(1, init = 'uniform', input_dim=1))
model.add(Activation('sigmoid'))
model.compile(optimizer = SGD(lr=0.02, decay=1e-6, momentum=0.9, nesterov=True), loss = 'mse')
data = np.random.random(1000)
labels = data**2
model.fit(data.reshape((len(data),1)), labels, nb_epoch = 1000, batch_size = 128, verbose = 1)
tdata = np.sort(np.random.random(100))
tlabels = tdata**2
preds = model.predict(tdata.reshape((len(tdata), 1)))
plt.plot(tdata, tlabels)
plt.scatter(tdata, preds)
plt.show()
This outputs a scatter plot of the test data points, along with a plot of the true curve.
As you can see, the results are reasonable. In general, neural networks are hard to train, and I had to do some parameter tuning before I got this example working.
It looks like you're using Windows. This question may be helpful for installing Keras on Windows.

Resources