Am I doing this right on Keras? - keras

I am doing a keras implementation of sequence classification using 2 BLSTM and 2 FC layer. My data shape is (2655,219,835) where 219 is steps and 835 is number of features. Training is with 1800 intsances and test is with 855 instances. There are 4 classification groups- 0,1,2,3 and They are converted to vectors using to_categorical function. The network implementation code is below:
model = Sequential()
model.add(Masking(mask_value=0., input_shape=(219,835)))
model.add(Bidirectional(LSTM(128,return_sequences=True)))
model.add(Bidirectional(LSTM(128)))
model.add(Dense(256,activation='relu'))
model.add(Dense(256,activation='relu'))
model.add(Dense(4, activation='softmax'))
adam=keras.optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
history=model.fit(train_data, label_train, epochs=30, validation_split=0.33, batch_size=128, verbose=1)
pred=model.predict(test_data)
Now, the problem is I am doing way too low accuracy than the reported accuracy in the paper. Could you please help me to find if I am making any mistake in here ?
Thanks!

Related

Using pretrained gensim Word2vec embedding along with data set in keras

Dear all, I have trained word2vec in gensim using Wikipedia data and saved using following program.
model = Word2Vec(LineSentence(inp), size=300, window=5, min_count=5, max_final_vocab=500000,
workers=multiprocessing.cpu_count())
model.save("outp1")
I want use this model in keras for multi-class Text Classification, What changes I need to do in the following code
model = Sequential()
model.add(Embedding(MAX_NB_WORDS, EMBEDDING_DIM, input_length=X.shape[1]))
model.add(SpatialDropout1D(0.2))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
epochs = 5
batch_size = 64
history = model.fit(X_train, Y_train, epochs=epochs,
batch_size=batch_size,validation_split=0.1,callbacks=[EarlyStopping(monitor='val_loss', patience=3, min_delta=0.0001)])
accr = model.evaluate(X_test,Y_test)
Actually I am new and trying to learn.

How to get 90%+ test accuracy on IMDB data?

I was trying to train a model using IMDB data. I am getting expected train accuracy about 96%+ but I am not satisfied with the test accuracy.Now my expectation is to get 90%+ test accuracy on test data. I tried by using several classifier but each time I am getting 84% to 89% accuracy on test data. Here I am going to include some classifiers I already tried. Most of the cases I tried some parameter tuning by increasing epoch or changing the optimizer. Now my concern is how can I increase the test accuracy to 90%+ .
Classifiers I tried so far:
First:
model = Sequential()
model.add(Embedding(vocab_size, 32, input_length = max_words))
model.add(Bidirectional(LSTM(32, return_sequences = True)))
model.add(GlobalMaxPool1D())
model.add(Dense(20, activation="relu"))
model.add(Dropout(0.05))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train,y_train,validation_data=(x_test, y_test),epochs=10,batch_size=100)
Second:
model = Sequential([
Embedding(vocab_size, 32, input_length=max_words),
Dropout(0.2),
ZeroPadding1D(padding=1),
Convolution1D(64, 5, activation='relu'),
Dropout(0.2),
MaxPooling1D(),
Flatten(),
Dense(100, activation='relu'),
Dropout(0.2),
Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train,y_train,validation_data=(x_test, y_test),epochs=10,batch_size=100)
By checking on State-of-the-art analysis on IMDB dataset, I don't think you can get to ^90% with simple models like those you are using. However, you may try using pretrained embedding like glove instead of training your own embedding. Also, I found this repo have BERT implementation in keras, providing demo of IMBD classification, it is able to get ~99% acc.

How can I get intermediate parameters after a training batch in keras?

I have trained a keras LSTM model. But after training, all i get is the final parameters of the models after training with 10 epochs and batch size=120. How can i get intermediate parameter after a batch keras.
Example: after 120 sample in each batch i can get the intermediate parameter of this step.
I have tried callback method and backend in keras, but i do not know how to get the
'''python
model = Sequential()
model.add(Embedding(max_features, 32))
#model.add(LSTM(32, return_sequences=True, input_shape=(1,texts.shape[0])))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
history_ltsm = model.fit(texts_train, y_train, epochs=10, batch_size=120, validation_split=0.2)
'''
I expected the model run step by step based on each batch to show the intermediate parameters, but not the all epochs.
Thank you very much!

Find Most Important Input from a Neural Network

I trained a neural network with 37 Inputs. It has around 85% accuracy. Is it possible for me to find out which Input has the most effect. I tried this code but I cannot figure out how to find most important Input
weights = model.layers[0].get_weights()[0]
biases = model.layers[0].get_weights()[1]
One possible solution is to wrap your model with keras.wrappers.scikit_learn and then use Recursive Feature elimination in scikit-learn:
def create_model():
# create model
model = Sequential()
model.add(Dense(512, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=128, verbose=0)
rfe = RFE(estimator=model, n_features_to_select=1, step=1)
rfe.fit(X, y)
ranking = rfe.ranking_.reshape(digits.images[0].shape)
# Plot pixel ranking
plt.matshow(ranking, cmap=plt.cm.Blues)
plt.colorbar()
plt.title("Ranking of pixels with RFE")
plt.show()
If you need to visualize weights see here.

From SKLearn to Keras - What is the difference?

I'm trying to go from SKLearn to Keras in order to make specific improvements to my models.
However, I can't get the same performance I had with my SKLearn model :
mlp = MLPClassifier(
solver='adam', activation='relu',
beta_1=0.9, beta_2=0.999, learning_rate='constant',
alpha=0, hidden_layer_sizes=(238,),
max_iter=300
)
dev_score(mlp)
Gives ~0.65 score everytime
Here is my corresponding Keras code :
def build_model(alpha):
level_moreargs = {'kernel_regularizer':l2(alpha), 'kernel_initializer': 'glorot_uniform'}
model = Sequential()
model.add(Dense(units=238, input_dim=X.shape[1], **level_moreargs))
model.add(Activation('relu'))
model.add(Dense(units=class_names.shape[0], **level_moreargs)) # output
model.add(Activation('softmax'))
model.compile(loss=keras.losses.categorical_crossentropy, # like sklearn
optimizer=keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0),
metrics=['accuracy'])
return model
k_dnn = KerasClassifier(build_fn=build_model, epochs=300, batch_size=200, validation_data=None, shuffle=True, alpha=0.5, verbose=0)
dev_score(k_dnn)
From looking at the documentation (and digging into SKLearn code), this should correspond exactly to the same thing.
However, I get ~0.5 accuracy when I run this model, which is very bad.
And if I set alpha to 0, SKLearn's score barely changes (0.63), while Keras's goes random from 0.2 to 0.4.
What is the difference between these models ? Why is Keras, although being supposed to be better than SKLearn, outperformed by so far here ? What's my mistake ?
Thanks,

Resources