Optimizing two estimators (dependent on each other) using Sklearn Grid Search - scikit-learn

The flow of my program is in two stages.
I am using Sklearn ExtraTreesClassifier along with SelectFromModelmethod to select the most important features. Here it should be noted that the ExtraTreesClassifier takes many parameters as input like n_estimators etc for classification and eventually giving different set of important features for different values of n_estimators via SelectFromModel. This means that I can optimize the n_estimators to get the best features.
In the second stage, I am traing my NN keras model based on the features selected in the first stage. I am using AUROC as the score for grid search but this AUROC is calculated using Keras based neural network. I want to use Grid Search for n_estimators in my ExtraTreesClassifier to optimize the AUROC of keras neural Network. I know I have to use Pipline but I am confused in implementing both together. I don't know where to put Pipeline in my code. I am getting an error which saysTypeError: estimator should be an estimator implementing 'fit' method, <function fs at 0x0000023A12974598> was passed
#################################################################################
I concatenate the CV set and the train set so that I may select the most important features
in both CV and Train together.
##############################################################################
frames11 = [train_x_upsampled, cross_val_x_upsampled]
train_cv_x = pd.concat(frames11)
frames22 = [train_y_upsampled, cross_val_y_upsampled]
train_cv_y = pd.concat(frames22)
def fs(n_estimators):
m = ExtraTreesClassifier(n_estimators = tree_number)
m.fit(train_cv_x,train_cv_y)
sel = SelectFromModel(m, prefit=True)
##################################################
The code below is to get the names of the selected important features
###################################################
feature_idx = sel.get_support()
feature_name = train_cv_x.columns[feature_idx]
feature_name =pd.DataFrame(feature_name)
X_new = sel.transform(train_cv_x)
X_new =pd.DataFrame(X_new)
######################################################################
So Now the important features selected are in the data-frame X_new. In
code below, I am again dividing the data into train and CV but this time
only with the important features selected.
####################################################################
train_selected_x = X_new.iloc[0:train_x_upsampled.shape[0], :]
cv_selected_x = X_new.iloc[train_x_upsampled.shape[0]:train_x_upsampled.shape[0]+cross_val_x_upsampled.shape[0], :]
train_selected_y = train_cv_y.iloc[0:train_x_upsampled.shape[0], :]
cv_selected_y = train_cv_y.iloc[train_x_upsampled.shape[0]:train_x_upsampled.shape[0]+cross_val_x_upsampled.shape[0], :]
train_selected_x=train_selected_x.values
cv_selected_x=cv_selected_x.values
train_selected_y=train_selected_y.values
cv_selected_y=cv_selected_y.values
##############################################################
Now with this new data which only contains the important features,
I am training a neural network as below.
#########################################################
def create_model():
n_x_new=train_selected_x.shape[1]
model = Sequential()
model.add(Dense(n_x_new, input_dim=n_x_new, kernel_initializer='glorot_normal', activation='relu'))
model.add(Dense(10, kernel_initializer='glorot_normal', activation='relu'))
model.add(Dropout(0.8))
model.add(Dense(1, kernel_initializer='glorot_normal', activation='sigmoid'))
optimizer = keras.optimizers.Adam(lr=0.001)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
seed = 7
np.random.seed(seed)
model = KerasClassifier(build_fn=create_model, epochs=20, batch_size=400, verbose=0)
n_estimators=[10,20,30]
param_grid = dict(n_estimators=n_estimators)
grid = GridSearchCV(estimator=fs, param_grid=param_grid,scoring='roc_auc',cv = PredefinedSplit(test_fold=my_test_fold), n_jobs=1)
grid_result = grid.fit(np.concatenate((train_selected_x, cv_selected_x), axis=0), np.concatenate((train_selected_y, cv_selected_y), axis=0))

I created a pipeline using keras classifier and a function. The function is not satisfying the conditions of sklearn custom estimator. Still , I am not getting it right.
def feature_selection(n_estimators=10):
m = ExtraTreesClassifier(n_estimators)
m.fit(train_cv_x,train_cv_y)
sel = SelectFromModel(m, prefit=True)
print(" Getting features names ")
print(" ")
feature_idx = sel.get_support()
feature_name = train_cv_x.columns[feature_idx]
feature_name =pd.DataFrame(feature_name)
X_new = sel.transform(train_cv_x)
X_new =pd.DataFrame(X_new)
print(" adding names and important feature values ")
print(" ")
X_new.columns = feature_name
print(" dividing the imporrtant features into train and test ")
print(" ")
#-----------ARE Data splitting Value-------------
train_selected_x = X_new.iloc[0:train_x_upsampled.shape[0], :]
cv_selected_x = X_new.iloc[train_x_upsampled.shape[0]:train_x_upsampled.shape[0]+cross_val_x_upsampled.shape[0], :]
train_selected_y = train_cv_y.iloc[0:train_x_upsampled.shape[0], :]
cv_selected_y = train_cv_y.iloc[train_x_upsampled.shape[0]:train_x_upsampled.shape[0]+cross_val_x_upsampled.shape[0], :]
##################################################
print(" Converting the selected important festures on train and test into numpy array to be suitable for NN model ")
print(" ")
train_selected_x=train_selected_x.values
cv_selected_x=cv_selected_x.values
train_selected_y=train_selected_y.values
cv_selected_y=cv_selected_y.values
print(" Now test fold ")
my_test_fold = []
for i in range(len(train_selected_x)):
my_test_fold.append(-1)
for i in range(len(cv_selected_x)):
my_test_fold.append(0)
print(" Now after test fold ")
return my_test_fold,train_selected_x,cv_selected_x,train_selected_y,cv_selected_y
def create_model():
n_x_new=X_new.shape[1]
np.random.seed(6000)
model_new = Sequential()
model_new.add(Dense(n_x_new, input_dim=n_x_new, kernel_initializer ='he_normal', activation='sigmoid'))
model_new.add(Dense(10, kernel_initializer='he_normal', activation='sigmoid'))
model_new.add(Dropout(0.3))
model_new.add(Dense(1, kernel_initializer='he_normal', activation='sigmoid'))
model_new.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_crossentropy'])
return model_new
pipeline = pipeline.Pipeline(steps=[('featureselection', custom_classifier()),('nn',KerasClassifier(build_fn=model, nb_epoch=10, batch_size=1000,
verbose=0))])
n_estimators=[10,20,30,40]
param_grid = dict(n_estimators=n_estimators)
grid = GridSearchCV(estimator=pipeline, param_grid=param_grid,scoring='roc_auc',cv = PredefinedSplit(test_fold=my_test_fold), n_jobs=1)
grid_result = grid.fit(np.concatenate((train_selected_x, cv_selected_x), axis=0), np.concatenate((train_selected_y, cv_selected_y), axis=0))

This is how I built my own custom transformer.
class fs(TransformerMixin, BaseEstimator):
def __init__(self, n_estimators=10 ):
self.ss=None
self.n_estimators = n_estimators
self.x_new = None
def fit(self, X, y):
m = ExtraTreesClassifier(10)
m.fit(X,y)
self.ss = SelectFromModel(m, prefit=True)
return self
def transform(self, X):
self.x_new=self.ss.transform(X)
print(np.shape(self.x_new))
return self.x_new

Related

Pytorch many-to-many time series LSTM always predicts the mean

I want to create an LSTM model using pytorch that takes multiple time series and creates predictions of all of them, a typical "many-to-many" LSTM network.
I am able to achieve what I want in keras. I create a set of data with three variables which are simply linearly spaced with some gaussian noise. Training the keras model I get a prediction 12 steps ahead that is reasonable.
When I try the same thing in pytorch the, model will always predict the mean of the input data. This is confirmed when looking at the loss during training I can see that the model never seems to perform better than just predicting the mean.
TL;DR; The question is: How can I achieve the same thing in pytorch as in the keras example in the gist below?
Full working examples are available here https://gist.github.com/jonlachmann/5cd68c9667a99e4f89edc0c307f94ddb
The keras network is defined as
model = Sequential()
model.add(LSTM(100, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(100, activation='relu'))
model.add(Dense(n_features))
model.compile(optimizer='adam', loss='mse')
and the pytorch network is
# Define the pytorch model
class torchLSTM(torch.nn.Module):
def __init__(self, n_features, seq_length):
super(torchLSTM, self).__init__()
self.n_features = n_features
self.seq_len = seq_length
self.n_hidden = 100 # number of hidden states
self.n_layers = 1 # number of LSTM layers (stacked)
self.l_lstm = torch.nn.LSTM(input_size=n_features,
hidden_size=self.n_hidden,
num_layers=self.n_layers,
batch_first=True)
# according to pytorch docs LSTM output is
# (batch_size,seq_len, num_directions * hidden_size)
# when considering batch_first = True
self.l_linear = torch.nn.Linear(self.n_hidden * self.seq_len, 3)
def init_hidden(self, batch_size):
# even with batch_first = True this remains same as docs
hidden_state = torch.zeros(self.n_layers, batch_size, self.n_hidden)
cell_state = torch.zeros(self.n_layers, batch_size, self.n_hidden)
self.hidden = (hidden_state, cell_state)
def forward(self, x):
batch_size, seq_len, _ = x.size()
lstm_out, self.hidden = self.l_lstm(x, self.hidden)
# lstm_out(with batch_first = True) is
# (batch_size,seq_len,num_directions * hidden_size)
# for following linear layer we want to keep batch_size dimension and merge rest
# .contiguous() -> solves tensor compatibility error
x = lstm_out.contiguous().view(batch_size, -1)
return self.l_linear(x)

How could we use Bahdanau attention in a stacked LSTM model?

I aim to use attention in a stacked LSTM model, but I don't know how to add AdditiveAttention mechanism of Keras between encoder and decoder layers. Let say, we have an input layer, an encoder, and a decoder, and a dense classification layer, and we aim our decoder to pay attention on all the hidden states of the encoder (h = [h1, ..., hT]) in deriving its outputs. Is there any high-level coding using the Keras whereby I can do? For example,
input_layer = Input(shape=(T, f))
x = input_layer
x = LSTM(num_neurons1, return_sequences=True)(x)
# Adding attention here, but I don't know how?
x = LSTM(num_neurons2)(x)
output_layer = Dense(1, 'sigmoid')(x)
model = Model(input_layer, output_layer)
...
I think this is wrong to use: x = AdditiveAttention(x, x). Am I right?
Maybe it is helpful for your issue ?
This is a classification model with LSTM and attention for classification on
character-level:
first create a custom layer for attention :
class attention(Layer):
def init(self,**kwargs):
super(attention,self).init(**kwargs)
def build(self,input_shape):
self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1),
initializer='random_normal', trainable=True)
self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1),
initializer='zeros', trainable=True)
super(attention, self).build(input_shape)
def call(self,x):
# Alignment scores. Pass them through tanh function
e = K.tanh(K.dot(x,self.W)+self.b)
# Remove dimension of size 1
e = K.squeeze(e, axis=-1)
# Compute the weights
alpha = K.softmax(e)
# Reshape to tensorFlow format
alpha = K.expand_dims(alpha, axis=-1)
# Compute the context vector
context = x * alpha
context = K.sum(context, axis=1)
return context
LEN_CHA = 64 # number of characters
LEN_Input = 110 # depend on the longest sentence, padded with zero
def LSTM_model_attention(Labels=3):
model = Sequential()
model.add(Embedding(LEN_CHA, EMBEDDING_DIM, input_length=LEN_INPUT))
model.add(SpatialDropout1D(0.7))
model.add(Bidirectional(LSTM(256, return_sequences=True)))
model.add(attention())
model.add(Dense(Labels, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
return model
LSTM_attention = LSTM_model_attention()
LSTM_attention.summary()

with tensorflow and keras how to find the "category" of a given string

Hello ML/AI newbie here,
I'm asking this question because I've no idea about machine learning, ai, e.t.c and I've no idea how to continue, what questions to ask. Even if I accidentally find the solution i wouldn't know.
Ok, I followed this tutorial about "Text Classification" and it went pretty well, no problems up to here.
https://www.youtube.com/watch?v=6g4O5UOH304&list=WL&index=8&t=0s
It classifies IMDB comments and checks if a review is "Positive" or "Negative", "0" or "1"
My question is
Let say I've my own dataset, similar to IMDB but instead of "0" and "1" I have several categories as numbers like "1,2,3,4,5,6,7,8,9,10,11,12,...." for each string. So I need it to return one of these numbers (since it's learning let say two of them if it can't decide)
What should I do?
A link to a tutorial related to what I need would be great too.
import tensorflow as tf
from tensorflow import keras
import numpy as np
data = keras.datasets.imdb
(train_data, train_labels), (test_data, test_labels) = data.load_data(num_words=3000)
word_index = data.get_word_index()
word_index = {k:(v+3) for k, v in word_index.items()}
word_index["<PAD>"] = 0;
word_index["<START>"] = 1;
word_index["<UNK>"] = 2;
word_index["<UNUSED>"] = 3;
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
train_data = keras.preprocessing.sequence.pad_sequences(train_data, value=word_index["<PAD>"], padding="post", maxlen=250)
test_data = keras.preprocessing.sequence.pad_sequences(test_data, value=word_index["<PAD>"], padding="post", maxlen=250)
def decode_review(text):
return " ".join([reverse_word_index.get(i, "?") for i in text])
model = keras.Sequential()
model.add(keras.layers.Embedding(10000, 6))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(16, activation="relu"))
model.add(keras.layers.Dense(1, activation="sigmoid"))
#model.summary()
model.compile(optimizer="adam", loss="binary_crossentropy", metrics="accuracy")
x_val = train_data[:10000]
x_train = train_data[10000:]
y_val = train_labels[:10000]
y_train = train_labels[10000:]
fitModel = model.fit(x_train, y_train, epochs=40, batch_size=512, validation_data=(x_val, y_val), verbose=1)
results = model.evaluate(test_data, test_labels)
print(results)
for index in range(20):
test_review = test_data[index]
predict = model.predict([test_review])
if predict[0] > 0.8:
print(decode_review(test_data[index]))
print(str(predict[0]))
print(str(test_labels[index]))
your task is a multiclass classification problem and for this reason, you have to modify your output layer. you have two possibilities.
if you have 1D integer encoded target you can use sparse_categorical_crossentropy as loss function, softmax as the last activation and the dimension of the last dense output equal to the number of class to predict
X = np.random.randint(0,10, (1000,100))
y = np.random.randint(0,3, 1000)
model = Sequential([
Dense(128, input_dim = 100),
Dense(3, activation='softmax'),
])
model.summary()
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
history = model.fit(X, y, epochs=3)
Otherwise, if you have one-hot encoded your target you can use categorical_crossentropy, softmax as the last activation and the dimension of the last dense output equal to the number of class to predict
X = np.random.randint(0,10, (1000,100))
y = pd.get_dummies(np.random.randint(0,3, 1000)).values
model = Sequential([
Dense(128, input_dim = 100),
Dense(3, activation='softmax'),
])
model.summary()
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
history = model.fit(X, y, epochs=3)
the usage of softmax enables to interpret the output as probability scores which sum to 1
when you compute the final prediction, to obtain the predicted class you can simply to in this way np.argmax(model.predict(X), axis=1)
these are some basic tutorials for multiclass text classification:
https://towardsdatascience.com/multi-class-text-classification-with-lstm-using-tensorflow-2-0-d88627c10a35
https://towardsdatascience.com/multi-class-text-classification-with-lstm-1590bee1bd17

TensorFlow 2.0 GradientTape with EarlyStopping

I am using Python 3.7.5 and TensorFlow 2.0's 'GradientTape' API for classification of MNIST dataset using 300 100 dense fully connected architecture. I would like to use TensorFlow's 'EarlyStopping' with GradientTape() so that the training stops according to the variable being watched or monitored and according to patience parameters.
The code I have is below:
# Use tf.data to batch and shuffle the dataset
train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train)).shuffle(100).batch(batch_size)
test_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test)).batch(batch_size)
# Choose an optimizer and loss function for training-
loss_fn = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adam(lr = 0.001)
def create_nn_gradienttape():
"""
Function to create neural network for use
with GradientTape API following MNIST
300 100 architecture
"""
model = Sequential()
model.add(
Dense(
units = 300, activation = 'relu',
kernel_initializer = tf.keras.initializers.GlorotNormal,
input_shape = (784,)
)
)
model.add(
Dense(
units = 100, activation = 'relu',
kernel_initializer = tf.keras.initializers.GlorotNormal
)
)
model.add(
Dense(
units = 10, activation = 'softmax'
)
)
return model
# Instantiate the model to be trained using GradientTape-
model = create_nn_gradienttape()
# Select metrics to measure the error & accuracy of model.
# These metrics accumulate the values over epochs and then
# print the overall result-
train_loss = tf.keras.metrics.Mean(name = 'train_loss')
train_accuracy = tf.keras.metrics.BinaryAccuracy(name = 'train_accuracy')
test_loss = tf.keras.metrics.Mean(name = 'test_loss')
test_accuracy = tf.keras.metrics.BinaryAccuracy(name = 'train_accuracy')
# Use tf.GradientTape to train the model-
#tf.function
def train_step(data, labels):
"""
Function to perform one step of Gradient
Descent optimization
"""
with tf.GradientTape() as tape:
# 'training=True' is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
# predictions = model(data, training=True)
predictions = model(data)
loss = loss_fn(labels, predictions)
# 'gradients' is a list variable!
gradients = tape.gradient(loss, model.trainable_variables)
# IMPORTANT:
# Multiply mask with computed gradients-
# List to hold element-wise multiplication between-
# computed gradient and masks-
grad_mask_mul = []
# Perform element-wise multiplication between computed gradients and masks-
for grad_layer, mask in zip(gradients, mask_model_stripped.trainable_weights):
grad_mask_mul.append(tf.math.multiply(grad_layer, mask))
# optimizer.apply_gradients(zip(gradients, model.trainable_variables))
optimizer.apply_gradients(zip(grad_mask_mul, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, predictions)
#tf.function
def test_step(data, labels):
"""
Function to test model performance
on testing dataset
"""
# training=False is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = model(data)
t_loss = loss_fn(labels, predictions)
test_loss(t_loss)
test_accuracy(labels, predictions)
EPOCHS = 15
for epoch in range(EPOCHS):
# Reset the metrics at the start of the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
test_loss.reset_states()
test_accuracy.reset_states()
for x, y in train_ds:
train_step(x, y)
for x_t, y_t in test_ds:
test_step(x_t, y_t)
template = 'Epoch {0}, Loss: {1:.4f}, Accuracy: {2:.4f}, Test Loss: {3:.4f}, Test Accuracy: {4:4f}'
print(template.format(epoch + 1,
train_loss.result(), train_accuracy.result()*100,
test_loss.result(), test_accuracy.result()*100))
# Count number of non-zero parameters in each layer and in total-
# print("layer-wise manner model, number of nonzero parameters in each layer are: \n")
model_sum_params = 0
for layer in model.trainable_weights:
# print(tf.math.count_nonzero(layer, axis = None).numpy())
model_sum_params += tf.math.count_nonzero(layer, axis = None).numpy()
print("Total number of trainable parameters = {0}\n".format(model_sum_params))
In the code above, How can I use 'tf.keras.callbacks.EarlyStopping' with GradientTape() API ?
Thanks!

MNIST Data set up batch

I am at the step of train the model. However, when I apply the code from a tutorial: batch_x, batch_y = mnist.train.next_batch(50). It shows that there is no attribute 'train' in the TensorFlow model. I know it is outdated code, and I tried to convert to new version of TensorFlow. However, i couldn't find a matching code that can do the same thing as the above line of code do. I bet there is a way but I couldn't come up with one solution.
I found a method that asked me to use tf.data.Dataset.batch(batch_size).
I tried the following way, but none of them works.
a. batch_x, batch_y = mnist.train.next_batch(50)
b. batch_x, batch_y = tf.data.Dataset.batch(batch_size)
c. batch_x, batch_y = tf.data.Dataset.batch(50)
d. batch_x, batch_y = mnist.batch(50)
with tf.Session() as sess:
#FIrst, run vars_initializer to initialize all variables
sess.run(vars_initializer)
for i in range(steps):
#Each batch: 50 images
batch_x, batch_y = mnist.train.next_batch(50)
#Train the model
#Dropout keep_prob (% to keep): 0.5 --> 50% will be dropped out
sess.run(cnn_trainer, feed_dict={x: batch_x, y_true: batch_y, hold_prob: 0.5})
#Test the model: at each 100th step
#Run this block of code for each 100 times of training, each time run a batch
if i % 100 == 0:
print('ON STEP: {}'.format(i))
print('ACCURACY: ')
#Compare to find matches of y_pred and y_true
matches = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y_true, 1))
#Cast the matches from integers to tf.float32
#Calculate the accuracy using the mean of matches
acc = tf.reduce_mean(tf.cast(matches, tf.float32))
#Test the model at each 100th step
#Using test dataset
#Dropout: NONE because of test, not training.
test_accuracy = sess.run(acc, feed_dict = {x:mnist.test.images, y_true:mnist.test.labels, hold_prob:1.0})
print(test_accuracy)
print('\n')
You can use tf.keras.datasets.mnist.load_data. It returns a tuple of Numpy arrays: (x_train, y_train), (x_test, y_test).
After that, you need to create dataset object using Dataset API. This will create training dataset. Test dataset could be created in the same fashion.
train, test = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices((train[0], train[1]))
Then, to create batch, you need to apply batch function to it
dataset = dataset.batch(1)
To output it contents or use it in training you need to create iterator. Code below creates most common iterator and outputs element of batch_size in this case 1.
iterator = dataset.make_one_shot_iterator()
with tf.Session() as sess:
print(sess.run(iterator.get_next())
Please read https://www.tensorflow.org/guide/datasets
This uses TensorFlow 1.11.0 and Keras and intended to show how to use the batch. You have to adapt it to your need.
import tensorflow as tf
from tensorflow import keras as k
(x_train, y_train), (X_test, Y_test) = tf.keras.datasets.mnist.load_data()
X_train = x_train.reshape(x_train.shape[0], 28, 28,1)
y_train = tf.keras.utils.to_categorical(y_train,10)
X_test = X_test.reshape(X_test.shape[0], 28, 28,1)
Y_test = tf.keras.utils.to_categorical(Y_test,10)
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_dataset = train_dataset.batch(32)
test_dataset = tf.data.Dataset.from_tensor_slices((X_test, Y_test))
test_dataset = test_dataset.batch(32)
model = tf.keras.models.Sequential([
tf.keras.layers.Convolution2D(32, (2, 2), activation='relu', input_shape=(28, 28,1)),
tf.keras.layers.MaxPool2D(pool_size=2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128),
tf.keras.layers.Activation('relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(10, activation='softmax')
])
tbCallback = [
k.callbacks.TensorBoard(
log_dir="D:/TensorBoard", histogram_freq=1, write_graph=True, write_images=True
)
]
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_dataset, epochs = 10, steps_per_epoch = 30,validation_data=test_dataset,validation_steps=1, callbacks=tbCallback)

Resources