Question about understanding Weights of Keras LSTM model - keras

I am implementing Federated Learning (FL) using Keras LSTM. (For this question, FL details are not necessary.)
Starting with the simple example where multiple models are trained at different clients. Each client shares their model weights with the server and (in this simple example), the model weights are averaged by the server and a global model is sent to the remaining clients. (Keeping long story short).
Keeping things further simple at this stage: I am using single LSTM unit, with input_shape = (1,1)
Now, when I tried to get the weights of Keras LSTM, it is a list of 3 arrays.
Weights[0] and Weights[1] elements are the floating point values, where as Weight[2] are the Binary 0/1 values. Is my understanding correct that Weight[2] is the On/OFF gate associated with the tanh gate?
Is there any information about these weights?
n_steps = 1
n_features = 1 # This indicates the number of past values
model1 = Sequential()
model1.add(LSTM(1, activation = 'relu', input_shape=(n_steps, n_features)))
model1.compile(loss='mae', optimizer = 'adamax')
Weights = model1.get_weights()
print(model1.summary())

Related

Request for improvement suggestion on my CNN learning model?

I’m trying to build a classification model for production line. If I understand correctly , it’s possible to use a CNN to classify numerical data .(and not only pictures)
My data is an array of 21 columns  per line:
20 different measurements and the last column is a type . It can be 0 or 1 or 2
each line of the array use a timestamp as index
type 0 represents 80 % of the production, and do not need extra treatment
but type 1 and 2 need extra treatment after production (so I need to clearly identify them)
To recreate something a CNN can use , I created a dataset where each label has for learning data an array made of the last previous 20 lines since it’s position .
So each label has for corresponding learning data , a square array of 20x20 measurements (like a picture ) .
(data already have been normalized using keras ColumnTransformer
after reading about unbalanced dataset , i decided to include only a type 0 each time I found a type 1 or 2 . At the end my dataset size is 18 000 lines , data shape '(18206, 20, 20)'
my learning model is pretty basic and looks like this :
train, test, train_label, test_label = train_test_split(X,y,test_size=0.3,shuffle=True)
##Call CNN model
sizePic = 20
model = Sequential()
model.add(Dense(sizePic*3, input_shape=(sizePic,sizePic,), activation='relu'))
model.add(Dense(sizePic, activation='relu'))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))
# Compile model
sgd = optimizers.SGD(lr=0.03)
model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
self.logger.info(model.summary())
# Fit the model
model.fit(train, train_label, epochs=750, batch_size=200,verbose=1)
# evaluate the model
self.learning_scores = model.evaluate(test, test_label, verbose=2)
self.logger.info("scores %r"%self.learning_scores)
at the end prediction scores are :
scores [0.6088506683505354, 0.7341632843017578]
I have been changing parameters like batch_size and learning rate , but with no big improvement . To my understanding, it's better to start this way than adding layers to the model , is this correct ?
Any suggestion ??
thanks for your time
You are not using any conv layer, only fully connected layers (and don't be afraid of adding some conv layers because they have way less parameters than dense layers)

Keras functional API: Can training be prevented branch-wise on a branching model?

I'm training a somewhat complicated model using Keras that involves multiple branches. Because of the structure of the problem, it doesn't make sense to train these pieces separately, but I don't want to apply the loss function for the main branch of the network to another branch that merges with the main branch (That branch has its own loss function and outputs). I was wondering if there's any way to accomplish this within the Functional API.
Here's a toy model definition that highlights the basic problem I'm running into. The layers themselves don't matter, what's important is the structure of the model:
#Utility for subtracting some tensors
def diff(two_tensors):
x, y = two_tensors
return x - y
#Two inputs
in_1 = Input((128,))
in_2 = Input((128,))
#Main branch: Does something to the first input
branch_1 = Dense(128, activation='relu')(in_1)
branch_1 = Dense(128, activation='relu')(branch_1)
#Auxilliary classifier definition:
classifier = Sequential()
classifier.add(Dense(128, activation='relu'))
classifier.add(Dense(1, activation='linear'))
#This model asserts a confidence, and we'll use a sigmoid to actually classify
#The classifier takes the second input and the result of the main branch
pred_a = classifier(branch_1)
pred_b = classifier(in_2)
class_out = Activation('sigmoid')
class_a = class_out(pred_a)
class_b = class_out(pred_b)
#We calculate the difference between these two confidences in a lambda
pred_diff = Lambda(diff)([pred_a, pred_b])
#The main model uses this difference for another classification
branch_join = concatenate([pred_diff, branch_1], axis=-1)
main_output = Dense(20, activation='softmax')(branch_join)
#The model outputs the aux classifier's choices and the main prediction
model = Model(inputs=[in_1, in_2], outputs=[main_output, class_a, class_b])
losses = { 'main_output': 'sparse_categorical_crossentropy'
, 'class_a': 'binary_crossentropy'
, 'class_b': 'binary_crossentropy'
}
I want to train the classifier in the middle of this model only on its classification accuracy for its two inputs. However, since it's hooked up to the main branch, I think the loss from the main output will also propagate through those layers. In many similar cases (like simple GANs), the answer would be to train the classifier separately and freeze the classifier when training the end-to-end system. However, that won't work for my use case, and I'm wondering if I can stop the loss from main_output from backpropagating into the classifier model, while still training it on its own outputs.

Which lstm architecture for my data and what data process should I do

I'm trying to build LSTM architecture to predict sickness rate(0%-100%). My input is an array with dimension 4760x10 (of number of sick persons per town per age, number of consultation .....) My output or the y is the sickness rate.
I'm new in machine learning and I tried several tips like changing the optimzer, the layer node number and the dropout value and my model didn't converge(the lowest mse was =616.245). I tried also to scale my data with 'MinMaxScaler'. So could you guys help me with some advice to change the architecture or some data processing to help the model to converge.
here is the lstm model which give me the mse=616.245
def build_modelz4():
model = Sequential()
model.add(LSTM(10, input_shape=(1, 10), return_sequences=True))
model.add(LSTM(84, return_sequences= True))
model.add(LSTM(84, return_sequences=False))
model.add(Dense(1,activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'] )
model.summary()
return model
lstmz4 = build_modelz4()
checkpointer = ModelCheckpoint(filepath="weightslstmz4.hdf5", verbose=1, save_best_only=True)
newsclstmhis = lstmz4.fit(trainX,trainY,epochs=1000,batch_size=221, validation_data=(testX, testY) ,verbose=2, shuffle=False, callbacks=[checkpointer])
Notice that when I used the ann model it converge with mse=0.8. So with lstm it should converge
and thank you in advance
4760 is a very small number of dimensions for a LSTM.Plus its seems like a very simple classification model try using simpler algorithms like svm for the process but if you are adamant on using deep learning use Sequential model with Dense layer instead with few more layers than this one that should definitely give you better results.

what exactly does 'tf.contrib.rnn.DropoutWrapper'' in tensorflow do? ( three citical questions)

As I know, DropoutWrapper is used as follows
__init__(
cell,
input_keep_prob=1.0,
output_keep_prob=1.0,
state_keep_prob=1.0,
variational_recurrent=False,
input_size=None,
dtype=None,
seed=None
)
.
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=0.5)
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)
the only thing I know is that it is use for dropout while training.
Here are my three questions
What are input_keep_prob,output_keep_prob and state_keep_prob respectively?
(I guess they define dropout probability of each part of RNN, but exactly
where?)
Is dropout in this context applied to RNN not only when training but also prediction process? If it's true, is there any way to decide whether I do or don't use dropout at prediction process?
As API documents in tensorflow web page, if variational_recurrent=True dropout works according to the method on a paper
"Y. Gal, Z Ghahramani. "A Theoretically Grounded Application of Dropout in Recurrent Neural Networks". https://arxiv.org/abs/1512.05287 " I understood this paper roughly. When I train RNN, I use 'batch' not single time-series. In this case, tensorflow automatically assign different dropout mask to different time-series in a batch?
input_keep_prob is for the dropout level (inclusion probability) added when fitting feature weights. output_keep_prob is for the dropout level added for each RNN unit output. state_keep_prob is for the hidden state that is fed to the next layer.
You can initialize each of the above mentioned parameters as follows:
import tensorflow as tf
dropout_placeholder = tf.placeholder_with_default(tf.cast(1.0, tf.float32))
tf.nn.rnn_cell.DropoutWrapper(tf.nn.rnn_cell.BasicRNNCell(n_hidden_rnn),
input_keep_prob = dropout_placeholder, output_keep_prob = dropout_placeholder,
state_keep_prob = dropout_placeholder)
The default dropout level will be 1 during prediction or anything else that we can feed during training.
The masking is done for the fitted weights rather than for the sequences that are included in the batch. As far as I know, it's done for the entire batch.
keep_prob = tf.cond(dropOut,lambda:tf.constant(0.9), lambda:tf.constant(1.0))
cells = rnn.DropoutWrapper(cells, output_keep_prob=keep_prob)

Decimal accuracy of output layer in keras

I am working on a project that predicts drug synergy values based on various input features related to them. The synergy values are floating point numbers and so I would like to set an accuracy range for my neural network.
eg - Say the actual value is 1.342423 and my model predicts 1.30123 then the output this should be treated as correct ouput.
In other words I would like to limit the amount of decimal places that are checked to compare the actual answer and the predicted answer.
Neural Net :
model = Sequential()
act = 'relu'
model.add(Dense(430, input_shape=(3,)))
model.add(Activation(act))
model.add(Dense(256))
model.add(Activation(act))
model.add(Dropout(0.42))
model.add(Dense(148))
model.add(Activation(act))
model.add(Dropout(0.3))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
Complete source code for learning and train/test data :
https://github.com/tanmay-edgelord/Drug-Synergy-Models/blob/master
Please ask for any additional details that are required
(Using Keras with TensorFlow backend)
Create a custom metric:
import keras.backend as K
def myAccuracy(y_true, y_pred):
diff = K.abs(y_true-y_pred) #absolute difference between correct and predicted values
correct = K.less(diff,0.05) #tensor with 0 for false values and 1 for true values
return K.mean(correct) #sum all 1's and divide by the total.
Then use it in the model compilation:
model.compile(metrics=[myAccuracy],....)

Resources