Request for improvement suggestion on my CNN learning model? - python-3.x

I’m trying to build a classification model for production line. If I understand correctly , it’s possible to use a CNN to classify numerical data .(and not only pictures)
My data is an array of 21 columns  per line:
20 different measurements and the last column is a type . It can be 0 or 1 or 2
each line of the array use a timestamp as index
type 0 represents 80 % of the production, and do not need extra treatment
but type 1 and 2 need extra treatment after production (so I need to clearly identify them)
To recreate something a CNN can use , I created a dataset where each label has for learning data an array made of the last previous 20 lines since it’s position .
So each label has for corresponding learning data , a square array of 20x20 measurements (like a picture ) .
(data already have been normalized using keras ColumnTransformer
after reading about unbalanced dataset , i decided to include only a type 0 each time I found a type 1 or 2 . At the end my dataset size is 18 000 lines , data shape '(18206, 20, 20)'
my learning model is pretty basic and looks like this :
train, test, train_label, test_label = train_test_split(X,y,test_size=0.3,shuffle=True)
##Call CNN model
sizePic = 20
model = Sequential()
model.add(Dense(sizePic*3, input_shape=(sizePic,sizePic,), activation='relu'))
model.add(Dense(sizePic, activation='relu'))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))
# Compile model
sgd = optimizers.SGD(lr=0.03)
model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
self.logger.info(model.summary())
# Fit the model
model.fit(train, train_label, epochs=750, batch_size=200,verbose=1)
# evaluate the model
self.learning_scores = model.evaluate(test, test_label, verbose=2)
self.logger.info("scores %r"%self.learning_scores)
at the end prediction scores are :
scores [0.6088506683505354, 0.7341632843017578]
I have been changing parameters like batch_size and learning rate , but with no big improvement . To my understanding, it's better to start this way than adding layers to the model , is this correct ?
Any suggestion ??
thanks for your time

You are not using any conv layer, only fully connected layers (and don't be afraid of adding some conv layers because they have way less parameters than dense layers)

Related

How can we use 2D Mean Square error (MSE) for a time series forecasting model to predict two virables

H!
I am working on predicting two variables using time series forecasting autoencoder model.
The dataset present the coordinates bounding box in video like x and y.
So let's say we have 6 people in video, so that mean we will have 12 variable as input in the model for each image.
The LSTM model doing a good job predicting just one variable. However, I want to predict two variables ( x and y) using autoencoder model.
I am using karas to this work. This is the model that I use.
model = Sequential()
model.add(LSTM(64, activation='relu', input_shape=(trainX.shape[1], trainX.shape[2]), return_sequences=True))
model.add(LSTM(32, activation='relu', return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(trainY.shape[2]))
cp1 = ModelCheckpoint('model1/', save_best_only=True)
model.compile(loss='mse', optimizer=Adam(learning_rate=0.0001), metrics=['mae', 'mape'])
model.summary()
I want to predict two variables ( x and y) using autoencoder model.
One of the suggestion was use two dimensional MSE, which is involved x and y values. But not sure how to do that.
I will really appropriate any suggestion or command that can be helpful

Question about understanding Weights of Keras LSTM model

I am implementing Federated Learning (FL) using Keras LSTM. (For this question, FL details are not necessary.)
Starting with the simple example where multiple models are trained at different clients. Each client shares their model weights with the server and (in this simple example), the model weights are averaged by the server and a global model is sent to the remaining clients. (Keeping long story short).
Keeping things further simple at this stage: I am using single LSTM unit, with input_shape = (1,1)
Now, when I tried to get the weights of Keras LSTM, it is a list of 3 arrays.
Weights[0] and Weights[1] elements are the floating point values, where as Weight[2] are the Binary 0/1 values. Is my understanding correct that Weight[2] is the On/OFF gate associated with the tanh gate?
Is there any information about these weights?
n_steps = 1
n_features = 1 # This indicates the number of past values
model1 = Sequential()
model1.add(LSTM(1, activation = 'relu', input_shape=(n_steps, n_features)))
model1.compile(loss='mae', optimizer = 'adamax')
Weights = model1.get_weights()
print(model1.summary())

Which lstm architecture for my data and what data process should I do

I'm trying to build LSTM architecture to predict sickness rate(0%-100%). My input is an array with dimension 4760x10 (of number of sick persons per town per age, number of consultation .....) My output or the y is the sickness rate.
I'm new in machine learning and I tried several tips like changing the optimzer, the layer node number and the dropout value and my model didn't converge(the lowest mse was =616.245). I tried also to scale my data with 'MinMaxScaler'. So could you guys help me with some advice to change the architecture or some data processing to help the model to converge.
here is the lstm model which give me the mse=616.245
def build_modelz4():
model = Sequential()
model.add(LSTM(10, input_shape=(1, 10), return_sequences=True))
model.add(LSTM(84, return_sequences= True))
model.add(LSTM(84, return_sequences=False))
model.add(Dense(1,activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'] )
model.summary()
return model
lstmz4 = build_modelz4()
checkpointer = ModelCheckpoint(filepath="weightslstmz4.hdf5", verbose=1, save_best_only=True)
newsclstmhis = lstmz4.fit(trainX,trainY,epochs=1000,batch_size=221, validation_data=(testX, testY) ,verbose=2, shuffle=False, callbacks=[checkpointer])
Notice that when I used the ann model it converge with mse=0.8. So with lstm it should converge
and thank you in advance
4760 is a very small number of dimensions for a LSTM.Plus its seems like a very simple classification model try using simpler algorithms like svm for the process but if you are adamant on using deep learning use Sequential model with Dense layer instead with few more layers than this one that should definitely give you better results.

Predicting a Time Series data with LSTM in Keras

I'm trying to prepare a model that will predict the first two numbers from a given array of numbers. So, the input dataset is like this -
[1 2 3 5]
[4 8 5 9]
[10 2 3 15]
Output will be -
[1 2]
[4 8]
[10 2]
So, the architectures of RNN are like below, (Taken from here)
Then, the basic architecture of I'm trying to achieve should be something close to this -
So, it should be a Many-To-Many network. (Resembles the fourth image)
Question - So, how can I create this type of model with Keras?
My Findings -
I tried something like this -
n_samples = 10000
input = np.random.randint(5,10, (n_samples,5))
output = input[...,0:2]
rinp = input.reshape(n_samples,1,5)
model = Sequential()
model.add(LSTM(10, input_shape=(1,5)))
model.add(Dense(2))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(rinp, output, epochs=1000, batch_size=500, verbose=1)
But as you can see, this is not even close. This is like an MLP. It does not utilize any time steps. Because, the input shape is - (n_samples,1,5). So, there is only one time step.
So, my implementation is wrong.
I've seen some examples of One-to-One, Many-to-One and Many-to-Many examples from here.
In Many-to-Many example, the author used the following code snippet.
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(1, length, 1)
y = seq.reshape(1, length, 1)
# define LSTM configuration
n_neurons = length
n_batch = 1
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(length, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result[0,:,0]:
print('%.1f' % value)
As you can see from the X and y values, the described model is like the one below -
Which is not the one I'm trying to achieve.
Any example regarding the architecture I'm trying to implement would be greatly helpful.
It looks like you are trying to build a Sequence-to-Sequence (seq2seq) model based on the drawing. There is a very nice tutorial online to get you started. Instead of predicting sentences, you can just predict fixed length tokens of length 2. This architecture and variants are often used for machine translation. Based on your data, I'm guessing you are trying to experiment with the problem of long-term dependencies in recurrent networks; otherwise it wouldn't make sense to use seq2seq for any practical purposes.

Accuracy goes to 0.0000 when training RNN with Keras?

I'm trying to use custom word-embeddings from Spacy for training a sequence -> label RNN query classifier. Here's my code:
word_vector_length = 300
dictionary_size = v.num_tokens + 1
word_vectors = v.get_word_vector_dictionary()
embedding_weights = np.zeros((dictionary_size, word_vector_length))
max_length = 186
for word, index in dictionary._get_raw_id_to_token().items():
if word in word_vectors:
embedding_weights[index,:] = word_vectors[word]
model = Sequential()
model.add(Embedding(input_dim=dictionary_size, output_dim=word_vector_length,
input_length= max_length, mask_zero=True, weights=[embedding_weights]))
model.add(Bidirectional(LSTM(128, activation= 'relu', return_sequences=False)))
model.add(Dense(v.num_labels, activation= 'sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = ['accuracy'])
model.fit(X_train, Y_train, batch_size=200, nb_epoch=20)
here the word_vectors are stripped from spacy.vectors and have length 300, the input is an np_array which looks like [0,0,12,15,0...] of dimension 186, where the integers are the token ids in the input, and I've constructed the embedded weight matrix accordingly. The output layer is [0,0,1,0,...0] of length 26 for each training sample, indicating the label that should go with this piece of vectorized text.
This looks like it should work, but during the first epoch the training accuracy is continually decreasing... and by the end of the first epoch/for the rest of training, it's exactly 0 and I'm not sure why this is happening. I've trained plenty of models with keras/TF before and never encountered this issue.
Any idea what might be happening here?
Are the labels always one-hot? Meaning only one of the elements of the label vector is one and the rest zero.
If so, then maybe try using a softmax activation with a categorical crossentropy loss like in the following official example:
https://github.com/fchollet/keras/blob/master/examples/babi_memnn.py#L202
This will help constraint the network to output probability distributions on the last layer (i.e. the softmax layer outputs sum up to 1).

Resources