I am currently working on my first ever LSTM model used for time series forecasting. I managed to build, run and tune the hp for the LSTM model in keras, but now I want to give it a try to PyTorch too. I really struggle converting my model to PyTorch framework. Also I want to migrate from keras.tuner to RayTune for hyperparameter optimization.
This is my model in keras that I want to implement in PyTorch:
def build_model(hp):
model = Sequential()
model.add(LSTM(hp.Int('input_unit',min_value=32,max_value=512,step=32),return_sequences=True, input_shape=(train_X.shape[1],train_X.shape[2])))
for i in range(hp.Int('n_layers', 2, 6)):
model.add(LSTM(hp.Int(f'lstm_{i}_units',min_value=32,max_value=512,step=32),return_sequences=True))
model.add(LSTM(hp.Int('layer_2_neurons',min_value=32,max_value=512,step=32)))
model.add(Dropout(hp.Float('Dropout_rate',min_value=0,max_value=0.5,step=0.1)))
model.add(Dense(train_y.shape[1], activation=hp.Choice('dense_activation',values=['relu', 'sigmoid'],default='relu')))
model.compile(loss='mean_squared_error', optimizer='adam',metrics = ['mse'])
return model
And these are the hp configurations I ve tuned with keras.tuner:
tuner= BayesianOptimization (
build_model,
objective='mse',
max_trials=50,
executions_per_trial=3,
directory='BS-lag-features-3-8',
seed=123
)
tuner.search(
x=train_X,
y=train_y,
epochs=50,
batch_size=128,
validation_data=(test_X, test_y),
)
As I mentioned, I want to move the same model from keras to PyTorch and use the same hp.optimization using RayTune, not keras.tuner.
My dataframe contains 229 rows, structered for weekly data, and 8 features (including lagged features) for each timestamp. I want to forecast the future week value, based on the features of the last n weeks values.
My x and y shape are as follows:
train_X.shape() --> (222, 3, 8)
train_y.shape() --> (222, 1)
test_X.shape() --> (4, 3, 8)
test_y.shape() --> (4, 1)
And the dataset, as you can see from above, is divided in 222 weeks for training, 4 weeks for testing, with a loopback period of 3 weeks and the nr. of features I want to predict as 1.
Any help will be much apreciatted! Thanks!
Related
I am implementing Federated Learning (FL) using Keras LSTM. (For this question, FL details are not necessary.)
Starting with the simple example where multiple models are trained at different clients. Each client shares their model weights with the server and (in this simple example), the model weights are averaged by the server and a global model is sent to the remaining clients. (Keeping long story short).
Keeping things further simple at this stage: I am using single LSTM unit, with input_shape = (1,1)
Now, when I tried to get the weights of Keras LSTM, it is a list of 3 arrays.
Weights[0] and Weights[1] elements are the floating point values, where as Weight[2] are the Binary 0/1 values. Is my understanding correct that Weight[2] is the On/OFF gate associated with the tanh gate?
Is there any information about these weights?
n_steps = 1
n_features = 1 # This indicates the number of past values
model1 = Sequential()
model1.add(LSTM(1, activation = 'relu', input_shape=(n_steps, n_features)))
model1.compile(loss='mae', optimizer = 'adamax')
Weights = model1.get_weights()
print(model1.summary())
I’m trying to build a classification model for production line. If I understand correctly , it’s possible to use a CNN to classify numerical data .(and not only pictures)
My data is an array of 21 columns per line:
20 different measurements and the last column is a type . It can be 0 or 1 or 2
each line of the array use a timestamp as index
type 0 represents 80 % of the production, and do not need extra treatment
but type 1 and 2 need extra treatment after production (so I need to clearly identify them)
To recreate something a CNN can use , I created a dataset where each label has for learning data an array made of the last previous 20 lines since it’s position .
So each label has for corresponding learning data , a square array of 20x20 measurements (like a picture ) .
(data already have been normalized using keras ColumnTransformer
after reading about unbalanced dataset , i decided to include only a type 0 each time I found a type 1 or 2 . At the end my dataset size is 18 000 lines , data shape '(18206, 20, 20)'
my learning model is pretty basic and looks like this :
train, test, train_label, test_label = train_test_split(X,y,test_size=0.3,shuffle=True)
##Call CNN model
sizePic = 20
model = Sequential()
model.add(Dense(sizePic*3, input_shape=(sizePic,sizePic,), activation='relu'))
model.add(Dense(sizePic, activation='relu'))
model.add(Flatten())
model.add(Dense(3, activation='softmax'))
# Compile model
sgd = optimizers.SGD(lr=0.03)
model.compile(loss='sparse_categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
self.logger.info(model.summary())
# Fit the model
model.fit(train, train_label, epochs=750, batch_size=200,verbose=1)
# evaluate the model
self.learning_scores = model.evaluate(test, test_label, verbose=2)
self.logger.info("scores %r"%self.learning_scores)
at the end prediction scores are :
scores [0.6088506683505354, 0.7341632843017578]
I have been changing parameters like batch_size and learning rate , but with no big improvement . To my understanding, it's better to start this way than adding layers to the model , is this correct ?
Any suggestion ??
thanks for your time
You are not using any conv layer, only fully connected layers (and don't be afraid of adding some conv layers because they have way less parameters than dense layers)
I'm trying to build LSTM architecture to predict sickness rate(0%-100%). My input is an array with dimension 4760x10 (of number of sick persons per town per age, number of consultation .....) My output or the y is the sickness rate.
I'm new in machine learning and I tried several tips like changing the optimzer, the layer node number and the dropout value and my model didn't converge(the lowest mse was =616.245). I tried also to scale my data with 'MinMaxScaler'. So could you guys help me with some advice to change the architecture or some data processing to help the model to converge.
here is the lstm model which give me the mse=616.245
def build_modelz4():
model = Sequential()
model.add(LSTM(10, input_shape=(1, 10), return_sequences=True))
model.add(LSTM(84, return_sequences= True))
model.add(LSTM(84, return_sequences=False))
model.add(Dense(1,activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'] )
model.summary()
return model
lstmz4 = build_modelz4()
checkpointer = ModelCheckpoint(filepath="weightslstmz4.hdf5", verbose=1, save_best_only=True)
newsclstmhis = lstmz4.fit(trainX,trainY,epochs=1000,batch_size=221, validation_data=(testX, testY) ,verbose=2, shuffle=False, callbacks=[checkpointer])
Notice that when I used the ann model it converge with mse=0.8. So with lstm it should converge
and thank you in advance
4760 is a very small number of dimensions for a LSTM.Plus its seems like a very simple classification model try using simpler algorithms like svm for the process but if you are adamant on using deep learning use Sequential model with Dense layer instead with few more layers than this one that should definitely give you better results.
I want to create a lstm model to classify signals.
Let's say I have 1000 files of signals. Each file contains a matrix of shape (500, 5) that means that in each file, I have 5 features (columns) and 500 rows.
0 1 2 3 4
0 5 5.3 2.3 4.2 2.2
... ... ... ... ... ...
499 2500 1.2 7.4 6.7 8.6
For each file, there is one output which is a boolean (True or False). the shape is (1,)
I created a database, data, with a shape (1000, 5, 500) and the target vector is of shape (1000, 1).
Then I split data (X_train, X_test, y_train, y_test).
Is it okay to give the matrix like this to the lstm model? Because I have very poor performance. From what I have seen, people give only a 1D or 2D data and they reshape their data after to give a 3D input to the lstm layer.
The code with the lstm is like this:
input_shape=(X_train.shape[1], X_train.shape[2]) #(5,500), i.e timesteps and features
model = Sequential()
model.add(LSTM(20, return_sequences=True))
model.add(LSTM(20))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
I changed the number of cells in a LSTM layer and the number of layers but the score is basically the same (0.19). Is it normal to have such a bad score in my case? Is there a better way to go ?
Thanks
By transforming your data into (samples, 5, 500) you are giving the LSTM 5 timesteps and 500 features. From your data it seems you would like to process all 500 rows and 5 features of each column to make a prediction. The LSTM input is (samples, timesteps, features). So if your rows represent timesteps in which 5 measurements are taken, then you need to permute the last 2 dimensions and set input_shape=(500, 5) in the first LSTM layer.
Also since your output is Boolean, you get a more stable training by using activation='sigmoid' in your final dense layer and train with loss='binary_crossentropy for binary classification.
I'm trying to prepare a model that will predict the first two numbers from a given array of numbers. So, the input dataset is like this -
[1 2 3 5]
[4 8 5 9]
[10 2 3 15]
Output will be -
[1 2]
[4 8]
[10 2]
So, the architectures of RNN are like below, (Taken from here)
Then, the basic architecture of I'm trying to achieve should be something close to this -
So, it should be a Many-To-Many network. (Resembles the fourth image)
Question - So, how can I create this type of model with Keras?
My Findings -
I tried something like this -
n_samples = 10000
input = np.random.randint(5,10, (n_samples,5))
output = input[...,0:2]
rinp = input.reshape(n_samples,1,5)
model = Sequential()
model.add(LSTM(10, input_shape=(1,5)))
model.add(Dense(2))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(rinp, output, epochs=1000, batch_size=500, verbose=1)
But as you can see, this is not even close. This is like an MLP. It does not utilize any time steps. Because, the input shape is - (n_samples,1,5). So, there is only one time step.
So, my implementation is wrong.
I've seen some examples of One-to-One, Many-to-One and Many-to-Many examples from here.
In Many-to-Many example, the author used the following code snippet.
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(1, length, 1)
y = seq.reshape(1, length, 1)
# define LSTM configuration
n_neurons = length
n_batch = 1
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(length, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result[0,:,0]:
print('%.1f' % value)
As you can see from the X and y values, the described model is like the one below -
Which is not the one I'm trying to achieve.
Any example regarding the architecture I'm trying to implement would be greatly helpful.
It looks like you are trying to build a Sequence-to-Sequence (seq2seq) model based on the drawing. There is a very nice tutorial online to get you started. Instead of predicting sentences, you can just predict fixed length tokens of length 2. This architecture and variants are often used for machine translation. Based on your data, I'm guessing you are trying to experiment with the problem of long-term dependencies in recurrent networks; otherwise it wouldn't make sense to use seq2seq for any practical purposes.