Need clarification of TimeDistributed(dense()) with LSTM in many-to-many scenario - theano

I am new to RNNs and keras.
I am trying to compare performance of LSTM against traditional machine learning algorithms (like RF or GBM) on a sequential data (not necessarily time-series but in order). My data contains 276 predictors and an output (for eg. stock price with 276 various informations of the stock's firm) with 8564 retro observations. Since, LSTMs are great in capturing sequential trend, I decided to use a time_step of 300. From the below figure, I believe I have the task of creating a many-to-many network (last figure from left). (Pic:http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
With each pink box being of size 276 (number of predictors) and 300 (time_steps) such pink boxes in one batch.However, I am struggling to see how I design the blue boxes here as each blue box should be the output (stock price) of each instance. From other posts on Keras gihub forum #2403 and #2654 , I think I have to implement TimeDistributed(Dense()) but I don't know how . This is my code to check if it works (train_idv is the data to predict from and train_dv is stock price)
train_idv.shape
#(8263, 300, 276)
train_dv.shape
#(8263, 300, 1)
batch_size = 1
time_Steps=300
model = Sequential()
model.add(LSTM(300,
batch_input_shape=(batch_size, time_Steps, train_idv.shape[2]),
stateful=True,
return_sequences=True))
model.add(Dropout(0.3))
model.add(TimeDistributed(Dense(300)))
# Model Compilation
model.compile(loss='mean_squared_error',optimizer='adam',metrics=['accuracy'])
model.fit(train_idv, traindv, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)
Running the model.fit gives this error
Traceback (most recent call last):
File "", line 1, in
File "/home/user/.local/lib/python2.7/site-packages/keras/models.py", line 627, in fit
sample_weight=sample_weight)
File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1052, in fit
batch_size=batch_size)
File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training.py", line 983, in _standardize_user_data
exception_prefix='model target')
File "/home/user/.local/lib/python2.7/site-packages/keras/engine/training.py", line 111, in standardize_input_data
str(array.shape))
Exception: Error when checking model target: expected timedistributed_4 to have shape (1, 300, 300) but got array with shape (8263, 300, 1)
Now, I have successfully ran it with time_step=1 and just using Dense(1) as last layer. But I am not sure how should I shape my train_dv (output in training) or how to use TimeDistributed(Dense()) correctly. Finally, I want to use
trainPredict = model.predict(train_idv,batch_size=1)
to predict scores on any data.
I have posted this question on keras github forum as well.

From your post I understand that you want each LSTM time step to predict a single scalar correct? Then you Time Distributed Dense layer should have output 1, not 300 (i.e. TimeDistributed(Dense(1))).
Also for your reference, there's an example in the keras repo for using Time Distributed Dense.
In this example one wants basically to train a multi-class classifier (with shared weights) for each timestep, where the different possible classes are the different possible digit characters:
# For each of step of the output sequence, decide which character should be chosen
model.add(TimeDistributed(Dense(len(chars))))
The number of time steps is defined by the preceding recurrent layers.

Related

Many to many RNN in keras - predict output for every nth input

I'm trying to figure out how to build a model using LSTM/GRU that predicts many to many but for every nth (7 in my case) input. For example, my input data has timesteps per day for a whole year but I'm only trying to predict the output at the end of each week and not each day.
The only information I was able to find is this answer:
Many to one and many to many LSTM examples in Keras
It says:
"Many-to-many when number of steps differ from input/output length: this is freaky hard in Keras. There are no easy code snippets to code that."
In pytorch it seems like you can set the ignore_index in the loss function which I think should do the trick.
Is there a solution for keras?
I think I found the answer. Since I'm trying to predict every nth value we can just keep the output from the LSTM layer that we are trying to predict and get rid of the rest. I created a lambda layer to do that - it just reads every 7th value from the lstm output.
This is the code:
X = np.random.normal(0,1,size=(100,365,5))
y = np.random.randint(2,size=(100,52,1))
model = Sequential()
model.add(LSTM(1, input_shape=(365, 5), return_sequences=True))
model.add(Lambda(lambda x: x[:, 6::7, :]))
model.add(TimeDistributed(Dense(1,activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X,y,epochs=3,verbose=1)

Keras Model - Functional API - adding layers to existing model

I am trying to learn to use the Keras Model API for modifying a trained model for the purpose of fine-tuning it on the go:
A very basic model:
inputs = Input((x_train.shape[1:]))
x = BatchNormalization(axis=1)(inputs)
x = Flatten()(x)
outputs = Dense(10, activation='softmax')(x)
model1 = Model(inputs, outputs)
model1.compile(optimizer=Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['categorical_accuracy'])
The architecture of it is
InputLayer -> BatchNormalization -> Flatten -> Dense
After I do some training batches on it I want to add some extra Dense layer between the Flatten one and the outputs:
x = Dense(32,activation='relu')(model1.layers[-2].output)
outputs = model1.layers[-1](x)
However, when I run it, i get this:
ValueError: Input 0 is incompatible with layer dense_1: expected axis -1 of input shape to have value 784 but got shape (None, 32)
Could someone please explain what is going on and how/if can I add layers to an already trained model?
Thank you
A Dense layer is made strictly for a certain input dimension. That dimension cannot be changed after you define it (it would need a different number of weights).
So, if you really want to add layers before a dense layer that is already used, you need to make sure that the outputs of the last new layer is the same shape as the flatten's output. (It says you need 784, so your new last dense layer needs 784 units).
Another approach
Since you're adding intermediate layers, it's pointless to keep the last layer: it was trained specifically for a certain input, if you change the input, then you need to train it again.
Well... since you need to train it again anyway, why keep it? Just create a new one that will be suited to the shapes of your new previous layers.

Number of categories , keras

I have a keras model that classifies images into a number of categories. How do I print the number of categories that my model is classifying the images into?
You can always check the structure of the model using model.summary() in Keras which gives you the information of each and every layer of your model sequentially as shown in the image below.
So in the last layer, which is my output layer, you can see it says output shape is (None, 10), which shows my classifier contains 10 classes or 10 categories. (Do not worry about None it just means batch size).

How to change batch size of an intermediate layer in Keras?

My problem is to take all hidden outputs from an LSTM and use them as training examples for a single dense layer. Flattening the output of the hidden layers and feeding them to a dense layer is not what I am looking to do. I have tried the following things:
I have considered Timedistributed wrapper for the dense layer (https://keras.io/layers/wrappers/). But, this seems to apply the same layer to every time slice, which is not what I want. In other words, the Timedistributed wrapper has input_shape of a 3D tensor (number of samples, number of timesteps, number of features) and produces another 3D tensor of the same type: (number of samples, number of timesteps, number of features). Instead what I want is a 2D tensor as output, which looks like (number of samples*number of timesteps, number of features)
There was a pull request for an AdvancedReshapeLayer: https://github.com/fchollet/keras/pull/36 on GitHub. This seems to be exactly what I am looking for. Unfortunately, it appears like that pull request was closed with no conclusive outcome.
I tried to build my own lambda layer to accomplish what I want as follows:
A). model.add(LSTM(NUM_LSTM_UNITS, return_sequences=True, activation='tanh')) #
B). model.add(Lambda(lambda x: x, output_shape=lambda x: (x[0]*x[1], x[2])))
C). model.add(Dense(NUM_CLASSES, input_dim=NUM_LSTM_UNITS))
mode.output_shape after (A) prints: (BATCH_SIZE, NUM_TIME_STEPS, NUM_LSTM_UNITS) and model.output_shape after (B) prints: (BATCH_SIZE*NUM_OF_TIMESTEPS, NUM_LSTM_UNITS)
Which is exactly what I am trying to achieve.
Unfortunately, when I try to run step (C). I get the following error:
Input 0 is incompatible with layer dense_1: expected ndim=2, found
ndim=3
This is baffling since when I print model.output_shape after (B), I do indeed see (BATCH_SIZE*NUM_OF_TIMESTEPS, NUM_LSTM_UNITS), which is of ndim=2.
Really appreciate any help with this.
EDIT: When I try to use the functional API instead of a sequential model, I still get the same error on step (C)
You can use backend reshape which includes batch_size dimension.
def backend_reshape(x):
return backend.reshape(x, (-1, NUM_LSTM_UNITS))
model.add(Lambda(backend_reshape, output_shape=(NUM_LSTM_UNITS,)))

Out of Memory Error in Scikit-learn MultinomialNB

In order to run NB classifier in about 400 MB of text data i need to use vectorizer.
vectorizer = TfidfVectorizer(min_df=2)
X_train = vectorizer.fit_transform(X_data)
But it is giving out of memory error. I am using Linux64 an python 64 bit version. How does people work through Vectorization process in Scikit for large data set (text)
Traceback (most recent call last):
File "ParseData.py", line 234, in <module>
main()
File "ParseData.py", line 211, in main
classifier = MultinomialNB().fit(X_train, y_train)
File "/home/pratibha/anaconda/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 313, in fit
Y = labelbin.fit_transform(y)
File "/home/pratibha/anaconda/lib/python2.7/site-packages/sklearn/base.py", line 408, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/home/pratibha/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/label.py", line 272, in transform
neg_label=self.neg_label)
File "/home/pratibha/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/label.py", line 394, in label_binarize
Y = np.zeros((len(y), len(classes)), dtype=np.int)
Edited (ogrisel): I changed the title from "Out of Memory Error in Scikit Vectorizer" to "Out of Memory Error in Scikit-learn MultinomialNB" to make it more descriptive of the actual problem.
Let me summarize the outcome of the discussion in the comments:
The label preprocessing machinery used internally in many scikit-learn classifiers does not scale well memory wise w.r.t. the number of classes. This is a known issue and there is ongoing work to tackle it.
The MultinomialNB class it-self will probably not be suitable to classify in a label space with cardinality 43K even if the label preprocessing limitation is fixed.
To address the large cardinality classification problem you could try:
fit binary SGDClassifier(loss='log', penalty='elasticnet') instances on columns of y_train converted as numpy arrays independently, then call clf.sparsify() and finally wrap those sparse models as a final one-vs-rest classifier (or rank predictions of the binary classifier by proba). Dependending on the value of the regularizer parameter alpha you might get sparse models that are small enough to fit in memory. You can also try to do the same with LogisticRegression, that is something like:
clf_label_i = LogisticRegression(penalty='l1').fit(X_train, y_train[:, label_i].toarray()).sparsify()
alternatively try to do a PCA of the target labels y_train, then cast your classification problem as a multi-output regression problem in the reduced label PCA space, and then decode the regressor's output by looking for the nearest class encoding in the label PCA space.
You can also have a look at
Block Coordinate Descent Algorithms for Large-scale Sparse Multiclass Classification implemented in lightning but I am not sure it suitable for label cardinality 43K either.

Resources