How to implement multi state LSTM RNN in keras - keras

I have 1000 distinct users and the dataset consists activities of these users over the past 1 year. Total records are over 300K. The inputs for the LSTM RNN has the feature vectors corresponding to these users. The user is also included because behavior of each user may vary from person to person. The network should learn behavior of each user and should be able to predict the next behavior based on the past information of the same user.
How to maintain separate hidden states for each user within an LSTM RNN.
Following blog post is similar to my problem:
https://towardsdatascience.com/multi-state-lstms-for-categorical-features-66cc974df1dc
Update
My dataset looks like:
DATASET
I transformed my dataset into a 3D the numpy array and reshaped it as (No of records, timesteps, n_features).
The questions are:
1) Is it necessary to encode the "user" attribute?
2) what is the correct batch size for this problem? Is it batch = 1000 (no. of distinct users)?
3) Do I need to include each user's data in each batch input to the model?
OR
Please suggest the correct implementation of this problem.

This is just automatic. You don't need to do anything.
The LSTM layer will certainly have a state matrix the size of your batch of users. (Otherwise it wouldn't be useful)

Related

Embedding layer output is stochastic?

If I use two identical models to learn over a dataset, but the order in which the samples are presented differs, would an embedding layer output the exact embeddings?
I think you will not get exact embeddings. The parameters of embeddings depend on how gradient decent selects them, so you probably get different values when the sample batch order is different. Furthermore, there is an initial random weight initialization for embedding layer, which also could contribute to a difference.
However, I would expect that 2 words close in one embedding will be also close in another embedding.

Darknet Yolov3 - Custom training on pre-trained model

Actually in darknet yolov3 model has coco.names file for labels which include 80 classes.
Now if I want to train a custom model with two labels only, where one label is already there in coco.names and another is not there.
For example I want to train a model to detect for cell phone and dslr camera, so cell phone class already exist in coco.names whereas dslr camera is not there in its labels file.
So can I train custom model using two classes cell phone and dslr camera and give data of only dslr camera for training and it will predict for both dslr camera and cell phone or shall I train with both data of cell phone and dslr images or is there any other way out.
I am a bit new to ML, so any help would be great
Thanks
So you want to fine tune a pre-trained model.
You need to think of classes by just being a set of end nodes of a network, the labels (phone, camera) are just a naming convention for them, and to give us visual guidance.
These nodes are fully connected (with associated weights) to the previous layer of the network, the total number of these intermediate connections varies depending on the number of end nodes (classes) you have.
With the fully trained model, you can't just select the nodes you want, and take out the rest, and add a few more. Because the previous layer (and full network) was trained to give estimates/predictions taking into account a certain number of final nodes.
So basically you need to give a full reset on the last layer (the head), and restart it with the desired number of classes. The idea here, is that you take advantage of the previous training effort on a broader dataset, and fine tune it to your desired data.
Short answer, you need data for both, and need to change the model to accept 2 classes only.
To configure that specific model for the new number of classes and data, I believe you can find some guidance and instructions here

How to handle shared data between samples and batches in Keras

I'm using Keras for timeseries prediction and I want to create a model that is based on the self-attention mechanism that will not use any RNNs. For each sample we look at the last x timesteps of samples to predict the next sample.
In other words I want to feed the network (num_batches, num_samples, timesteps, features) and get (num_batches, predictions).
There is 1 problems with this.
There is a lot of unnecessary duplication of data where sample n has basically the same timesteps and features as sample n+1, only shifted 1 to the left.
How would you handle this assuming you dataset is very large?
I am not very familiar with this, but if your issue is "I have too many replicated data" I think you can solve your problem devising a generator for your data, and then pass the generator as input for the Keras/TensorFlow fit function (according to TensorFlow APIs specification, it is stated that it supports generators as input).
If your question is related to the logic behind the model, I do not see the issue. It is like that you have a sliding window, for each window you predict one value, and then you move the window by a certain amount (in your case, one). Could you argue a little more about your concern?

what should be the target in this deep learning image classification problem

I am doing a image classification project using CNN in keras. I have a dataset of about 900 photos of about 70 people .Each person has multiple photos of his different age.
My goal is to predict the correct ID of the person if any one of his photo is in the input.
Here is the glimpse of the data.
My questions are:
What should be my target column ?Is Target 'AGE' or 'ID'? 2-Do I
need to do hot-encoding of the target column? For example if I used
ID as my target,then do I have to do one-hot-encoding of ID column?
If I used ID as my target,then after one-hot-encoding, does it
mean,I will be having 70 classes?
I need information about the
output layer. My goal is to find whether the photo belong to the
same ID or not,so what should be the output layer? Shall I use
softmax with 70 outputs ?
Another question about the output layer
is that can I use a softmax with 70 outputs and then feed it to a
layer of sigmoid with single output ?
You are going to identify the same person using different age images. For example, in the dataset, you have 100 different images of khan and you trained a model. Now you provide the 101st image of khan, the model will detect it. So your target column should be ID.
yes, there are 70 classes and you get one hot encoded vector of 900x70
It should be a softmax layer because the sigmoid layer is used for binary class or multilabel problem. As you have to detect 70 different people from each other, you need a softmax class.
I don't think so, in this way your model would not be capable of telling which person image is this (the one provided as a test)

Batch size for panel data for LSTM in Keras

I have repeated measurements on subjects, which I have structured as input to an LSTM model in Keras as follows:
batch_size = 1
model = Sequential()
model.add(LSTM(50, batch_input_shape=(batch_size, time_steps, features), return_sequences=True))
Where time_steps are the number of measurements on each subject, and features the number of available features on each measurement. Each row of the data is one subject.
My question is regarding the batch size with this type of data.
Should I only use a batch size of 1, or can the batch size be more than 1 subjects?
Related to that, would I benefit from setting stateful to True? Meaning that learning from one batch would inform the other batches too. Correct me if my understanding about this is not right too.
Great question! Using a batch size greater than 1 is possible with this sort of data and setup, provided that your rows are individual experiments on subjects and that your observations for each subject are ordered sequentially through time (e.g. Monday comes before Tuesday). Make sure that your observations between train and test are not split randomly and that your observations are ordered sequentially by subject in each, and you can apply batch processing. Because of this, set shuffle to false if using Keras as Keras shuffles observations in batches by default.
In regards to setting stateful to true: with a stateful model, all the states are propagated to the next batch. This means that the state of the sample located at index i, Xi will be used in the computation of the sample Xi+bs in the next batch. In the case of time series, this generally makes sense. If you believe that a subject measurement Si infleunces the state of the next subject measurement Si+1, then try setting stateful to true. It may be worth exploring setting stateful to false as well to explore and better understand if a previous observation in time infleunces the following observation for a particular subject.
Hope this helps!

Resources