Description
Given a dataset that has 10 sequences - a sequence corresponds to a day of stock value recordings - where each constitutes 50 sample recordings of stock values that are separated by 5 minute intervals starting from the morning or 9:05 am. However, there is one extra recording (the 51th sample) that is only available in the training set which is 2 hours later, not 5 minutes, than the last recorded sample in the 50 sample recordings. That 51th sample is required to be predicted for the testing set where the first 50 samples are also given.
I am using the pybrain recurrent neural network for this problem that groups sequences together, and the label (or commonly known as the target y) of each sample x_i is the sample of the next time step x_(i+1) - a typical formulation in time series prediction.
Example
A sequence for one day is something like:
Signal id Time value
1 - 9:05 - 23
2 - 9:10 - 31
3 - 9:15 - 24
... - ... - ...
50 - 13:15 - 15
Below is the 2 hour later label 'target' given for the training set
and is required to be predicted for the testing set
51 - 15:15 - 11
Question
Now that my recurrent neural network (RNN) has trained on these 10 sequences, if it confronts another sequence, how would I use the RNN to predict the stock values 2 hours after the last sample in the sequence ?
Please note that I also have "2 hours later than the last sample stock values" for each of the training sequences but I am not sure how to incorporate that in training the RNN since it expects identical time intervals between samples. Thanks!
I hope I this will help you out
The recurrent network structure
A few tips
Choosing your recurrent network
The more mature Long Short Time Memory (LSTM) neural network is a great fit for this kind of task. LSTM is able to detect common "shapes" and "variations" in the stock value "graph", and there is A LOT of research which tries to prove that such shapes actually occur in real life! See this link for an example.
Accuracy
If you want the network to achieve higher accuracy, I would recommend you to also feed the network the stock values from the previous year (at the exact same date), so that the number of inputs doubles from 50 to 100. Though the network might be well optimised on your dataset, it will never be able to predict the unpredictable behaviour of the future ;)
Related
I use machine learning algorithm in Malware analysis. When I input some features, I get strange training time. For example:
4 feature(A,B,C,D), model training time is 3 seconds.
3 Features(A,B,C), training time is 5 seconds.
2 features(A, B), training time is 8 seconds.
1 feature(A), training time is 4 seconds.
This kind of result happens on both MLP and Random Forest. In my opinion, the training time should be faster if I use less features, but the result is complete different.
In KNN, the result will be like these:
If I using 6,5,4,3 features(A,B,C,D,E,F), model testing time is about 1.1 seconds, almost the same.
2 features(A,B), model testing time is 3 seconds.
1 feature (A), model testing time is 5 seconds.
My dataset has 17K records and using 10-Fold cross-validation. The feature is sort by their entropy, feature A have highest entropy and feature F is lowest. Using Google Colab with sklearn for the testing. I tried several times in different date, and the trend is the same.
The feature of my dataset has total 79 features, the appearance only happens with few features.
Thanks for anyone who reply me, I have no idea about it.
It does seem at first glance that having fewer features will result in lower training times. However, depending on which algorithm is being used, this may not be the case. In training, an objective function (loss function) is being minimized by the algorithm. Taking the case of the MLP neural network, if you change the features (especially depending on whether they're informative or not), you're changing the feature space (or "error surface") over which the optimization occurs and possibly the minima of the function will be harder to find, resulting in more steps and longer training in order to satisfy the convergence criteria.
I have a dataset which is (6497 (number of samples), 13(number of features)). Each sample consists of 35-time steps each. I have to predict a target variable of each of these time steps, giving previous time steps as input for the next time step. for example, it should predict for 1st-time step (by taking only 12 features) and the output of 1st-time step has to be used as in input for the next step(along with features). Similarly, for 3rd time step, the first and second step out has to be used along with 12 features. Where can I start to design a network which solves this problem?
I recently learn the LSTM for time series prediction from
https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/23_Time-Series-Prediction.ipynb
In his tutorial, he says: Instead of training the Recurrent Neural Network on the complete sequences of almost 300k observations, we will use the following function to create a batch of shorter sub-sequences picked at random from the training-data.
def batch_generator(batch_size, sequence_length):
"""
Generator function for creating random batches of training-data.
"""
# Infinite loop.
while True:
# Allocate a new array for the batch of input-signals.
x_shape = (batch_size, sequence_length, num_x_signals)
x_batch = np.zeros(shape=x_shape, dtype=np.float16)
# Allocate a new array for the batch of output-signals.
y_shape = (batch_size, sequence_length, num_y_signals)
y_batch = np.zeros(shape=y_shape, dtype=np.float16)
# Fill the batch with random sequences of data.
for i in range(batch_size):
# Get a random start-index.
# This points somewhere into the training-data.
idx = np.random.randint(num_train - sequence_length)
# Copy the sequences of data starting at this index.
x_batch[i] = x_train_scaled[idx:idx+sequence_length]
y_batch[i] = y_train_scaled[idx:idx+sequence_length]
yield (x_batch, y_batch)
He try to create several bacth samples for training.
My question is that, can we first randomly shuttle the x_train_scaled and y_train_scaled, and then begin sampling several batch size using the follow batch_generator?
my motivation for this question is that, for time series prediction, we want to training the past and predict for the furture. Therefore, is it legal to shuttle the training samples?
In the tutorial, the author chose a piece of continuous samples such as
x_batch[i] = x_train_scaled[idx:idx+sequence_length]
y_batch[i] = y_train_scaled[idx:idx+sequence_length]
Can we pick x_batch and y_batch not continous. For example, the x_batch[0] is picked at 10:00am and x_batch[1] is picked at 9:00am at the same day?
In summary: The follow two question are
(1) can we first randomly shuttle the x_train_scaled and y_train_scaled, and then begin sampling several batch size using the follow batch_generator?
(2) when we train LSTM, Do we need to consider the influence of time order? what parameters we learn for LSTM.
Thanks
(1) We cannot. Imagine trying to predict the weather for tomorrow. Would you want a sequence of temperature values for the last 10 hours or would you want random temperature values of the last 5 years?
Your dataset is a long sequence of values in a 1-hour interval. Your LSTM takes in a sequence of samples that is chronologically connected. For example, with sequence_length = 10 it can take the data from 2018-03-01 09:00:00 to 2018-03-01 19:00:00 as input. If you shuffle the dataset before generating batches that consist of these sequences, you will train your LSTM on predicting based on a sequence of random samples from your whole dataset.
(2) Yes, we need to consider temporal ordering for time series. You can find ways to test your time series LSTM in python here: https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
The train/test data must be split in such a way as to respect the temporal ordering and the model is never trained on data from the future and only tested on data from the future.
It depends a lot on the dataset. For example, the weather from a random day in the dataset is highly related to the weather of the surrounding days. So, in this case, you should try a statefull LSTM (ie, a LSTM that uses the previous records as input to the next one) and train in order.
However, if your records (or a transformation of them) are independent from each other, but depend on some notion of time, such as the inter-arrival time of the items in a record or a subset of these records, there should be noticeable differences when using shuffling. In some cases, it will improve the robustness of the model; in other cases, it will not generalize. Noticing these differences is part of the evaluation of the model.
In the end, the question is: the "time series" as it is is really a time series (ie, records really depend on their neighbor) or there is some transformation that can break this dependency, but preserv the structure of the problem? And, for this question, there is only one way to get to the answer: explore the dataset.
About authoritative references, I will have to let you down. I learn this from a seasoned researcher in the field, however, according to him, he learn it through a lot of experimentation and failures. As he told me: these aren't rules, they are guidelines; try all the solutions that fits your budget; improve on the best ones; try again.
I have 5 time series that I want a neural network to predict. The time series are related to each other. Each time series consists of numbers between 0 and 100. I want to predict the next number for each time series. I already have a model to train one time series using a GRU and that works reasonably well. I have tried two strategies:
I normalized the numbers and made the problem a regression problem. The best validation accuracy so far is 0.38.
I one-hot-encoded the time series, and this works significantly better (an added accuracy of 0.15) but it costs 100 times as much memory.
For 5 time series, I tried 5 independent models but in that case the relationship between the 5 time series was lost. I am wondering what an efficient strategy to proceed might be. I can think of two myself but I might be missing something:
I can stack the input so that I have a five-hot-encoded input instead of 5 one-hot-encoded. Can this be done?
I can create 5 models and merge them. I am not sure what to do with the output. Should I split the model, one for each time series?
Is there a strategy I have overlooked? Memory is a problem. With thousands of time series, with sample lengths of 100, the data uses a lot of memory and processing time. I Googled around but could not find an efficient strategy. Could someone suggest how to implement this problem efficiently in Keras?
There are 2 parts to this question. Suppose we are looking at sales S of a product across $> 1000$ stores where a it sells. For each of these 1000 stores we have 24 months recorded data.
We want to be able to predict S_t <- f(S_{t-1}). We could build a RNN for each of the store time series, calculate test RMSE and then take an average after taking care of normalizing values etc. But the problem is there are very few samples per time series. If we were to segment stores into groups (by say Dynamic Time Warping) then could we create a monologue of text sentiment mining where like in text two sentences are separated by a dot here we would have two time series separated by a special symbol (let's say). In that case, we would generate a RNN model on
Train_1 | Train_2 |...|Train_t
data and predict on
Test_1 | Test_2 |...|Test_t
After this, we would like to set it up as a panel data problem where S_t <- f(x_{t1},x_{t2},...,x_{tn}). In that case should I build a separate neural network for each t and then connect the hidden layers from t -> t+1 -> t+2 ....
How should I implement these through packages like Keras/Theano/Mxnet etc.? Any help would be great!
For your first question, implementing this in MXNet Gluon is very straight forward. You can formulate your problem as an auto regression problem so that it doesn't have any dependency on the sequence length during or you can formulate it as single prediction and require a specific sequence length for S in order to predict S_t. Either way, this gluon tutorial can help get you started.