Can anyone please site an example on how linear regression can be implemented using RNN in pure tensorflow other than using keras API..
Eg :
x_train = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
I ll divide them into 5 batches of 2 element each so x_train will be of shape [5, 2]
[1, 2]
[3, 4]
[5, 6]..
y_train = [3, 5, 7,...]
Where the challenge is that.. i should be giving the model to train upon 1 and 2 then from that the model should have and inference of 3, like wise 3 and 4 and it should prediction 5.. and so on.. I have tried a way around but had problems with shape and the losses where really high..
The same problem in a deep network works perfect with around 100 epochs, 100 data points and a loss of 0.01 (MSE).
Can anyone please help me..?
Related
Using broadcasting in normal tensors, the following works
torch.ones(3, 1)*torch.ones(3, 10)
I need to extend this behavior to sparse vectors, but I can't:
i = torch.tensor([[0, 1, 1],
[2, 0, 2]])
a = torch.sparse_coo_tensor(i, torch.ones(3, 1), [2, 4, 1])
b = torch.sparse_coo_tensor(i, torch.ones(3, 10), [2, 4, 10])
a*b
# gives RuntimeError: mul operands have incompatible sizes
Why shouldn't this work?
Is there an pytorch function to do this?
If not what is the best alternative algorithm?
So I am understanding lasso regression and I don't understand why it needs two input values to predict another value when it's just a 2 dimensional regression.
It says in the documentation that
clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
which I don't understand. Why is it [0,0] or [1,1] and not just [0] or [1]?
[[0,0], [1, 1], [2, 2]]
means that you have 3 samples/observations and each is characterised by 2 features/variables (2 dimensional).
Indeed, you could have these 3 samples with only 1 features/variables and still be able to fit a model.
Example using 1 feature.
from sklearn import datasets
from sklearn import linear_model
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :1] # we only take the feature
y = iris.target
clf = linear_model.Lasso(alpha=0.1)
clf.fit(X,y)
print(clf.coef_)
print(clf.intercept_)
I am quite new to the deep learning field especially Keras. Here I have a simple problem of classification and I don't know how to solve it. What I don't understand is how the general process of the classification, like converting the input data into tensors, the labels, etc.
Let's say we have three classes, 1, 2, 3.
There is a sequence of classes that need to be classified as one of those classes. The dataset is for example
Sequence 1, 1, 1, 2 is labeled 2
Sequence 2, 1, 3, 3 is labeled 1
Sequence 3, 1, 2, 1 is labeled 3
and so on.
This means the input dataset will be
[[1, 1, 1, 2],
[2, 1, 3, 3],
[3, 1, 2, 1]]
and the label will be
[[2],
[1],
[3]]
Now one thing that I do understand is to one-hot encode the class. Because we have three classes, every 1 will be converted into [1, 0, 0], 2 will be [0, 1, 0] and 3 will be [0, 0, 1]. Converting the example above will give a dataset of 3 x 4 x 3, and a label of 3 x 1 x 3.
Another thing that I understand is that the last layer should be a softmax layer. This way if a test data like (e.g. [1, 2, 3, 4]) comes out, it will be softmaxed and the probabilities of this sequence belonging to class 1 or 2 or 3 will be calculated.
Am I right? If so, can you give me an explanation/example of the process of classifying these sequences?
Thank you in advance.
Here are a few clarifications that you seem to be asking about.
This point was confusing so I deleted it.
If your input data has the shape (4), then your input tensor will have the shape (batch_size, 4).
Softmax is the correct activation for your prediction (last) layer
given your desired output, because you have a classification problem
with multiple classes. This will yield output of shape (batch_size,
3). These will be the probabilities of each potential classification, summing to one across all classes. For example, if the classification is class 0, then a single prediction might look something like [0.9714,0.01127,0.01733].
Batch size isn't hard-coded to the network, hence it is represented in model.summary() as None. E.g. the network's last-layer output shape can be written (None, 3).
Unless you have an applicable alternative, a softmax prediction layer requires a categorical_crossentropy loss function.
The architecture of a network remains up to you, but you'll at least need a way in and a way out. In Keras (as you've tagged), there are a few ways to do this. Here are some examples:
Example with Keras Sequential
model = Sequential()
model.add(InputLayer(input_shape=(4,))) # sequence of length four
model.add(Dense(3, activation='softmax')) # three possible classes
Example with Keras Functional
input_tensor = Input(shape=(4,))
x = Dense(3, activation='softmax')(input_tensor)
model = Model(input_tensor, x)
Example including input tensor shape in first functional layer (either Sequential or Functional):
model = Sequential()
model.add(Dense(666, activation='relu', input_shape=(4,)))
model.add(Dense(3, activation='softmax'))
Hope that helps!
I'm working on a smaller project to better understand RNN, in particualr LSTM and GRU. I'm not at all an expert, so please bear that in mind.
The problem I'm facing is given as data in the form of:
>>> import numpy as np
>>> import pandas as pd
>>> pd.DataFrame([[1, 2, 3],[1, 2, 1], [1, 3, 2],[2, 3, 1],[3, 1, 1],[3, 3, 2],[4, 3, 3]], columns=['person', 'interaction', 'group'])
person interaction group
0 1 2 3
1 1 2 1
2 1 3 2
3 2 3 1
4 3 1 1
5 3 3 2
6 4 3 3
this is just for explanation. We have different person interacting with different groups in different ways. I've already encoded the various features. The last interaction of a user is always a 3, which means selecting a certain group. In the short example above person 1 chooses group 2, person 2 chooses group 1 and so on.
My whole data set is much bigger but I would like to understand first the conceptual part before throwing models at it. The task I would like to learn is given a sequence of interaction, which group is chosen by the person. A bit more concrete, I would like to have an output a list with all groups (there are 3 groups, 1, 2, 3) sorted by the most likely choice, followed by the second and third likest group. The loss function is therefore a mean reciprocal rank.
I know that in Keras Grus/LSTM can handle various length input. So my three questions are.
The input is of the format:
(samples, timesteps, features)
writing high level code:
import keras.layers as L
import keras.models as M
model_input = L.Input(shape=(?, None, 2))
timestep=None should imply the varying size and 2 is for the feature interaction and group. But what about the samples? How do I define the batches?
For the output I'm a bit puzzled how this should look like in this example? I think for each last interaction of a person I would like to have a list of length 3. Assuming I've set up the output
model_output = L.LSTM(3, return_sequences=False)
I then want to compile it. Is there a way of using the mean reciprocal rank?
model.compile('adam', '?')
I know the questions are fairly high level, but I would like to understand first the big picture and start to play around. Any help would therefore be appreciated.
The concept you've drawn in your question is a pretty good start already. I'll add a few things to make it work, as well as a code example below:
You can specify LSTM(n_hidden, input_shape=(None, 2)) directly, instead of inserting an extra Input layer; the batch dimension is to be omitted for the definition.
Since your model is going to perform some kind of classification (based on time series data) the final layer is what we'd expect from "normal" classification as well, a Dense(num_classes, action='softmax'). Chaining the LSTM and the Dense layer together will first pass the time series input through the LSTM layer and then feed its output (determined by the number of hidden units) into the Dense layer. activation='softmax' allows to compute a class score for each class (we're going to use one-hot-encoding in a data preprocessing step, see code example below). This means class scores are not ordered, but you can always do so via np.argsort or np.argmax.
Categorical crossentropy loss is suited for comparing the classification score, so we'll use that one: model.compile(loss='categorical_crossentropy', optimizer='adam').
Since the number of interactions. i.e. the length of model input, varies from sample to sample we'll use a batch size of 1 and feed in one sample at a time.
The following is a sample implementation w.r.t to the above considerations. Note that I modified your sample data a bit, in order to provide more "reasoning" behind group choices. Also each person needs to perform at least one interaction before choosing a group (i.e. the input sequence cannot be empty); if this is not the case for your data, then introducing an additional no-op interaction (e.g. 0) can help.
import pandas as pd
import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(10, input_shape=(None, 2))) # LSTM for arbitrary length series.
model.add(tf.keras.layers.Dense(3, activation='softmax')) # Softmax for class probabilities.
model.compile(loss='categorical_crossentropy', optimizer='adam')
# Example interactions:
# * 1: Likes the group,
# * 2: Dislikes the group,
# * 3: Chooses the group.
df = pd.DataFrame([
[1, 1, 3],
[1, 1, 3],
[1, 2, 2],
[1, 3, 3],
[2, 2, 1],
[2, 2, 3],
[2, 1, 2],
[2, 3, 2],
[3, 1, 1],
[3, 1, 1],
[3, 1, 1],
[3, 2, 3],
[3, 2, 2],
[3, 3, 1]],
columns=['person', 'interaction', 'group']
)
data = [person[1][['interaction', 'group']].values for person in df.groupby('person')]
x_train = [x[:-1] for x in data]
y_train = tf.keras.utils.to_categorical([x[-1, 1]-1 for x in data]) # Expects class labels from 0 to n (-> subtract 1).
print(x_train)
print(y_train)
class TrainGenerator(tf.keras.utils.Sequence):
def __init__(self, x, y):
self.x = x
self.y = y
def __len__(self):
return len(self.x)
def __getitem__(self, index):
# Need to expand arrays to have batch size 1.
return self.x[index][None, :, :], self.y[index][None, :]
model.fit_generator(TrainGenerator(x_train, y_train), epochs=1000)
pred = [model.predict(x[None, :, :]).ravel() for x in x_train]
for p, y in zip(pred, y_train):
print(p, y)
And the corresponding sample output:
[...]
Epoch 1000/1000
3/3 [==============================] - 0s 40ms/step - loss: 0.0037
[0.00213619 0.00241093 0.9954529 ] [0. 0. 1.]
[0.00123938 0.99718493 0.00157572] [0. 1. 0.]
[9.9632275e-01 7.5039308e-04 2.9268670e-03] [1. 0. 0.]
Using custom generator expressions: According to the documentation we can use any generator to yield the data. The generator is expected to yield batches of the data and loop over the whole data set indefinitely. When using tf.keras.utils.Sequence we do not need to specify the parameter steps_per_epoch as this will default to len(train_generator). Hence, when using a custom generator, we shall provide this parameter as well:
import itertools as it
model.fit_generator(((x_train[i % len(x_train)][None, :, :],
y_train[i % len(y_train)][None, :]) for i in it.count()),
epochs=1000,
steps_per_epoch=len(x_train))
I have a training set of data. The python script for creating the model also calculates the attributes into a numpy array (It's a bit vector). I then want to use VarianceThreshold to eliminate all features that have 0 variance (eg. all 0 or 1). I then run get_support(indices=True) to get the indices of the select columns.
My issue now is how to get only the selected features for the data I want to predict. I first calculate all features and then use array indexing but it does not work:
x_predict_all = getAllFeatures(suppl_predict)
x_predict = x_predict_all[indices] #only selected features
indices is a numpy array.
The returned array x_predict has the correct length len(x_predict) but wrong shape x_predict.shape[1] which is still the original length. My classifier then throws an error due to wrong shape
prediction = gbc.predict(x_predict)
File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", li
ne 1032, in _init_decision_function
self.n_features, X.shape[1]))
ValueError: X.shape[1] should be 1855, not 2090.
How can I solve this issue?
You can do it like this:
Test data
from sklearn.feature_selection import VarianceThreshold
X = np.array([[0, 2, 0, 3],
[0, 1, 4, 3],
[0, 1, 1, 3]])
selector = VarianceThreshold()
Alternative 1
>>> selector.fit(X)
>>> idxs = selector.get_support(indices=True)
>>> X[:, idxs]
array([[2, 0],
[1, 4],
[1, 1]])
Alternative 2
>>> selector.fit_transform(X)
array([[2, 0],
[1, 4],
[1, 1]])