Related
I am beginner in RNNs and would like to build a model gated recurrent unit GRU for predicting a user's action on an E-commerce website called google merchandize store that sells Google branded merchandise.
We have 5 different actions:
Add to cart
Quickview click
Product click
Remove from cart
Onsite click
My data_y which the target looks like this as we have different actions
array([[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
...,
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[1, 0, 0, 0, 0]], dtype=uint8)
By using only the url or the page path the user has accessed, I have achieved 68% prediction accuracy but still trying to improve it by adding another inputs to the model.
My data_X looks like
pagePath
[googleredesign, bags]
[googleredesign, bags]
[googleredesign, electronics]
...
...
[googleredesign, bags, backpacks, home]
[googleredesign, bags, backpacks, googlealpine...
53087 rows × 2 columns
After getting the vocab length and the max sequence length I tokenized it
tokenizer = Tokenizer(num_words=vocab_length)
tokenizer.fit_on_texts(data_X['pagePath'])
sequences = tokenizer.texts_to_sequences(data_X['pagePath'])
word_index = tokenizer.word_index
model_inputs = pad_sequences(sequences, maxlen=max_seq_length)
data_X=model_inputs
That's how it looks like after tokenization
array([[ 0, 0, 0, 1, 3],
[ 0, 0, 0, 1, 3],
[ 0, 0, 0, 1, 3],
...,
[ 0, 1, 3, 12, 9],
[ 0, 1, 3, 12, 9],
[ 0, 1, 3, 12, 81]], dtype=int32)
After that I have splitted that data and trained the model
X_train, X_test, y_train, y_test = train_test_split(data_X, data_y, test_size=0.3,
random_state=2)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(37160, 5) (15927, 5) (37160, 5) (15927, 5)
embedding_dim = 64
inputs = tf.keras.Input(shape=(max_seq_length,))
embedding = tf.keras.layers.Embedding(
input_dim=vocab_length,
output_dim=embedding_dim,
input_length=max_seq_length
)(inputs)
gru = tf.keras.layers.GRU(units=embedding_dim)(embedding)
outputs = tf.keras.layers.Dense(5, activation='sigmoid')(gru)
model = tf.keras.Model(inputs, outputs)
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=[
'accuracy',
tf.keras.metrics.AUC(name='auc')
]
)
batch_size = 32
epochs = 3
history = model.fit(
X_train,
y_train,
validation_split=0.2,
batch_size=batch_size,
epochs=epochs,
callbacks=[
tf.keras.callbacks.ReduceLROnPlateau(),
tf.keras.callbacks.ModelCheckpoint('model.h5', save_best_only=True)
]
)
So my question is how to add another input to the model for example: if I want to add a column which represents the total time the user spent on the website. How to add it with the embedding layer and it is not tokenized and unrelated to the pagePath column which is tokenized?
you can tokenize the main row in the dataset i guess, and after that you can feed the model with the updated dataset and try also to fine tune the validation split. Increasing the number of epoch may also result in a better results
My dataset looks like the following:
on the left, my inputs, and on the right the outputs.
The inputs are tokenized and converted to a list of indices, for instance, the molecule input:
'CC1(C)Oc2ccc(cc2C#HN3CCCC3=O)C#N'
is converted to:
[28, 28, 53, 69, 28, 70, 40, 2, 54, 2, 2, 2, 69, 2, 2, 54, 67, 28, 73, 33, 68, 69, 67, 28, 73, 73, 33, 68, 53, 40, 70, 39, 55, 28, 28, 28, 28, 55, 62, 40, 70, 28, 63, 39, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I use the following list of chars as my map from strings to indices
cs = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z', 'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
'0','1','2','3','4','5','6','7','8','9',
'=','#',':','+','-','[',']','(',')','/','\'
, '#','.','%']
Thus, for every char in the input string, there is an index, and if the length of the input string is less than the max length of all inputs which is 100, I complement with zeros. (like in the above-shown example)
My model looks like this:
class LSTM_regr(torch.nn.Module) :
def __init__(self, vocab_size, embedding_dim, hidden_dim) :
super().__init__()
self.embeddings = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
self.linear = nn.Linear(hidden_dim, 1)
self.dropout = nn.Dropout(0.2)
def forward(self, x, l):
x = self.embeddings(x)
x = self.dropout(x)
lstm_out, (ht, ct) = self.lstm(x)
return self.linear(ht[-1])
vocab_size = 76
model = LSTM_regr(vocab_size, 20, 256)
My problem is, after training, every input I give to the model to test it, gives me the same output (i.e., 3.3318). Why is that?
My training loop:
def train_model_regr(model, epochs=10, lr=0.001):
parameters = filter(lambda p: p.requires_grad, model.parameters())
optimizer = torch.optim.Adam(parameters, lr=lr)
for i in range(epochs):
model.train()
sum_loss = 0.0
total = 0
for x, y, l in train_dl:
x = x.long()
y = y.float()
y_pred = model(x, l)
optimizer.zero_grad()
loss = F.mse_loss(y_pred, y.unsqueeze(-1))
loss.backward()
optimizer.step()
sum_loss += loss.item()*y.shape[0]
total += y.shape[0]
EDIT:
I figured it out, I reduced the learning rate from 0.01 to 0.0005 and reduced the batch size from 100 to 10 and it worked fine.
I think this makes sense, the model was training on large batch size, thus it was learning to output the mean always since that's what the loss function does.
Your LSTM_regr returns the last hidden state regardless of the true sequence length. That is, if your true sequence is of length 3, x is of length 100, and the output is the last hidden state after processing 97 padding elements.
You should compute the loss for the prediction that matches the true length of each sequence.
I figured it out, I reduced the learning rate from 0.01 to 0.0005 and reduced the batch size from 100 to 10 and it worked fine.
I think this makes sense, the model was training on large batch size, thus it was learning to output the mean always since that's what the loss function does.
I am new to sequential models. I am working on an image caption generator with attention model in Keras.
I keep getting an error with the expected target shape in Keras. I have worked on basic models before. And for this kind of error, usually there's a mistake with the way I process my dataset.
However, in this case, I have tried adjusting the shape of my 'y' array by unpacking it, or by trying to pack my 'outputs' list. But it doesn't change the error message.
def model(photo_shape, max_len, n_s, vocab_size):
outputs=list()
seq=Input(shape=(max_len,), name='inseq')#max_len is 33.
x = Embedding(vocab_size, 300,mask_zero=True)(seq)
p=Input(shape=(photo_shape[0],photo_shape[1]),name='picture')
s,_,c=LSTM(n_s,return_state = True)(p)
for t in range(max_len):
context = attention(p,s)
word=Lambda(lambda x: x[:,t,:])(x)
context = concat([word,context])
context=reshape(context)
s,_,c = lstm(context)
out = den2(s) #return a dense layer with vocab_size units(none, 2791).
outputs.append(out)
#print(np.array(outputs).shape) => returns (33,)
model = Model(inputs=[seq,p],outputs=outputs)
return model
#the following method goes to a generator function.
def build_sequences(tokenizer, max_length, desc_list, photo):
X1, X2, y = list(), list(), list()
desc = desc_list[0]
seq = tokenizer.texts_to_sequences([desc])[0]
l=len(seq)
in_seq, out_seq = seq[:l-1], seq[1:l]
in_seq = pad_sequences([in_seq],padding='post', maxlen=max_length)[0]
out_seq =[to_categorical([w], num_classes=vocab_size)[0] for w in out_seq]
out_seq = pad_sequences([out_seq],padding='post', maxlen=max_length)[0]
X1.append(in_seq)
X2.append(photo)
y.append(out_seq)
return np.array(X1), np.array(X2), np.array(y)
The error I get is this.
ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 33 array(s), but instead got the following list of 1 arrays: [array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0,...
The debug messages show the following shapes for my inputs and outputs.
in_seq: (1, 33)
photo: (1, 196, 512)
out_seq: (1, 33, 2971)
The first is the batch size, which should not be taken into consideration. So, I do not know why the 33 is not visible to keras. I have tried modifying this shape. But logically, should this not work?
Please let me know if this is a data processing error or a problem with my model structure!
Let me know if more code is required
I have my label tensor of shape (1,1,128,128,128) in which the values might range from 0,24. I want to convert this to one hot encoded tensor, using the nn.fucntional.one_hot function
n = 24
one_hot = torch.nn.functional.one_hot(indices, n)
but this expects a tensor of indices, honestly, I am not sure how to get those. The only tensor I have is the label tensor of the shape described above and it contains values ranging from 1-24, not the indices
How can I get a tensor of indices from my tensor? Thanks in advance.
If the error you are getting is this one:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: one_hot is only applicable to index tensor.
Maybe you just need to convert to int64:
import torch
# random Tensor with the shape you said
indices = torch.Tensor(1, 1, 128, 128, 128).random_(1, 24)
# indices.shape => torch.Size([1, 1, 128, 128, 128])
# indices.dtype => torch.float32
n = 24
one_hot = torch.nn.functional.one_hot(indices.to(torch.int64), n)
# one_hot.shape => torch.Size([1, 1, 128, 128, 128, 24])
# one_hot.dtype => torch.int64
You can use indices.long() too.
The torch.as_tensor function can also be helpful if your labels are stored in a list or numpy array:
import torch
import random
n_classes = 5
n_samples = 10
# Create list n_samples random labels (can also be numpy array)
labels = [random.randrange(n_classes) for _ in range(n_samples)]
# Convert to torch Tensor
labels_tensor = torch.as_tensor(labels)
# Create one-hot encodings of labels
one_hot = torch.nn.functional.one_hot(labels_tensor, num_classes=n_classes)
print(one_hot)
The output one_hot has shape (n_samples, n_classes) and should look something like:
tensor([[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 0],
[1, 0, 0, 0, 0]])
Usually, this issue can be solved by adding long().
for example,
import torch
import torch.nn.functional as F
labels=torch.Tensor([[0, 2, 1]])
n_classes=3
encoded=F.one_hot(labels, n_classes)
It gives an error as:
RuntimeError: one_hot is only applicable to index tensor.
To solve this issue, use long().
import torch
import torch.nn.functional as F
labels=torch.Tensor([[0, 2, 1]]).long()
n_classes=3
encoded=F.one_hot(labels, n_classes)
Now it would be executed without errors.
For example, I have CNN which tries to predict numbers from MNIST dataset (code written using Keras). It has 10 outputs, which form softmax layer. Only one of outputs can be true (independently for each digit from 0 to 9):
Real: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
Predicted: [0.02, 0.9, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]
Sum of predicted is equal to 1.0 due to definition of softmax.
Let's say I have a task where I need to classify some objects that can fall in several categories:
Real: [0, 1, 0, 1, 0, 1, 0, 0, 0, 1]
So I need to normalize in some other way. I need function which gives value on range [0, 1] and which sum can be larger than 1.
I need something like that:
Predicted: [0.1, 0.9, 0.05, 0.9, 0.01, 0.8, 0.1, 0.01, 0.2, 0.9]
Each number is probability that object falls in given category. After that I can use some threshold like 0.5 to distinguish categories in which given object falls.
The following questions appear:
So which activation function can be used for this?
May be this function already exists in Keras?
May be you can propose some other way to predict in this case?
Your problem is one of multi-label classification, and in the context of Keras it is discussed, for example, here: https://github.com/fchollet/keras/issues/741
In short the suggested solution for it in keras is to replace the softmax layer with a sigmoid layer and use binary_crossentropy as your cost function.
an example from that thread:
# Build a classifier optimized for maximizing f1_score (uses class_weights)
clf = Sequential()
clf.add(Dropout(0.3))
clf.add(Dense(xt.shape[1], 1600, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1600, 1200, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(1200, 800, activation='relu'))
clf.add(Dropout(0.6))
clf.add(Dense(800, yt.shape[1], activation='sigmoid'))
clf.compile(optimizer=Adam(), loss='binary_crossentropy')
clf.fit(xt, yt, batch_size=64, nb_epoch=300, validation_data=(xs, ys), class_weight=W, verbose=0)
preds = clf.predict(xs)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
print f1_score(ys, preds, average='macro')