I want to know how can we give a categorical variable as an input to an embedding layer in keras and train that embedding layer? - keras

let's say we have a data frame where we have a categorical column which has 7 categories - Monday, Tuesday, Wednesday, Thursday, Friday, Saturday and Sunday. Let's say we have 100 data points and we want to give the categorical data as an input to the embedding layer and train the embedding layer using Keras. How do we actually achieve it? Can you share some intuition with code examples?
I have tried this code but it gives me an error which says "ValueError: "input_length" is 1, but received input has shape (None, 26)". I have referred to this blog https://medium.com/#satnalikamayank12/on-learning-embeddings-for-categorical-data-using-keras-165ff2773fc9, but I didn't get how to use it for my particular case.
from sklearn.preprocessing import LabelEncoder
l_encoder=LabelEncoder()
l_encoder.fit(X_train["Weekdays"])
encoded_weekdays_train=l_encoder.transform(X_train["Weekdays"])
encoded_weekdays_test=l_encoder.transform(X_test["Weekdays"])
no_of_unique_cat=len(X_train.school_state.unique())
embedding_size = min(np.ceil((no_of_unique_cat)/2),50)
embedding_size = int(embedding_size)
vocab = no_of_unique_cat+1
#Get the flattened LSTM output for categorical text
input_layer2 = Input(shape=(embedding_size,))
embedding = Embedding(input_dim=vocab, output_dim=embedding_size, input_length=1, trainable=True)(input_layer2)
flatten_school_state = Flatten()(embedding)
I want to know in case of 7 categories, what will be the shape of input_layer2? What should be the vocab size, output dim and input_length? Can anyone explain, or correct my code? Your insights will be really helpful.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-46-e28d41acae85> in <module>
1 #Get the flattened LSTM output for input text
2 input_layer2 = Input(shape=(embedding_size,))
----> 3 embedding = Embedding(input_dim=vocab, output_dim=embedding_size, input_length=1, trainable=True)(input_layer2)
4 flatten_school_state = Flatten()(embedding)
~/anaconda3/lib/python3.7/site-packages/keras/engine/base_layer.py in __call__(self, inputs, **kwargs)
472 if all([s is not None
473 for s in to_list(input_shape)]):
--> 474 output_shape = self.compute_output_shape(input_shape)
475 else:
476 if isinstance(input_shape, list):
~/anaconda3/lib/python3.7/site-packages/keras/layers/embeddings.py in compute_output_shape(self, input_shape)
131 raise ValueError(
132 '"input_length" is %s, but received input has shape %s' %
--> 133 (str(self.input_length), str(input_shape)))
134 elif s1 is None:
135 in_lens[i] = s2
ValueError: "input_length" is 1, but received input has shape (None, 26)

embedding_size can never be the input size.
A Keras embedding takes "integers" as input. You should have your data as numbers from 0 to 6.
If your 100 data points form a sequence of days, you cannot restrict the length of the sequences in the embedding to 1.
Your input shape should be (length_of_sequence,). Which means your training data should have shape (any, length_of_sequence). Which is probably (1, 100) by your description.
All the rest is automatic.

Related

Pre-trained embedding layer: tf.constant with unsupported shape

I am going to use pre-trained word embeddings in Keras model. my matrix weights are stored in ;matrix.w2v.wv.vectors.npy; and it has shape (150854, 100).
Now when I add the embedding layer in the Keras model with different parameters as follows:
model.add(Embedding(5000, 100,
embeddings_initializer=keras.initializers.Constant(emb_matrix),
input_length=875, trainable=False))
I get the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-61-8731e904e60a> in <module>()
1 model = Sequential()
2
----> 3 model.add(Embedding(5000,100,
embeddings_initializer=keras.initializers.Constant(emb_matrix),
input_length=875,trainable=False))
4 model.add(Conv1D(128, 10, padding='same', activation='relu'))
5 model.add(MaxPooling1D(10))
22 frames
/usr/local/lib/python3.7/dist-
packages/tensorflow/python/framework/constant_op.py in
_constant_eager_impl(ctx, value, dtype, shape, verify_shape)
323 raise TypeError("Eager execution of tf.constant with unsupported shape
"
324 "(value has %d elements, shape is %s with %d
elements)." %
--> 325 (num_t, shape, shape.num_elements()))
326
327
TypeError: Eager execution of tf.constant with unsupported shape (value has
15085400 elements, shape is (5000, 100) with 500000 elements).
Kindly tell me where I am doing a mistake.
Your embeddings layer expects a vocabulary of 5,000 words and initializes an embeddings matrix of the shape 5000×100. However. the word2vec model that you are trying to load has a vocabulary of 150,854 words.
Your either need to increase the capacity of the embedding layer or truncate the embedding matrix to allow the most frequent words only.

How to solve "logits and labels must have the same first dimension" error

I'm trying out different Neural Network architectures for a word based NLP.
So far I've used bidirectional-, embedded- and models with GRU's guided by this tutorial: https://towardsdatascience.com/language-translation-with-rnns-d84d43b40571 and it all worked out well.
When I tried using LSTM's however, I get an error saying:
logits and labels must have the same first dimension, got logits shape [32,186] and labels shape [4704]
How can I solve this?
My source and target dataset consists of 7200 sample sentences. They are integer tokenized and embedded. The source dataset is post padded to match the length of the target dataset.
Here is my model and the relevant code:
lstm_model = Sequential()
lstm_model.add(Embedding(src_vocab_size, 128, input_length=X.shape[1], input_shape=X.shape[1:]))
lstm_model.add(LSTM(128, return_sequences=False, dropout=0.1, recurrent_dropout=0.1))
lstm_model.add(Dense(128, activation='relu'))
lstm_model.add(Dropout(0.5))
lstm_model.add((Dense(target_vocab_size, activation='softmax')))
lstm_model.compile(optimizer=Adam(0.002), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = lstm_model.fit(X, Y, batch_size = 32, callbacks=CALLBACK, epochs = 100, validation_split = 0.25) #At this line the error is raised!
With the shapes:
X.shape = (7200, 147)
Y.shape = (7200, 147, 1)
src_vocab_size = 188
target_vocab_size = 186
I've looked at similar question on here already and tried adding a Reshape layer
simple_lstm_model.add(Reshape((-1,)))
but this only causes the following error:
"TypeError: __int__ returned non-int (type NoneType)"
It's really weird as I preprocess the dataset the same way for all models and it works just fine except for the above.
You should have return_sequences=True and return_state=False in calling the LSTM constructor.
In your snippet, the LSTM only return its last state, instead of the sequence of states for every input embedding. In theory, you could have spotted it from the error message:
logits and labels must have the same first dimension, got logits shape [32,186] and labels shape [4704]
The logits should be three-dimensional: batch size × sequence length × number of classes. The length of the sequences is 147 and indeed 32 × 147 = 4704 (number of your labels). This could have told you the length of the sequences disappeared.

How does one use 3D convolutions on standard 3 channel images?

I am trying to use 3d conv on cifar10 data set (just for fun). I see the docs that we usually have the input be 5d tensors (N,C,D,H,W). Am I really forced to pass 5 dimensional data necessarily?
The reason I am skeptical is because 3D convolutions simply mean my conv moves across 3 dimensions/directions. So technically I could have 3d 4d 5d or even 100d tensors and then should all work as long as its at least a 3d tensor. Is that not right?
I tried it real quick and it did give an error:
import torch
​
​
def conv3d_example():
N,C,H,W = 1,3,7,7
img = torch.randn(N,C,H,W)
##
in_channels, out_channels = 1, 4
kernel_size = (2,3,3)
conv = torch.nn.Conv3d(in_channels, out_channels, kernel_size)
##
out = conv(img)
print(out)
print(out.size())
​
##
conv3d_example()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-3-29c73923cc64> in <module>
15
16 ##
---> 17 conv3d_example()
<ipython-input-3-29c73923cc64> in conv3d_example()
10 conv = torch.nn.Conv3d(in_channels, out_channels, kernel_size)
11 ##
---> 12 out = conv(img)
13 print(out)
14 print(out.size())
~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
491 result = self._slow_forward(*input, **kwargs)
492 else:
--> 493 result = self.forward(*input, **kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)
~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
474 self.dilation, self.groups)
475 return F.conv3d(input, self.weight, self.bias, self.stride,
--> 476 self.padding, self.dilation, self.groups)
477
478
RuntimeError: Expected 5-dimensional input for 5-dimensional weight 4 1 2 3, but got 4-dimensional input of size [1, 3, 7, 7] instead
cross posted:
https://discuss.pytorch.org/t/how-does-one-use-3d-convolutions-on-standard-3-channel-images/53330
How does one use 3D convolutions on standard 3 channel images?
Consider the following scenario. You have a 3 channel NxN image. This image will have size of 3xNxN in pytorch (ignoring the batch dimension for now).
Say you pass this image to a 2D convolution layer with no bias, kernel size 5x5, padding of 2, and input/output channels of 3 and 10 respectively.
What's actually happening when we apply this layer to the input image?
You can think of it like this...
For each of the 10 output channels there is a kernel of size 3x5x5. A 3D convolution is applied to the 3xNxN input image using this kernel, which can be thought of as unpadded in the first dimension. The result of this convolution is a 1xNxN feature map.
Since there are 10 output layers, there are 10 of the 3x5x5 kernels. After all kernels have been applied the outputs are stacked into a single 10xNxN tensor.
So really, in the classical sense, a 2D convolution layer is already performing a 3D convolution.
Similarly for a 3D convolution layer, its really doing a 4D convolution, which is why you need 5 dimensional input.
Let's review what we know, for a 3D convolution we will need to address these:
N For mini batch (or how many sequences do we want to feed at one go)
Cin For the number of channels in our input (if our image is rgb, this is 3)
D For depth or in other words the number of images/frames in one input sequence (if we are dealing videos, this is the number of frames)
H For the height of the image/frame
W For the width of the image/frame
So now that we know what's needed, it should be easy to get this going.
In your example, you are missing the depth in the input and since you have a single rgb image, then the depth or time dimension of your input is 1.
You also have a wrong in_channels. its C (in your case 3, as you have rgb image it seems)
You also need to fix your kernel dimensions as it has the wrong depth dimension as well. again since we are dealing with a single image and not a sequence of images, the depth is 1. were you to have a depth of k in your input, then you could choose any values 1<=n<=k in your kernel.
Now you should be able to successfully run your snippet.
def conv3d_example():
# for deterministic output only
torch.random.manual_seed(0)
N,C,D,H,W = 1,3,1,7,7
img = torch.randn(N,C,D,H,W)
##
in_channels = C
out_channels = 4
kernel_size = (1,3,3)
conv = torch.nn.Conv3d(in_channels, out_channels, kernel_size)
##
out = conv(img)
print(out)
print(out.size())
results in :
In [3]: conv3d_example()
tensor([[[[[ 0.9368, -0.6973, 0.1359, 0.2023, -0.3149],
[-0.4601, 0.2668, 0.3414, 0.6624, -0.6251],
[-1.0212, -0.0767, 0.2693, 0.9537, -0.4375],
[ 0.6981, -0.1586, -0.3076, 0.1973, -0.2972],
[-0.0747, -0.8704, 0.1757, -0.4161, -0.3464]]],
[[[-0.4710, -0.7841, -1.1406, -0.6413, 0.9183],
[-0.2473, 0.2532, -1.0443, -0.8634, -0.8797],
[ 0.5243, -0.4383, 0.1375, -0.7561, 0.7913],
[-1.1216, -0.4496, 0.5481, 0.1034, -1.0036],
[-0.0941, -0.1458, -0.1438, -1.0257, -0.4392]]],
[[[ 0.5196, 0.3102, 0.5299, -0.0126, 0.7945],
[ 0.3721, -1.3339, -0.5849, -0.2701, 0.4842],
[-0.2661, 0.9777, -0.3328, -0.1730, -0.6360],
[ 0.4960, 0.2348, 0.5183, -0.2935, 0.1777],
[-0.2672, 0.0233, -0.5573, 0.8366, 0.6082]]],
[[[-0.1565, -1.7331, -0.2015, -1.1708, 0.3099],
[-0.3667, 0.1985, -0.4940, 0.4044, -0.8000],
[ 0.2814, -0.6172, -0.4466, -0.6098, 0.0983],
[-0.5814, -0.2825, -0.1321, 0.5536, -0.4767],
[-0.3337, 0.3160, -0.4748, -0.7694, -0.0705]]]]],
grad_fn=<SlowConv3DBackward0>)
torch.Size([1, 4, 1, 5, 5])

Error on executing sess.run() ValueError: setting an array element with a sequence

This error is generated when I try to run the training step. The dataset is MNIST dataset from Kaggle. I'm using a neural network to predict the handwritten digits:
Input Data : [33600, 784] reshaped into [784, 33600]
Neural network architecture:
Layer 1 has W1 1000 by 784 relu
Layer 2 has W2 1000 by 1000 relu
Layer 3 has W3 500 by 1000 relu
Layer 4 has W4 200 by 500 relu
Layer 5 has W5 10 by 200 with softmax
No biases used
Code:
print(X_train[:, 0].reshape(-1, 1).shape," ",y_train[:,0].reshape(-1,1).shape)`
Output: (784, 1) (10, 1)
Code:
X, Y = tf.placeholder(tf.float32,[784, None]), tf.placeholder(tf.float32,[10, None])
logits = forward_propagation(X, parameters)
cost = compute_cost(logits, Y)
optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(cost)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
_,c = sess.run([optimizer,cost], feed_dict= {X:X_train[:,0].reshape(-1,1),
Y:y_train[:, 0].reshape(-1,1)})
print(c)
Output:
ValueError Traceback (most recent call
last)
<ipython-input-41-f78f499b0606> in <module>()
8 with tf.Session() as sess:
9 sess.run(tf.global_variables_initializer())
---> 10 _,c = sess.run([optimizer,cost], feed_dict=
{X:np.asarray(X_train), Y:np.asarray(y_train)})
11 print(c)
.......
.......
ValueError: setting an array element with a sequence.
Please correct the code if you can.
I got the solution. As mentioned in the answers of many other similar questions, the problem is generally with the shape & type of arrays provided to the feed_dict.
My main focus was only on X:X_train[:,0].reshape(-1,1) but it was correct shape and type. The error was in Y:y_train[:, 0].reshape(-1,1). I could not detect this error because I applied one_hot_encoding on y_train but forgot to use the .toarray() method after tranforming. So the shape of y_train appeared to be correct but actually it was wrong.
As a general suggestion after going through many similar questions, I would say to thoroughly check shapes, types and content of the arrays being fed to the feed_dict.

IndexError: List Index out of range Keras Tokenizer

I'm working with the sentiment140 dataset to try and learn sentiment analysis using RNNs. I found this tutorial online that uses the keras.imdb datasource, but I want to try and use my own datasource, so I have tried to adapt the code my own data.
Tutorial: https://towardsdatascience.com/a-beginners-guide-on-sentiment-analysis-with-rnn-9e100627c02e
The data preprocessing involves extracting series data and then tokenizing and padding it before sending it to the model for training. I performed these operations below, in my code but whenever I try to run the training I get if isinstance(data[0], list):IndexError: list index out of range. I did not define data so this leads me to believe that I did something that keras or tensorflow did not like. Any ideas as to what is causing this error?
My data is currently in a csv file format with the headers being SENTIMENT and TEXT. SENTIMENT is 0 for negative and 1 for positive. TEXT is the processed tweet that was collected. Here is a sample.
Dataset CSV (Only a view lines to save space)
SENTIMENT,TEXT
0,about to file tax
0,ahh i hate dogs
1,My paycheck came in today
1,lot to do before chi this weekend
1,lol love food
Code
import pandas as pd
import keras
import keras.preprocessing.text as kpt
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
import json
import numpy as np
# Load in DS
df = pd.read_csv('./train.csv')
print(df.head())
#Create sequence
vocabulary_size = 1000
tokenizer = Tokenizer(num_words= vocabulary_size, split=' ')
tokenizer.fit_on_texts(df['TEXT'].values)
X_train = tokenizer.texts_to_sequences(df['TEXT'].values)
#Pad Sequence
X_train = pad_sequences(X_train)
print(X_train)
#Get Sentiment
y_train = df['SENTIMENT'].tolist()
#create model
max_words = 24
from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout
embedding_size=32
model=Sequential()
model.add(Embedding(vocabulary_size, embedding_size, input_length=max_words))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
batch_size = 64
num_epochs = 3
X_valid, y_valid = X_train[:batch_size], y_train[:batch_size]
X_train2, y_train2 = X_train[batch_size:], y_train[batch_size:]
model.fit(X_train2, y_train2,
validation_data=(X_valid, y_valid),
batch_size=batch_size,
epochs=num_epochs)
Output
Using TensorFlow backend.
SENTIMENT TEXT
0 0 aww that be bummer You shoulda get david carr ...
1 0 be upset that he can not update his facebook b...
2 0 I dive many time for the ball manage to save t...
3 0 my whole body feel itchy and like its on fire
4 0 no it be not behave at all be mad why be here ...
[[ 0 0 0 ... 3 10 5]
[ 0 0 0 ... 46 47 89]
[ 0 0 0 ... 29 9 96]
...
[ 0 0 0 ... 30 309 310]
[ 0 0 0 ... 0 0 72]
[ 0 0 0 ... 33 312 313]]
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 24, 32) 32000
_________________________________________________________________
lstm_1 (LSTM) (None, 100) 53200
_________________________________________________________________
dense_1 (Dense) (None, 1) 101
=================================================================
Total params: 85,301
Trainable params: 85,301
Non-trainable params: 0
_________________________________________________________________
None
Traceback (most recent call last):
File "mcve.py", line 50, in <module>
epochs=num_epochs)
File "/home/dv/tensorflow/venv/lib/python3.6/site-packages/keras/engine/training.py", line 950, in fit
batch_size=batch_size)
File "/home/dv/tensorflow/venv/lib/python3.6/site-packages/keras/engine/training.py", line 787, in _standardize_user_data
exception_prefix='target')
File "/home/dv/tensorflow/venv/lib/python3.6/site-packages/keras/engine/training_utils.py", line 79, in standardize_input_data
if isinstance(data[0], list):
IndexError: list index out of range
JUPYTER NOTEBOOK ERROR
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-25-184505b70981> in <module>()
20 model.fit(X_train2, y_train2,
21 batch_size=batch_size,
---> 22 epochs=num_epochs)
23
~/tensorflow/venv/lib/python3.6/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
948 sample_weight=sample_weight,
949 class_weight=class_weight,
--> 950 batch_size=batch_size)
951 # Prepare validation data.
952 do_validation = False
~/tensorflow/venv/lib/python3.6/site-packages/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
785 feed_output_shapes,
786 check_batch_axis=False, # Don't enforce the batch size.
--> 787 exception_prefix='target')
788
789 # Generate sample-wise weight values given the `sample_weight` and
~/tensorflow/venv/lib/python3.6/site-packages/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
77 'for each key in: ' + str(names))
78 elif isinstance(data, list):
---> 79 if isinstance(data[0], list):
80 data = [np.asarray(d) for d in data]
81 elif len(names) == 1 and isinstance(data[0], (float, int)):
IndexError: list index out of range
Edit
My former suggestion is wrong. I've checked your code and run it, and it works without errors for me.
Then I've looked at the source code, standardize_input_data function. There's a line which checks a data argument:
def standardize_input_data(data,
names,
shapes=None,
check_batch_axis=True,
exception_prefix=''):
"""Normalizes inputs and targets provided by users.
Users may pass data as a list of arrays, dictionary of arrays,
or as a single array. We normalize this to an ordered list of
arrays (same order as `names`), while checking that the provided
arrays have shapes that match the network's expectations.
# Arguments
data: User-provided input data (polymorphic).
...
At the line 79:
elif isinstance(data, list):
if isinstance(data[0], list):
...
So, it looks like in case of error an input data is list, but a list of zero length.
A standartize_input_data function is called inside Model.fit(...) method throught a call to Model._standardize_user_data(...). Through this chain of functions, passed data argument gets a value of x argument of Model.fit(x, y, ...). So, I guess is that the problem with type or content of X_train2 or X_valid. Would you provide X_train2 and X_val in addition to X_train content?
Old wrong suggestion
You should increase vocabulary size by one to deal with out-of-vocabulary tokens, I guess.
I.e, change initialization of the Embedding layer:
model.add(Embedding(vocabulary_size + 1, embedding_size, input_length=max_words))
According to the docs, "input_dim: int > 0. Size of the vocabulary, i.e. maximum integer index + 1".
You may check a max. value of the max(X_train) (edited).
Hope it helps!

Resources