I am building my first sentiment analysis model for a small dataset of 1000 reviews using TF-IDF approach along with LSTM using the below code. I am preparing the train data by preprocessing it and feeding to the Vectorizer as below
def tfidf_features(X_train, X_val, X_test):
tfidf_vectorizer = TfidfVectorizer(analyzer='word', token_pattern = '(\S+)', min_df = 5, max_df =
0.9, ngram_range=(1,2))
return X_train, X_val, X_test, tfidf_vectorizer.vocabulary_
I am converting my csr_matrix to a pytorch tensor using the below code
def spy_sparse2torch_sparse(data):
return t
And I am getting the training sentences tensor as this
tensor(indices=tensor([[ 0, 0, 1, ..., 599, 599, 599],
[ 97, 131, 49, ..., 109, 65, 49]]),
values=tensor([0.6759, 0.7370, 0.6076, ..., 0.3288, 0.3927, 0.3288]),
size=(600, 145), nnz=1607, layout=torch.sparse_coo)
I am creating a TensorDataSet using the below code wherein I am also converting my label data from bumpy to a torch tensor
train_data = TensorDataset(train_x, torch.from_numpy(train_y))
I have defined my LSTM network and calling it with the following parameters
n_vocab = len(vocabulary)
n_embed = 100
n_hidden = 256
n_output = 1 # 1 ("positive") or 0 ("negative")
n_layers = 2
net = Sentiment_Lstm(n_vocab, n_embed, n_hidden, n_output, n_layers)
I have also defined the loss and optimizer. Now I am training my model using the below code
print_every = 100
step = 0
n_epochs = 4 # validation loss increases from ~ epoch 3 or 4
clip = 5 # for gradient clip to prevent exploding gradient problem in LSTM/RNN
for epoch in range(n_epochs):
h = net.init_hidden(batch_size)
for inputs, labels in train_loader:
step += 1
# making requires_grad = False for the latest set of h
h = tuple([ for each in h])
output, h = net(inputs)
loss = criterion(output.squeeze(), labels.float())
nn.utils.clip_grad_norm(net.parameters(), clip)
if (step % print_every) == 0:
valid_losses = []
v_h = net.init_hidden(batch_size)
for v_inputs, v_labels in valid_loader:
v_inputs, v_labels =,
v_h = tuple([ for each in v_h])
v_output, v_h = net(v_inputs)
v_loss = criterion(v_output.squeeze(), v_labels.float())
print("Epoch: {}/{}".format((epoch+1), n_epochs),
"Step: {}".format(step),
"Training Loss: {:.4f}".format(loss.item()),
"Validation Loss: {:.4f}".format(np.mean(valid_losses)))
However, I am getting a major error on the line output, h = net(inputs) as RuntimeError: sparse tensors do not have strides
The workarounds given on other websites are not understandable. I am expecting an exact code change I need to make in order to fix this issue.

Pytorch does not support sparse (S) to sparse matrix multiplication.
Let us consider :,c2), where c1 and c2 are sparse_coo_tensor matrices.
case1: If we try c1 and c2 to be S --> It gives the erros RuntimeError: sparse tensors do not have strides.
case2: If c1 is dense (D) and c2 is S --> It gives the same error.
case3: Only when c1 is S and c2 is D --> It works fine.
I guess the matrix multiplication happening in your Sentiment_Lstm might be falling under the first two cases. And thereby throwing this error.
By using dense input format it should work.


Pytorch VGG16 only returning True after training

I am trying to modify the VGG16 model in pytorch to do a simple yes/no feature detection (to detect if 1 particular feature is in an image). To do this I modified the last layer of the VGG network to output 2 tensors instead of 1000, which I believe is about all that should be necessary to accomplish this. When I test the network with random weights/biases it is around 50% accurate as you would expect, and when I print the output layer the tensors vary pretty randomly between -1 and 1. However after a bit of training the output layer very quickly shifts to the second tensor being a positive number and the first tensor being in the negative, until doing a max() just returns 1 (True) every time and thinks it has detected the feature in every image.
What am I doing wrong here? I'm very new to pytorch and machine learning, so I'm not sure what the issue is.
Here's the simplest, reproducable example I can manage. I did not include my training/test loaders because they load images off my local disk, but hopefully this is enough code to figure out what is going wrong:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
for batch, (X, y) in enumerate(dataloader):
prediction = model(X)
loss = loss_fn(prediction, y)
print(f"Training...{batch * len(X) / size * 100:.01f}%")
def test_loop(dataloader, model):
size = len(dataloader.dataset)
correct = 0
with torch.no_grad():
for X, y in dataloader:
outputs = model(X)
predictions =, keepdim=True)[1]
correct += predictions.eq(
print(f'Test set accuracy: {(correct / size) * 100:.01f}%')
network = torchvision.models.vgg16(pretrained=False)
network.classifier[6] = nn.Linear(4096, 2)
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(network.parameters(), lr=LEARNING_RATE, momentum=MOMENTUM)
Running print( in the test_loop function gives the following before/after training (I lowered the batch size to 8 here):
# before
# print(
[4.0089e-02, -1.2475e-04],
[3.2431e-02, -2.2334e-03 ],
[3.7739e-02, 5.7708e-03],
[3.7453e-02, 1.9297e-03],
[4.3812e-02, 5.1457e-05],
[3.8975e-02, 6.3827e-03],
[4.3934e-02, 6.7114e-03],
[3.6315e-02, 8.8174e-03]
# print(
values=tensor([0.0401, 0.0324, 0.0377, 0.0375, 0.0438, 0.0390, 0.0439, 0.0363]),
indices=tensor([0, 0, 0, 0, 0, 0, 0, 0])
# after
# print(
[-0.4314, 0.4763],
[-0.4296, 0.4799],
[-0.3882, 0.4378],
[-0.4257, 0.4682],
[-0.4330, 0.4682],
[-0.3420, 0.3832],
[-0.4467, 0.5142],
[-0.3902, 0.4175]
# print(
values=tensor([0.4763, 0.4799, 0.4378, 0.4682, 0.4635, 0.3832, 0.5142, 0.4175]),
indices=tensor([1, 1, 1, 1, 1, 1, 1, 1])

(pytorch / mse) How can I change the shape of tensor?

Problem definition:
I have to use MSELoss function to define the loss to classification problem. Therefore it keeps saying the error message regarding the shape of tensor.
Entire error message:
torch.Size([32, 10]) torch.Size([32])
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call
last) in
53 output = model.forward(images)
54 print(output.shape, labels.shape)
---> 55 loss = criterion(output, labels)
56 loss.backward()
57 optimizer.step()
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/ in
call(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/ in
forward(self, input, target)
430 def forward(self, input, target):
--> 431 return F.mse_loss(input, target, reduction=self.reduction)
/opt/conda/lib/python3.7/site-packages/torch/nn/ in
mse_loss(input, target, size_average, reduce, reduction) 2213
ret = torch.mean(ret) if reduction == 'mean' else torch.sum(ret)
2214 else:
-> 2215 expanded_input, expanded_target = torch.broadcast_tensors(input, target) 2216 ret =
torch._C._nn.mse_loss(expanded_input, expanded_target,
_Reduction.get_enum(reduction)) 2217 return ret
/opt/conda/lib/python3.7/site-packages/torch/ in
50 [0, 1, 2]])
51 """
---> 52 return torch._C._VariableFunctions.broadcast_tensors(tensors)
> RuntimeError: The size of tensor a (10) must match the size of tensor
b (32) at non-singleton dimension 1
How can I reshape the tensor, and which tensor (output or labels) should I change to calculate the loss?
Entire code is attached below.
import numpy as np
import torch
# Loading the Fashion-MNIST dataset
from torchvision import datasets, transforms
# Get GPU Device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('MNIST_data/', download = True, train = True, transform = transform)
testset = datasets.FashionMNIST('MNIST_data/', download = True, train = False, transform = transform)
trainloader =, batch_size = 32, shuffle = True, num_workers=4)
testloader =, batch_size = 32, shuffle = True, num_workers=4)
# Examine a sample
dataiter = iter(trainloader)
images, labels =
# Define the network architecture
from torch import nn, optim
import torch.nn.functional as F
model = nn.Sequential(nn.Linear(784, 128),
nn.Linear(128, 10),
nn.LogSoftmax(dim = 1))
# Define the loss
criterion = nn.MSELoss()
# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr = 0.001)
# Define the epochs
epochs = 5
train_losses, test_losses = [], []
for e in range(epochs):
running_loss = 0
for images, labels in trainloader:
# Flatten Fashion-MNIST images into a 784 long vector
images =
labels =
images = images.view(images.shape[0], -1)
# Training pass
output = model.forward(images)
print(output.shape, labels.shape)
loss = criterion(output, labels)
running_loss += loss.item()
test_loss = 0
accuracy = 0
# Turn off gradients for validation, saves memory and computation
with torch.no_grad():
# Set the model to evaluation mode
# Validation pass
for images, labels in testloader:
images =
labels =
images = images.view(images.shape[0], -1)
ps = model(images)
test_loss += criterion(ps, labels)
top_p, top_class = ps.topk(1, dim = 1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor))
print("Epoch: {}/{}..".format(e+1, epochs),
"Training loss: {:.3f}..".format(running_loss/len(trainloader)),
"Test loss: {:.3f}..".format(test_loss/len(testloader)),
"Test Accuracy: {:.3f}".format(accuracy/len(testloader)))
From the output you print before it error, torch.Size([32, 10]) torch.Size([32]).
The left one is what the model gives you and the right one is from trainloader, normally you use this for something like nn.CrossEntropyLoss.
And from the full error log, the error is from this line
loss = criterion(output, labels)
The way to make this work is called One-hot Encoding, if it's me for sake of my laziness I'll write it like this.
ones = torch.sparse.torch.eye(10).to(device) # number of class class
labels = ones.index_select(0, labels)
Alternatively, you can change your loss function from nn.MSELoss() to nn.CrossEntropyLoss(). Cross entropy loss is generally preferable to MSE for categorical tasks like this, and in PyTorch's implementation this loss function takes care of a lot of the shape conversion under the hood so you can provide it with a vector of class probabilities and a single class label.
Fundamentally, your model attempts to predict what class the input belongs to by calculating a score (you might call it a 'confidence score') for each possible class. So if you have 10 classes, the model's output will be a 10-dimensional list (in PyTorch, a tensor shape [10]) and the prediction would be the the index of the highest score. Often one would apply the softmax ( function to convert these scores to a probability distribution, so all scores will be between 0 and 1 and the elements all sum to 1.
Then cross entropy is a common choice of loss function for this task: it compares the list of predictions to the one-hot encoded label. E.g. if you have 3 classes, a label would look like [1, 0, 0] to represent the first class. This is also called the "one-hot encoding". Meanwhile a prediction might look like [0.7, 0.1, 0.2]. In PyTorch, nn.CrossEntropyLoss() expects your labels are coming as single value tensors whose value represents the class label, since there's no real need to move long, sparse vectors around memory. So this loss function accomplishes the comparison you want to do and I'm guessing is implemented more efficiently than actually creating one-hot encodings.

Translating LSTM model from Keras to Pytorch

I am having a hard time translating a quite simple LSTM model from Keras to Pytorch. X (get it here) corresponds to 1152 samples of 90 timesteps, each timestep has only 1 dimension. y (here) is a single prediction at t = 91 for all 1152 samples.
In Keras:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
import numpy as np
import pandas as pd
X = pd.read_csv('X.csv', header = None).values
y = pd.read_csv('y.csv', header = None).values
# From Keras documentation []:
# Input shape 3D tensor with shape (batch_size, timesteps, input_dim).
X = np.reshape(X, (1152, 90, 1))
regressor = Sequential()
regressor.add(LSTM(units = 100, return_sequences = True, input_shape = (90, 1)))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(LSTM(units = 50, return_sequences = True))
regressor.add(LSTM(units = 50))
regressor.add(Dense(units = 1, activation = 'linear'))
regressor.compile(optimizer = 'rmsprop', loss = 'mean_squared_error', metrics = ['mean_absolute_error']), y, epochs = 10, batch_size = 32)
... leads me to:
# Epoch 10/10
# 1152/1152 [==============================] - 33s 29ms/sample - loss: 0.0068 - mean_absolute_error: 0.0628
Then in Pytorch:
import torch
from torch import nn, optim
from sklearn.metrics import mean_absolute_error
X = pd.read_csv('X.csv', header = None).values
y = pd.read_csv('y.csv', header = None).values
X = torch.tensor(X, dtype = torch.float32)
y = torch.tensor(y, dtype = torch.float32)
dataset =, y)
loader =, batch_size = 32, shuffle = True)
class regressor_LSTM(nn.Module):
def __init__(self):
self.lstm1 = nn.LSTM(input_size = 1, hidden_size = 100)
self.lstm2 = nn.LSTM(100, 50)
self.lstm3 = nn.LSTM(50, 50, dropout = 0.3, num_layers = 2)
self.dropout = nn.Dropout(p = 0.3)
self.linear = nn.Linear(in_features = 50, out_features = 1)
def forward(self, X):
# From the Pytorch documentation []:
# **input** of shape `(seq_len, batch, input_size)`
X = X.view(90, 32, 1)
# I am discarding hidden/cell states since in Keras I am using a stateless approach
# []
X, _ = self.lstm1(X)
X = self.dropout(X)
X, _ = self.lstm2(X)
X = self.dropout(X)
X, _ = self.lstm3(X)
X = self.dropout(X)
X = self.linear(X)
return X
regressor = regressor_LSTM()
criterion = nn.MSELoss()
optimizer = optim.RMSprop(regressor.parameters())
for epoch in range(10):
running_loss = 0.
running_mae = 0.
for i, data in enumerate(loader):
inputs, labels = data
outputs = regressor(inputs)
outputs = outputs[-1].view(*labels.shape)
loss = criterion(outputs, labels)
running_loss += loss.item()
mae = mean_absolute_error(labels.detach().cpu().numpy().flatten(), outputs.detach().cpu().numpy().flatten())
running_mae += mae
print('EPOCH %3d: loss %.5f - MAE %.5f' % (epoch+1, running_loss/len(loader), running_mae/len(loader)))
... leads me to:
# EPOCH 10: loss 0.04220 - MAE 0.16762
You can notice that both loss and MAE are quite different (Pytorch's are much higher). If I use Pytorch's model to predict the values, they all return as a constant.
What am I doing wrong?
Oh I believe I made considerable progress. It seems that the way to represent y is different between Keras and Pytorch. In Keras, we should pass it as a single value representing one timestep in the future (or, at least, for the problem I am trying to solve). But in Pytorch, y must be X shifted one timestep to the future. It is like this:
time_series = [0, 1, 2, 3, 4, 5]
X = [0, 1, 2, 3, 4]
# Keras:
y = [5]
# Pytorch:
y = [1, 2, 3, 4, 5]
This way, Pytorch compares all values in the time slice when calculating loss. I believe Keras rearranges the data under the hood to conform to this approach, as the code works when fed the variables just like that. But in Pytorch, I was estimating loss based only on one value (the one I was trying to predict), not the whole series, therefore I believe it could not correctly capture the time dependency.
When taking this in consideration, I got to:
EPOCH 100: loss 0.00551 - MAE 0.058435
And, most importantly, comparing true and predicted values in a separate dataset got me to
The patterns were clearly captured by the model.

Numpy and tensorflow RNN shape representation mismatch

I'm building my first RNN in tensorflow. After understanding all the concepts regarding the 3D input shape, I came across with this issue.
In my numpy version (1.15.4), the shape representation of 3D arrays is the following: (panel, row, column). I will make each dimension different so that it is clearer:
In [1]: import numpy as np
In [2]: arr = np.arange(30).reshape((2,3,5))
In [3]: arr
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]],
[[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
In [4]: arr.shape
Out[4]: (2, 3, 5)
In [5]: np.__version__
Out[5]: '1.15.4'
Here my understanding is: I have two timesteps with each timestep having 3 observations with 5 features in each observation.
However, in tensorflow "theory" (which I believe it is strongly based in numpy) RNN cells expect tensors (i.e. just n-dimensional matrices) of shape [batch_size, timesteps, features], which could be translated to: (row, panel, column) in the numpy "jargon".
As can be seen, the representation doesn't match, leading to errors when feeding numpy data into a placeholder, which in most of the examples and theory is defined like:
x = tf.placeholder(tf.float32, shape=[None, N_TIMESTEPS_X, N_FEATURES], name='XPlaceholder')
np.reshape() doesn't solve the issue because it just rearranges the dimensions, but messes up with the data.
I'm using for the first time the Dataset API, but I encounter the problems once into the session, not in the Dataset API ops.
I'm using the static_rnn method, and everything works well until I have to feed the data into the placeholder, which obviously results in a shape error.
I have tried to change the placeholder shape to shape=[N_TIMESTEPS_X, None, N_FEATURES]. HOWEVER, I'm using the dataset API, and I get errors when making the initializer if I change the Xplaceholder to the shape=[N_TIMESTEPS_X, None, N_FEATURES].
So, to summarize:
First problem: Shape errors with different shape representations.
Second problem: Dataset error when equating the shape representations (I think that either static_rnn or dynamic_rnn would function if this is resolved).
My question is:
¿Is there anything I'm missing in regard to this different representation logic which makes the practice confusing?
¿Could the solution be attained to switching to dynamic_rnn? (although the problems about the shape I encounter are related to the dataset API initializer being fed with shape [N_TIMESTEPS_X, None, N_FEATURES], not with the RNN cell itself.
Thank you very much for your time.
Full code:
'''The idea is to create xt, yt, xval and yval. My numpy arrays to
be fed are of the following shapes:
The 3D xt array has a shape of: (11, 69579, 74)
The 3D xval array has a shape of: (11, 7732, 74)
The yt array has a shape of: (69579, 3)
The yval array has a shape of: (7732, 3)
N_TIMESTEPS_X = xt.shape[0] ## The stack number
#N_OBSERVATIONS = xt.shape[1]
N_FEATURES = xt.shape[2]
N_OUTPUTS = yt.shape[1]
N_NEURONS_LSTM = 128 ## Number of units in the LSTMCell
N_NEURONS_DENSE = 64 ## Number of units in the Dense layer
N_EPOCHS = 600
### Define the placeholders anda gather the data.
train_data = (xt, yt)
validation_data = (xval, yval)
## We define the placeholders as a trick so that we do not break into memory problems, associated with feeding the data directly.
'''As an alternative, you can define the Dataset in terms of tf.placeholder() tensors, and feed the NumPy arrays when you initialize an Iterator over the dataset.'''
batch_size = tf.placeholder(tf.int64)
x = tf.placeholder(tf.float32, shape=[None, N_TIMESTEPS_X, N_FEATURES], name='XPlaceholder')
y = tf.placeholder(tf.float32, shape=[None, N_OUTPUTS], name='YPlaceholder')
# Creating the two different dataset objects.
train_dataset =,y)).batch(BATCH_SIZE).repeat()
val_dataset =,y)).batch(BATCH_SIZE)
# Creating the Iterator type that permits to switch between datasets.
itr =, train_dataset.output_shapes)
train_init_op = itr.make_initializer(train_dataset)
validation_init_op = itr.make_initializer(val_dataset)
next_features, next_labels = itr.get_next()
### Create the graph
cellType = tf.nn.rnn_cell.LSTMCell(num_units=N_NEURONS_LSTM, name='LSTMCell')
inputs = tf.unstack(next_features, N_TIMESTEPS_X, axis=0)
'''inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size]'''
RNNOutputs, _ = tf.nn.static_rnn(cell=cellType, inputs=inputs, dtype=tf.float32)
predictionsLayer = tf.layers.dense(inputs=tf.layers.batch_normalization(RNNOutputs[-1]), units=N_NEURONS_DENSE, activation=None, name='Dense_Layer')
### Define the cost function, that will be optimized by the optimizer.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=predictionsLayer, labels=next_labels, name='Softmax_plus_Cross_Entropy'))
optimizer_type = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE, name='AdamOptimizer')
optimizer = optimizer_type.minimize(cost)
### Model evaluation
correctPrediction = tf.equal(tf.argmax(predictionsLayer,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correctPrediction,tf.float32))
#confusionMatrix = tf.confusion_matrix(next_labels, predictionsLayer, num_classes=3, name='ConfMatrix')
N_BATCHES = train_data[0].shape[0] // BATCH_SIZE
## Saving variables so that we can restore them afterwards.
saver = tf.train.Saver()
save_dir = '/home/zmlaptop/Desktop/tfModels/{}_{}'.format(cellType.__class__.__name__,"%Y%m%d%H%M%S"))
varDict = {'nTimeSteps':N_TIMESTEPS_X, 'BatchSize': BATCH_SIZE, 'nFeatures':N_FEATURES,
'nNeuronsLSTM':N_NEURONS_LSTM, 'nNeuronsDense':N_NEURONS_DENSE, 'nEpochs':N_EPOCHS,
'learningRate':LEARNING_RATE, 'optimizerType': optimizer_type.__class__.__name__}
varDicSavingTxt = save_dir + '/varDict.txt'
modelFilesDir = save_dir + '/modelFiles'
logDir = save_dir + '/TBoardLogs'
acc_summary = tf.summary.scalar('Accuracy', accuracy)
loss_summary = tf.summary.scalar('Cost_CrossEntropy', cost)
summary_merged = tf.summary.merge_all()
with open(varDicSavingTxt, 'w') as outfile:
with tf.Session() as sess:
train_writer = tf.summary.FileWriter(logDir + '/train', sess.graph)
validation_writer = tf.summary.FileWriter(logDir + '/validation')
# initialise iterator with train data, feed_dict = {x : train_data[0], y: train_data[1], batch_size: BATCH_SIZE})
print('¡Training starts!')
for epoch in range(N_EPOCHS):
batchAccList = []
tot_loss = 0
for batch in range(N_BATCHES):
optimizer_output, loss_value, summary =[optimizer, cost, summary_merged])
accBatch =
tot_loss += loss_value
if batch % 10 == 0:
train_writer.add_summary(summary, batch)
epochAcc = tf.reduce_mean(batchAccList)
if epoch%10 == 0:
print("Epoch: {}, Loss: {:.4f}, Accuracy: {}".format(epoch, tot_loss / N_BATCHES, epochAcc))
#confM =
#confDic = {'confMatrix': confM}
#confTxt = save_dir + '/confMDict.txt'
#with open(confTxt, 'w') as outfile:
# outfile.write(repr(confDic))
# initialise iterator with validation data, feed_dict = {x : validation_data[0], y: validation_data[1], batch_size:len(validation_data[0])})
print('Validation Loss: {:4f}, Validation Accuracy: {}'.format(,
summary_val =
validation_writer.add_summary(summary_val), modelFilesDir)
Is there anything I'm missing in regard to this different
representation logic which makes the practice confusing?
In fact, you made a mistake about the input shapes of static_rnn and dynamic_rnn. The input shape of static_rnn is [timesteps,batch_size, features](link),which is a list of 2D tensors of shape [batch_size, features]. But The input shape of dynamic_rnn is either [timesteps,batch_size, features] or [batch_size,timesteps, features] depending on time_major is True or False(link).
Could the solution be attained to switching to dynamic_rnn?
The key is not that you use static_rnn or dynamic_rnn, but that your data shape matches the required shape. The general format of placeholder is like your code is [None, N_TIMESTEPS_X, N_FEATURES]. It's also convenient for you to use dataset API.
You can use transpose()(link) instead of reshape().transpose() will permute the dimensions of an array and won't messes up with the data.
So your code needs to be modified.
# permute the dimensions
xt = xt.transpose([1,0,2])
xval = xval.transpose([1,0,2])
# adjust shape,axis=1 represents timesteps
inputs = tf.unstack(next_features, axis=1)
Other errors should have nothing to do with rnn shape.

How do I obtain predictions and probabilities from new data input to a CNN in Tensorflow

I'll preface this by saying this is my first posted question on SO. I've just recently started working with Tensorflow, and have been attempting to apply a convolutional-neural network model approach for classification of .csv records in a file representing images from scans of microarray data. (FYI: Microarrays are a grid of spotted DNA on a glass slide, representing specific DNA target sequences for determining the presence of those DNA targets in a sample. The individual pixels represent fluorescence intensity value from 0-1). The file has ~200,000 records in total. Each record (image) has 10816 pixels that represent DNA sequences from known viruses, and one index label which identifies the virus species. The pixels create a pattern which is unique to each of the different viruses. There are 2165 different viruses in total represented within the 200,000 records. I have trained the network on images of labeled microarray datasets, but when I try to pass a new dataset through to classify it/them as one of the 2165 different viruses and determine predicted values and probabilities, I don't seem to be having much luck. This is the code that I am currently using for this:
import tensorflow as tf
import numpy as np
import csv
def extract_data(filename):
print("extracting data...")
labels = []
fvecs = []
rowCount = 0
#iterate over the rows, split the label from the features
#convert the labels to integers and features to floats
for line in open(filename):
rowCount = rowCount + 1
row = line.split(',')
labels.append(row[3])#(int(row[7])) #<<<IT ALWAYS PREDICTS THIS VALUE!
for x in row [4:10820]:
#convert the array of float arrasy into a numpy float matrix
fvecs_np = np.matrix(fvecs).astype(np.float32)
#convert the array of int lables inta a numpy array
labels_np = np.array(labels).astype(dtype=np.uint8)
#convert the int numpy array into a one-hot matrix
labels_onehot = (np.arange(NUM_LABELS) == labels_np[:, None]).astype(np.float32)
print("arrays converted")
return fvecs_np, labels_onehot
def TestModels():
fvecs_np, labels_onehot = extract_data("MicroarrayTestData.csv")
weights = {}
biases = {}
init = tf.global_variables_initializer()
#Load meta graph and restore weights
ModelID = "MicroarrayCNN_Data-1000.meta"
print("RESTORING:::", ModelID)
saver = tf.train.import_meta_graph(ModelID)
graph = tf.get_default_graph()
x = graph.get_tensor_by_name("x:0")
y = graph.get_tensor_by_name("y:0")
keep_prob = tf.placeholder(tf.float32)
y_ = tf.placeholder("float", shape=[None, 2165])
wc1 = graph.get_tensor_by_name("wc1:0")
wc2 = graph.get_tensor_by_name("wc2:0")
wd1 = graph.get_tensor_by_name("wd1:0")
Wout = graph.get_tensor_by_name("Wout:0")
bc1 = graph.get_tensor_by_name("bc1:0")
bc2 = graph.get_tensor_by_name("bc2:0")
bd1 = graph.get_tensor_by_name("bd1:0")
Bout = graph.get_tensor_by_name("Bout:0")
weights = {wc1, wc2, wd1, Wout}
biases = {bc1, bc2, bd1, Bout}
probabilities = y
predY = prediction.eval(feed_dict={x: fvecs_np, y: labels_onehot}, session=sess)
probY = probabilities.eval(feed_dict={x: fvecs_np, y: labels_onehot}, session=sess)
accuracy = tf.reduce_mean(tf.cast(prediction, "float"))
print(, feed_dict={x: fvecs_np, y: labels_onehot}))
print("Predicted::: ", predY, accuracy)
feed_dictTEST = {y: labels_onehot}
print("probabilities", probabilities.eval(feed_dict={x: fvecs_np}, session=sess))
########## Run Analysis ###########
So, when I run this code I get the correct prediction for the test set, although I am not sure I believe it, because it appears that whatever value I append in line 14 (see below) is the output it predicts:
labels.append(row[3])#<<<IT ALWAYS PREDICTS THIS VALUE!
I don't understand this, and it makes me suspicious that I've set up the CNN incorrectly, as I would have expected it to ignore my input label and determine a bast match from the trained network based on the trained patterns. The only thing I can figure is that when I pass the value through for the prediction; it is instead training the model on this data as well, and then predicting itself. Is this a correct assumption, or am I misinterpreting how Tensorflow works?
The other issue is that when I try to use code that (based on other tutorials) which is supposed to output the probabilities of all of the 2165 possible outputs, I get the error:
InvalidArgumentError (see above for traceback): Shape [-1,2165] has negative dimensions
[[Node: y = Placeholder[dtype=DT_FLOAT, shape=[?,2165], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
To me, it looks like it is the correct layer based on the 2165 value in the Tensor shape, but I don't understand the -1 value. So, to wrap up the summary, my questions are:
Based on the fact that I get the value that I have in the label of the input data, is this the correct method to make a classification using this model?
Am I missing a layer or have I configured the model incorrectly in order to extract the probabilities of all of the possible output classes, or am I using the wrong code to extract the information? I try to print out the accuracy to see if that would work, but instead it outputs the description of a tensor, so clearly that is incorrect as well.
As requested, I'm also including the original code that was used to train the model, which is now below. You can see I do sort of a piece meal training of a limited number of related records at a time by their taxonomic relationships as I iterate through the file. This is mostly because the Mac that I'm training on (Mac Pro w/ 64GB ram) tends to give me the "Killed -9" error due to overuse of resources if I don't do it this way. There may be a better way to do it, but this seems to work.
Original Author: Aymeric Damien
from __future__ import print_function
import tensorflow as tf
import numpy as np
import csv
import random
# Parameters
num_epochs = 2
train_size = 1609
learning_rate = 0.001 #(larger >speed, lower >accuracy)
training_iters = 5000 # How much do you want to train (more = better trained)
batch_size = 32 #How many samples to train on, size of the training batch
display_step = 10 # How often to diplay what is going on during training
# Network Parameters
n_input = 10816 # MNIST data input (img shape: 28*28) my case 104x104 = 10816(rough array size)
n_classes = 2165 #3280 #2307 #787# Switched to 100 taxa/training set, dynamic was too wonky.
dropout = 0.75 # Dropout, probability to keep units. Jeffery Hinton's group developed it, that prevents overfitting to find new paths. More generalized model.
# Functions
def extract_data(filename):
print("extracting data...")
# arrays to hold the labels and feature vectors.
taxCount = 0
taxCurrent = 0
labels = []
fvecs = []
rowCount = 0
#iterate over the rows, split the label from the features
#convert the labels to integers and features to floats
print("entering CNN loop")
for line in open(filename):
rowCount = rowCount + 1
row = line.split(',')
taxCurrent = row[3]
print("profile:", row[0:12])
fvecs.append([float(x) for x in row [4:10820]])
#convert the array of float arrasy into a numpy float matrix
fvecs_np = np.matrix(fvecs).astype(np.float32)
#convert the array of int lables inta a numpy array
labels_np = np.array(labels).astype(dtype=np.uint8)
#convert the int numpy array into a one-hot matrix
labels_onehot = (np.arange(NUM_LABELS) == labels_np[:, None]).astype(np.float32)
print("arrays converted")
return fvecs_np, labels_onehot
# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1): #Layer 1 : Convolutional layer
# Conv2D wrapper, with bias and relu activation
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME') # Strides are the tensors...list of integers. Tensors=data
x = tf.nn.bias_add(x, b) #bias is the tuning knob
return tf.nn.relu(x) #rectified linear unit (activation function)
def maxpool2d(x, k=2): #Layer 2 : Takes samples from the image. (This is a 4D tensor)
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
# Create model
def conv_net(x, weights, biases, dropout):
print("conv_net setup")
# Reshape input picture
x = tf.reshape(x, shape=[-1, 104, 104, 1]) #-->52x52 , -->26x26x64
# Convolution Layer
conv1 = conv2d(x, weights['wc1'], biases['bc1']) #defined above already
# Max Pooling (down-sampling)
conv1 = maxpool2d(conv1, k=2)
# Convolution Layer
conv2 = conv2d(conv1, weights['wc2'], biases['bc2']) #wc2 and bc2 are just placeholders...could actually skip this layer...maybe
# Max Pooling (down-sampling)
conv2 = maxpool2d(conv2, k=2)
# Fully connected layer
# Reshape conv2 output to fit fully connected layer input
fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1) #activation function for the NN
# Apply Dropout
fc1 = tf.nn.dropout(fc1, dropout)
# Output, class prediction
out = tf.add(tf.matmul(fc1, weights['Wout']), biases['Bout'])
return out
def Train_Network(Txid_IN, Sess_File_Name):
import tensorflow as tf
x,y = 0,0
weights = {}
biases = {}
# tf Graph input
print("setting placeholders")
x = tf.placeholder(tf.float32, [None, n_input], name="x") #Gateway for data (images)
y = tf.placeholder(tf.float32, [None, n_classes], name="y") # Gateway for data (labels)
keep_prob = tf.placeholder(tf.float32) #dropout # Gateway for dropout(keep probability)
# Store layers weight & bias
#CREATE weights
weights = {
# 5x5 conv, 1 input, 32 outputs
'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32]),name="wc1"), #
# 5x5 conv, 32 inputs, 64 outputs
'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64]),name="wc2"),
# fully connected, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(tf.random_normal([26*26*64, 1024]),name="wd1"),
# 1024 inputs, 10 outputs (class prediction)
'Wout': tf.Variable(tf.random_normal([1024, n_classes]),name="Wout")
biases = {
'bc1': tf.Variable(tf.random_normal([32]), name="bc1"),
'bc2': tf.Variable(tf.random_normal([64]), name="bc2"),
'bd1': tf.Variable(tf.random_normal([1024]), name="bd1"),
'Bout': tf.Variable(tf.random_normal([n_classes]), name="Bout")
# Construct model
print("constructing model")
pred = conv_net(x, weights, biases, keep_prob)
# Define loss(cost) and optimizer
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y)) Deprecated version of the statement
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels=y)) #added reduce_mean 6/27
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
print ("%% ", correct_pred)
print ("%% ", accuracy)
# Initializing the variables
#init = tf.initialize_all_variables()
init = tf.global_variables_initializer()
saver = tf.train.Saver()
fvecs_np, labels_onehot = extract_data("MicroarrayDataOUT.csv") #CHAGE TO PICORNAVIRUS!!!!!AHHHHHH!!!
print("starting session")
# Launch the graph
FitStep = 0
with tf.Session() as sess: #graph is encapsulated by its session
step = 1
# Keep training until reach max iterations (training_iters)
while step * batch_size < training_iters:
if FitStep >= 5:
#iterate and train
print(fvecs_np, labels_onehot)
for step in range(num_epochs * train_size // batch_size):, feed_dict={x: fvecs_np, y: labels_onehot, keep_prob:dropout}) #no dropout???...added Keep_prob:dropout
if FitStep >= 5:
###batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop), feed_dict={x: batch_x, y: batch_y,
### keep_prob: dropout}) <<<<SOMETHING IS WRONG IN HERE?!!!
if step % display_step == 0:
# Calculate batch loss and accuracy
loss, acc =[cost, accuracy], feed_dict={x: fvecs_np,
y: labels_onehot,
keep_prob: 1.})
print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(np.mean(loss)) + ", Training Accuracy= " + \
TrainAcc = float("{:.5f}".format(acc))
#print("******", TrainAcc)
if TrainAcc >= .99: #Changed from .95 temporarily
FitStep = FitStep+1, Sess_File_Name, global_step=1000) #
print("Saved Session:", Sess_File_Name)
step += 1
print("Optimization Finished!")
print("Testing Accuracy:", \, feed_dict={x: fvecs_np[:256],
y: labels_onehot[:256],
keep_prob: 1.}))
#feed_dictTEST = {x: fvecs_np[50]}
#best =[prediction],feed_dictTEST)
def Tax_Iterator(CSV_inFile, CSV_outFile): #Deprecate
#Need to copy *.csv file to MySQL for sorting
resultFileINIT = open(CSV_outFile,'w')
TaxCount = 0
TaxThreshold = 2165
ThresholdStep = 2165
PrevTax = 0
linecounter = 0
#Open all GenBank profile list
for line in open(CSV_inFile):
linecounter = linecounter+1
resultFile = open(CSV_outFile,'a')
wr = csv.writer(resultFile, dialect='excel')
# Check for new TXID
row = line.split(',')
print(row[7], "===", PrevTax)
if row[7] != PrevTax:
TaxCount = TaxCount+1
PrevTax = row[7]
#Check it current Tax count is < or > threshold
# < threshold
print(TaxCount,"=+=", TaxThreshold)
if TaxCount<=3300:
CurrentTax= row[7]
CurrTxCount = CurrentTax
print("TaxCount=", TaxCount)
print( "Add to CSV")
print("row:", CurrentTax, "***", row[0:15])
# is > threshold
# but same TXID....
print(row[7], "=-=", CurrentTax)
if row[7]==CurrentTax:
CurrentTax= row[7]
print("TaxCount=", TaxCount)
print( "Add to CSV")
print("row:", CurrentTax, "***", row[0:15])
# but different TXID...
print(row[7], "=*=", CurrentTax)
if row[7]>CurrentTax:
Sess_File_Name = "CNN_VirusIDvSPECIES_XXALL"+ str(TaxThreshold-ThresholdStep)
print("<<<< Start Training >>>>"
print("Training on :: ", CurrTxCount, "Taxa", TaxCount, "data points.")
Train_Network(CurrTxCount, Sess_File_Name)
print("Training complete")
resultFileINIT = open(CSV_outFile,'w')
CurrentTax= row[7]
#reset tax count
CurrTxCount = 0
TaxCount = 0
Sess_File_Name = "MicroarrayCNN_Data"+ str(TaxThreshold+ThresholdStep)
print("<<<< Start Training >>>>")
print("Training on :: ", CurrTxCount, "Taxa", TaxCount, "data points.")
Train_Network(CurrTxCount, Sess_File_Name)
resultFileINIT = open(CSV_outFile,'w')
CurrentTax= row[7]
Tax_Iterator("MicroarrayInput.csv", "MicroarrayOutput.csv")
You defined prediction as prediction=tf.argmax(y,1). And in both feed_dict, you feed labels_onehot for y. Consequently, your "prediction" is always equal to the labels.
As you didn't post the code you used to train your network, I can't tell you what exactly you need to change.
Edit: I have isses understanding the underlying problem you're trying to solve - based on your code, you're trying to train a neural network with 2165 different classes using 1609 training examples. How is this even possible? If each example had a different class, there would still be some classes without any training example. Or does one image belong to many classes? From your statement at the beginning of your question, I had assumed you're trying to output a real-valued number between 0-1.
I'm actually surprised that the code actually worked as it looks like you're adding only a single number to your labels list, but your model expects a list with length 2165 for each training example.
