How to feed a model with "a list of outputs"? - keras

Sorry for the title but I could't come up with a better description here.
I am trying to apply batches for training on a model which should have 13 fully connected output layers. Each output layer has only two nodes (but are fully connected as stated).
Building the model's output looks like this:
outputs = list()
for i in range(num_labels):
out_y = Dense(2, activation='softmax', name='out_{:d}'.format(i))(convolution_layer)
self.model = Model(input=inputs, output=outputs)
However, I can't manage to feed this model. I've tried to go with a [batch_size, 13, 1, 2] sized output array:
y = np.zeros((batch_size, 13, 1, 2))
But for a batch of size 2 I get:
ValueError: The model expects 13 input arrays, but only received one array. Found: array with shape (2, 13, 1, 2)
I've tried several other things but it's simply not clear to me how the input for the model looks like.
How can I train this model?
I have also tried to pass a list of lists of numpy arrays:
where the first level of the batch represent the sample (here 2) and the second level is the sample with the list of 13 numpy arrays. Yet I am getting:
ValueError: Error when checking model target: you are passing a list as input to your model, but the model expects a list of 13 Numpy arrays instead. The list you passed was: [[array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 0., 1.]), array([ 1., 0.]), array([
As suggested, I also tried to return a list() of numpy arrays of size [13,2]:
Where the error becomes:
ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 13 arrays but instead got the following list of 2 arrays: [array([[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 0., 1.],
[ 1., 0.],
[ ...
The code
Below you can find the current code which generates one sample in sample_generator and a full batch in batch_generator (which uses sample_generator).
def batch_generator(w2v, file_path, meta_info, batch_size, sample_generator_fn, embedding_size):
Please note: The code shows now how I generate a list() of [13,2] ndarrays whereas the number of such ndarrays in that list is defined by batch_size.
x = np.zeros((batch_size, meta_info.max_sequence_length, embedding_size, 1))
y = list() #np.zeros((batch_size, 13, 1, 2))
file = open(file_path)
while True:
x[:] = 0.0
#y[:] = 0.0
for batch in range(batch_size):
sentence_info_json = file.readline()
if sentence_info_json == '':
sentence_info_json = file.readline()
sample = sample_generator_fn(w2v, sentence_info_json, meta_info)
if not sample:
sentence_embedding = sample[0]
final_length = len(sentence_embedding)
x[batch, :final_length, :, 0] = sentence_embedding
shuffled = np.asarray(range(batch_size))
x = x[shuffled]
#y = y[shuffled]
y = [y[i] for i in shuffled]
yield x, y
except Exception as e:
print('Error in generator.')
raise e
def sample_generator(w2v, sentence_info_json, meta_info):
if not sentence_info_json:
sentence_info = json.loads(sentence_info_json)
tokens = [token['word'] for token in sentence_info['corenlp']['tokens']]
sentence = Sentence(tokens=tokens)
sentence_embedding = w2v.get_word_vectors(sentence.tokens.tolist())
sentence_embedding = np.asarray([word_vector for word_vector in sentence_embedding if word_vector is not None])
final_length = len(sentence_embedding)
if final_length == 0:
return None
y = np.zeros((2, len(meta_info.category_dict)))
y[1, :] = 1.
#y_list = []
y_tar = np.zeros((len(meta_info.category_dict), 2))
for i in range(len(meta_info.category_dict)):
y_tar[i][1] = 1.0
# y_list.append(np.asarray([0.0, 1.0]))
for opinion in sentence_info['opinions']:
index = meta_info.category_dict[opinion['category']]
y_tar[index][0] = 1.0
y_tar[index][1] = 0.0
#y_list[index][0] = 1.0
#y_list[index][1] = 0.0
return sentence_embedding, y_tar
As requested, the call to fit_generator()
train_file, train_meta_info,
num_batches, sample_generator,
# validation_data=batch_generator(test_file_path, train_meta_info),
# nb_val_samples=100,

Your output should be a list as specified in the error. Each element of the list should be a numpy array of size [batch_size, nb_outputs]. So a list of 13 elements of size [batch_size,2] in your case.


How to evaluate a pyTorch/DGL tensor

From a DGL graph I want to see the adjacency matrix with
adjM = g.adjacency_matrix()
and I get the following which is fine:
tensor(indices=tensor([[0, 0, 0, 1],
[1, 2, 3, 3]]),
values=tensor([1., 1., 1., 1.]),
size=(4, 4), nnz=4, layout=torch.sparse_coo)
Now I want to have the adjacency matrix and the node values each by itself. I imagine something of this kind:
adjMatrix = adjM.indices # or
adjMatrix = adjM[0]
nodeValues = adjM.values # or
nodeValues = adjM[1]
But this form is not estimated by pyTorch/DGL.
My beginner's question:
how to do this correctly and sucsessfully? and
is there a tutorial for a nuby? ( I have searched a lot just for this detail...!)
Click here!
You will find the usage of dgl.adj(). As the doc said, the return is an adjacency matrix, and the return type is the SparseTensor.
I noticed that the output that you post is a SparseTensor.
You can try it as follows then you can get the entire adj_matrix
I create a dgl graph g, get the adjacency matrix as adj
g = dgl.graph(([0, 1, 2], [1, 2, 3]))
adj = g.adj()
output is:
tensor(indices=tensor([[0, 1, 2],
[1, 2, 3]]),
values=tensor([1., 1., 1.]),
size=(4, 4), nnz=3, layout=torch.sparse_coo)
We can find that adj is the presence of sparse, and the sparse type is coo, we can use the following code to verify if adj is a SparseTensor
output :
so we can use to_dense() get the original adj matrix
the result is:
tensor([[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.],
[0., 0., 0., 0.]])
When you have a problem with DGL you can check the Deep Graph Library Tutorials and Documentation.

How do I mask a feed forward layer based on tensor in pytorch?

I have a really simple network with 2 inputs (x and m).
x is size 100
m is size 3
My network is simply...
f_1 = linear_layer(x)
f_2 = linear_layer(f_1)
f_3 = linear_layer(f_1)
f_4 = linear_layer(f_1)
f_5 = softmax(linear_layer(sum(f_2, f_3, f_4)))
based on the vector m, I want to zero out and ignore f_2, f_3, f_4 in the final sum and resulting gradient calculation. Is there a way to create a mask based on vector m to achieve this?
Ok, here is how you do it. Use list comprehensions to make it more generic:
# example input and output
x = torch.ones(5)
y = torch.zeros(3)
# mask tensor
mask = torch.tensor([0, 1, 0])
# initial layer
z0 = torch.nn.Linear(5, 5)
# layers to potentially mask
z1 = torch.nn.Linear(5, 3)
z2 = torch.nn.Linear(5, 3)
z3 = torch.nn.Linear(5, 3)
# defines how the data passes through the layers, specific mask element is applied to each of the maskable layers
layer1_output = z0(x)
layer2_output = mask[0]*z1(layer1_output) + mask[1]*z2(layer1_output) + mask[2]*z3(layer1_output)
# loss function
loss = torch.nn.functional.binary_cross_entropy_with_logits(layer2_output, y)
# run it and see
as shown below, the masking tensor is effective in selecting subnets to apply computation to based on mask element
tensor([[ 0.0354, 0.0354, 0.0354, 0.0354, 0.0354],
[-0.0986, -0.0986, -0.0986, -0.0986, -0.0986],
[-0.0372, -0.0372, -0.0372, -0.0372, -0.0372],
[-0.0168, -0.0168, -0.0168, -0.0168, -0.0168],
[-0.0133, -0.0133, -0.0133, -0.0133, -0.0133]])
tensor([[-0., 0., 0., -0., 0.],
[-0., 0., 0., -0., 0.],
[-0., 0., 0., -0., 0.]])
tensor([[-0.0422, 0.1314, 0.1108, -0.1644, 0.0906],
[-0.0240, 0.0747, 0.0630, -0.0934, 0.0515],
[-0.0251, 0.0781, 0.0659, -0.0977, 0.0539]])
tensor([[-0., 0., 0., -0., 0.],
[-0., 0., 0., -0., 0.],
[-0., 0., 0., -0., 0.]])

Why do 'loss.backward()' and 'weight.grad' return a tensor containing all zeros?

When I run 'loss.backward()' and 'weight.grad' I get a tensor containing all zeros. Also, 'weight.grad_fn' retruns NONE.
However, it all seems to return the correct result for the second layer 'w2'.
If I play with simple operations such as x*2 or x**2 'backward()' and '.grad' return correct results
Here's my code:
import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets, transforms
# Getting MNIST data
num_workers = 0
batch_size = 64
transform = transforms.ToTensor()
train_data = datasets.MNIST(root='data', train=True, download=True, transform=transform)
train_loader =, batch_size=batch_size, num_workers=num_workers)
dataiter = iter(train_loader)
images, labels =
#### NN Part
def activation(x):
return 1/(1+torch.exp(-x))
inputs = torch.from_numpy(images.view())
# Flatten the inputs format from (64,1,28,28) into (64,784)
inputs = inputs.reshape(images.shape[0], int(images.shape[1]*images.shape[2]*images.shape[3]))
w1 = torch.randn(784, 256, requires_grad=True)# n_input, n_hidden
b1 = torch.randn(256)# n_hidden
w2 = torch.randn(256, 10, requires_grad=True)# n_hidden, n_output
b2 = torch.randn(10)# n_output
h = activation(, w1) + b1)
y =, w2) + b2
By the way it gives me the same problem if I try to run it this way also:
images = images.reshape(images.shape[0], -1)
model = nn.Sequential(nn.Linear(784, 128),
nn.Linear(128, 64),
nn.Linear(64, 10),
logits = model(images)
criterion = nn.NLLLoss()
loss = criterion(logits, labels)
print('Before backward pass: ', model[0].weight.grad)
print('After: ', model[0].weight.grad)
#print('After: ', model[2].weight.grad)
#print('After: ', model[4].weight.grad)
The gradients of w1 are not all zero, there are simply a lot of zeros, especially around the border, because the MNIST images have a lot of black pixels (zeros). When multiplying with zero, the resulting gradients are also zero.
By printing w1.grad you only see a very small part of the values (borders), and you just can't see the non-zero values.
# => tensor([[0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.],
# ...,
# [0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.],
# [0., 0., 0., ..., 0., 0., 0.]])
# Indices of non-zero elements
# => tensor([[ 71, 0],
# [ 71, 1],
# [ 71, 2],
# ...,
# [746, 253],
# [746, 254],
# [746, 255]])

How to add to pytorch tensor at indices?

I have to admit, I'm a bit confused by the scatter* and index* operations - I'm not sure any of them do exactly what I'm looking for, which is very simple:
Given some 2-D tensor
z = tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
And a list (or tensor?) of 2-d indexes:
inds = tensor([[0, 0],
[1, 1],
[1, 2]])
I want to add a scalar to z at those indexes (and do it efficiently):
znew = z.something_add(inds, 3)
znew = tensor([[4., 1., 1., 1.],
[1., 4., 4., 1.],
[1., 1., 1., 1.]])
If I have to I can make that scalar a tensor of whatever shape (where all elements = 3), but I'd rather not...
You must provide two lists to your indexing. The first having the row positions and the second the column positions. In your example, it would be:
z[[0, 1, 1], [0, 1, 2]] += 3
torch.Tensor indexing follows Numpy. See for more details.
This code achieves what you want:
z_new = z.clone() # copy the tensor
z_new[inds[:, 0], inds[:, 1]] += 3 # modify selected indices of new tensor
In PyTorch, you can index each axis of a tensor with another tensor.

Unable to transform string column to categorical matrix using Keras and Sklearn

I am trying to build a simple Keras model, with Python3.6 on MacOS, to predict house prices in a given range but I fail to transform the output into a category matrix. I am using this dataset from Kaggle.
I've created a new column in the dataframe with different price ranges as strings to serve as target output in my model, then use keras.utils and Sklearn LabelEncoder to try to create the output binary matrix but I keep getting the error:
ValueError: invalid literal for int() with base 10: '0 - 50000'
Here is my code:
import pandas as pd
import numpy as np
from keras.layers import Dense
from keras.models import Sequential, load_model
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical, np_utils
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
seed = 7
data = pd.read_csv("Melbourne_housing_FULL.csv")
data.fillna(0, inplace=True)
price_range = 50000
bins = np.arange(0, 12000000, price_range)
labels = ['{} - {}'.format(i + 1, j) for i, j in zip(bins[:-1], bins[1:])]
#correct first value
labels[0] = '0 - 50000'
for item in labels:
print (labels[:10])
['0 - 50000', '50001 - 100000', '100001 - 150000', '150001 - 200000',
'200001 - 250000', '250001 - 300000', '300001 - 350000', '350001 - 400000',
'400001 - 450000', '450001 - 500000']
data['PriceRange'] = pd.cut(data.Price,
output_len = len(labels)
Everything is correct here until I run the next piece:
predictors = data.drop(['Suburb', 'Address', 'SellerG', 'CouncilArea',
'Propertycount', 'Date', 'Type', 'Price', 'PriceRange'], axis=1).as_matrix()
target = data['PriceRange']
# encode class values as integers
encoder = LabelEncoder()
encoded_Y = encoder.transform(target)
target = np_utils.to_categorical(data.PriceRange)
n_cols = predictors.shape[1]
And I get the ValueError: invalid literal for int() with base 10: '0 - 50000'
Con someone help me here? Don't really understand what I am doing wrong.
Many thanks
Its because np_utils.to_categorical takes y of datatype int, but you have strings either convert them into int by giving them a key i.e :
cats = data.PriceRange.values.categories
di = dict(zip(cats,np.arange(len(cats))))
#{'0 - 50000': 0,
# '10000001 - 10050000': 200,
# '1000001 - 1050000': 20,
# '100001 - 150000': 2,
# '10050001 - 10100000': 201,
# '10100001 - 10150000': 202,
target = np_utils.to_categorical(
or since you are using pandas you can use pd.get_dummies to get one hot encoding.
onehot = pd.get_dummies(data.PriceRange)
target_labels = onehot.columns
target = onehot.as_matrix()
array([[ 1., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 1., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
With only one line of code
