Using Categorical with multi-dimensional p in PyMC3 - theano

I am running into problems when I am trying to use pm.Categorical to sample many instances at once (either with multidimensional p or using theano.scan). What is the best way to go here? My goal is to sample one response per draw for each of many individuals, based on the probability of each response for each individual.
Here is a working example for many individuals, one draw:
import pymc3 as pm
import theano.tensor as T
import numpy as np
actions = T.as_tensor_variable(np.array([0, 1, 2])) # indiv0 selects response0, indiv1 response1, etc.
with pm.Model() as model:
p = pm.Beta('p', alpha=2, beta=2, shape=[3, 3]) # prob. for each indiv. for each response
actions = pm.Categorical('actions', p=p, observed=actions)
trace = pm.sample()
Everything works fine here. The model samples nine p's, one for each individual and each response. But what I'd like to do is to sample several responses for each individual. I first tried to add a dimension of trials to the actions data:
n_trials = 10
actions = T.as_tensor_variable(np.array([[0, 1, 2]] * n_trials)) # same thing, 10 trials
with pm.Model() as model:
p = pm.Beta('p', alpha=2, beta=2, shape=[3, 3]) # same as above, but more trials
actions = pm.Categorical('actions', p=p, observed=actions)
trace = pm.sample()
Obviously, this leads to an error:
Wrong number of dimensions: expected 1, got 2 with shape (10, 3).
Surprisingly, it also leads to an error when I draw a separate p for each sample, each individual, and each response:
p = pm.Beta('p', alpha=2, beta=2, shape=[n_trials, 3, 3])
It seems like pm.Categorical only accepts 2-dimensional p. To keep p 2-dimensional, I tried scanning:
actions = T.as_tensor_variable(np.array([[0, 1, 2]] * 10)
with pm.Model() as model:
p = pm.Beta('p', alpha=2, beta=2, shape=[3, 3]) # same as above
actions, _ = theano.scan(fn=lambda action, p: pm.Categorical('actions', p=p, observed=action),
trace = pm.sample()
The code fails with the following error message:
theano.gof.fg.MissingInputError: Input 0 of the graph (indices start from 0), used to compute Elemwise{Cast{int64}}(<TensorType(int32, vector)>), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.
So maybe it is not possible to use pm.Categorical within theano.scan? Any help as to how to approach this problem would be super helpful!

The following solved my problem. I made actions 1-dimensional and p 2-dimensional:
n_trials, n_subj, n_actions = 10, 3, 3
actions = T.as_tensor_variable(np.array([0, 1, 2] * n_trials)) # shape (30, ) instead of (10, 3)
with pm.Model() as model:
p = pm.Uniform('p', lower=0, upper=1, shape=[n_subj, n_actions])
indiv = T.tile(T.arange(n_subj), n_trials)
p = p[indiv]
actions = pm.Categorical('actions', p=p, observed=actions)
trace = pm.sample()


NameError: name 'players_data' is not defined

I got this error. How to define the players_data?
NameError Traceback (most recent call last)
----> 1 data = np.vstack((asia[1:], eu[1:], na[1:], oc[1:], sea[1:], players_data[1:]))
2 df = pd.DataFrame({data[0, i]: data[1:, i] for i in range(data.shape[1])})
3 m = asfloat(data[1:, :4])
NameError: name players_data is not defined
asia = open_exl('pubg_as.xls', 0)
eu = open_exl('pubg_eu.xls', 0)
na = open_exl('pubg_na.xls', 0)
oc = open_exl('pubg_oc.xls', 0)
sea = open_exl('pubg_sea.xls', 0)
# Load all data
all_data = np.genfromtxt('PUBG_Player_Statistics.csv', delimiter=',')
all_data[:, 28] = all_data[:, 28] * 100
# Train data
train_data = all_data[1:2000, :][:, [3, 2, 28, 9]]
test_data = all_data[2000:, :][:, [3, 2, 28, 9]]
data = np.vstack((asia[1:], eu[1:], na[1:], oc[1:], sea[1:], players_data[1:]))
df = pd.DataFrame({data[0, i]: data[1:, i] for i in range(data.shape[1])})
m = asfloat(data[1:, :4])
It is hard to know what was the original value stored in players_data, as this is an incomplete code from another user; however, based on what they were doing, my guess is that players_data is:
players_data = train_data
But, why??
They used kmeans algorithm to create 6 clusters that represent the following categories:
['Normal player', 'Waller', 'Experienced Player', 'Both', 'God', 'Aimbot']
In the first 5 variables used in vstack they have information from the best players in the 5 servers. They wanted used this information, and leverage it with "normal players".
At the end, they didn't use neither "train_data" nor "test_data"; however, in the, they mentioned the following:
The reason we mix data from the normal data set is to increase the density of normal players, and make the clustering tight. After test, we think 2000 rows of data has the best performance.
An important thing to notice, is that in the train and test data, they selected 4 columns:
[3, 2, 28, 9]
Which are the same columns that they used in the "top performers files"
def open_exl(address, idx):
data = xlrd.open_workbook(address)
table = data.sheets()[idx]
rows = table.nrows
ct_data = []
for row in range(rows):
return np.array(ct_data)[:, :4]
Since the code has inconsistencies, it might not be possible to obtain the results as they did; however, it seems as a great opportunity to play with the data and compare the results that you obtain with their previous investigation.

Tensorflow map_fn Out of Memory Issues

I am having issues with my code running out of memory on large data sets. I attempted to chunk the data to feed it into the calculation graph but I eventually get an out of memory error. Would setting it up to use the feed_dict functionality get around this problem?
My code is set up like the following, with a nested map_fn function due to a result of the tf_itertools_product_2D_nest function.
tf_itertools_product_2D_nest function is from Cartesian Product in Tensorflow
I also tried a variation where I made a list of tensor-lists which was significantly slower than doing it purely in tensorflow so I'd prefer to avoid that method.
import tensorflow as tf
import numpy as np
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.9
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess = tf.Session()
tensorboard_log_dir = "../log/"
def tf_itertools_product_2D_nest(a,b): #does not work on nested tensors
a, b = a[ None, :, None ], b[ :, None, None ]
n_feat_dimension_in_common = tf.shape(a)[-1]
c = tf.concat( [ a + tf.zeros_like( b ), tf.zeros_like( a ) + b ], axis = 2 )
return c
def do_calc(arr_pair):
arr_1 = arr_pair[0]
arr_binary = arr_pair[1]
return tf.reduce_max(tf.cumsum(arr_1*arr_binary))
def calc_row_wrapper(row):
return tf.map_fn(do_calc,row)
for i in range(0,10):
a = tf.constant(np.random.random((7,10))*10,tf.float64)
b = tf.constant(np.random.randint(2, size=(3,10)),tf.float64)
a_b_itertools_product = tf_itertools_product_2D_nest(a,b)
'''Creates array like this:
[ [[arr_a0,arr_b0], [arr_a1,arr_b0],...],
[[arr_a0,arr_b1], [arr_a1,arr_b1],...],
[[arr_a0,arr_b2], [arr_a1,arr_b2],...],
with tf.summary.FileWriter(tensorboard_log_dir, sess.graph) as writer:
result_array =,a_b_itertools_product),
writer.add_run_metadata(run_metadata,"iteration {}".format(i))
# result_array should be an array with 3 rows (1 for each binary vector in b) and 7 columns (1 for each row in a)
I can imagine that is unnecessarily consuming memory due to the extra dimension added. Is there a way to mimic the outcome of the standard itertools.product() function to output 1 long list of every possible combination of items in the 2 input iterables? Like the result of:
# [([1, 2], [5, 6]), ([1, 2], [7, 8]), ([3, 4], [5, 6]), ([3, 4], [7, 8])]
That would eliminate the need to call map_fn twice.
When map_fn is called within a loop as my code shows, will it keep spawning graphs for every iteration? There appears to be a big "map_" node for every iteration cycle in this code's Tensorboardgraph.
Tensorboard Default View (not enough reputation yet)
When I select a particular iteration based on the tag in Tensorboard, only the map node corresponding to the iteration is highlighted with all the others grayed out. Does that mean that for that cycle only the map node for that cycle is present (and the others no longer, if from a previous cycle , exist in memory)?
Tensorboard 1 iteration view

numpy matrix not functioning as intended

This is my code:
import random
import numpy as np
import math
populacao = 5
x_min = -10
x_max = 10
nbin = 4
def fitness(xy, populacao, resultado):
fit = np.matrix(resultado)
xy_fit = np.append(xy, fit.T, axis = 1)
xy_fit_sorted = xy_fit[np.argsort(xy_fit[:,-1].T),:]
return xy_fit_sorted
def codifica(x, x_min, x_max,n):
x = float(x)
xdec = round((x-x_min)/(x_max-x_min)*(2**n-1))
xbin = int(bin(xdec)[2:])
xy = np.array([[1, 2],[3,4],[0,0],[-5,-1],[9,-2]])
resultado = np.array([5, 25, 0, 26, 85])
xy_fit_sorted = np.array(fitness(xy, populacao, resultado))
parents = (xy_fit_sorted[:,:2])
the problem i'm having is that to select the 2 rows of "xy_fit_sorted", i'm doing this strange thing:
parents = (xy_fit_sorted[:,:2])
Intead of what makes sense in my mind:
parents = (xy_fit_sorted[:1,:])
it's like the whole matrix is in one line.
I'm not sure what most of your code is doing, so here's just a guess: are you thrown off by the shape of xy_fit_sorted being (1, 5, 3), having an extra zero axis?
That could be fixed e.g. by constructing xy_fit without the use of np.matrix:
xy_fit = np.append(xy, resultado[:, np.newaxis], axis=1)
Then xy_fit_sorted comes out with a shape of (5, 3).
The underlying issue was that np.matrix is always a 2-D array. When indexing xy_fit[...] you intend to index with a vector. But using np.matrix for xy_fit, xy_fit[:,-1].T is not a vector, but a 2-D array as well (of shape (1,5)). This leads to xy_fit_sorted having an extra dimension as well.
Note that the numpy doc says about np.matrix anyhow:
It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future.

Implementing word dropout in pytorch

I want to add word dropout to my network so that I can have sufficient training examples for training the embedding of the "unk" token. As far as I'm aware, this is standard practice. Let's assume the index of the unk token is 0, and the index for padding is 1 (we can switch them if that's more convenient).
This is a simple CNN network which implements word dropout the way I would have expected it to work:
class Classifier(nn.Module):
def __init__(self, params):
super(Classifier, self).__init__()
self.params = params
self.word_dropout = nn.Dropout(params["word_dropout"])
self.pad = torch.nn.ConstantPad1d(max(params["window_sizes"])-1, 1)
self.embedding = nn.Embedding(params["vocab_size"], params["word_dim"], padding_idx=1)
self.convs = nn.ModuleList([nn.Conv1d(1, params["feature_num"], params["word_dim"] * window_size, stride=params["word_dim"], bias=False) for window_size in params["window_sizes"]])
self.dropout = nn.Dropout(params["dropout"])
self.fc = nn.Linear(params["feature_num"] * len(params["window_sizes"]), params["num_classes"])
def forward(self, x, l):
x = self.word_dropout(x)
x = self.pad(x)
embedded_x = self.embedding(x)
embedded_x = embedded_x.view(-1, 1, x.size()[1] * self.params["word_dim"]) # [batch_size, 1, seq_len * word_dim]
features = [F.relu(conv(embedded_x)) for conv in self.convs]
pooled = [F.max_pool1d(feat, feat.size()[2]).view(-1, params["feature_num"]) for feat in features]
pooled =, 1)
pooled = self.dropout(pooled)
logit = self.fc(pooled)
return logit
Don't mind the padding - pytorch doesn't have an easy way of using non zero padding in CNNs, much less trainable non-zero padding, so I'm doing it manually. Dropout also doesn't allow me to use non zero dropout, and I want to separate the padding token from the unk token. I'm keeping it in my example because it's the reason for this question's existence.
This doesn't work because dropout wants Float Tensors so that it can scale them properly, while my input is Long Tensors that don't need to be scaled.
Is there an easy way of doing this in pytorch? I essentially want to use LongTensor-friendly dropout (bonus: better if it will let me specify a dropout constant that isn't 0, so that I could use zero padding).
Actually I would do it outside of your model, before converting your input into a LongTensor.
This would look like this:
import random
def add_unk(input_token_id, p):
#random.random() gives you a value between 0 and 1
#to avoid switching your padding to 0 we add 'input_token_id > 1'
if random.random() < p and input_token_id > 1:
return 0
return input_token_id
#than you have your input token_id
#for this example I take just a random number, lets say 127
input_token_id = 127
#let p be your probability for UNK
p = 0.01
your_input_tensor = torch.LongTensor([add_unk(input_token_id, p)])
So there are two options which come to my mind which are actually GPU-friendly. In general both solutions should be much more efficient.
Option one - Doing computation directly in forward():
If you're not using torch.utils and don't have plans using it later this is probably the way to go.
Instead of doing the computation before we just do it in the forward() method of main PyTorch class. However I see no (simple) way doing this in torch 0.3.1., so you would need to upgrade to version 0.4.0:
So imagine x is your input vector:
>>> x = torch.tensor(range(10))
>>> x
tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
probs is a vector containing uniform probabilities for dropout so we can check later agains our probability for dropout:
>>> probs = torch.empty(10).uniform_(0, 1)
>>> probs
tensor([ 0.9793, 0.1742, 0.0904, 0.8735, 0.4774, 0.2329, 0.0074,
0.5398, 0.4681, 0.5314])
Now we apply the dropout probabilities probs on our input x:
>>> torch.where(probs > 0.2, x, torch.zeros(10, dtype=torch.int64))
tensor([ 0, 0, 0, 3, 4, 5, 0, 7, 8, 9])
Note: To see some effect I chose a dropout probability of 0.2 here. I reality you probably want it to be smaller.
You can pick for this any token / id you like, here is an example with 42 as unknown token id:
>>> unk_token = 42
>>> torch.where(probs > 0.2, x, torch.empty(10, dtype=torch.int64).fill_(unk_token))
tensor([ 0, 42, 42, 3, 4, 5, 42, 7, 8, 9])
torch.where comes with PyTorch 0.4.0:
I don't know about the shapes of your network, but your forward() should look something like this then (when using mini-batching you need to flatten the input before applying dropout):
def forward_train(self, x, l):
# probabilities
probs = torch.empty(x.size(0)).uniform_(0, 1)
# applying word dropout
x = torch.where(probs > 0.02, x, torch.zeros(x.size(0), dtype=torch.int64))
# continue like before ...
x = self.pad(x)
embedded_x = self.embedding(x)
embedded_x = embedded_x.view(-1, 1, x.size()[1] * self.params["word_dim"]) # [batch_size, 1, seq_len * word_dim]
features = [F.relu(conv(embedded_x)) for conv in self.convs]
pooled = [F.max_pool1d(feat, feat.size()[2]).view(-1, params["feature_num"]) for feat in features]
pooled =, 1)
pooled = self.dropout(pooled)
logit = self.fc(pooled)
return logit
Note: I named the function forward_train() so you should use another forward() without dropout for evaluation / predicting. But you could also use some if conditions with train().
Option two: using
If you're using Dataset provided by torch.utils it is very easy to do this kind of pre-processing efficiently. Dataset uses strong multi-processing acceleration by default so the the code sample above just has to be executed in the __getitem__ method of your Dataset class.
This could look like this:
def __getitem__(self, index):
'Generates one sample of data'
# Select sample
ID = self.input_tokens[index]
# Load data and get label
# using add ink_unk function from code above
X = torch.LongTensor(add_unk(ID, p=0.01))
y = self.targets[index]
return X, y
This is a bit out of context and doesn't look very elegant but I think you get the idea. According to this blog post of Shervine Amidi at Stanford it should be no problem to do more complex pre-processing steps in this function:
Since our code [Dataset is meant] is designed to be multicore-friendly, note that you
can do more complex operations instead (e.g. computations from source
files) without worrying that data generation becomes a bottleneck in
the training process.
The linked blog post - "A detailed example of how to generate your data in parallel with PyTorch" - provides also a good guide for implementing the data generation with Dataset and DataLoader.
I guess you'll prefer option one - only two lines and it should be very efficient. :)
Good luck!

Why do we need Theano reshape?

I don't understand why do we need tensor.reshape() function in Theano. It is said in the documentation:
Returns a view of this tensor that has been reshaped as in
As far as I understood, theano.tensor.var.TensorVariable is some entity that is used for computation graphs creation. And it is absolutely independent of shapes. For instance when you create your function you can pass there matrix 2x2 or matrix 100x200. As I thought reshape somehow restricts this variety. But it is not. Suppose the following example:
X = tensor.matrix('X')
X_resh = X.reshape((3, 3))
Y = X_resh ** 2
f = theano.function([X_resh], Y)
print(f(numpy.array([[1, 2], [3, 4]])))
As I understood, it should give an error since I passed matrix 2x2 not 3x3, but it computes element-wise squares perfectly.
So what is the shape of the theano tensor variable and where should we use it?
There is an error in the provided code though Theano fails to point this out.
Instead of
f = theano.function([X_resh], Y)
you should really use
f = theano.function([X], Y)
Using the original code you are actually providing the tensor after the reshape so the reshape command never gets executed. This can be seen by adding
which prints
Elemwise{sqr,no_inplace} [id A] '' 0
|<TensorType(float64, matrix)> [id B]
Note that there is no reshape operation in this compiled execution graph.
If one changes the code so that X is used as the input instead of X_resh then Theano throws an error including the message
ValueError: total size of new array must be unchanged Apply node that
caused the error: Reshape{2}(X, TensorConstant{(2L,) of 3})
This is expected because one cannot reshape a tensor with shape (2, 2) (i.e. 4 elements) into a tensor with shape (3, 3) (i.e. 9 elements).
To address the broader question, we can use symbolic expressions in the target shape and those expressions can be functions of the input tensor's symbolic shape. Here's some examples:
import numpy
import theano
import theano.tensor
X = theano.tensor.matrix('X')
X_vector = X.reshape((X.shape[0] * X.shape[1],))
X_row = X.reshape((1, X.shape[0] * X.shape[1]))
X_column = X.reshape((X.shape[0] * X.shape[1], 1))
X_3d = X.reshape((-1, X.shape[0], X.shape[1]))
f = theano.function([X], [X_vector, X_row, X_column, X_3d])
for output in f(numpy.array([[1, 2], [3, 4]])):
print output.shape, output
