Broadcasting of shared variables in logistic regression with theano - theano

I am currently going through the theano tutorial for logistic regression, very much like discussed in this post:
what-does-negative-log-likelihood-of-logistic-regression-in-theano-look-like. However, the original tutorial uses shared variables W and b, as well as a matrix called input. Input is a n x n_in matrix, W is n_in x n_out and b is a n_out x 1 column vector.
self.W = theano.shared(
value=numpy.zeros(
(n_in, n_out),
dtype=theano.config.floatX
),
name='W',
borrow=True
)
self.b = theano.shared(
value=numpy.zeros(
(n_out,),
dtype=theano.config.floatX
),
name='b',
borrow=True
)
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
Now, as far as I understood from the documentation of shared variables, the broadcasting pattern of shared variables defaults to false. So why is it that this line of code does not throw an error because of mismatched dimensions?
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
After all, we are adding a matrix T.dot(input, self.W) to a vector b. Do shared variables broadcast by default after all? Even with broadcasting, the dimension don't add up. T.dot(input, self.W) is n x n_out matrix and b a n_out x 1 vector.
What am I missing?

Shared variables are not broadcastable by default as their shape can change, however, Theano takes care of necessary broadcasting with Tensors when operations like adding scalar to a matrix, adding vector to a matrix etc. are requested. Check the documentation for more details.
Note that input is a symbolic variable not a shared variable. The dimension mismatch would have occurred if second dimension of self.W i.e n_out would be different from first dimension of vector b, which is not the case here, so they perfectly add up, it's the same in numpy as well:
import numpy as np
a = np.asarray([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
b = np.asarray([1,2,3])
print(a + b)

Related

PyTorch doubly stochastic normalisation of 3D tensor

I'm trying to implement double stochastic normalisation of an N x N x P tensor as described in Section 3.2 in Gong, CVPR 2019. This can be done easily in the N x N case using matrix operations but I am stuck with the 3D tensor case. What I have so far is
def doubly_stochastic_normalise(E):
"""E: n x n x f"""
E = E / torch.sum(E, dim=1, keepdim=True) # normalised across rows
F = E / torch.sum(E, dim=0, keepdim=True) # normalised across cols
E = torch.einsum('ijp,kjp->ikp', E, F)
return E
but I'm wondering if there is a method without einsum.
In this setting, you can always fall back to using torch.matmul (batched matrix multiplication to be more precise). However, this requires you to transpose the axis. Recall the matrix multiplication for two 3D inputs, in einsum notation, it gives us:
bik,bkj->bij
Notice how the k dimension gets reduces. To get to this setting, we need to transpose the inputs of the operator. In your case we have:
ijp ? kjp -> ikp
↓ ↓ ↑
pij # pjk -> pik
This translates to:
>>> (E.permute(2,0,1) # F.permute(2,1,0)).permute(1,2,0)
# ijp ➝ pij kjp ➝ pjk pik ➝ ikp
You can argue your method is not only shorter but also a lot more readable. I would therefore stick with torch.einsum. The reason why the einsum operator is so useful here is because you can perform axes transpositions on the fly.

How to set up the number of inputs neurons in sklearn MLPClassifier?

Given a dataset of n samples, m features, and using [sklearn.neural_network.MLPClassifier][1], how can I set hidden_layer_sizes to start with m inputs? For instance, I understand that if hidden_layer_sizes= (10,10) it means there are 2 hidden layers each of 10 neurons (i.e., units) but I don't know if this also implies 10 inputs as well.
Thank you
This classifier/regressor, as implemented, is doing this automatically when calling fit.
This can be seen in it's code here.
Excerpt:
n_samples, n_features = X.shape
# Ensure y is 2D
if y.ndim == 1:
y = y.reshape((-1, 1))
self.n_outputs_ = y.shape[1]
layer_units = ([n_features] + hidden_layer_sizes +
[self.n_outputs_])
You see, that your potentially given hidden_layer_sizes is surrounded by layer-dimensions defined by your data within .fit(). This is the reason, the signature reads like this with a subtraction of 2!:
Parameters
hidden_layer_sizes : tuple, length = n_layers - 2, default (100,)
The ith element represents the number of neurons in the ith hidden layer.

Why do we need Theano reshape?

I don't understand why do we need tensor.reshape() function in Theano. It is said in the documentation:
Returns a view of this tensor that has been reshaped as in
numpy.reshape.
As far as I understood, theano.tensor.var.TensorVariable is some entity that is used for computation graphs creation. And it is absolutely independent of shapes. For instance when you create your function you can pass there matrix 2x2 or matrix 100x200. As I thought reshape somehow restricts this variety. But it is not. Suppose the following example:
X = tensor.matrix('X')
X_resh = X.reshape((3, 3))
Y = X_resh ** 2
f = theano.function([X_resh], Y)
print(f(numpy.array([[1, 2], [3, 4]])))
As I understood, it should give an error since I passed matrix 2x2 not 3x3, but it computes element-wise squares perfectly.
So what is the shape of the theano tensor variable and where should we use it?
There is an error in the provided code though Theano fails to point this out.
Instead of
f = theano.function([X_resh], Y)
you should really use
f = theano.function([X], Y)
Using the original code you are actually providing the tensor after the reshape so the reshape command never gets executed. This can be seen by adding
theano.printing.debugprint(f)
which prints
Elemwise{sqr,no_inplace} [id A] '' 0
|<TensorType(float64, matrix)> [id B]
Note that there is no reshape operation in this compiled execution graph.
If one changes the code so that X is used as the input instead of X_resh then Theano throws an error including the message
ValueError: total size of new array must be unchanged Apply node that
caused the error: Reshape{2}(X, TensorConstant{(2L,) of 3})
This is expected because one cannot reshape a tensor with shape (2, 2) (i.e. 4 elements) into a tensor with shape (3, 3) (i.e. 9 elements).
To address the broader question, we can use symbolic expressions in the target shape and those expressions can be functions of the input tensor's symbolic shape. Here's some examples:
import numpy
import theano
import theano.tensor
X = theano.tensor.matrix('X')
X_vector = X.reshape((X.shape[0] * X.shape[1],))
X_row = X.reshape((1, X.shape[0] * X.shape[1]))
X_column = X.reshape((X.shape[0] * X.shape[1], 1))
X_3d = X.reshape((-1, X.shape[0], X.shape[1]))
f = theano.function([X], [X_vector, X_row, X_column, X_3d])
for output in f(numpy.array([[1, 2], [3, 4]])):
print output.shape, output

Role of class_weight in loss functions for linearSVC and LogisticRegression

I am trying to figure out what exactly the loss function formula is and how I can manually calculate it when class_weight='auto' in case of svm.svc, svm.linearSVC and linear_model.LogisticRegression.
For balanced data, say you have a trained classifier: clf_c. Logistic loss should be (am I correct?):
def logistic_loss(x,y,w,b,b0):
'''
x: nxp data matrix where n is number of data points and p is number of features.
y: nx1 vector of true labels (-1 or 1).
w: nx1 vector of weights (vector of 1./n for balanced data).
b: px1 vector of feature weights.
b0: intercept.
'''
s = y
if 0 in np.unique(y):
print 'yes'
s = 2. * y - 1
l = np.dot(w, np.log(1 + np.exp(-s * (np.dot(x, np.squeeze(b)) + b0))))
return l
I realized that logisticRegression has predict_log_proba() which gives you exactly that when data is balanced:
b, b0 = clf_c.coef_, clf_c.intercept_
w = np.ones(len(y))/len(y)
-(clf_c.predict_log_proba(x[xrange(len(x)), np.floor((y+1)/2).astype(np.int8)]).mean() == logistic_loss(x,y,w,b,b0)
Note, np.floor((y+1)/2).astype(np.int8) simply maps y=(-1,1) to y=(0,1).
But this does not work when data is imbalanced.
What's more, you expect the classifier (here, logisticRegression) to perform similarly (in terms of loss function value) when data in balance and class_weight=None versus when data is imbalanced and class_weight='auto'. I need to have a way to calculate the loss function (without the regularization term) for both scenarios and compare them.
In short, what does class_weight = 'auto' exactly mean? Does it mean class_weight = {-1 : (y==1).sum()/(y==-1).sum() , 1 : 1.} or rather class_weight = {-1 : 1./(y==-1).sum() , 1 : 1./(y==1).sum()}?
Any help is much much appreciated. I tried going through the source code, but I am not a programmer and I am stuck.
Thanks a lot in advance.
class_weight heuristics
I am a bit puzzled by your first proposition for the class_weight='auto' heuristic, as:
class_weight = {-1 : (y == 1).sum() / (y == -1).sum(),
1 : 1.}
is the same as your second proposition if we normalize it so that the weights sum to one.
Anyway to understand what class_weight="auto" does, see this question:
what is the difference between class weight = none and auto in svm scikit learn.
I am copying it here for later comparison:
This means that each class you have (in classes) gets a weight equal
to 1 divided by the number of times that class appears in your data
(y), so classes that appear more often will get lower weights. This is
then further divided by the mean of all the inverse class frequencies.
Note how this is not completely obvious ;).
This heuristic is deprecated and will be removed in 0.18. It will be replaced by another heuristic, class_weight='balanced'.
The 'balanced' heuristic weighs classes proportionally to the inverse of their frequency.
From the docs:
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data:
n_samples / (n_classes * np.bincount(y)).
np.bincount(y) is an array with the element i being the count of class i samples.
Here's a bit of code to compare the two:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.utils import compute_class_weight
n_classes = 3
n_samples = 1000
X, y = make_classification(n_samples=n_samples, n_features=20, n_informative=10,
n_classes=n_classes, weights=[0.05, 0.4, 0.55])
print("Count of samples per class: ", np.bincount(y))
balanced_weights = n_samples /(n_classes * np.bincount(y))
# Equivalent to the following, using version 0.17+:
# compute_class_weight("balanced", [0, 1, 2], y)
print("Balanced weights: ", balanced_weights)
print("'auto' weights: ", compute_class_weight("auto", [0, 1, 2], y))
Output:
Count of samples per class: [ 57 396 547]
Balanced weights: [ 5.84795322 0.84175084 0.60938452]
'auto' weights: [ 2.40356854 0.3459682 0.25046327]
The loss functions
Now the real question is: how are these weights used to train the classifier?
I don't have a thorough answer here unfortunately.
For SVC and linearSVC the docstring is pretty clear
Set the parameter C of class i to class_weight[i]*C for SVC.
So high weights mean less regularization for the class and a higher incentive for the svm to classify it properly.
I do not know how they work with logistic regression. I'll try to look into it but most of the code is in liblinear or libsvm and I'm not too familiar with those.
However, note that the weights in class_weight do not influence directly methods such as predict_proba. They change its ouput because the classifier optimizes a different loss function.
Not sure this is clear, so here's a snippet to explain what I mean (you need to run the first one for the imports and variable definition):
lr = LogisticRegression(class_weight="auto")
lr.fit(X, y)
# We get some probabilities...
print(lr.predict_proba(X))
new_lr = LogisticRegression(class_weight={0: 100, 1: 1, 2: 1})
new_lr.fit(X, y)
# We get different probabilities...
print(new_lr.predict_proba(X))
# Let's cheat a bit and hand-modify our new classifier.
new_lr.intercept_ = lr.intercept_.copy()
new_lr.coef_ = lr.coef_.copy()
# Now we get the SAME probabilities.
np.testing.assert_array_equal(new_lr.predict_proba(X), lr.predict_proba(X))
Hope this helps.

Bad input argument to theano function

I am new to theano. I am trying to implement simple linear regression but my program throws following error:
TypeError: ('Bad input argument to theano function with name "/home/akhan/Theano-Project/uog/theano_application/linear_regression.py:36" at index 0(0-based)', 'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')
Here is my code:
import theano
from theano import tensor as T
import numpy as np
import matplotlib.pyplot as plt
x_points=np.zeros((9,3),float)
x_points[:,0] = 1
x_points[:,1] = np.arange(1,10,1)
x_points[:,2] = np.arange(1,10,1)
y_points = np.arange(3,30,3) + 1
X = T.vector('X')
Y = T.scalar('Y')
W = theano.shared(
value=np.zeros(
(3,1),
dtype=theano.config.floatX
),
name='W',
borrow=True
)
out = T.dot(X, W)
predict = theano.function(inputs=[X], outputs=out)
y = predict(X) # y = T.dot(X, W) work fine
cost = T.mean(T.sqr(y-Y))
gradient=T.grad(cost=cost,wrt=W)
updates = [[W,W-gradient*0.01]]
train = theano.function(inputs=[X,Y], outputs=cost, updates=updates, allow_input_downcast=True)
for i in np.arange(x_points.shape[0]):
print "iteration" + str(i)
train(x_points[i,:],y_points[i])
sample = np.arange(x_points.shape[0])+1
y_p = np.dot(x_points,W.get_value())
plt.plot(sample,y_p,'r-',sample,y_points,'ro')
plt.show()
What is the explanation behind this error? (didn't got from the error message). Thanks in Advance.
There's an important distinction in Theano between defining a computation graph and a function which uses such a graph to compute a result.
When you define
out = T.dot(X, W)
predict = theano.function(inputs=[X], outputs=out)
you first set up a computation graph for out in terms of X and W. Note that X is a purely symbolic variable, it doesn't have any value, but the definition for out tells Theano, "given a value for X, this is how to compute out".
On the other hand, predict is a theano.function which takes the computation graph for out and actual numeric values for X to produce a numeric output. What you pass into a theano.function when you call it always has to have an actual numeric value. So it simply makes no sense to do
y = predict(X)
because X is a symbolic variable and doesn't have an actual value.
The reason you want to do this is so that you can use y to further build your computation graph. But there is no need to use predict for this: the computation graph for predict is already available in the variable out defined earlier. So you can simply remove the line defining y altogether and then define your cost as
cost = T.mean(T.sqr(out - Y))
The rest of the code will then work unmodified.

Resources