How can we monitor/analyse the givens of a theano function?
As an example consider the following function:
train_model = theano.function(
inputs=[index],
outputs=cost,
updates=updates,
givens={
x: X_train[index * batch_size: (index + 1) * batch_size],
y: y_train[index * batch_size: (index + 1) * batch_size]
}
)
What would be a way to monitor/analyse the shared variables x and y?
If you're following/using the code from the Theano tutorials, as appears to be the case, then x_train and y_train are shared variables containing your training data (x_train is the input and y_train is the true/actual output that you want your model to predict when correct).
The contents of these shared variables never (or, at least, shouldn't) change because your training data is normally static while a model is training.
So, looking at the contents of shared variables train_x and train_y is just the same as looking at your training data. You can presumably just go look at the data wherever you load it from (e.g. maybe CSV data files, or numpy saved arrays, etc.)
If you really want to look at the contents of a shared variable then you can do this using the get_value() method which returns the underlying numpy array:
x_data = X_train.get_value()
print x_data.shape
# etc.
Theano is not involved at all here. Nothing is symbolic, it's just concrete numpy arrays.
Related
I try to build up my own loss function as follows
import numpy as np
from keras import backend as K
def MyLoss(self, x_input, x_reconstruct):
a = np.copy(x_reconstruct)
a = np.asarray(a, dtype='float16')
a = np.floor(4*a)/4
return K.mean(K.square(a - x_input), axis=-1)`
In compilation, it says
ValueError: setting an array element with a sequence
Both x_input and x_reconstruct are [m, n, 1] np arrays. The last line of code is actually copied directly from Keras' built-in MSE loss function.
Also, I suppose loss is calculated per sample. If dimensions of the input and reconstructed input are both [m, n, 1], the result of Keras' built-in loss will also be a matrix sized [m, n]. So why does it work properly?
I then tried to us np's functions directly by
def MyLoss(self, x_input, x_reconstruct):
a = np.copy(x_reconstruct)
a = np.asarray(a, dtype=self.precision)
a = np.floor(4*a)/4
Diff = a - x_input
xx = np.mean(np.square(Diff), axis=-1)
yy = np.sum(xx)
return yy
yet the error persists. What mistake did I make? How should write the code?
Having borrowed the suggestion from Make a Custom loss function in Keras in detail, I tried following
def MyLoss(self, x_input, x_reconstruct):
if self.precision == 'float16':
K.set_floatx('float16')
K.set_epsilon(1e-4)
a = K.cast_to_floatx(x_input)
a = K.round(a*4.-0.5)/4.0
return K.sum(K.mean(K.square(x_input-a), axis=-1))
But the same error happens
You can not use numpy arrays in your loss. You have to use TensorFlow or Keras backend operations. Try this maybe:
import tensorflow as tf
import keras.backend as K
def MyLoss(x_input, x_reconstruct):
a = tf.cast(x_input, dtype='tf.float16')
a = tf.floor(4*a)/4
return K.mean(K.square(a - x_input), axis=-1)
I found the answer myself, and let me share it here
If I write code like this
def MyLoss(self, y_true, y_pred):
if self.precision == 'float16':
K.set_floatx('float16')
K.set_epsilon(1e-4)
return K.mean(K.square(y_true-K.round(y_pred*4.-0.5)/4.0), axis=-1)
It works. The trick is, I think, that I cannot use 'K.cast_to_floatx(y_true)'. Instead, simply use y_true directly. I still do not understand why...
I am working on a multi-label image classification problem with Keras and so I utilize functions flow_from_dataframe() and fit_generator().
I have about 2000 classes and as you can guess they are highly skewed / imbalanced. After searching a bit I came across with arguments class_weight and classes and I decided to give them a try. My problem is, I am not sure if I use them correctly. Here is an example:
Let's assume that I have flatten all class occurrences so that I get the following list of (duplicated) labels:
labels = ['classD', 'classA', 'classA', 'classC', 'classD', 'classD']
And this is the function that computes classes and class_weight:
from collections import Counter
def get_classes_weights(l, n):
counter = Counter(l).most_common(n)
classes = [cls for cls, ocu in counter]
majority = max([ocu for cls, ocu in counter])
weights = {idx: float(majority/ocu) for idx, (cls, ocu) in enumerate(counter)}
return classes, weights
Let's also assume that I what to consider the top-2 classes only:
classes, class_weight = get_classes_weights(labels, 2)
This gives:
classes: ['classD', 'classA']
and:
class_weight: {0: 1.0, 1: 1.5}
And finally, this is how I use them within the functions:
generator_train.flow_from_dataframe(
classes=classes,
)
model.fit_generator(
class_weight=class_weight
)
So my question are:
Is the above the right way to apply weights given that I work on a multi-label image classification problem?
Does my validation set need to be balanced or it is OK if it has been taken from the same distribution as the training set (20% and 80% random selection, respectively)?
I'm trying to use PyTorch with complex loss function. In order to accelerate the code, I hope that I can use the PyTorch multiprocessing package.
The first trial, I put 10x1 features into the NN and get 10x4 output.
After that, I want to pass 10x4 parameters into a function to do some calculation. (The calculation will be complex in the future.)
After calculating, the function will return a 10x1 array in total. This array will be set as NN_energy and calculate loss function.
Besides, I also want to know if there is another method to create a backward-able array to store the NN_energy array, instead of using
NN_energy = net(Data_in)[0:10,0]
Thanks a lot.
Full Code:
import torch
import numpy as np
from torch.autograd import Variable
from torch import multiprocessing
def func(msg,BOP):
ans = (BOP[msg][0]+BOP[msg][1]/BOP[msg][2])*BOP[msg][3]
return ans
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden_1, n_hidden_2, n_output):
super(Net, self).__init__()
self.hidden_1 = torch.nn.Linear(n_feature , n_hidden_1) # hidden layer
self.hidden_2 = torch.nn.Linear(n_hidden_1, n_hidden_2) # hidden layer
self.predict = torch.nn.Linear(n_hidden_2, n_output ) # output layer
def forward(self, x):
x = torch.tanh(self.hidden_1(x)) # activation function for hidden layer
x = torch.tanh(self.hidden_2(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
if __name__ == '__main__': # apply_async
Data_in = Variable( torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() )
Ground_truth = Variable( torch.from_numpy( np.asarray(list(range(20,30))).reshape(10,1) ).float() )
net = Net( n_feature=1 , n_hidden_1=15 , n_hidden_2=15 , n_output=4 ) # define the network
optimizer = torch.optim.Rprop( net.parameters() )
loss_func = torch.nn.MSELoss() # this is for regression mean squared loss
NN_output = net(Data_in)
args = range(0,10)
pool = multiprocessing.Pool()
return_data = pool.map( func, zip(args, NN_output) )
pool.close()
pool.join()
NN_energy = net(Data_in)[0:10,0]
for i in range(0,10):
NN_energy[i] = return_data[i]
loss = torch.sqrt( loss_func( NN_energy , Ground_truth ) ) # must be (1. nn output, 2. target)
print(loss)
Error messages:
File
"C:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py",
line 126, in reduce_tensor
raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
RuntimeError: Cowardly refusing to serialize non-leaf tensor which
requires_grad, since autograd does not support crossing process
boundaries. If you just want to transfer the data, call detach() on
the tensor before serializing (e.g., putting it on the queue).
First of all, Torch Variable API is deprecated since a very long time, just don't use it.
Next, torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() is wrong at many levels: np.asarray of list is useless since a copy will be performed anyway, and np.array takes list as input by design. Then, np.arange is available to return a range as numpy array, and it is also available on Torch. Next, specifying both dimension for reshape is useless and error prone, you could simply do reshape((-1, 1)), or even better unsqueeze(-1).
Here is the simplified expression torch.arange(10, dtype=torch.float32, requires_grad=True).unsqueeze(-1).
Using multiprocessing pool is a bad practice if using batch processing is possible. It will be both way more efficient and readable. Indeed, performing N small algebraic operations in parallel is always slower and a larger single algebraic operation, and even more on GPU. More importantly, computing the gradient is not supported by multiprocessing, hence the error that you get. Yet, this is partially true, because it is supports for tensors on cpu since 1.6.0. Have a lok, to the official release changelog.
Could you post a more representative example of what func method could be to make sure you really need it ?
NB: Distributed autograd as you are looking is now available in Pytorch as an experimental feature available in beta since 1.6.0. Have a look to the official documentation.
In theano, it was very easy to get the gradient of some variable w.r.t. a given loss:
loss = f(x, w)
dl_dw = tt.grad(loss, wrt=w)
I get that pytorch goes by a different paradigm, where you'd do something like:
loss = f(x, w)
loss.backwards()
dl_dw = w.grad
The thing is I might not want to do a full backwards propagation through the graph - just along the path needed to get to w.
I know you can define Variables with requires_grad=False if you don't want to backpropagate through them. But then you have to decide that at the time of variable-creation (and the requires_grad=False property is attached to the variable, rather than the call which gets the gradient, which seems odd).
My Question is is there some way to backpropagate on demand (i.e. only backpropagate along the path needed to compute dl_dw, as you would in theano)?
It turns out that this is reallyy easy. Just use torch.autograd.grad
Example:
import torch
import numpy as np
from torch.autograd import grad
x = torch.autograd.Variable(torch.from_numpy(np.random.randn(5, 4)))
w = torch.autograd.Variable(torch.from_numpy(np.random.randn(4, 3)), requires_grad=True)
y = torch.autograd.Variable(torch.from_numpy(np.random.randn(5, 3)))
loss = ((x.mm(w) - y)**2).sum()
(d_loss_d_w, ) = grad(loss, w)
assert np.allclose(d_loss_d_w.data.numpy(), (x.transpose(0, 1).mm(x.mm(w)-y)*2).data.numpy())
Thanks to JerryLin for answering the question here.
I am trying to figure out what exactly the loss function formula is and how I can manually calculate it when class_weight='auto' in case of svm.svc, svm.linearSVC and linear_model.LogisticRegression.
For balanced data, say you have a trained classifier: clf_c. Logistic loss should be (am I correct?):
def logistic_loss(x,y,w,b,b0):
'''
x: nxp data matrix where n is number of data points and p is number of features.
y: nx1 vector of true labels (-1 or 1).
w: nx1 vector of weights (vector of 1./n for balanced data).
b: px1 vector of feature weights.
b0: intercept.
'''
s = y
if 0 in np.unique(y):
print 'yes'
s = 2. * y - 1
l = np.dot(w, np.log(1 + np.exp(-s * (np.dot(x, np.squeeze(b)) + b0))))
return l
I realized that logisticRegression has predict_log_proba() which gives you exactly that when data is balanced:
b, b0 = clf_c.coef_, clf_c.intercept_
w = np.ones(len(y))/len(y)
-(clf_c.predict_log_proba(x[xrange(len(x)), np.floor((y+1)/2).astype(np.int8)]).mean() == logistic_loss(x,y,w,b,b0)
Note, np.floor((y+1)/2).astype(np.int8) simply maps y=(-1,1) to y=(0,1).
But this does not work when data is imbalanced.
What's more, you expect the classifier (here, logisticRegression) to perform similarly (in terms of loss function value) when data in balance and class_weight=None versus when data is imbalanced and class_weight='auto'. I need to have a way to calculate the loss function (without the regularization term) for both scenarios and compare them.
In short, what does class_weight = 'auto' exactly mean? Does it mean class_weight = {-1 : (y==1).sum()/(y==-1).sum() , 1 : 1.} or rather class_weight = {-1 : 1./(y==-1).sum() , 1 : 1./(y==1).sum()}?
Any help is much much appreciated. I tried going through the source code, but I am not a programmer and I am stuck.
Thanks a lot in advance.
class_weight heuristics
I am a bit puzzled by your first proposition for the class_weight='auto' heuristic, as:
class_weight = {-1 : (y == 1).sum() / (y == -1).sum(),
1 : 1.}
is the same as your second proposition if we normalize it so that the weights sum to one.
Anyway to understand what class_weight="auto" does, see this question:
what is the difference between class weight = none and auto in svm scikit learn.
I am copying it here for later comparison:
This means that each class you have (in classes) gets a weight equal
to 1 divided by the number of times that class appears in your data
(y), so classes that appear more often will get lower weights. This is
then further divided by the mean of all the inverse class frequencies.
Note how this is not completely obvious ;).
This heuristic is deprecated and will be removed in 0.18. It will be replaced by another heuristic, class_weight='balanced'.
The 'balanced' heuristic weighs classes proportionally to the inverse of their frequency.
From the docs:
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data:
n_samples / (n_classes * np.bincount(y)).
np.bincount(y) is an array with the element i being the count of class i samples.
Here's a bit of code to compare the two:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.utils import compute_class_weight
n_classes = 3
n_samples = 1000
X, y = make_classification(n_samples=n_samples, n_features=20, n_informative=10,
n_classes=n_classes, weights=[0.05, 0.4, 0.55])
print("Count of samples per class: ", np.bincount(y))
balanced_weights = n_samples /(n_classes * np.bincount(y))
# Equivalent to the following, using version 0.17+:
# compute_class_weight("balanced", [0, 1, 2], y)
print("Balanced weights: ", balanced_weights)
print("'auto' weights: ", compute_class_weight("auto", [0, 1, 2], y))
Output:
Count of samples per class: [ 57 396 547]
Balanced weights: [ 5.84795322 0.84175084 0.60938452]
'auto' weights: [ 2.40356854 0.3459682 0.25046327]
The loss functions
Now the real question is: how are these weights used to train the classifier?
I don't have a thorough answer here unfortunately.
For SVC and linearSVC the docstring is pretty clear
Set the parameter C of class i to class_weight[i]*C for SVC.
So high weights mean less regularization for the class and a higher incentive for the svm to classify it properly.
I do not know how they work with logistic regression. I'll try to look into it but most of the code is in liblinear or libsvm and I'm not too familiar with those.
However, note that the weights in class_weight do not influence directly methods such as predict_proba. They change its ouput because the classifier optimizes a different loss function.
Not sure this is clear, so here's a snippet to explain what I mean (you need to run the first one for the imports and variable definition):
lr = LogisticRegression(class_weight="auto")
lr.fit(X, y)
# We get some probabilities...
print(lr.predict_proba(X))
new_lr = LogisticRegression(class_weight={0: 100, 1: 1, 2: 1})
new_lr.fit(X, y)
# We get different probabilities...
print(new_lr.predict_proba(X))
# Let's cheat a bit and hand-modify our new classifier.
new_lr.intercept_ = lr.intercept_.copy()
new_lr.coef_ = lr.coef_.copy()
# Now we get the SAME probabilities.
np.testing.assert_array_equal(new_lr.predict_proba(X), lr.predict_proba(X))
Hope this helps.