Purpose of 'givens' variables in Theano.function - theano

I was reading the code for the logistic function given at http://deeplearning.net/tutorial/logreg.html. I am confused about the difference between inputs & givens variables for a function. The functions that compute mistakes made by a model on a minibatch are:
test_model = theano.function(inputs=[index],
outputs=classifier.errors(y),
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size],
y: test_set_y[index * batch_size: (index + 1) * batch_size]})
validate_model = theano.function(inputs=[index],
outputs=classifier.errors(y),
givens={
x: valid_set_x[index * batch_size:(index + 1) * batch_size],
y: valid_set_y[index * batch_size:(index + 1) * batch_size]})
Why couldn't/wouldn't one just make x& y shared input variables and let them be defined when an actual model instance is created?

The givens parameter allows you to separate the description of the model and the exact definition of the inputs variable. This is a consequence of what the given parameter do: modify the graph to compile before compiling it. In other words, we substitute in the graph, the key in givens with the associated value.
In the deep learning tutorial, we use a normal Theano variable to build the model. We use givens to speed up the GPU. Here, if we keep the dataset on the CPU, we will transfer a mini-batch to the GPU at each function call. As we do many iterations on the dataset, we end up transferring the dataset multiple time to the GPU. As the dataset is small enough to fit on the GPU, we put it in a shared variable to have it transferred to the GPU if one is available (or stay on the Central Processing Unit if the Graphics Processing Unit is disabled). Then when compiling the function, we swap the input with a slice corresponding to the mini-batch of the dataset to use. Then the input of the Theano function is just the index of that mini-batch we want to use.

I don't think anything is stopping you from doing it that way (I didn't try the updates= dictionary using an input variable directly, but why not). Remark however that for pushing data to a GPU in a useful manner, you will need it to be in a shared variable (from which x and y are taken in this example).

Related

How does a trained SVR model predict values?

I've been trying to understand how does a model trained with support vector machines for regression predict values. I have trained a model with the sklearn.svm.SVR, and now I'm wondering how to "manually" predict the outcome of an input.
Some background - the model is trained with kernel SVR, with RBF function and uses the dual formulation. So now I have arrays of the dual coefficients, the indexes of the support vectors, and the support vectors themselves.
I found the function which is used to fit the hyperplane but I've been unsuccessful in applying that to "manually" predict outcomes without the function .predict.
The few things I tried all include the dot products of the input (features) array, and all the support vectors.
If anyone ever needs this, I've managed to understand the equation and code it in python.
The following is the used equation for the dual formulation:
where N is the number of observations, and αi multiplied by yi are the dual coefficients found from the model's attributed model.dual_coef_. The xiT are some of the observations used for training (support vectors) accessed by the attribute model.support_vectors_ (transposed to allow multiplication of the two matrices), x is the input vector containing a value for each feature (its the one observation for which we want to get prediction), and b is the intercept accessed by model.intercept_.
The xiT and x, however, are the observations transformed in a higher-dimensional space, as explained by mery in this post.
The calculation of the transformation by RBF can be either applied manually step by stem or by using the sklearn.metrics.pairwise.rbf_kernel.
With the latter, the code would look like this (my case shows I have 589 support vectors, and 40 features).
First we access the coefficients and vectors:
support_vectors = model.support_vectors_
dual_coefs = model.dual_coef_[0]
Then:
pred = (np.matmul(dual_coefs.reshape(1,589),
rbf_kernel(support_vectors.reshape(589,40),
Y=input_array.reshape(1,40),
gamma=model.get_params()['gamma']
)
)
+ model.intercept_
)
If the RBF funcion needs to be applied manually, step by step, then:
vrbf = support_vectors.reshape(589,40) - input_array.reshape(1,40)
pred = (np.matmul(dual_coefs.reshape(1,589),
np.diag(np.exp(-model.get_params()['gamma'] *
np.matmul(vrbf, vrbf.T)
)
).reshape(589,1)
)
+ model.intercept_
)
I placed the .reshape() function even where it is not necessary, just to emphasize the shapes for the matrix operations.
These both give the same results as model.predict(input_array)

How to improve this toy Jax optimizer code with while loops and saved history?

I'm writing a custom optimizer I want JIT-able with Jax which features 1) breaking on maximum steps reached 2) breaking on a tolerance reached, and 3) saving the history of the steps taken. I'm relatively new to some of this stuff in Jax, but reading the docs I have this solution:
import jax, jax.numpy as jnp
#jax.jit
def optimizer(x, tol = 1, max_steps = 5):
def cond(arg):
step, x, history = arg
return (step < max_steps) & (x > tol)
def body(arg):
step, x, history = arg
x = x / 2 # simulate taking an optimizer step
history = history.at[step].set(x) # simulate saving current step
return (step + 1, x, history)
return jax.lax.while_loop(
cond,
body,
(0, x, jnp.full(max_steps, jnp.nan))
)
optimizer(10.) # works
My question is whether this can be improved in some way? In particular, is there a way to avoid pre-allocating the history? This isn't ideal since the real thing is alot more complicated than a single array and there's obviously the potential for wasted memory if tolerance is reached well before the maximum steps.
is there a way to avoid pre-allocating the history?
No, as I understand JAX
in JAX, 'type' includes shape, that is, the in and out data shape of the body function MUST be the same, otherwise, say dynamic grow history use jnp.vstack((history, x)), JAX will consider it as side effect.
There is a way, if you think the tolerance will be often reached before the maximum number of steps.
JAX implements sparse matrices (and pytrees of them), in jax.experimental.sparse. They will have the same shape as the maximum history size, and therefore satisfy the "fixed size" requirement for XLA, but of course, will only store nonzero elements in memory.

Is there a element-map function in pytorch?

I'm new in PyTorch and I come from functional programming languages(where map function is used everywhere). The problem is that I have a tensor and I want to do some operations on each element of the tensor. The operation may be various so I need a function like this:
map : (Numeric -> Numeric) -> Tensor -> Tensor
e.g. map(lambda x: x if x < 255 else -1, tensor) # the example is simple, but the lambda may be very complex
Is there such a function in PyTorch? How should I implement such function?
Most mathematical operations that are implemented for tensors (and similarly for ndarrays in numpy) are actually applied element wise, so you could write for instance
mask = tensor < 255
result = tensor * mask + (-1) * ~mask
This is a quite general appraoch. For the case that you have right now where you only want to modify certain elements, you can also apply "logical indexing" that let's you overwrite the current tensor:
tensor[mask < 255] = -1
So in python there actually is a map() function but usually there are better ways to do it (better in python; in other languages - like Haskell - map/fmap is obviously prefered in most contexts).
So the key take-away here is that the preferred method is taking advantage of the vectorization. This also makes the code more efficient as those tensor operations are implemented in a low level language, while map() is nothing but a python-for loop that is a lot slower.

Multiclass semantic segmentation model evaluation

I am doing a project on multiclass semantic segmentation. I have formulated a model that outputs pretty descent segmented images by decreasing the loss value. However, I cannot evaluate the model performance in metrics, such as meanIoU or Dice coefficient.
In case of binary semantic segmentation it was easy just to set the threshold of 0.5, to classify the outputs as an object or background, but it does not work in the case of multiclass semantic segmentation. Could you please tell me how to obtain model performance on the aforementioned metrics? Any help will be highly appreciated!
By the way, I am using PyTorch framework and CamVid dataset.
If anyone is interested in this answer, please also look at this issue. The author of the issue points out that mIoU can be computed in a different way (and that method is more accepted in literature). So, consider that before using the implementation for any formal publication.
Basically, the other method suggested by the issue-poster is to separately accumulate the intersections and unions over the entire dataset and divide them at the final step. The method in the below original answer computes intersection and union for a batch of images, then divides them to get IoU for the current batch, and then takes a mean of the IoUs over the entire dataset.
However, this below given original method is problematic because the final mean IoU would vary with the batch-size. On the other hand, the mIoU would not vary with the batch size for the method mentioned in the issue as the separate accumulation would ensure that batch size is irrelevant (though higher batch size can definitely help speed up the evaluation).
Original answer:
Given below is an implementation of mean IoU (Intersection over Union) in PyTorch.
def mIOU(label, pred, num_classes=19):
pred = F.softmax(pred, dim=1)
pred = torch.argmax(pred, dim=1).squeeze(1)
iou_list = list()
present_iou_list = list()
pred = pred.view(-1)
label = label.view(-1)
# Note: Following for loop goes from 0 to (num_classes-1)
# and ignore_index is num_classes, thus ignore_index is
# not considered in computation of IoU.
for sem_class in range(num_classes):
pred_inds = (pred == sem_class)
target_inds = (label == sem_class)
if target_inds.long().sum().item() == 0:
iou_now = float('nan')
else:
intersection_now = (pred_inds[target_inds]).long().sum().item()
union_now = pred_inds.long().sum().item() + target_inds.long().sum().item() - intersection_now
iou_now = float(intersection_now) / float(union_now)
present_iou_list.append(iou_now)
iou_list.append(iou_now)
return np.mean(present_iou_list)
Prediction of your model will be in one-hot form, so first take softmax (if your model doesn't already) followed by argmax to get the index with the highest probability at each pixel. Then, we calculate IoU for each class (and take the mean over it at the end).
We can reshape both the prediction and the label as 1-D vectors (I read that it makes the computation faster). For each class, we first identify the indices of that class using pred_inds = (pred == sem_class) and target_inds = (label == sem_class). The resulting pred_inds and target_inds will have 1 at pixels labelled as that particular class while 0 for any other class.
Then, there is a possibility that the target does not contain that particular class at all. This will make that class's IoU calculation invalid as it is not present in the target. So, you assign such classes a NaN IoU (so you can identify them later) and not involve them in the calculation of the mean.
If the particular class is present in the target, then pred_inds[target_inds] will give a vector of 1s and 0s where indices with 1 are those where prediction and target are equal and zero otherwise. Taking the sum of all elements of this will give us the intersection.
If we add all the elements of pred_inds and target_inds, we'll get the union + intersection of pixels of that particular class. So, we subtract the already calculated intersection to get the union. Then, we can divide the intersection and union to get the IoU of that particular class and add it to a list of valid IoUs.
At the end, you take the mean of the entire list to get the mIoU. If you want the Dice Coefficient, you can calculate it in a similar fashion.

How do I classify using SVM Classifier?

I'm doing a project in liver tumor classification. Actually I initially used Region Growing method for liver segmentation and from that I segmented tumor using FCM.
I,then, obtained the texture features using Gray Level Co-occurence Matrix. My output for that was
stats =
autoc: [1.857855266614132e+000 1.857955341199538e+000]
contr: [5.103143332457753e-002 5.030548650257343e-002]
corrm: [9.512661919561399e-001 9.519459060378332e-001]
corrp: [9.512661919561385e-001 9.519459060378338e-001]
cprom: [7.885631654779597e+001 7.905268525471267e+001]
Now how should I give this as an input to the SVM program.
function [itr] = multisvm( T,C,tst )
%MULTISVM(2.0) classifies the class of given training vector according to the
% given group and gives us result that which class it belongs.
% We have also to input the testing matrix
%Inputs: T=Training Matrix, C=Group, tst=Testing matrix
%Outputs: itr=Resultant class(Group,USE ROW VECTOR MATRIX) to which tst set belongs
%----------------------------------------------------------------------%
% IMPORTANT: DON'T USE THIS PROGRAM FOR CLASS LESS THAN 3, %
% OTHERWISE USE svmtrain,svmclassify DIRECTLY or %
% add an else condition also for that case in this program. %
% Modify required data to use Kernel Functions and Plot also%
%----------------------------------------------------------------------%
% Date:11-08-2011(DD-MM-YYYY) %
% This function for multiclass Support Vector Machine is written by
% ANAND MISHRA (Machine Vision Lab. CEERI, Pilani, India)
% and this is free to use. email: anand.mishra2k88#gmail.com
% Updated version 2.0 Date:14-10-2011(DD-MM-YYYY)
u=unique(C);
N=length(u);
c4=[];
c3=[];
j=1;
k=1;
if(N>2)
itr=1;
classes=0;
cond=max(C)-min(C);
while((classes~=1)&&(itr<=length(u))&& size(C,2)>1 && cond>0)
%This while loop is the multiclass SVM Trick
c1=(C==u(itr));
newClass=c1;
svmStruct = svmtrain(T,newClass);
classes = svmclassify(svmStruct,tst);
% This is the loop for Reduction of Training Set
for i=1:size(newClass,2)
if newClass(1,i)==0;
c3(k,:)=T(i,:);
k=k+1;
end
end
T=c3;
c3=[];
k=1;
% This is the loop for reduction of group
for i=1:size(newClass,2)
if newClass(1,i)==0;
c4(1,j)=C(1,i);
j=j+1;
end
end
C=c4;
c4=[];
j=1;
cond=max(C)-min(C); % Condition for avoiding group
%to contain similar type of values
%and the reduce them to process
% This condition can select the particular value of iteration
% base on classes
if classes~=1
itr=itr+1;
end
end
end
end
Kindly guide me.
Images:
You have to take all the feature values you get and concatenate them into a feature vector. Then for the SVM the features should be normalized so that the values in each dimension vary between -1 and 1, if I remember correctly. I think libsvm has a function for doing the normalization.
So assuming your feature vector ends up having N dimensions, and you have M training instances, your training set should be an M x N matrix. Then if you have P test instances, your test set should be a P x N matrix.
May I also suggest you a very popular implementation of SVM, called SVMLight http://svmlight.joachims.org/.
You can find examples on the website on how to use it. Mex-matlav wrapper for it is also available.
As pointed out by Dima, you need to concatenate the features?
btw can you tell me which dataset are you using for liver-tumor-classification?
Is it publicly available for download?

Resources