I'm trying to create a matrix of gradient with the gradient of each observation by parameters and EPOCH. If my model has 100 obs, 1000 params and 10 Epoch, my matrix should be (100,1000,10).
The problem is that I'm not able to get those gradient. The parameters and the observation are set at required_gradient=True.
I've tried to run this after each observation pass thru the net:
for p in net.parameters():
paramgradlist.append(p.grad)
But the gradient stays the same of each params stays the same for all observations.
Thank you
You are not copying your data and instead of storing a reference to the gradients. In the end, this means all your observations will be the same (i.e. the gradients' final value).
Instead, you could clone the gradients before appending them to the list:
for p in net.parameters():
paramgradlist.append(p.grad.clone())
Related
I want a similar function as in https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.circulant.html to create a circulant matrix using PyTorch. I need this as a part of my Deep Learning model (in order to reduce over-parametrization in some of my Fully Connected layers as suggested in https://arxiv.org/abs/1907.08448 (Fig.3))
The input of the function shall be a 1D torch tensor, and the output should be the 2D circulant matrix.
You can make use of unfold to extract sliding windows. But to get the correct order you need to flip (later unflip) the tensors, and first concatenate the flipped tensor to itself.
circ=lambda v:torch.cat([f:=v.flip(0),f[:-1]]).unfold(0,len(v),1).flip(0)
Here is a generic function for pytorch tensors, to get the circulant matrix for one dimension. It's based on unfold and it works for 2d circulant matrix or high-dimension tensors.
def circulant(tensor, dim):
"""get a circulant version of the tensor along the {dim} dimension.
The additional axis is appended as the last dimension.
E.g. tensor=[0,1,2], dim=0 --> [[0,1,2],[2,0,1],[1,2,0]]"""
S = tensor.shape[dim]
tmp = torch.cat([tensor.flip((dim,)), torch.narrow(tensor.flip((dim,)), dim=dim, start=0, length=S-1)], dim=dim)
return tmp.unfold(dim, S, 1).flip((-1,))
Essentially, this is a PyTorch version of scipy.linalg.circulant and works for multi-dimension tensors.
Also a similar question: Create array/tensor of cycle shifted arrays
I am doing a project on multiclass semantic segmentation. I have formulated a model that outputs pretty descent segmented images by decreasing the loss value. However, I cannot evaluate the model performance in metrics, such as meanIoU or Dice coefficient.
In case of binary semantic segmentation it was easy just to set the threshold of 0.5, to classify the outputs as an object or background, but it does not work in the case of multiclass semantic segmentation. Could you please tell me how to obtain model performance on the aforementioned metrics? Any help will be highly appreciated!
By the way, I am using PyTorch framework and CamVid dataset.
If anyone is interested in this answer, please also look at this issue. The author of the issue points out that mIoU can be computed in a different way (and that method is more accepted in literature). So, consider that before using the implementation for any formal publication.
Basically, the other method suggested by the issue-poster is to separately accumulate the intersections and unions over the entire dataset and divide them at the final step. The method in the below original answer computes intersection and union for a batch of images, then divides them to get IoU for the current batch, and then takes a mean of the IoUs over the entire dataset.
However, this below given original method is problematic because the final mean IoU would vary with the batch-size. On the other hand, the mIoU would not vary with the batch size for the method mentioned in the issue as the separate accumulation would ensure that batch size is irrelevant (though higher batch size can definitely help speed up the evaluation).
Original answer:
Given below is an implementation of mean IoU (Intersection over Union) in PyTorch.
def mIOU(label, pred, num_classes=19):
pred = F.softmax(pred, dim=1)
pred = torch.argmax(pred, dim=1).squeeze(1)
iou_list = list()
present_iou_list = list()
pred = pred.view(-1)
label = label.view(-1)
# Note: Following for loop goes from 0 to (num_classes-1)
# and ignore_index is num_classes, thus ignore_index is
# not considered in computation of IoU.
for sem_class in range(num_classes):
pred_inds = (pred == sem_class)
target_inds = (label == sem_class)
if target_inds.long().sum().item() == 0:
iou_now = float('nan')
else:
intersection_now = (pred_inds[target_inds]).long().sum().item()
union_now = pred_inds.long().sum().item() + target_inds.long().sum().item() - intersection_now
iou_now = float(intersection_now) / float(union_now)
present_iou_list.append(iou_now)
iou_list.append(iou_now)
return np.mean(present_iou_list)
Prediction of your model will be in one-hot form, so first take softmax (if your model doesn't already) followed by argmax to get the index with the highest probability at each pixel. Then, we calculate IoU for each class (and take the mean over it at the end).
We can reshape both the prediction and the label as 1-D vectors (I read that it makes the computation faster). For each class, we first identify the indices of that class using pred_inds = (pred == sem_class) and target_inds = (label == sem_class). The resulting pred_inds and target_inds will have 1 at pixels labelled as that particular class while 0 for any other class.
Then, there is a possibility that the target does not contain that particular class at all. This will make that class's IoU calculation invalid as it is not present in the target. So, you assign such classes a NaN IoU (so you can identify them later) and not involve them in the calculation of the mean.
If the particular class is present in the target, then pred_inds[target_inds] will give a vector of 1s and 0s where indices with 1 are those where prediction and target are equal and zero otherwise. Taking the sum of all elements of this will give us the intersection.
If we add all the elements of pred_inds and target_inds, we'll get the union + intersection of pixels of that particular class. So, we subtract the already calculated intersection to get the union. Then, we can divide the intersection and union to get the IoU of that particular class and add it to a list of valid IoUs.
At the end, you take the mean of the entire list to get the mIoU. If you want the Dice Coefficient, you can calculate it in a similar fashion.
I'm working with time-variant graph embedding, where at each time step, the adjacency matrix of the graph changes. The main idea is to perform the node embedding of each timestep of the graph by looking to a set of node features and the adjacency matrix. The node embedding step is long and complicated, and is not part of the core of the problem, so I will skip this part. Suffice it to say that I use Graph Convolutional Network to embed the nodes.
Consider that I have a stack of B adjacency matrices A with sizes NxN, where B = batch size and N = number of nodes in the graph. Also, the matrices are stacked according to a time series, where matrix in index i comes before matrix in index i+1. I have already embedded the nodes of the graph, which results in a matrix of dimensions B x N x E, where E = size of the embedding (parameter). Note that the model has to deal with any graph, therefore, N is not a parameter. Another important comment is that each batch contains adjacency matrices from the same graph, and therefore all matrices of a batch have the same number of node, but the matrices of other batches may have different number of nodes.
I now need to pass these embedding through an LSTM cell. I never used Keras before, so I'm having a hard time making the Keras LSTM blend in my Tensorflow code. What I want to do is: pass each node embedding through an LSTM such that the number of timesteps = B and the LSTM batch size = N, that is, the input to my LSTM has the shape [N, B, E], where N and B are only known through execution time. I want the output of my LSTM to have the shape of [B, E*E]. The embedding matrix is called here self.embed_mat. Here is my code:
def _LSTM_layer(self):
with tf.variable_scope(self.scope, reuse=tf.AUTO_REUSE), tf.device(self.device):
in_shape = tf.shape(self.embed_mat)
lstm_input = tf.reshape(self.embed_mat, [in_shape[1], in_shape[0], EMBED_SIZE]) #lstm = [N, B, E]
input_plh = K.placeholder(name="lstm_input", shape=(None, None, EMBED_SIZE))
lstm = LSTM(EMBED_SIZE*EMBED_SIZE, input_shape=(None, None, EMBED_SIZE))
get_output = K.function(inputs=[input_plh], outputs=[lstm(input_plh)])
h = get_output([lstm_input])
I am a bit lost with the K.function part. All I want is the output tensor of the LSTM cell. I've seen that in order to get that with Keras, we need to use K.function, but I don't quite get it what it does. When I call get_output([lstm_input]), I get the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'worker_global/A/shape' with dtype int64 and shape [?]
Here, A is the stacked adjacency matrices with dimension BxNxN. What is going on here? Does the value of N needs to be known during graph building step? I think I made some dumb mistake with the LSTM cell, but I can't get what it is.
Thanks in advance!
If you want to get the output of your LSTM layer "out" given input of "inp" in a keras Sequential() model called "model," where "inp" is your first / input layer and "out" is an LSTM layer that happens to be, for the sake of this example, in the 4th position in your sequential model, you would obtain the output of that LSTM layer from the data you call "lstm_input" above with the following code:
inp = model.layers[0].input
out = model.layers[3].output
inp_to_out = K.function([inp], [out])
output = inp_to_out([lstm_input])
For a function approximation problem I'm trying to accumulate gradients but I find that sometimes some of these gradients are nan(i.e. undefined) even though the loss is always real. I think this might be due to numerical instabilities and I'm basically looking for a simple method for removing the nans from the computed gradients.
Starting with the solution to this question I tried doing the following:
# Optimizer definition - nothing different from any classical example
opt = tf.train.AdamOptimizer()
## Retrieve all trainable variables you defined in your graph
tvs = tf.trainable_variables()
## Creation of a list of variables with the same shape as the trainable ones
# initialized with 0s
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
## Calls the compute_gradients function of the optimizer to obtain... the list of gradients
gvs_ = opt.compute_gradients(rmse, tvs)
gvs =tf.where(tf.is_nan(gvs_), tf.zeros_like(gvs_), gvs_)
## Adds to each element from the list you initialized earlier with zeros its gradient (works because accum_vars and gvs are in the same order)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
## Define the training step (part with variable value update)
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])
So basically, the key idea is this line:
gvs =tf.where(tf.is_nan(gvs_), tf.zeros_like(gvs_), gvs_)
But when I apply this idea I obtain the following error:
ValueError: Tried to convert 'x' to a tensor and failed. Error:
Dimension 1 in both shapes must be equal, but are 30 and 9. Shapes are
[2,30] and [2,9]. From merging shape 2 with other shapes. for
'IsNan/packed' (op: 'Pack') with input shapes: [2,9,30], [2,30,9],
[2,30], [2,9].
compute_gradients returns a list of tensors in your case. You may want to do:
gvs_ = [(tf.where(tf.is_nan(grad), tf.zeros_like(grad), grad), val) for grad,val in gvs_]
I am trying to implement Expectation Maximization algorithm(Gaussian Mixture Model) on a data set data=[[x,y],...]. I am using mv_norm.pdf(data, mean,cov) function to calculate cluster responsibilities. But after calculating new values of covariance (cov matrix) after 6-7 iterations, cov matrix is becoming singular i.e determinant of cov is 0 (very small value) and hence it is giving errors
ValueError: the input matrix must be positive semidefinite
and
raise np.linalg.LinAlgError('singular matrix')
Can someone suggest any solution for this?
#E-step: Compute cluster responsibilities, given cluster parameters
def calculate_cluster_responsibility(data,centroids,cov_m):
pdfmain=[[] for i in range(0,len(data))]
for i in range(0,len(data)):
sum1=0
pdfeach=[[] for m in range(0,len(centroids))]
pdfeach[0]=1/3.*mv_norm.pdf(data[i], mean=centroids[0],cov=[[cov_m[0][0][0],cov_m[0][0][1]],[cov_m[0][1][0],cov_m[0][1][1]]])
pdfeach[1]=1/3.*mv_norm.pdf(data[i], mean=centroids[1],cov=[[cov_m[1][0][0],cov_m[1][0][1]],[cov_m[1][1][0],cov_m[0][1][1]]])
pdfeach[2]=1/3.*mv_norm.pdf(data[i], mean=centroids[2],cov=[[cov_m[2][0][0],cov_m[2][0][1]],[cov_m[2][1][0],cov_m[2][1][1]]])
sum1+=pdfeach[0]+pdfeach[1]+pdfeach[2]
pdfeach[:] = [x / sum1 for x in pdfeach]
pdfmain[i]=pdfeach
global old_pdfmain
if old_pdfmain==pdfmain:
return
old_pdfmain=copy.deepcopy(pdfmain)
softcounts=[sum(i) for i in zip(*pdfmain)]
calculate_cluster_weights(data,centroids,pdfmain,soft counts)
Initially, I've passed [[3,0],[0,3]] for each cluster covariance since expected number of clusters is 3.
Can someone suggest any solution for this?
The problem is your data lies in some manifold of dimension strictly smaller than the input data. In other words for example your data lies on a circle, while you have 3 dimensional data. As a consequence when your method tries to estimate 3 dimensional ellipsoid (covariance matrix) that fits your data - it fails since the optimal one is a 2 dimensional ellipse (third dimension is 0).
How to fix it? You will need some regularization of your covariance estimator. There are many possible solutions, all in M step, not E step, the problem is with computing covariance:
Simple solution, instead of doing something like cov = np.cov(X) add some regularizing term, like cov = np.cov(X) + eps * np.identity(X.shape[1]) with small eps
Use nicer estimator like LedoitWolf estimator from scikit-learn.
Initially, I've passed [[3,0],[0,3]] for each cluster covariance since expected number of clusters is 3.
This makes no sense, covariance matrix values has nothing to do with amount of clusters. You can initialize it with anything more or less resonable.