I have a tensor of ground truth values of 3D points of G=[18000x3], and an output from my network of the same size O=[18000x3].
I need to compute a loss so that I basically have the square root of the distance between each 3D point, summed over all keypoints and normalized over 18000. How do I write this efficiently?
Just write the expression you propose using the vectorized operations provided by PyTorch. In this case
loss = (O - G).pow(2).sum(axis=1).sqrt().mean()
Check out pow, sum, sqrt and mean.
Related
I have GAN for a graph prediction task in which there are torch_geometric.nn.NNConv layers. I want to add eigenvector centrality difference between ground-truth and predicted graph to my loss function. To calculate eigenvector centrality, I intended to use eigenvector_centrality function from NetworkX library. However, this function requires input to be a NetworkX graph which also requires to convert my torch.tensor outputted from Generator network to numpy.array. So, I need to detach() the gradient which will cause PyTorch to lose all gradient tracking. Thus, how can I properly implement a eigenvector centrality for my loss function? Thanks.
I am currently working on a project that uses Linear Discriminant Analysis to transform some high-dimensional feature set into a scalar value according to some binary labels.
So I train LDA on the data and the labels and then use either transform(X) or decision_function(X) to project the data into a one-dimensional space.
I would like to understand the difference between these two functions. My intuition would be that the decision_function(X) would be transform(X) + bias, but this is not the case.
Also, I found that those two functions give a different AUC score, and thus indicate that it is not a monotonic transformation as I would have thought.
In the documentation, it states that the transform(X) projects the data to maximize class separation, but I would have expected decision_function(X) to do this.
I hope someone could help me understand the difference between these two.
LDA projects your multivariate data onto a 1D space. The projection is based on a linear combination of all your attributes (columns in X). The weights of each attribute are determined by maximizing the class separation. Subsequently, a threshold value in 1D space is determined which gives the best classification results. transform(X) gives you the value of each observation in this 1D space x' = transform(X). decision_function(X) gives you the log-likelihood of an attribute being a positive class log(P(y=1|x')).
I have a prediction model in pytorch that takes inputs and generates outputs in a specific coordinate system. In my process I transform the output and ground truth into a different coordinate system (2-dimensional translation and rotation). I can now calculate the loss in both coordinate systems, which have the same values (RMSE and NLL loss).
Does it matter which loss I use for the training to run loss.backward() on?
TLDR:
Does it matter which loss I use for the training to run loss.backward() on?
No for MSE, Yes for NLL.
Assuming that ground truth vector is x and the output vector is y,
Old MSE = (x-y).T.dot(x-y)
After the transformation, ground truth vector becomes A.dot(x) and output becomes A.dot(y).
New MSE = (x-y).T.dot(M).dot(x-y) where M=A.T.dot(A) where A is the transformation matrix.
Due to properties of linear transformation, we also have A.T.dot(A)=I
So, we can see that M will always turn out to be identity matrix and hence the MSE remains unchanged.
Now, NLL loss which is typically applied after nn.LogSoftmax just does
Y[x].mean() where Y is the output after nn.LogSoftmax and x is the target.
(I am referring to this).
This is not the same as what you'd get after you linearly transform output and target.
I am trying to build a CNN (in Keras) that can estimate the rotation of an image (or a 2d object). So basically, the input is an image and the output should be its rotation.
My first experiment is to estimate the rotation of MŃIST digits (starting with only one digit "class", let's say the "3"). So what I did was extracting all 3s from the MNIST set, and then building a "rotated 3s" dataset, by randomly rotating these images multiple times, and storing the rotated images together with their rotation angles as ground truth labels.
So my first problem was that a 2d rotation is cyclic and I didn't know how to model this behavior. Therefore, I encoded the angle as y=sin(ang), x = cos(ang). This gives me my dataset (the rotated 3s images) and the corresponding labels (x and y values).
For the CNN, as a start, i just took the keras MNIST CNN example (https://keras.io/examples/mnist_cnn/) and replaced the last dense layer (that had 10 outputs and a softmax activation) with a dense layer that has 2 outputs (x and y) and a tanh activation (since y=sin(ang), x = cos(ang) are within [-1,1]).
The last thing i had to decide was the loss function, where i basically want to have a distance measurement for angles. Therefore i thought "cosine_proximity" is the way to go.
When training the network I can see that the loss is decreasing and converging to a certain point. However when I then check the predictions vs the ground truth I observe a (for me) fairly surprising behavior. Almost all x and y predictions tend towards 0 or +/-1. And since the "decoding" of my rotation is ang=atan2(y,x) the predictions are usually either +/- 0°, 45°, 90, 135° or 180°.
However, my training and test data has only angles of 0°, 20°, 40°, ... 360°.
This doesn't really change if I change the complexity of the network. I also played around with the optimizer parameters without any success.
Is there anything wrong with the assumptions:
- x,y encoding for angle
- tanh activation to have values in [-1,1]
- cosine_proximity as loss function
Thanks in advance for any advice, tips or pointing me towards a possible mistake i made!
It's hard to give you an exact answer so let's try with some ideas:
Change from Cosine Proximity to MSE or other losses and check if something changes.
Change the way you encode the target. You could just represent the angle as a number between 0 and 1. It doesn't seem a problem even if the angles are ciclic.
Ensure you preprocessing/augmentation steps make sense for this particular task.
I am attempting to compute a linear combination of n tensors of the same dimension in Tensorflow. The scalar coefficients are Tensorflow Variables.
Since tf.scalar_mul does not generalise to multiplying a vector of tensors by a vector of scalars, I have thus far used tf.gather and performed each multiplication individually in a python for loop, and then converted the list of results to a tensor and summed them across the zeroth axis. Like so:
coefficients = tf.Variable(tf.constant(initial_value, shape=[n]))
components = []
for i in range(n):
components.append(tf.scalar_mul(tf.gather(coefficients, i), tensors[i]))
combination = tf.reduce_sum(tf.convert_to_tensor(components), axis=0)
This works fine, but does not scale well at all. My application requires computing n linear combinations, meaning I have n^2 gather and multiply operations. With large values of n the computation time is poor and the memory usage of the program is unreasonably large.
Is there a more natural way of computing a linear combination like this in Tensorflow that would be faster and less resource intensive?
Use broadcasting. Assuming coefficients has shape (n,) and tensors shape (n,...) you can simply use
coefficients[:, tf.newaxis, ...] * tensors
here, you would need to repeat tf.newaxis as many times as tensors has dimenions besides the one of size n. So e.g. if tensors has shape (n, a, b) you would use coefficients[:, tf.newaxis, tf.newaxis]
This will turn coefficients into a tensor with the same number of dimensions as tensors, but all dimensions except the first one are of size 1, so they can be broadcast to the shape of tensors.
Some alternatives:
Define coefficients as a variable with the correct number of dimensions in the first place (a little ugly in my opinion).
Use tf.reshape to reshape coefficients to (n, 1, ...) instead if you don't like the indexing syntax.
Use tf.transpose to shift the dimension of size n to the end of tensors. Then the dimensions align for broadcasting without needing to add dimensions to coefficients.
Also see the numpy docs on broadcasting -- it works essentially the same way in Tensorflow.
There is a new PyPI module called TWIT, Tensor Weighted Interpolative Transfer, that will do this fast. It is written in C for the core operations.