Currently I have pytorch tensors with shape (batch_size, height, width, channel_size) and I want to convert it to a mini-batch described here. My current idea is to convert each example from tensor representation to graph representation separately and group them together. I want to do all these without involving file save/load as it will surely hinder the speed (I notice that Creating “In Memory Datasets” does it this way).
Yet I didn't find any function for the example grouping part. Could anyone please help give a plausible workflow for it, and is there any smarter way for this convertion, from tensor to mini-batch for pytorch-geometric?
I think I'm experiencing a similar question with you. If I understand your question correctly, your want to commit following transformation
Input: Tensor = [#batch,#vertex,#feature]
Output: torch_geometric.data.BatchData = Large tensor
My implementation is:
x = DataLoader([Data(x,edge_index=edges,num_node=#node) for x in x],batch_size=#batch)
data = next(iter(x))
Related
I am new to PyTorch and I would like to add a mean-variance normalization layer to my network that will normalize features to zero mean and unit standard deviation. I got a bit confused reading the documentation, could anyone give me some leads?
As #Ivan commented, the normalization can be done on many levels. However, as You say
normalize features to zero mean and unit standard deviation
I suppose You just want to input unbiased data to the network. If that's the case, You should treat it as data preprocessing step rather than a layer of Your model and basically do:
X = (X - torch.mean(X, dim=0))/torch.std(X, dim=0)
As an alternative, You can use torchvision.transforms:
preprocess = torchvision.transforms.Normalize(mean=torch.mean(X, dim=0), std=torch.std(X, dim=0))
X = preprocess(X)
as in this ResNet native example. Note how it is reasonably assumed that the future data would always have roughly the same mean and std_dev as the set that is used for their initial calculation (supposedly the training set). For this reason, we should preserve the initially calculated values and use them for preprocessing in any future inference scenario.
I want to make gaussian noise layer of Keras that is imposing noise with different stddev level to each column of dataset. However, since I am not know much about coding stuffs, there is a big problem that I cannot solve it by myself.
With source code of Keras gaussian noise layer,
I made a code like below :
def call(self, inputs, training=None):
def noised():
temp=inputs
for i in range(100):
temp[:,i]=temp[:,i]+K.random_normal(shape=
(len(inputs),1),mean=0.,stddev=self.stddev[i])
return temp
return K.in_train_phase(noised, inputs, training=training)
However, it shows an error like :
object of type 'Tensor' has no len()
I believe that the error comes from the different type of shape.
Because, the original code, which is like below :
def noised():
return inputs + K.random_normal(shape=K.shape(inputs),
mean=0.,
stddev=self.stddev)
is using symbolic type of shape(K.shape), and what I imposed is integer type of number(len()).
However, I have no idea the way to overcome the problem.
It would really be a great help for me if you give me some way to solve it.
Thank you so much for your assistance.
I know it's super late, but maybe it's still interesting for other people. I'm using Tensorflow 2.3.0 and I can just use the numpy slicing commands. So slice the tensor, apply the individual layers and merge them back together:
input = tf.keras.Input(shape=(None,3))
x1 = GaussianNoise(0.1)(input[:,:,0:1])
x2 = GaussianNoise(0.2)(input[:,:,1:2])
x3 = GaussianNoise(0.3)(input[:,:,2:3])
x = Concatenate()([x1,x2,x3])
I am pretty new to Tensorflow, and I am currently learning it through given website https://www.tensorflow.org/get_started/get_started
It is said in the manual that:
We've created a model, but we don't know how good it is yet. To evaluate the model on training data, we need a y placeholder to provide the desired values, and we need to write a loss function.
A loss function measures how far apart the current model is from the provided data. We'll use a standard loss model for linear regression, which sums the squares of the deltas between the current model and the provided data. linear_model - y creates a vector where each element is the corresponding example's error delta. We call tf.square to square that error. Then, we sum all the squared errors to create a single scalar that abstracts the error of all examples using tf.reduce_sum:"
q1."we don't know how good it is yet.", I didn't understand this
quote as the simple model created is a simple slope equation and on
what it should train for?, as the model is a simple slope. Is it
require an perfect slope or what? why am I training that model and
for what?
q2.what is a loss function? Is loss function is used to determine the
accuracy of the model? Why is it required?
q3. I didn't understand " 'sums the squares of the deltas' between
the current model and the provided data."
q4.I didn't understood this part of code,"squared_deltas =
tf.square(linear_model - y)
this is the code:
y = tf.placeholder(tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)
print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))
this may be simple questions, but I am a beginner to Tensorflow and having a hard time understanding it.
1) So you're kind of right about "Why should we train for a simple problem" but this is just an introduction piece. With any machine learning task you need to evaluate your model to see how good it is. In this case you are just trying to train to find the coefficients for the line of best fit.
2) A loss function in any machine learning context represents your error with your model. This usually means a function of your "distance" of your calculated value to the ground truth value. Think of it as an internal evaluation score. You want to minimise your loss so the gradients and parameter changes are based on your loss.
3/4) Your question here is more to do with least square regression. It's a statistical method to create lines of best fit between points. The deltas represent the differences between your calculated values and the truth values. The aim is to minimise the area of the squares and hence minise the error and have a better line of best fit.
What you are doing in this Tensorflow example is creating a machine learning model that will learn the coefficients for the line of best fit automatically using a least squares based system.
Pretty much all of your question have to-do with the loss function.
The loss function is a function that determines how far apart your output are from the expected (correct) output.
It has two usages:
Help the algorithm determine if the tweaking of the weight is helping going in the good or bad direction
Determinate the accuracy (~the number of time your system guesses the correct answer)
The loss function is the sum of the deltas witch is: the addition of the diff (delta) between the expected output and the actual output.
I think It's squared to magnifies the error the algorithm makes.
I developed a CNN using MatConvNet and am able to visualize the weights of the 1st layer. It looked very similar to what is shown here (also attached below incase I am not specific enough) http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
My question is, what are the weight gradients ? I'm not sure what those are and am unable to generate those...
Weights in a NN
In a neural network, a series of linear functions represented as matrices are applied to features (usually with a nonlinear joint between them). These functions are determined by the values in the marices, referred to as weights.
You can visualize the weights of a normal neural network, but it usually means something slightly different to visualize the convolutional layers of a cnn. These layers are designed to learn a feature computation over the space.
When you visualize the weights, you're looking for patterns. A nice smooth filter may mean that the weights are well learned and "looking for something in particular". A noisy weight visualization may mean that you've undertrained your network, overfit it, need more regularization, or something else nefarious (a decent source for these claims).
From this decent review of weight visualizations, we can see patterns start to emerge from treating the weights as images:
Weight Gradients
"Visualizing the gradient" means taking the gradient matrix and treating like an image [1], just like you took the weight matrix and treated it like an image before.
A gradient is just a derivative; for images, it's usually computed as a finite difference - grossly simplified, the X gradient subtracts pixels next to each other in a row, and the Y gradient subtracts pixels next to each other in a column.
For the common example of a filter that extracts edges, we may see a strong gradient in a particular direction. By visualizing the gradients (taking the matrix of finite differences and treating it like an image), you can get a more immediate idea of how your filter is operating on the input. There are a lot of cutting edge techniques (eg, eg) for interpreting these results, but making the image pop up is the easy part!
A similar technique involves visualizing the activations after a forward pass over the input. In this case, you're looking at how the input was changed by the weights; by visualizing the weights, you're looking at how you expect them to change the input.
Don't over-think it - the weights are interesting because they let us see how the function behaves, and the gradients of the weights are just another feature to help explain what's going on. There's nothing sacred about that feature: here are some cool clustering features (t-SNE) from the google paper that look at space separability.
[1] It can be more complicated if you introduce weight sharing, but not that much
My answer here covers this question https://stackoverflow.com/a/68988426/10661506
Long story short, weight gradient of layer l is the gradient of the loss with respect to the weights of layer l.
If you have a correct implementation of backpropagation, you should have access to these gradients as they are needed to compute the weights update at every layer.
In my Theano program, I want to split the tensor matrix into two parts, with each of them making different contributions to the error function. Can anyone tell me whether automatic differentiation support this?
For example, for a tensor matrix variable M, I want to split it into M1=M[:300,] and M2=M[300:,], then the cost function is defined as 0.5* M1 * w + 0.8*M2*w. Is it still possible to get the gradient with T.grad(cost,w)?
Or more specifically, I want to construct an Autoencoder with different features having different weights in contribution to the total cost.
Thanks for anyone who answers my question.
Theano support this out of the box. You have nothing particular to do. If Theano don't support something in the crash, it should raise an error. But you won't have it for this, if there isn't problem in the way you call it. But the current pseudo-code should work.