I have read many papers where convolutional neuronal networks are used for super-resolution or for image segmentation or autoencoder and so on. They use different kinds of upsampling aka deconvolutions and a discussion over here in a different question.
Here in Tensorflow there is a function
Here in Keras there are some
I implemented the Keras one:
x = tf.keras.layers.UpSampling1D(size=2)(x)
and I used this one stolen from an super-resolution repo here:
class SubPixel1D(tf.keras.layers.Layer):
def __init__(self, r):
super(SubPixel1D, self).__init__()
self.r = r
def call(self, inputs):
with tf.name_scope('subpixel'):
X = tf.transpose(inputs, [2,1,0]) # (r, w, b)
X = tf.compat.v1.batch_to_space_nd(X, [self.r], [[0,0]]) # (1, r*w, b)
X = tf.transpose(X, [2,1,0])
return X
But I realized that both don't have parameters in my model summary. Is this not necessary for those functions to have parameters so they can learn the upsampling??
In Keras Upsampling simply copies your input to the size provided. you can find the documentation here, So there is no need to have parameters for these layers.
I think you have confused upsampling with Transposed Convolution/ Deconvolution.
In UpSampling1D, if you look at the actual source code on github, the up-sampling involved is either nearest neighbor or bi-linear. And both these interpolation schemes have no learning parameters, like any weight or biases, unless and until they are followed by a convolution layer.
Since in Subpixel1D also no convolution layer or learnable layers is used, hence no training parameters
Related
I am dealing with a dataset with 6 features and around 1000 samples total. I am hoping to use unsupervised learning -- with dimensionality reduction followed by clustering -- as there are labels for these data which are often incorrect. I tested linear methods like PCA first, and found that the data are not linearly separable. I am now using an autoencoder to perform dimensionality reduction, but am coming across some questions regarding the number of nodes. I am testing a simple autoencoder in PyTorch at the moment, with one hidden layers. However, I am uncertain of the number of nodes that would be appropriate for this problem. I am a bit confused about whether the advice here about node selection references to 'input layer size' as the total training data size (0.2*1000 samples), total dataset (1000 samples), or features themself (6 features).
Here is my current PyTorch code aimed at handling this problem:
class Autoencoder(nn.Module):
def __init__(self,input_dim = 6, latent_dim = 2):
super(Autoencoder, self).__init__()
self.input_dim = input_dim
self.latent_dim = latent_dim
self.encode = nn.Sequential(nn.Linear(self.input_dim,32),
nn.LeakyReLU(0.02),
nn.Linear(16, self.latent_dim),
)
self.decode = nn.Sequential(nn.Linear(self.latent_dim,32),
nn.LeakyReLU(0.02),
nn.Linear(32,self.input_dim)
)
self.apply(weights_init)
def encoded(self, x):
#encodes data to latent space
return self.encode(x)
def decoded(self, x):
#decodes latent space data to 'real' space
return self.decode(x)
def forward(self, x):
en = self.encoded(x)
de = self.decoded(en)
return de`
This yields a training/test loss as follows:
and latent space as follows:
I would greatly appreciate any advice on this subject. I recognize this is likely a rather simple question, so apologies in advance!
Fitting a single polynomial to a bunch of data is pretty easy in Pytorch using an nn.Linear layer. I've included a trivial example at the end of this post. But suppose I have tons of data split into groups, and I want to fit a different polynomial to each group. As an example, find the particular quadratic coefficients that fit each column in this image:
In other words, I want to simultaneously find the coefficients for N polynomials of order n, given m data per set to be fit:
In the image above, there are m=80 points per dataset, and N=100 sets to fit.
This perfectly lends itself to tensor manipulation and Pytorch on a gpu should make this blindingly fast by fitting all N at once. Problem is, I'm having a terrible brain fart, and haven't been able to wrap my head around the right layer configuration. Basically I need N nn.Linear layers, each operating on its own dataset. If this were convolution, I'd use a depthwise layer...
Example network to fit one polynomial where X are the m x p abscissa data, y are the m ordinate data, and we want to find the p coefficients.
class polyfit(torch.nn.Module):
def __init__(self,n=2):
super(polyfit, self).__init__()
self.poly = torch.nn.Linear(n,1,bias=False,)
def forward(self, x):
print(x.shape,self.poly)
return self.poly(x)
model = polyfit(n)
loss = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
for epoch in range(100): # or however I want to run the loops
output = model(X)
mse = loss(output, y)
optimizer.zero_grad()
mse.backward()
optimizer.step()
Figured it out after thinking about my Depthwise Convolution comment. A Conv1D with just 3 parameters times a tensor with values [1,x,x**2] is a quadratic, same as with a Linear layer with n=3. So the layer needs to be:
self.poly = torch.nn.Conv1d(N,N,n+1,bias=False,groups=N)
Just have to make sure the X,y tensors are the right dimensions of [m, N, n] and [m, N, 1] respectively.
I'm wondering about the following thing:
I would like to apply some transfer learning on a project I'm working on using an Artificial Neural Network. I have two (chemical) datasets which have a different distribution of values but can be related from a physical point of view.
Having that the first quantity varies between 0 and 12 and the second between 10^{-13] and 10^{13}, how should I set up the network? Maybe with some intermediate normalization layer?
My initial attempt relies on building a first network in the following way:
class Model(nn.Module):
def __init__(self, in_features, h1, h2, out_features=1):
super(Model, self).__init__()
self.fc1 = nn.Linear(in_features,h1) # input layer
self.fc2 = nn.Linear(h1, h2) # hidden layer
self.out = nn.Linear(h2, out_features) # output layer
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.out(x)
return x
After having trained this model for some epochs I would use the pre-trained weights to run over the other dataset that has different data distribution (between 10^{-13} and 10^{13}) but I'm not sure what kind of normalization, intermediate layer should I put before, to kinda shift the final way to match the other distribution..
So i am new to deep learning and started learning PyTorch. I created a classifier model with following structure.
class model(nn.Module):
def __init__(self):
super(model, self).__init__()
resnet = models.resnet34(pretrained=True)
layers = list(resnet.children())[:8]
self.features1 = nn.Sequential(*layers[:6])
self.features2 = nn.Sequential(*layers[6:])
self.classifier = nn.Sequential(nn.BatchNorm1d(512), nn.Linear(512, 3))
def forward(self, x):
x = self.features1(x)
x = self.features2(x)
x = F.relu(x)
x = nn.AdaptiveAvgPool2d((1,1))(x)
x = x.view(x.shape[0], -1)
return self.classifier(x)
So basically I wanted to classify among three things {0,1,2}. While evaluating, I passed the image it returned a Tensor with three values like below
(tensor([[-0.1526, 1.3511, -1.0384]], device='cuda:0', grad_fn=<AddmmBackward>)
So my question is what are these three numbers? Are they probability ?
P.S. Please pardon me If I asked something too silly.
The final layer nn.Linear (fully connected layer) of self.classifier of your model produces values, that we can call a scores, for example, it may be: [10.3, -3.5, -12.0], the same you can see in your example as well: [-0.1526, 1.3511, -1.0384] which are not normalized and cannot be interpreted as probabilities.
As you can see it's just a kind of "raw unscaled" network output, in other words these values are not normalized, and it's hard to use them or interpret the results, that's why the common practice is converting them to normalized probability distribution by using softmax after the final layer, as #skinny_func has already described. After that you will get the probabilities in the range of 0 and 1, which is more intuitive representation.
So after training what you would want to do is to apply softmax to the output tensor to extract the probability of each class, then you choose the maximal value (highest probability).
in your case:
prob = torch.nn.functional.softmax(model(x), dim=1)
_, pred_class = torch.max(prob, dim=1)
Given a simple 2 layer neural network, the traditional idea is to compute the gradient w.r.t. the weights/model parameters. For an experiment, I want to compute the gradient of the error w.r.t the input. Are there existing Pytorch methods that can allow me to do this?
More concretely, consider the following neural network:
import torch.nn as nn
import torch.nn.functional as F
class NeuralNet(nn.Module):
def __init__(self, n_features, n_hidden, n_classes, dropout):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(n_features, n_hidden)
self.sigmoid = nn.Sigmoid()
self.fc2 = nn.Linear(n_hidden, n_classes)
self.dropout = dropout
def forward(self, x):
x = self.sigmoid(self.fc1(x))
x = F.dropout(x, self.dropout, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
I instantiate the model and an optimizer for the weights as follows:
import torch.optim as optim
model = NeuralNet(n_features=args.n_features,
n_hidden=args.n_hidden,
n_classes=args.n_classes,
dropout=args.dropout)
optimizer_w = optim.SGD(model.parameters(), lr=0.001)
While training, I update the weights as usual. Now, given that I have values for the weights, I should be able to use them to compute the gradient w.r.t. the input. I am unable to figure out how.
def train(epoch):
t = time.time()
model.train()
optimizer.zero_grad()
output = model(features)
loss_train = F.nll_loss(output[idx_train], labels[idx_train])
acc_train = accuracy(output[idx_train], labels[idx_train])
loss_train.backward()
optimizer_w.step()
# grad_features = loss_train.backward() w.r.t to features
# features -= 0.001 * grad_features
for epoch in range(args.epochs):
train(epoch)
It is possible, just set input.requires_grad = True for each input batch you're feeding in, and then after loss.backward() you should see that input.grad holds the expected gradient. In other words, if your input to the model (which you call features in your code) is some M x N x ... tensor, features.grad will be a tensor of the same shape, where each element of grad holds the gradient with respect to the corresponding element of features. In my comments below, I use i as a generalized index - if your parameters has for instance 3 dimensions, replace it with features.grad[i, j, k], etc.
Regarding the error you're getting: PyTorch operations build a tree representing the mathematical operation they are describing, which is then used for differentiation. For instance c = a + b will create a tree where a and b are leaf nodes and c is not a leaf (since it results from other expressions). Your model is the expression, and its inputs as well as parameters are the leaves, whereas all intermediate and final outputs are not leaves. You can think of leaves as "constants" or "parameters" and of all other variables as of functions of those. This message tells you that you can only set requires_grad of leaf variables.
Your problem is that at the first iteration, features is random (or however else you initialize) and is therefore a valid leaf. After your first iteration, features is no longer a leaf, since it becomes an expression calculated based on the previous ones. In pseudocode, you have
f_1 = initial_value # valid leaf
f_2 = f_1 + your_grad_stuff # not a leaf: f_2 is a function of f_1
to deal with that you need to use detach, which breaks the links in the tree, and makes the autograd treat a tensor as if it was constant, no matter how it was created. In particular, no gradient calculations will be backpropagated through detach. So you need something like
features = features.detach() - 0.01 * features.grad
Note: perhaps you need to sprinkle a couple more detaches here and there, which is hard to say without seeing your whole code and knowing the exact purpose.