I'm wondering about the following thing:
I would like to apply some transfer learning on a project I'm working on using an Artificial Neural Network. I have two (chemical) datasets which have a different distribution of values but can be related from a physical point of view.
Having that the first quantity varies between 0 and 12 and the second between 10^{-13] and 10^{13}, how should I set up the network? Maybe with some intermediate normalization layer?
My initial attempt relies on building a first network in the following way:
class Model(nn.Module):
def __init__(self, in_features, h1, h2, out_features=1):
super(Model, self).__init__()
self.fc1 = nn.Linear(in_features,h1) # input layer
self.fc2 = nn.Linear(h1, h2) # hidden layer
self.out = nn.Linear(h2, out_features) # output layer
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.out(x)
return x
After having trained this model for some epochs I would use the pre-trained weights to run over the other dataset that has different data distribution (between 10^{-13} and 10^{13}) but I'm not sure what kind of normalization, intermediate layer should I put before, to kinda shift the final way to match the other distribution..
Related
I build a simple pytorch model as below. However, I receive error message that mat1 and mat2 size are not aligned. How do I tweek the code to allow the flexibility of different dimension of data?
class simpleNet(nn.Module):
def __init__(self, **input_dim, hidden_size, num_classes**):
"""
:param input_dim: input feature dimension
:param hidden_size: hidden dimension
:param num_classes: total number of classes
"""
super(TwoLayerNet, self).__init__()
# hidden layer
self.hidden = nn.Linear(input_dim, hidden_size)
# Second fully connected layer that outputs our 10 labels
self.output = nn.Linear(hidden_size, num_classes)
def forward(self, x):
out = None
x = self.hidden(x)
x = torch.sigmoid(x)
x = self.output(x)
out = x
trying to build a toy neural network using Pytorch.
For your neural network to work, your output from your previous layer should be equal to your input for next layer, since its a code snippet for just your architecture without the initializations code, I cannot tell what you can simplify, not having equals in transition is not a good practice though. However, you can use reshape function from torch to make your output of previous layer equal to your next layer to make it work as a brute force method. Refer to: https://pytorch.org/docs/stable/generated/torch.reshape.html
I've joined a new project where someone has defined a class similar to the following:
class FeatureClassifier(nn.Module):
def __init__(self):
super().__init__()
self.layer_norm = nn.LayerNorm(512)
self.flatten = nn.Flatten()
self.dropout = nn.Dropout(0.1)
self.fc1 = nn.Linear(512, 2)
def forward(self, x):
x = self.layer_norm(x)
x = self.flatten(nn.functional.relu(x))
x = self.dropout(x)
x = self.fc1(nn.functional.relu(x))
return x
Which is returning quite poor results. My impression is that this is effectively some data manipulations and a Relu, so a linear regression on a subset of normalised data. The author of the code however contends that it is non-linear. What aspect of this network might make it non-linear?
The code returns quite poor fits and losses. I tried upping the dropout to 0.5 to see if that had an impact, but it did not. In my mind this confirms the linear behaviour to a degree, as with a larger dropout I would expect the behaviour to change if any complex information was being extracted.
I am dealing with a dataset with 6 features and around 1000 samples total. I am hoping to use unsupervised learning -- with dimensionality reduction followed by clustering -- as there are labels for these data which are often incorrect. I tested linear methods like PCA first, and found that the data are not linearly separable. I am now using an autoencoder to perform dimensionality reduction, but am coming across some questions regarding the number of nodes. I am testing a simple autoencoder in PyTorch at the moment, with one hidden layers. However, I am uncertain of the number of nodes that would be appropriate for this problem. I am a bit confused about whether the advice here about node selection references to 'input layer size' as the total training data size (0.2*1000 samples), total dataset (1000 samples), or features themself (6 features).
Here is my current PyTorch code aimed at handling this problem:
class Autoencoder(nn.Module):
def __init__(self,input_dim = 6, latent_dim = 2):
super(Autoencoder, self).__init__()
self.input_dim = input_dim
self.latent_dim = latent_dim
self.encode = nn.Sequential(nn.Linear(self.input_dim,32),
nn.LeakyReLU(0.02),
nn.Linear(16, self.latent_dim),
)
self.decode = nn.Sequential(nn.Linear(self.latent_dim,32),
nn.LeakyReLU(0.02),
nn.Linear(32,self.input_dim)
)
self.apply(weights_init)
def encoded(self, x):
#encodes data to latent space
return self.encode(x)
def decoded(self, x):
#decodes latent space data to 'real' space
return self.decode(x)
def forward(self, x):
en = self.encoded(x)
de = self.decoded(en)
return de`
This yields a training/test loss as follows:
and latent space as follows:
I would greatly appreciate any advice on this subject. I recognize this is likely a rather simple question, so apologies in advance!
So i am new to deep learning and started learning PyTorch. I created a classifier model with following structure.
class model(nn.Module):
def __init__(self):
super(model, self).__init__()
resnet = models.resnet34(pretrained=True)
layers = list(resnet.children())[:8]
self.features1 = nn.Sequential(*layers[:6])
self.features2 = nn.Sequential(*layers[6:])
self.classifier = nn.Sequential(nn.BatchNorm1d(512), nn.Linear(512, 3))
def forward(self, x):
x = self.features1(x)
x = self.features2(x)
x = F.relu(x)
x = nn.AdaptiveAvgPool2d((1,1))(x)
x = x.view(x.shape[0], -1)
return self.classifier(x)
So basically I wanted to classify among three things {0,1,2}. While evaluating, I passed the image it returned a Tensor with three values like below
(tensor([[-0.1526, 1.3511, -1.0384]], device='cuda:0', grad_fn=<AddmmBackward>)
So my question is what are these three numbers? Are they probability ?
P.S. Please pardon me If I asked something too silly.
The final layer nn.Linear (fully connected layer) of self.classifier of your model produces values, that we can call a scores, for example, it may be: [10.3, -3.5, -12.0], the same you can see in your example as well: [-0.1526, 1.3511, -1.0384] which are not normalized and cannot be interpreted as probabilities.
As you can see it's just a kind of "raw unscaled" network output, in other words these values are not normalized, and it's hard to use them or interpret the results, that's why the common practice is converting them to normalized probability distribution by using softmax after the final layer, as #skinny_func has already described. After that you will get the probabilities in the range of 0 and 1, which is more intuitive representation.
So after training what you would want to do is to apply softmax to the output tensor to extract the probability of each class, then you choose the maximal value (highest probability).
in your case:
prob = torch.nn.functional.softmax(model(x), dim=1)
_, pred_class = torch.max(prob, dim=1)
I have read many papers where convolutional neuronal networks are used for super-resolution or for image segmentation or autoencoder and so on. They use different kinds of upsampling aka deconvolutions and a discussion over here in a different question.
Here in Tensorflow there is a function
Here in Keras there are some
I implemented the Keras one:
x = tf.keras.layers.UpSampling1D(size=2)(x)
and I used this one stolen from an super-resolution repo here:
class SubPixel1D(tf.keras.layers.Layer):
def __init__(self, r):
super(SubPixel1D, self).__init__()
self.r = r
def call(self, inputs):
with tf.name_scope('subpixel'):
X = tf.transpose(inputs, [2,1,0]) # (r, w, b)
X = tf.compat.v1.batch_to_space_nd(X, [self.r], [[0,0]]) # (1, r*w, b)
X = tf.transpose(X, [2,1,0])
return X
But I realized that both don't have parameters in my model summary. Is this not necessary for those functions to have parameters so they can learn the upsampling??
In Keras Upsampling simply copies your input to the size provided. you can find the documentation here, So there is no need to have parameters for these layers.
I think you have confused upsampling with Transposed Convolution/ Deconvolution.
In UpSampling1D, if you look at the actual source code on github, the up-sampling involved is either nearest neighbor or bi-linear. And both these interpolation schemes have no learning parameters, like any weight or biases, unless and until they are followed by a convolution layer.
Since in Subpixel1D also no convolution layer or learnable layers is used, hence no training parameters