I am trying to implement an unsupervised domain adaptation network following the paper GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain
Adaptation, presented in CVPR in 2019 (can be found at this link). I have some trouble understanding some parts of the paper.
I reported the image found in the paper explaining the structure od the model. I have some troubles understanding if the input of the model is just one image or multiple, since there is a domain classification network that should classify the domain that the image comes from, but at the same time there is a part in which the alignment of the classes' centroid is evaluated.
Moreover, there is no indication on how to compute the class centroid itself, and since I am not an expert in this matter, i wonder how it is possible to compute it and optimize it using the loss function given in the paper.
The last thing I'm wondering about is an error that I get in the code (using pytorch to implement the solution). With this being the code I wrote for the model:
class GCAN(nn.Module):
def __init__(self, num_classes, gcn_in_channels=256, gcn_out_channels=150):
super(GCAN, self).__init__()
self.cnn = resnet50(pretrained=True)
resnet_features = self.cnn.fc.in_features
combined_features = resnet_features + gcn_out_channels
self.cnn = nn.Sequential(*list(self.cnn.children())[:-1])
self.dsa = alexnet(pretrained=True)
self.gcn = geometric_nn.GCNConv(in_channels=gcn_in_channels,
out_channels=gcn_out_channels)
self.domain_alignment = nn.Sequential(
nn.Linear(in_features=combined_features,
out_features=1024),
nn.ReLU(),
nn.Linear(in_features=1024, out_features=1024),
nn.ReLU(),
nn.Linear(in_features=1024, out_features=1),
nn.Sigmoid()
)
self.classifier = nn.Sequential(
nn.Linear(in_features=combined_features, out_features=1024),
nn.Dropout(p=0.2),
nn.ReLU(),
nn.Linear(in_features=1024, out_features=1024),
nn.Dropout(p=0.2),
nn.ReLU(),
nn.Linear(in_features=1024, out_features=num_classes),
nn.Softmax()
)
def forward(self, xs):
resnet_features = self.cnn(xs)
scores = self.dsa(xs)
scores = scores.cpu().detach().numpy()
adjacency_matrix = np.matmul(scores, np.transpose(scores))
graph = nx.from_numpy_matrix(adjacency_matrix) # networkx
gcn_features = self.gcn(graph)
concat_features = torch.cat((resnet_features, gcn_features))
domain_classification = self.domain_alignment(concat_features)
pseudo_label = self.classifier(concat_features)
return domain_classification, pseudo_label
when I try to plot the summary I get the following error:
forward() missing 1 required positional argument: 'edge_index'
But looking at the documentation of the GCN convolution (which is the part that gives the error), I have given to the layer both in_channels and out_channels. What am I missing in this case?
Related
I came across this example of a neural network implementation using PyTorch. Link to the article is: https://machine-learning.tokyo/neural-network-approach-for-boston-house-prices-dataset-by-pytorch/. I would like to know what neural network is used in this article. As in, what is the name of its type and/or structure.
Here is the code of the neural network:
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.layer1 = nn.Linear(X.shape[1], 32) # input: X.shape[1]=13, output: 32
self.layer2 = nn.Linear(32, 16)
self.layer3 = nn.Linear(16, 1)
def forward(self, x):
x = F.relu( self.layer1(x) )
x = F.relu( self.layer2(x) )
x = self.layer3(x)
return x
model = NeuralNetwork()
I tried to look up various definitions and categorizations of neural networks and I think that the neural network used in this example is Multilayer Perceptron (MLP). However, I'm not very experienced in this field and I'm unsure whether my assumption is correct.
I'm wondering about the following thing:
I would like to apply some transfer learning on a project I'm working on using an Artificial Neural Network. I have two (chemical) datasets which have a different distribution of values but can be related from a physical point of view.
Having that the first quantity varies between 0 and 12 and the second between 10^{-13] and 10^{13}, how should I set up the network? Maybe with some intermediate normalization layer?
My initial attempt relies on building a first network in the following way:
class Model(nn.Module):
def __init__(self, in_features, h1, h2, out_features=1):
super(Model, self).__init__()
self.fc1 = nn.Linear(in_features,h1) # input layer
self.fc2 = nn.Linear(h1, h2) # hidden layer
self.out = nn.Linear(h2, out_features) # output layer
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.out(x)
return x
After having trained this model for some epochs I would use the pre-trained weights to run over the other dataset that has different data distribution (between 10^{-13} and 10^{13}) but I'm not sure what kind of normalization, intermediate layer should I put before, to kinda shift the final way to match the other distribution..
So i am new to deep learning and started learning PyTorch. I created a classifier model with following structure.
class model(nn.Module):
def __init__(self):
super(model, self).__init__()
resnet = models.resnet34(pretrained=True)
layers = list(resnet.children())[:8]
self.features1 = nn.Sequential(*layers[:6])
self.features2 = nn.Sequential(*layers[6:])
self.classifier = nn.Sequential(nn.BatchNorm1d(512), nn.Linear(512, 3))
def forward(self, x):
x = self.features1(x)
x = self.features2(x)
x = F.relu(x)
x = nn.AdaptiveAvgPool2d((1,1))(x)
x = x.view(x.shape[0], -1)
return self.classifier(x)
So basically I wanted to classify among three things {0,1,2}. While evaluating, I passed the image it returned a Tensor with three values like below
(tensor([[-0.1526, 1.3511, -1.0384]], device='cuda:0', grad_fn=<AddmmBackward>)
So my question is what are these three numbers? Are they probability ?
P.S. Please pardon me If I asked something too silly.
The final layer nn.Linear (fully connected layer) of self.classifier of your model produces values, that we can call a scores, for example, it may be: [10.3, -3.5, -12.0], the same you can see in your example as well: [-0.1526, 1.3511, -1.0384] which are not normalized and cannot be interpreted as probabilities.
As you can see it's just a kind of "raw unscaled" network output, in other words these values are not normalized, and it's hard to use them or interpret the results, that's why the common practice is converting them to normalized probability distribution by using softmax after the final layer, as #skinny_func has already described. After that you will get the probabilities in the range of 0 and 1, which is more intuitive representation.
So after training what you would want to do is to apply softmax to the output tensor to extract the probability of each class, then you choose the maximal value (highest probability).
in your case:
prob = torch.nn.functional.softmax(model(x), dim=1)
_, pred_class = torch.max(prob, dim=1)
I need to visualize the output of Vgg16 model which classify 14 different classes.
I load the trained model and I did replace the classifier layer with the identity() layer but it doesn't categorize the output.
Here is the snippet:
the number of samples here is 1000 images.
epoch = 800
PATH = 'vgg16_epoch{}.pth'.format(epoch)
checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
class Identity(nn.Module):
def __init__(self):
super(Identity, self).__init__()
def forward(self, x):
return x
model.classifier._modules['6'] = Identity()
model.eval()
logits_list = numpy.empty((0,4096))
targets = []
with torch.no_grad():
for step, (t_image, target, classess, image_path) in enumerate(test_loader):
t_image = t_image.cuda()
target = target.cuda()
target = target.data.cpu().numpy()
targets.append(target)
logits = model(t_image)
print(logits.shape)
logits = logits.data.cpu().numpy()
print(logits.shape)
logits_list = numpy.append(logits_list, logits, axis=0)
print(logits_list.shape)
tsne = TSNE(n_components=2, verbose=1, perplexity=10, n_iter=1000)
tsne_results = tsne.fit_transform(logits_list)
target_ids = range(len(targets))
plt.scatter(tsne_results[:,0],tsne_results[:,1],c = target_ids ,cmap=plt.cm.get_cmap("jet", 14))
plt.colorbar(ticks=range(14))
plt.legend()
plt.show()
here is what this script has been produced: I am not sure why I have all colors for each cluster!
The VGG16 outputs over 25k features to the classifier. I believe it's too much to t-SNE. It's a good idea to include a new nn.Linear layer to reduce this number. So, t-SNE may work better. In addition, I'd recommend you two different ways to get the features from the model:
The best way to get it regardless of the model is by using the register_forward_hook method. You may find a notebook here with an example.
If you don't want to use the register, I'd suggest this one. After loading your model, you may use the following class to extract the features:
class FeatNet (nn.Module):
def __init__(self, vgg):
super(FeatNet, self).__init__()
self.features = nn.Sequential(*list(vgg.children())[:-1]))
def forward(self, img):
return self.features(img)
Now, you just need to call FeatNet(img) to get the features.
To include the feature reducer, as I suggested before, you need to retrain your model doing something like:
class FeatNet (nn.Module):
def __init__(self, vgg):
super(FeatNet, self).__init__()
self.features = nn.Sequential(*list(vgg.children())[:-1]))
self.feat_reducer = nn.Sequential(
nn.Linear(25088, 1024),
nn.BatchNorm1d(1024),
nn.ReLU()
)
self.classifier = nn.Linear(1024, 14)
def forward(self, img):
x = self.features(img)
x_r = self.feat_reducer(x)
return self.classifier(x_r)
Then, you can run your model returning x_r, that is, the reduced features. As I told you, 25k features are too much for t-SNE. Another method to reduce this number is by using PCA instead of nn.Linear. In this case, you send the 25k features to PCA and then train t-SNE using the PCA's output. I prefer using nn.Linear, but you need to test to check which one you get a better result.
I am beginner and I am trying to implement AlexNet for image classification. The pytorch implementation of AlexNet is as follows:
class AlexNet(nn.Module):
def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(64, 192, kernel_size=5, padding=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(192, 384, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(384, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)
def forward(self, x):
x = self.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 256 * 6 * 6)
x = self.classifier(x)
return x
However I am trying to implement the network for a input size of (3,448,224) with num of classes = 8.
I have no idea on how to change x.view in the forward method and how many layers I should drop to get optimum performance. Please help.
As stated in https://github.com/pytorch/vision/releases:
Since, most of the pretrained models provided in torchvision (the newest version) already added self.avgpool = nn.AdaptiveAvgPool2d((size, size)) to resolve the incompatibility with input size. So you don't have to care about it so much.
Below is the code, very short.
import torchvision
import torch.nn as nn
num_classes = 8
model = torchvision.models.alexnet(pretrained=True)
# replace the last classifier
model.classifier[6] = nn.Linear(4096, num_classes)
# now you can trained it with your dataset of size (3, 448, 224)
Transfer learning
There are two popular ways to do transfer learning. Suppose that we trained a model M in very large dataset D_large, now we would like to transfer the "knowledge" learned by the model M to our new model, M', on other datasets such as D_other (which has a smaller size than that of D_large).
Use (most) parts of M as the architecture of our new M' and initialize those parts with the weights trained on D_large. We can start training the model M' on the dataset D_other and let it learn the weights of those above parts from M to find the optimal weights on our new dataset. This is usually referred as fine-tuning the model M'.
Same as the above method except that before training M' we freeze all the parameters of those parts and start training M' on our dataset D_other. In both cases, those parts from M are mostly the first components in the model M' (the base). However, in this case, we refer those parts of M as the model to extract the features from the input dataset (or feature extractor). The accuracy obtained from the two methods may differ a little to some extent. However, this method guarantees the model doesn't overfit on the small dataset. It's a good point in terms of accuracy. On the other hands, when we freeze the weights of M, we don't need to store some intermediate values (the hidden outputs from each hidden layer) in the forward pass and also don't need to compute the gradients during the backward pass. This improves the speed of training and reduces the memory required during training.
The implementation
Along with Alexnet, a lot of pretrained models on ImageNet is already provided by Facebook team such as ResNet, VGG.
To fit your requirements the most in the aspect of model size, it would be nice to use VGG11, and ResNet which have fewest parameters in their model family.
I just pick VGG11 as an example:
Obtain a pretrained model from torchvision.
Freeze the all the parameters of this model.
Replace the last layer in the model by your new Linear layer to perform your classification. This means that you can reuse all most everything of M to M'.
import torchvision
# obtain the pretrained model
model = torchvision.models.vgg11(pretrained=True)
# freeze the params
for param in net.parameters():
param.requires_grad = False
# replace with your classifier
num_classes = 8
net.classifier[6] = nn.Linear(in_features=4096, out_features=num_classes)
# start training with your dataset
Warnings
In the old torchvision package version, there is no self.avgpool = nn.AdaptiveAvgPool2d((size, size)) which makes harder to train on our input size which is different from [3, 224, 224] used in training ImageNet. You can do a little effort as below:
class OurVGG11(nn.Module):
def __init__(self, num_classes=8):
super(OurVGG11, self).__init__()
self.vgg11 = torchvision.models.vgg11(pretrained=True)
for param in self.vgg11.parameters():
param.requires_grad = False
# Add a avgpool here
self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
# Replace the classifier layer
self.vgg11.classifier[-1] = nn.Linear(4096, num_classes)
def forward(self, x):
x = self.vgg11.features(x)
x = self.avgpool(x)
x = x.view(x.size(0), 512 * 7 * 7)
x = self.vgg11.classifier(x)
return x
model = OurVGG11()
# now start training `model` on our dataset.
Try out with different models in torchvision.models.