Like what nn.Conv2d or nn.AvgPool2d do with a tensor and a kernel size, I would like to calculate the variances of a tensor with a kernel size. How can I achieve this? I guess maybe source code of pytorch should be touched?
If it's only the variance you are after, you can use the fact that
var(x) = E[x^2] - E[x]^2
Using avg_pool2d you can estimate the local average of x and of x squared:
import torch.nn.functional as nnf
running_var = nnf.avg_pool2d(x**2, kernel_size=2, stride=1) - nnf.avg_pool2d(x, kernel_size=2,stride=1)**2
However, if you want a more general method of performing "sliding window" operations, you should become familiarized with unfold and fold:
u = nnf.unfold(x, kernel_size=2, stride=1) # get all kernel_size patches as vectors
running_var2 = torch.var(u, unbiased=False, dim=1)
# reshape back to original shape ("folding")
running_var2 = running_var2.reshape(x.shape[0], 1, x.shape[2]-1, x.shape[3]-1)
Related
I want to train the model to sum the three inputs. So it is as simple as possible.
Firstly the weights are initialized randomly. It produces bad error estimate (approx. 0.5)
Then I initialize the weights with zeros. There are two options:
the shape of the weights tensor is [1, 3]
the shape of the weights tensor is [3]
When I choose the 1st option the model still works bad and can't learn this simple formula.
When I choose the 2nd option it works perfect with the error of 10e-12.
Why the result depends on the shape of the weights? Why do I need to initialize the model with zeros to solve this simple problem?
import torch
from torch.nn import Sequential as Seq, Linear as Lin
from torch.optim.lr_scheduler import ReduceLROnPlateau
X = torch.rand((1024, 3))
y = (X[:,0] + X[:,1] + X[:,2])
m = Seq(Lin(3, 1, bias=False))
# 1 option
m[0].weight = torch.nn.parameter.Parameter(torch.tensor([[0, 0, 0]], dtype=torch.float))
# 2 option
#m[0].weight = torch.nn.parameter.Parameter(torch.tensor([0, 0, 0], dtype=torch.float))
optim = torch.optim.SGD(m.parameters(), lr=10e-2)
scheduler = ReduceLROnPlateau(optim, 'min', factor=0.5, patience=20, verbose=True)
mse = torch.nn.MSELoss()
for epoch in range(500):
optim.zero_grad()
out = m(X)
loss = mse(out, y)
loss.backward()
optim.step()
if epoch % 20 == 0:
print(loss.item())
scheduler.step(loss)
First option doesn't learning because it fails with broadcasting: while out.shape == (1024, 1) corresponding targets y has shape of (1024, ). MSELoss, as expected, computes mean of tensor (out - y)^2, which in this case has shape (1024, 1024), clearly wrong objective for this task. At the same time, after applying 2-nd option tensor (out - y)^2 has size (1024, ) and mean of it corresponds to actual mse. Default approach, without explicit changing weights shape (through option 1 and 2), would work if set target shape to (1024, 1) for example by y = y.unsqueeze(-1) after definition of y.
Fitting a single polynomial to a bunch of data is pretty easy in Pytorch using an nn.Linear layer. I've included a trivial example at the end of this post. But suppose I have tons of data split into groups, and I want to fit a different polynomial to each group. As an example, find the particular quadratic coefficients that fit each column in this image:
In other words, I want to simultaneously find the coefficients for N polynomials of order n, given m data per set to be fit:
In the image above, there are m=80 points per dataset, and N=100 sets to fit.
This perfectly lends itself to tensor manipulation and Pytorch on a gpu should make this blindingly fast by fitting all N at once. Problem is, I'm having a terrible brain fart, and haven't been able to wrap my head around the right layer configuration. Basically I need N nn.Linear layers, each operating on its own dataset. If this were convolution, I'd use a depthwise layer...
Example network to fit one polynomial where X are the m x p abscissa data, y are the m ordinate data, and we want to find the p coefficients.
class polyfit(torch.nn.Module):
def __init__(self,n=2):
super(polyfit, self).__init__()
self.poly = torch.nn.Linear(n,1,bias=False,)
def forward(self, x):
print(x.shape,self.poly)
return self.poly(x)
model = polyfit(n)
loss = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
for epoch in range(100): # or however I want to run the loops
output = model(X)
mse = loss(output, y)
optimizer.zero_grad()
mse.backward()
optimizer.step()
Figured it out after thinking about my Depthwise Convolution comment. A Conv1D with just 3 parameters times a tensor with values [1,x,x**2] is a quadratic, same as with a Linear layer with n=3. So the layer needs to be:
self.poly = torch.nn.Conv1d(N,N,n+1,bias=False,groups=N)
Just have to make sure the X,y tensors are the right dimensions of [m, N, n] and [m, N, 1] respectively.
Is there any way in Pytorch to reduce dimensions of tensor in model?
Adaptive Average Pooling or in fact any typical pooling in Pytorch does not reduce the dimensions of a tensor.
You can find all the types of poolings, Pytorch offers over here:
https://pytorch.org/docs/master/nn.html#pooling-layers
I suggest to use this template code to try out different poolings and their affect on dimensions:
m = nn.AdaptiveAvgPool2d((5,7))
input = torch.randn(1, 64, 8, 9)
output = m(input)
print(output.size())
In order to reduce dimension in Pytorch models you can specify a block that does squeeze() to the tensor or even flattens the tensor with example_tensor.view(-1, x, y) for example.
Sarthak Jain
This code should work to compress (1,64,224,224) --> (1,64)
import torch
import torch.nn as nn
m = nn.AdaptiveAvgPool2d((1,1))
input = torch.randn(1, 64, 224, 224)
output = m(input).view(1,-1)
print(output.size()) #torch.Size([1, 64])
1d-convolution is pretty simple when it is done by hand. However, I want to implement what is done here using nn.Conv1d and it is not simple for me to do it. In this example h=[1,2,-1], x=[4,1,2,5] and the output is going to be y=[4,9,0,8,8,-5]. To do it using Pytorch we need to define h=nn.Conv1d(in, out, k) and x=torch.tensor(*) and y=h(x) should be the result.
Note: please do not use nn.Conv2d to implement it.
First, you should be aware that the term "convolution" used in basically all literature related to convolutional neural networks (CNNs) actually corresponds to the correlation operation not the convolution operation.
The only difference (for real-valued inputs) between correlation and convolution is that in convolution the kernel is flipped/mirrored before sliding it across the signal, whereas in correlation no such flipping occurs.
There are also some extra operations that convolution layers in CNNs perform that are not part of the definition of convolution. They apply an offset (a.k.a. bias), they operate on mini-batches, and they map multi-channel inputs to multi-channel outputs.
Therefore, in order to recreate a convolution operation using a convolution layer we should (i) disable bias, (ii) flip the kernel, and (iii) set batch-size, input channels, and output channels to one.
For example, a PyTorch implementation of the convolution operation using nn.Conv1d looks like this:
import torch
from torch import nn
x = torch.tensor([4, 1, 2, 5], dtype=torch.float)
k = torch.tensor([1, 2, -1], dtype=torch.float)
# Define these constants to differentiate the various usages of "1".
BATCH_SIZE, IN_CH, OUT_CH = 1, 1, 1
# Pad with len(k)-1 zeros to ensure all non-zero outputs are computed.
h = nn.Conv1d(IN_CH, OUT_CH, kernel_size=len(k), padding=len(k) - 1, bias=False)
# Copy flipped k into h.weight.
# h.weight is shape (OUT_CH, IN_CH, kernel_size), reshape k accordingly.
# Perform copy inside no_grad context to avoid autograd issues.
with torch.no_grad():
h.weight.copy_(torch.flip(k, dims=[0]).reshape(OUT_CH, IN_CH, -1))
# Input shape to h is assumed to be (BATCH_SIZE, IN_CH, SIGNAL_LENGTH), reshape x accordingly.
# Output shape of h is (BATCH_SIZE, OUT_CH, OUTPUT_LENGTH), reshape output to 1D signal.
y = h(x.reshape(BATCH_SIZE, IN_CH, -1)).reshape(-1)
which results in
>>> print(y)
tensor([ 4., 9., 0., 8., 8., -5.], grad_fn=<ViewBackward>)
I Know there is the Conv2DTranspose in keras which can be used in Image. We need to use it in NLP, so the 1D deconvolution is needed.
How do we implement the Conv1DTranspose in keras?
Use keras backend to fit the input tensor to 2D transpose convolution. Do not always use transpose operation for it will consume a lot of time.
import keras.backend as K
from keras.layers import Conv2DTranspose, Lambda
def Conv1DTranspose(input_tensor, filters, kernel_size, strides=2, padding='same'):
"""
input_tensor: tensor, with the shape (batch_size, time_steps, dims)
filters: int, output dimension, i.e. the output tensor will have the shape of (batch_size, time_steps, filters)
kernel_size: int, size of the convolution kernel
strides: int, convolution step size
padding: 'same' | 'valid'
"""
x = Lambda(lambda x: K.expand_dims(x, axis=2))(input_tensor)
x = Conv2DTranspose(filters=filters, kernel_size=(kernel_size, 1), strides=(strides, 1), padding=padding)(x)
x = Lambda(lambda x: K.squeeze(x, axis=2))(x)
return x
In my answer, I suppose you are previously using Conv1D for the convolution.
Conv2DTranspose is new in Keras2, it used to be that what it does was done by a combination of UpSampling2D and a convolution layer. In StackExchange[Data Science] there is a very interesting discussion about what are deconvolutional layers (one answer includes very usefull animated gifs).
Check this discussion about "Why all convolutions (no deconvolutions) in "Building Autoencoders in Keras" interesting. Here is an excerpt: "As Francois has explained multiple times already, a deconvolution layer is only a convolution layer with an upsampling. I don't think there is an official deconvolution layer. The result is the same." (The discussion goes on, it might be that they are approximately, not exactly the same - also, since then, Keras 2 introduced Conv2DTranspose)
The way I understand it, a combination of UpSampling1D and then Convolution1D is what you are looking for, I see no reason to go to 2D.
If however you want to go with Conv2DTranspose, you will need to first Reshape the input from 1D to 2D e.g.
model = Sequential()
model.add(
Conv1D(
filters = 3,
kernel_size = kernel_size,
input_shape=(seq_length, M),#When using this layer as the first layer in a model, provide an input_shape argument
)
)
model.add(
Reshape( ( -1, 1, M) )
)
model.add(
keras.layers.Conv2DTranspose(
filters=M,
kernel_size=(10,1),
data_format="channels_last"
)
)
The inconvenient part for using Conv2DTranspose is that you need to specify seq_length and cannot have it as None (arbitrary length series)
Unfortunately, the same is true with UpSampling1D for TensorFlow back-end (Theano seems to be once again better here - too bad its not gonna be around)
In TensorFlow v2.2.0 the Conv1DTranspose layer has been implemented in the tf.keras.layers API. Check it out!
You can reshape it to occupy an extra dimension, run the deconvolution, and then reshape it back. In practice, this works. But I've not really thought very hard if it has any theoretical implications (but it seems to theoretically also be fine as you are not going to "convolve" over that dimension
x = Reshape( ( -1, 1 ) )( x )
x = Permute( ( 3, 1, 2 ) )( x )
x = Conv2DTranspose( filters, kernel )( x )
x = Lambda( K.squeeze, arguments={"axis":1} )( x )