I have these 2 variables:
result- tensor 1 X 251 X 20
kernel - tensor 1 X 10 X 10
when I run the command:
from torch.nn import functional as F
result = F.conv2d(result, kernel)
I get the error:
RuntimeError: expected stride to be a single integer value or a list of 1 values to match the convolution dimensions, but got stride=[1, 1]
I am not giving any stride, what am I doing wrong?
import torch
import torch.nn.functional as F
image = torch.rand(16, 3, 32, 32)
filter = torch.rand(1, 3, 5, 5)
out_feat_F = F.conv2d(image, filter,stride=1, padding=0)
print(out_feat_F.shape)
Out:
torch.Size([16, 1, 28, 28])
Which is equivalent with:
import torch
import torch.nn
image = torch.rand(16, 3, 32, 32)
conv_filter = torch.nn.Conv2d(in_channels=3, out_channels=1, kernel_size=5, stride=1, padding=0)
output_feature = conv_filter(image)
print(output_feature.shape)
Out:
torch.Size([16, 1, 28, 28])
Padding is by default 0, stride is by default 1.
The filter last two dimensions in the first example correspond to the
kernel size in the second example.
kernel_size=5 is the same as kernel_size=(5,5).
Related
I have a PyTorch tensor of size (1, 4, 128, 128) (batch, channel, height, width), and I want to 'upsample' it to (1, 3, 256, 256)
I thought to use interpolate (a function in nn.functional)
However, reading the documentation, and applying this function I am able to get in output a shape (1, 4, 256, 256), so maybe it is not the function that I am looking for. The code that I used is the following:
import torch.nn as nn
#x.shape -> (1,4,128,128)
x_0 = nn.functional.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)
#x_0.shape -> (1,4,256,256)
How can I do that (from (1, 4, 128, 128) to (1, 3, 256, 256))?
To follow there is the network that I am trying to replicate, but I got stack in the upsample layer.
What about PyTorch's nn.Upsample function:
upsample = nn.Upsample(scale_factor=2)
x = upsample(x)
Not sure if that's what you are looking for since you want the second dimension to change from 4 to 3.
I have a image tensor of shape :-
N,C,H,W = 5,512,13,13
I need to take a mean across H and W dimensions so that the output is of shape :-
N,C,1,1
I am trying doing for loop but is there some better way to do so using reshape. .
import torch
tz = torch.rand(5, 512, 13, 13)
tzm = tz.mean(dim=(2,3), keepdim=True)
tzm.shape
Output
torch.Size([5, 512, 1, 1])
I have multiple torch tensors with the following shapes
x1 = torch.Size([1, 512, 177])
x2 = torch.Size([1, 512, 250])
x3 = torch.Size([1, 512, 313])
How I can pad all these tensors by 0 over the last dimension, to have a unique shape like ([1, 512, 350]).
What I tried to do is to convert them into NumPy arrays and use these two lines of code:
if len(x1) < 350:
ff = np.pad(f, [(0, self.max_len - f.shape[0]), ], mode='constant')
f = ff
But unfortunately, it doesn't affect the last dim and still, the shapes are not equal.
Any help will be appreciated
Thanks
You can simply do:
import torch.nn.functional as F
x = F.pad(x, (0, self.max_len - x.size(2)), "constant", 0)
This is the question:
Before we define the model, we define the size of our alphabet. Our alphabet consists of lowercase English letters, and additionally a special character used for space between symbols or before and after the word. For the first part of this assignment, we don't need that extra character.
Our end goal is to learn to transcribe words of arbitrary length. However, first, we pre-train our simple convolutional neural net to recognize single characters. In order to be able to use the same model for one character and for entire words, we are going to design the model in a way that makes sure that the output size for one character (or when input image size is 32x18) is 1x27, and Kx27 whenever the input image is wider. K here will depend on particular architecture of the network, and is affected by strides, poolings, among other things. A little bit more formally, our model 𝑓𝜃 , for an input image 𝑥 gives output energies 𝑙=𝑓𝜃(𝑥) . If 𝑥∈ℝ32×18 , then 𝑙∈ℝ1×27 . If 𝑥∈ℝ32×100 for example, our model may output 𝑙∈ℝ10×27 , where 𝑙𝑖 corresponds to a particular window in 𝑥 , for example from 𝑥0,9𝑖 to 𝑥32,9𝑖+18 (again, this will depend on the particular architecture).
The code:
# constants for number of classes in total, and for the special extra character for empty space
ALPHABET_SIZE = 27, # Extra character for space inbetween
BETWEEN = 26
print(alphabet.shape) # RETURNS: torch.Size([32, 340])
My CNN Block:
from torch import nn
import torch.nn.functional as F
"""
Remember basics:
1. Bigger strides = less overlap
2. More filters = More features
Image shape = 32, 18
Alphabet shape = 32, 340
"""
class SimpleNet(torch.nn.Module):
def __init__(self):
super().__init__()
self.cnn_block = torch.nn.Sequential(
nn.Conv2d(3, 32, 3),
nn.BatchNorm2d(32),
nn.Conv2d(32, 32, 3),
nn.BatchNorm2d(32),
nn.Conv2d(32, 32, 3),
nn.BatchNorm2d(32),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, 3),
nn.BatchNorm2d(64),
nn.Conv2d(64, 64, 3),
nn.BatchNorm2d(64),
nn.Conv2d(64, 64, 3),
nn.BatchNorm2d(64),
nn.MaxPool2d(2)
)
def forward(self, x):
x = self.cnn_block(x)
# after applying cnn_block, x.shape should be:
# batch_size, alphabet_size, 1, width
return x[:, :, 0, :].permute(0, 2, 1)
model = SimpleNet()
alphabet_energies = model(alphabet.view(1, 1, *alphabet.shape))
def plot_energies(ce):
fig=plt.figure(dpi=200)
ax = plt.axes()
im = ax.imshow(ce.cpu().T)
ax.set_xlabel('window locations →')
ax.set_ylabel('← classes')
ax.xaxis.set_label_position('top')
ax.set_xticks([])
ax.set_yticks([])
cax = fig.add_axes([ax.get_position().x1+0.01,ax.get_position().y0,0.02,ax.get_position().height])
plt.colorbar(im, cax=cax)
plot_energies(alphabet_energies[0].detach())
I get the error in the title at alphabet_energies = model(alphabet.view(1, 1, *alphabet.shape))
Any help would be appreciated.
You should begin to replace nn.Conv2d(3, 32, 3) to nn.Conv2d(1, 32, 3)
Your model begins with a conv2d from 3 channels to 32 but your input image has only 1 channel (greyscale image).
I have 10,000 images in RGB in an ndarray the size of (10000, 32, 32, 3).
I'd like to efficiently compress the images (take the means of colors) to 2x2, 4x4 etc. using numpy. The only idea I've got so far is to manually split the images, compress, and put together the pieces within the loops. Is there a more elegant solution?
You could do something like this, using scipy.ndimage.zoom:
import numpy as np
import scipy.ndimage as si
def resample(img, dims):
orig = img.shape[1]
new_imgs = []
for dim in dims:
factor = dim / orig
new_img = si.zoom(img, zoom=[1, factor, factor, 1])
new_imgs.append(new_img)
return new_imgs
For example, with random data:
>>> img = np.random.random((100, 32, 32, 3))
>>> resample(img, dims = [2, 4, 8, 16, 32])
>>> [img.shape for img in new_imgs]
[(100, 2, 2, 3),
(100, 4, 4, 3),
(100, 8, 8, 3),
(100, 16, 16, 3),
(100, 32, 32, 3)]
Note from the comment (below) that you might need to adjust the mode parameter in the zoom function.
You can use SciKit image's view_as_blocks and np.mean():
import numpy as np
import skimage
images = np.random.rand(10000, 32, 32, 3)
images_rescaled = skimage.util.view_as_blocks(images, (1, 4, 4, 1)).mean(axis=(-2, -3)).squeeze()
images_rescaled.shape
# (10000, 8, 8, 3)