Related
I have a PyTorch tensor of size (1, 4, 128, 128) (batch, channel, height, width), and I want to 'upsample' it to (1, 3, 256, 256)
I thought to use interpolate (a function in nn.functional)
However, reading the documentation, and applying this function I am able to get in output a shape (1, 4, 256, 256), so maybe it is not the function that I am looking for. The code that I used is the following:
import torch.nn as nn
#x.shape -> (1,4,128,128)
x_0 = nn.functional.interpolate(x, scale_factor=2, mode='bilinear', align_corners=False)
#x_0.shape -> (1,4,256,256)
How can I do that (from (1, 4, 128, 128) to (1, 3, 256, 256))?
To follow there is the network that I am trying to replicate, but I got stack in the upsample layer.
What about PyTorch's nn.Upsample function:
upsample = nn.Upsample(scale_factor=2)
x = upsample(x)
Not sure if that's what you are looking for since you want the second dimension to change from 4 to 3.
How do I change the number of input channels in the torchvision ConvNeXt model? I am working with grayscale images and want 1 input channel instead of 3.
import torch
from torchvision.models.convnext import ConvNeXt, CNBlockConfig
# this is the given configuration for the 'tiny' model
block_setting = [
CNBlockConfig(96, 192, 3),
CNBlockConfig(192, 384, 3),
CNBlockConfig(384, 768, 9),
CNBlockConfig(768, None, 3),
]
model = ConvNeXt(block_setting)
# my sample image (N, C, W, H) = (16, 1, 50, 50)
im = torch.randn(16, 1, 50, 50)
# forward pass
model(im)
output:
RuntimeError: Given groups=1, weight of size [96, 3, 4, 4], expected input[16, 1, 50, 50] to have 3 channels, but got 1 channels instead
However, if I change my input shape to (16, 3, 50, 50) it seems to work fine.
The torchvision source code seems to be based of their github implementation but where do I specify in_chans with the torchvision interface?
You can rewrite the whole input layer, model._modules["features"][0][0] is
nn.Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
Then, you only need to change the in_channels
>>> model._modules["features"][0][0] = nn.Conv2d(1, 96, kernel_size=(4, 4), stride=(4, 4))
>>> model(im)
tensor([[-0.4854, -0.1925, 0.1051, ..., -0.2310, -0.8830, -0.0251],
[ 0.3332, -0.4205, -0.3007, ..., 0.8530, 0.1429, -0.3819],
[ 0.1794, -0.7546, -0.7835, ..., -0.8072, -0.0972, 0.7413],
...,
[ 0.1356, 0.0868, 0.6135, ..., -0.1382, -0.2001, 0.2415],
[-0.1612, -0.4812, 0.1271, ..., -0.6594, 0.2706, 1.0833],
[ 0.0243, -0.5039, -0.4086, ..., 0.4233, 0.0389, 0.2787]],
grad_fn=<AddmmBackward0>)
I have 10,000 images in RGB in an ndarray the size of (10000, 32, 32, 3).
I'd like to efficiently compress the images (take the means of colors) to 2x2, 4x4 etc. using numpy. The only idea I've got so far is to manually split the images, compress, and put together the pieces within the loops. Is there a more elegant solution?
You could do something like this, using scipy.ndimage.zoom:
import numpy as np
import scipy.ndimage as si
def resample(img, dims):
orig = img.shape[1]
new_imgs = []
for dim in dims:
factor = dim / orig
new_img = si.zoom(img, zoom=[1, factor, factor, 1])
new_imgs.append(new_img)
return new_imgs
For example, with random data:
>>> img = np.random.random((100, 32, 32, 3))
>>> resample(img, dims = [2, 4, 8, 16, 32])
>>> [img.shape for img in new_imgs]
[(100, 2, 2, 3),
(100, 4, 4, 3),
(100, 8, 8, 3),
(100, 16, 16, 3),
(100, 32, 32, 3)]
Note from the comment (below) that you might need to adjust the mode parameter in the zoom function.
You can use SciKit image's view_as_blocks and np.mean():
import numpy as np
import skimage
images = np.random.rand(10000, 32, 32, 3)
images_rescaled = skimage.util.view_as_blocks(images, (1, 4, 4, 1)).mean(axis=(-2, -3)).squeeze()
images_rescaled.shape
# (10000, 8, 8, 3)
I am trying to use ResNet50 Pretrained network for segmentation problem.
I remove the last layer and add my desired layer. But when I try to fit, I get the following error:
ValueError: Error when checking target: expected conv2d_1 to have shape (16, 16, 1) but got array with shape (512, 512, 1)
I have two folders: images and masks. images are RGB and masks are in grayscale.
The shape is 512x512 for all images.
I can not figure in which part am I doing wrong.
Any help will be appreciated.
from keras.applications.resnet50 import ResNet50
image_input=Input(shape=(512, 512, 3))
model = ResNet50(input_tensor=image_input,weights='imagenet',include_top=False)
x = model.output
x = Conv2D(1, (1,1), padding="same", activation="sigmoid")(x)
model = Model(inputs=model.input, outputs=x)
model.summary()
conv2d_1 (Conv2D) (None, 16, 16, 1) 2049 activation_49[0][0]
for layer in model.layers[:-1]:
layer.trainable = False
for layer in model.layers[-1:]:
layer.trainable = True
model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])
Your network gives an output of shape (16, 16, 1) but your y (target) has shape (512, 512, 1)
Run the following to see this.
from keras.applications.resnet50 import ResNet50
from keras.layers import Input
image_input=Input(shape=(512, 512, 3))
model = ResNet50(input_tensor=image_input,weights='imagenet',include_top=False)
model.summary()
# Output shows that the ResNet50 network has output of shape (16,16,2048)
from keras.layers import Conv2D
conv2d = Conv2D(1, (1,1), padding="same", activation="sigmoid")
conv2d.compute_output_shape((None, 16, 16, 2048))
# Output shows the shape your network's output will have.
Either your y or the way you use ResNet50 has to change. Read about ResNet50 to see what you are missing.
I am a learner who is just beginning to learn deep learning.
I just started using Keras.
I want to implement SRCNN.
This problem occurs when I try to import a picture to test the model first.
Problem:
ValueError: Error when checking input: expected conv2d_1_input to have
4 dimensions, but got array with shape (80, 120, 3)
My code is as follows:
from PIL import Image
import numpy as np
from keras import Sequential
from keras.layers import Conv2D, Activation
input_image = Image.open('../../res/image/120x80/120x80 (1).png')
input_image_array = np.array(input_image)
model = Sequential()
model.add(Conv2D(64, (9, 9), data_format='channels_last', activation='relu', input_shape=(80, 120, 3)))
model.add(Conv2D(35, (1, 1), data_format='channels_last', activation='relu', input_shape=(80, 120, 3)))
model.add(Conv2D(1, (5, 5), data_format='channels_last', input_shape=(120, 80, 3)))
model.compile(loss='mean_squared_error', optimizer='sgd')
model.fit(input_image_array, input_image_array)
print(model.summary())
To give a single input image, you need to include the samples dimension (the first one), so you need to add dimension with a value of one:
input_image_array = np.array(input_image)
input_image_array = input_image_array[np.newaxis, :, :, :]
This will change the shape to (1, 80, 120, 3) which corresponds to one image sample.