Compressing RGB images with numpy

Compressing RGB images with numpy - python-3.x

I have 10,000 images in RGB in an ndarray the size of (10000, 32, 32, 3).
I'd like to efficiently compress the images (take the means of colors) to 2x2, 4x4 etc. using numpy. The only idea I've got so far is to manually split the images, compress, and put together the pieces within the loops. Is there a more elegant solution?

You could do something like this, using scipy.ndimage.zoom:
import numpy as np
import scipy.ndimage as si
def resample(img, dims):
orig = img.shape[1]
new_imgs = []
for dim in dims:
factor = dim / orig
new_img = si.zoom(img, zoom=[1, factor, factor, 1])
new_imgs.append(new_img)
return new_imgs
For example, with random data:
>>> img = np.random.random((100, 32, 32, 3))
>>> resample(img, dims = [2, 4, 8, 16, 32])
>>> [img.shape for img in new_imgs]
[(100, 2, 2, 3),
(100, 4, 4, 3),
(100, 8, 8, 3),
(100, 16, 16, 3),
(100, 32, 32, 3)]
Note from the comment (below) that you might need to adjust the mode parameter in the zoom function.

You can use SciKit image's view_as_blocks and np.mean():
import numpy as np
import skimage
images = np.random.rand(10000, 32, 32, 3)
images_rescaled = skimage.util.view_as_blocks(images, (1, 4, 4, 1)).mean(axis=(-2, -3)).squeeze()
images_rescaled.shape
# (10000, 8, 8, 3)

Related

ConvNeXt torchvision - specify input channels

How do I change the number of input channels in the torchvision ConvNeXt model? I am working with grayscale images and want 1 input channel instead of 3.
import torch
from torchvision.models.convnext import ConvNeXt, CNBlockConfig
# this is the given configuration for the 'tiny' model
block_setting = [
CNBlockConfig(96, 192, 3),
CNBlockConfig(192, 384, 3),
CNBlockConfig(384, 768, 9),
CNBlockConfig(768, None, 3),
]
model = ConvNeXt(block_setting)
# my sample image (N, C, W, H) = (16, 1, 50, 50)
im = torch.randn(16, 1, 50, 50)
# forward pass
model(im)
output:
RuntimeError: Given groups=1, weight of size [96, 3, 4, 4], expected input[16, 1, 50, 50] to have 3 channels, but got 1 channels instead
However, if I change my input shape to (16, 3, 50, 50) it seems to work fine.
The torchvision source code seems to be based of their github implementation but where do I specify in_chans with the torchvision interface?

You can rewrite the whole input layer, model._modules["features"][0][0] is
nn.Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
Then, you only need to change the in_channels
>>> model._modules["features"][0][0] = nn.Conv2d(1, 96, kernel_size=(4, 4), stride=(4, 4))
>>> model(im)
tensor([[-0.4854, -0.1925, 0.1051, ..., -0.2310, -0.8830, -0.0251],
[ 0.3332, -0.4205, -0.3007, ..., 0.8530, 0.1429, -0.3819],
[ 0.1794, -0.7546, -0.7835, ..., -0.8072, -0.0972, 0.7413],
...,
[ 0.1356, 0.0868, 0.6135, ..., -0.1382, -0.2001, 0.2415],
[-0.1612, -0.4812, 0.1271, ..., -0.6594, 0.2706, 1.0833],
[ 0.0243, -0.5039, -0.4086, ..., 0.4233, 0.0389, 0.2787]],
grad_fn=<AddmmBackward0>)

How to create a mask from some ordered points?

I want to create a binary mask from some ordered points.
For example, for the points:
ps = [
[64, 64, 0],
[128, 64, 0],
[128, 128, 0],
[64, 128, 0],
]
I hope it can create a binary mask which is agree with:
import numpy as np
img = np.zeros(shape=[256, 256])
img[64:128, 64:128] = 1
I find vtkPolyDataToImageStencil&vtkImageStencilToImage may be the solution. And my code is:
import vtkmodules.all as vtk
from vtk.util.numpy_support import vtk_to_numpy, numpy_to_vtk
ps = [
[64, 64, 0],
[128, 64, 0],
[128, 128, 0],
[64, 128, 0],
]
polydata = vtk.vtkPolyData()
points = vtk.vtkPoints()
polygon = vtk.vtkPolygon()
polygon.GetPointIds().SetNumberOfIds(len(ps))
for idx, p in enumerate(ps):
points.InsertNextPoint(p[0], p[1], p[2])
polygon.GetPointIds().SetId(idx, idx)
polygons = vtk.vtkCellArray()
polygons.InsertNextCell(polygon)
polydata.SetPoints(points)
polydata.SetPolys(polygons)
polyDataToImageStencil = vtk.vtkPolyDataToImageStencil()
polyDataToImageStencil.SetInputData(polydata)
polyDataToImageStencil.SetOutputOrigin(0, 0, 0)
polyDataToImageStencil.SetOutputSpacing([1, 1, 1])
polyDataToImageStencil.SetOutputWholeExtent([0, 255, 0, 255, 0, 0])
polyDataToImageStencil.Update()
imgStencilToImage = vtk.vtkImageStencilToImage()
imgStencilToImage.SetInputConnection(polyDataToImageStencil.GetOutputPort())
imgStencilToImage.SetInsideValue(1)
imgStencilToImage.SetOutsideValue(0)
imgStencilToImage.Update()
vtkMask = imgStencilToImage.GetOutput()
mask = vtk_to_numpy(vtkMask.GetPointData().GetScalars())
print(mask.sum())
However, the final mask is all 0.
What’s wrong with my code? Any suggestion is appreciated~~~

I would recommend to use opencv. I have never done this with binary (black & white) images. My example works with full color files.
import numpy as np
import cv2 as cv
def mask_from_polygons(ps, x, y, backgr=(255,255,255), foregr=(0,0,0)):
result = np.empty((y, x, 3), np.uint8)
result[:] = backgr
pts = np.array(ps).round().astype(np.int32)
cv.fillPoly(result, [pts], foregr)
return(result)
Should be not too difficult to adapt to black & white...

Pytorch permute not changing desired index

I am trying to use the permute function to swap the axis of my tensor but for some reason the output is not as expected. The output of the code is torch.Size([512, 256, 3, 3]) but I would expect it to be torch.Size([256, 512, 3, 3]). It doesn't look like I can use flip to switch 0, 1 index. Is there something i am missing? I wish to change the tensor such that the shape is (256, 512, 3, 3).
Reproducible code:
import torch
wtf = torch.rand(3, 3, 512, 256)
wtf = wtf.permute(2, 3, 1, 0)
print(wtf.shape)

The numbers provided to torch.permute are the indices of the axis in the order you want the new tensor to have.
Having set x as torch.rand(3, 3, 512, 256).
If you want to invert the order of axis: the initial order is 0, 1, 2, 3, you want 3, 2, 1, 0:
>>> wtf.permute(3, 2, 1, 0).shape
torch.Size([256, 512, 3, 3])
Inverting axis order is essentially the transpose operation:
>>> wtf.T
torch.Size([256, 512, 3, 3])
If you just want to invert and keep the order of the last two: original order is 0, 1, 2, 3 and resulting order is 3, 2, 0, 1:
>>> x.permute(3, 2, 0, 1).shape
torch.Size([256, 512, 3, 3])
The difference between the two options is that the last two axes of size 3 will be swapped.

Split and extract values from a PyTorch tensor according to given dimensions

I have a tensor Aof sizetorch.Size([32, 32, 3, 3]) and I want to split it and extract a tensor B of size torch.Size([16, 16, 3, 3]) from it. The tensor can be 1d or 4d and split has to be according to the given new tensor dimensions. I have been able to generate the target dimensions but I'm unable to split and extract the values from the source tensor. I ave tried torch.narrow but it takes only 3 arguments and I need 4 in many cases. torch.split takes dim as an int due to which tensor is split along one dimension only. But I want to split it along multiple dimensions.

You have multiple options:
use .split multiple times
use .narrow multiple times
use slicing
e.g.:
t = torch.rand(32, 32, 3, 3)
t0, t1 = t.split((16, 16), 0)
print(t0.shape, t1.shape)
>>> torch.Size([16, 32, 3, 3]) torch.Size([16, 32, 3, 3])
t00, t01 = t0.split((16, 16), 1)
print(t00.shape, t01.shape)
>>> torch.Size([16, 16, 3, 3]) torch.Size([16, 16, 3, 3])
t00_alt, t01_alt = t[:16, :16, :, :], t[16:, 16:, :, :]
print(t00_alt.shape, t01_alt.shape)
>>> torch.Size([16, 16, 3, 3]) torch.Size([16, 16, 3, 3])

TensorFlow: Why does avg_pool ignore one stride dimension?

I am attempting to stride over the channel dimension, and the following code exhibits surprising behaviour. It is my expectation that tf.nn.max_pool and tf.nn.avg_pool should produce tensors of identical shape when fed the exact same arguments. This is not the case.
import tensorflow as tf
x = tf.get_variable('x', shape=(100, 32, 32, 64),
initializer=tf.constant_initializer(5), dtype=tf.float32)
ksize = (1, 2, 2, 2)
strides = (1, 2, 2, 2)
max_pool = tf.nn.max_pool(x, ksize, strides, padding='SAME')
avg_pool = tf.nn.avg_pool(x, ksize, strides, padding='SAME')
print(max_pool.shape)
print(avg_pool.shape)
This prints
$ python ex04/mini.py
(100, 16, 16, 32)
(100, 16, 16, 64)
Clearly, I am misunderstanding something.

The link https://github.com/Hvass-Labs/TensorFlow-Tutorials/issues/19 states:
The first and last stride must always be 1,
because the first is for the image-number and
the last is for the input-channel.

Turns out this is really a bug.
https://github.com/tensorflow/tensorflow/issues/14886#issuecomment-352934112

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Compressing RGB images with numpy - python-3.x

You can use SciKit image's view_as_blocks and np.mean(): import numpy as np import skimage images = np.random.rand(10000, 32, 32, 3) images_rescaled = skimage.util.view_as_blocks(images, (1, 4, 4, 1)).mean(axis=(-2, -3)).squeeze() images_rescaled.shape # (10000, 8, 8, 3)

Related

ConvNeXt torchvision - specify input channels

How to create a mask from some ordered points?

Pytorch permute not changing desired index

Split and extract values from a PyTorch tensor according to given dimensions

TensorFlow: Why does avg_pool ignore one stride dimension?

Categories

Resources