Padding a tensor until reaching required size - pytorch

I'm working with certian tensors with shape of (X,42) while X can be in a range between 50 to 70.
I want to pad each tensor that I get until it reaches a size of 70. so all tensors will be (70,42).
is there anyway to do this when I the begining size is a variable X? thanks for the help!

Use torch.nn.functional.pad - Pads tensor.
import torch
import torch.nn.functional as F
source = torch.rand((3,42))
source.shape
>>> torch.Size([3, 42])
# here, pad = (padding_left, padding_right, padding_top, padding_bottom)
source_pad = F.pad(source, pad=(0, 0, 0, 70 - source.shape[0]))
source_pad.shape
>>> torch.Size([70, 42])

You can easily do so by:
pad_x = torch.zeros((70, x.size(1)), device=x.device, dtype=x.dtype)
pad_x[:x.size(0), :] = x
This will give you x_pad with zero padding at the end of x

Related

Generate 8 bit image with numpy

I'm trying to generate an image of all 8 bit colours. And this is the important bit: 1 pixel represents 1 unique colour. That's 2^8 or 256 colours - should be a 32 x 32 image.
The plan is to be able to change the bit depth and create a different image. ie 65536 colours for 16 bit.
Here's what I have:
import numpy as np
from PIL import Image
# --------------------------------------------------------------
def create_image(output, width, height, pixels):
# Convert the pixels into an array using numpy
array = np.array(pixels, dtype=np.uint8)
img = Image.fromarray(array)
img.save(output)
# --------------------------------------------------------------
bit = 8
cmap = plt.get_cmap("viridis", 2**bit)
a = cmap(np.linspace(0,1,2**bit))
numOfCols = (len(a)) # number of cols
x = int(np.sqrt(2**bit)*2)
y = int(np.sqrt(2**bit)*2)
arr = np.reshape(a, (x, y))
create_image("test.png", x, y, arr)
I'm new to numpy and I may have the initial size of the array wrong, as I get an error
ValueError: cannot reshape array of size 1024 into shape (16,16)
if I try to force it into an array that's 16 x 16.
Secondly, the image is just black, which is great for coffee, not so good for my results.
How do I transfer the array with all the colour data to the image properly?
First of all, your colormap generates an array of values in the following fashion:
In [71]: mymap = cmap(np.linspace(0, 1, 2 ** bit))
In [72]: mymap
Out[72]:
array([[0.267004, 0.004874, 0.329415, 1. ],
[0.26851 , 0.009605, 0.335427, 1. ],
[0.269944, 0.014625, 0.341379, 1. ],
...,
[0.974417, 0.90359 , 0.130215, 1. ],
[0.983868, 0.904867, 0.136897, 1. ],
[0.993248, 0.906157, 0.143936, 1. ]])
In this question, it's noted that PIL cannot handle the 32-bit floating point RGB format.
It does support tuples of 3 8-bit integers, so our goal is to make these things integer and scale them to 0-255 range. And remove the last column (opacity).
# Filter out ones
mymap = mymap[:, :-1]
# Multiply by 256 and convert to uint8
mymap = np.uint8(mymap * 256)
Now we have to properly reshape it into a 16x16 array.
You actually have to reshape into (16, 16, 3), as the result would be a 3d array.
mymap = mymap.reshape(16, 16 ,3)
And, finally, make a PIL image out of that and write out
img = Image.fromarray(mymap)
img.save("output.png")
My result looks like this: ( please zoom in as it's only 16x16 pixels )

Maxpool of an image in pytorch

I'm trying to just apply maxpool2d (from torch.nn) on a single image (not as a maxpool layer). Here is my code right now:
name = 'astronaut'
imshow(images[name], name)
img = images[name]
# pool of square window of size=3, stride=1
m = nn.MaxPool2d(3,stride = 1)
img_transform = torch.Tensor(images[name])
plt.imshow(m(img_transform).view((512,510)))
The issue is, this code gives me a very green image as a result. I am sure the problem is with the dimensions of view, but I was unable to find how to apply maxpool to just one image so I couldn't fix it. The dimension of the image I'm considering is 512x512. The arguments for view make no sense for me right now, it's just the only number that gives a result...
If for example, I gave 512,512 as the argument for view, I get the following error:
RuntimeError: shape '[512, 512]' is invalid for input of size 261120
If anyone can tell me how to apply maxpool, avgpool, or minpool to an image and display the result I would be super grateful!
Thanks (:
Assuming your image is a numpy.array upon loading (please see comments for explanation of each step):
import numpy as np
import torch
# Assuming you have 3 color channels in your image
# Assuming your data is in Width, Height, Channels format
numpy_img = np.random.randint(low=0, high=255, size=(512, 512, 3))
# Transform to tensor
tensor_img = torch.from_numpy(numpy_img)
# PyTorch takes images in format Channels, Width, Height
# We have to switch their dimensions using `permute`
tensor_img = tensor_img.permute(2, 0, 1)
tensor_img.shape # Shape [3, 512, 512]
# Layers always need batch as first dimension (even for one image)
# unsqueeze will add it for you
ready_tensor_img = tensor_img.unsqueeze(dim=0)
ready_tensor_img.shape # Shape [1, 3, 512, 512]
pooling = torch.nn.MaxPool2d(kernel_size=3, stride=1)
# You need to cast your image to float as
# pooling is not implemented for Tensors of type long
new_img = pooling(ready_tensor_img.float())
If your image is black and white you would need shape [1, 1, 512, 512] (single channel only), you can't leave/squeeze those dimensions, they always have to be there for any torch.nn.Module!
To transform tensor into image again you could use similar steps:
# Cast to long and squeeze batch dimension
no_batch = new_img.long().squeeze(dim=0)
# Unpermute
width_height_channels = no_batch.permute(1, 2, 0)
width_height_channels.shape # Shape: [510, 510, 3]
# Cast to numpy and you have your image
final_image = width_height_channels.numpy()

What would be the equivalent of keras.layers.Masking in pytorch?

I have time-series sequences which I needed to keep the length of sequences fixed to a number by padding zeroes into matrix and using keras.layers.Masking in keras I could neglect those padded zeros for further computations, I am wondering how could it be done in Pytorch?
Either I need to do the padding in pytroch and pytorch can't handle the sequences with varying lengths what is the equivalent to Masking layer of keras in pytorch, or if pytorch handles the sequences with varying lengths, how could it be done?
You can use PackedSequence class as equivalent to keras masking. you can find more features at torch.nn.utils.rnn
Here putting example from packing for variable-length sequence inputs for rnn
import torch
import torch.nn as nn
from torch.autograd import Variable
batch_size = 3
max_length = 3
hidden_size = 2
n_layers =1
# container
batch_in = torch.zeros((batch_size, 1, max_length))
#data
vec_1 = torch.FloatTensor([[1, 2, 3]])
vec_2 = torch.FloatTensor([[1, 2, 0]])
vec_3 = torch.FloatTensor([[1, 0, 0]])
batch_in[0] = vec_1
batch_in[1] = vec_2
batch_in[2] = vec_3
batch_in = Variable(batch_in)
seq_lengths = [3,2,1] # list of integers holding information about the batch size at each sequence step
# pack it
pack = torch.nn.utils.rnn.pack_padded_sequence(batch_in, seq_lengths, batch_first=True)
>>> pack
PackedSequence(data=Variable containing:
1 2 3
1 2 0
1 0 0
[torch.FloatTensor of size 3x3]
, batch_sizes=[3])
# initialize
rnn = nn.RNN(max_length, hidden_size, n_layers, batch_first=True)
h0 = Variable(torch.randn(n_layers, batch_size, hidden_size))
#forward
out, _ = rnn(pack, h0)
# unpack
unpacked, unpacked_len = torch.nn.utils.rnn.pad_packed_sequence(out)
>>> unpacked
Variable containing:
(0 ,.,.) =
-0.7883 -0.7972
0.3367 -0.6102
0.1502 -0.4654
[torch.FloatTensor of size 1x3x2]
more you would find this article useful. [Jum to Title - "How the PackedSequence object works"] - link
You can use a packed sequence to mask a timestep in the sequence dimension:
batch_mask = ... # boolean mask e.g. (seq x batch)
# move `padding` at right place then it will be cut when packing
compact_seq = torch.zeros_like(x)
for i, seq_len in enumerate(batch_mask.sum(0)):
compact_seq[:seq_len, i] = x[batch_mask[:,i],i]
# pack in sequence dimension (the number of agents)
packed_x = pack_padded_sequence(compact_seq, batch_mask.sum(0).cpu().numpy(), enforce_sorted=False)
packed_scores, rnn_hxs = nn.GRU(packed_x, rnn_hxs)
# restore sequence dimension
scores, _ = pad_packed_sequence(packed_scores)
# restore order, moving padding in its place
scores = torch.zeros((*batch_mask.shape,scores.size(-1))).to(scores.device).masked_scatter(batch_mask.unsqueeze(-1), scores)
instead use a mask select/scatter to mask in the batch dimension:
batch_mask = torch.any(x, -1).unsqueeze(-1) # boolean mask (batch,1)
batch_x = torch.masked_select(x, batch_mask).reshape(-1, x.size(-1))
batch_rnn_hxs = torch.masked_select(rnn_hxs, batch_mask).reshape(-1, rnn_hxs.size(-1))
batch_rnn_hxs = nn.GRUCell(batch_x, batch_rnn_hxs)
rnn_hxs = rnn_hxs.masked_scatter(batch_mask, batch_rnn_hxs) # restore batch
Note that using scatter function is safe for gradient backpropagation

Index 150 out of bounds in axis0 with size 1

I was making histogram using numpy array in Python with open cv. The code is as follows:
#finding histogram of an image
import numpy as np
import cv2
img = cv2.imread("cr7.jpg")
gry_img=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
a=np.zeros((1,256),dtype=np.uint8)
#finding how many times a particular pixel intensity repeats
for x in range (0,183): #size of gray_img is (184,275)
for y in range (0,274):
g=gry_ img[x,y]
a[g]=a[g]+1
print(a)
Error is as follows:
IndexError: index 150 is out of bounds for axis 0 with size 1
Since you haven't supplied the image, it is only from guessing that it seems you've made a mistake with the dimensions of the image. Alternatively the issue is entirely with the shape of your results array a.
The code you have is rather fragile, and here is a cleaner way to interact with images. I use an image from opencv's data directory: aero1.jpg.
The code here resolves both potential issues identified above, whichever one it was:
fname = 'aero1.jpg'
im = cv2.imread(fname)
gry_img = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
gry_img.shape
>>> (480, 640)
# note that the image is 640pix wide by 480 tall;
# the numpy array shows the number of rows first.
# rows are in y / columns are in x
# NOTE the results array `a` need only be 1-dimensional, not 2d (1x256)
a=np.zeros((256, ), dtype=np.uint8)
# iterating over all pixels, whatever the shape of the image.
height, width = gry_img.shape
for x in xrange(width):
for y in xrange(height):
g = gry_img[y, x] # NOTE y, x not x, y
a[g] += 1
But note that you could also achieve this easily with a numpy function np.histogram (docs), with slightly careful handling of the bin edges.
histb, bin_edges = np.histogram(gry_img.reshape(-1), bins=xrange(0, 257))
# check that we arrived at the same result as iterating manually:
(a == histb).all()
>>> True

Theano Reshaping

I am unable to clearly comprehend theano's reshape. I have an image matrix of shape:
[batch_size, stack1_size, stack2_size, height, width]
, where there are stack2_size stacks of images, each having stack1_size of channels. I now want to convert them into the following shape:
[batch_size, stack1_size*stack2_size, 1 , height, width]
such that all the stacks will be combined together into one stack of all channels. I am not sure if reshape will do this for me. I see that reshape seems to not lexicographically order the pixels if they are mixed in dimensions in the middle. I have been trying to achieve this with a combination of dimshuffle,reshape and concatenate, but to no avail. I would appreciate some help.
Thanks.
Theano reshape works just like numpy reshape with its default order, i.e. 'C':
ā€˜Cā€™ means to read / write the elements using C-like index order, with
the last axis index changing fastest, back to the first axis index
changing slowest.
Here's an example showing that the image pixels remain in the same order after a reshape via either numpy or Theano.
import numpy
import theano
import theano.tensor
def main():
batch_size = 2
stack1_size = 3
stack2_size = 4
height = 5
width = 6
data = numpy.arange(batch_size * stack1_size * stack2_size * height * width).reshape(
(batch_size, stack1_size, stack2_size, height, width))
reshaped_data = data.reshape([batch_size, stack1_size * stack2_size, 1, height, width])
print data[0, 0, 0]
print reshaped_data[0, 0, 0]
x = theano.tensor.TensorType('int64', (False,) * 5)()
reshaped_x = x.reshape((x.shape[0], x.shape[1] * x.shape[2], 1, x.shape[3], x.shape[4]))
f = theano.function(inputs=[x], outputs=reshaped_x)
print f(data)[0, 0, 0]
main()

Resources