Unusual order of dimensions of an image matrix in python - python-3.x

I downloaded a dataset which contains a MATLAB file called 'depths.mat' which contains a 3-dimensional matrix with the dimensions 480 x 640 x 1449. These are actually 1449 images, each with the dimension 640 x 480. I successfully loaded it into python using the scipy library but the problem is the unusual order of the dimensions. This makes Python think that there are 480 images with the dimensions 640 x 1449. I tried to reshape the matrix in python, but a simple reshape operation did not solve my problem.
Any suggestions are welcome. Thank you.

You misunderstood. You do not want to reshape, you want to transpose it. In MATLAB, arrays are A(x,y,z) while in python they are P[z,y,x]. Make sure that once you load the entire matrix, you change the first and last dimensions.
You can do this with the swapaxes function, but beware! it does not make a copy nor change the the data, just changes how the higher level indices of nparray access the internal memory. Your best chances if you have enough RAM is to make a copy and dump the original.


Is it possible to add tensors of different sizes together in pytorch?

I have an image gradient of size (3, 224, 224) and a patch of (1, 768). is it possible to add this gradient to the patch to get a size of the patch (1, 768)?
Forgive my inquisitiveness. I know pytorch too utilizes broadcasting and I am not sure if I will able to do so with two different tensors in way similar to the line below:
torch.add(a, b)
For example:
The end product would be the same patch on the left with the gradient of an entire image on the right added to it. My understanding is that it’s not possible, but knowledge isn’t bounded.
No. Whether two tensors are broadcastable is defined by the following rules:
Each tensor has at least one dimension.
When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.
Because the second bullet doesn't hold in your example (i.e., 768 != 224, 1 not in {224, 768}), you can't broadcast the add. If you have some meaningful way to reshape your gradients, you might be able to.
I figured out to do it myself. I divided the image gradient (right) into 16 x 16 patches, created a loop that adds each patch to the original image patch (left). This way, I was able to add a 224 x 224 image gradient into a 16 x 16 patch. I just wanted to see what would happen if I do such

Resize float32 array with K-nearest neighbour in the same way as scipy.misc.imresize or tf.image.resize

I am to create a network using much of the same characteristics as pix2pix: https://github.com/affinelayer/pix2pix-tensorflow.
My adjustment is that I will not be using images, but matrices with float32 values. This introduces a lot of problems and there is a lot to rewrite. Most of the code can easily be rewritten, but I've encountered a problem.
The network has a separable convolutional layer where the image is resized using tf.image.resize. This function uses different resize methods, such as K-Nearest Neighbors, and I don't want to loose that feature. Both scipy.misc.imresize and tf.image.resize are limited to int values and does not support any higher than uint16. If I were to transform the data to said formats, I will loose precision.
Is there a way to create this efficiently in numpy (or any equivalent) supporting float32?
Sorry for not introducing any code, but the problem more or less explains itself without (I hope).
Try using scipy.ndimage.interpolation.zoom. This works for float number images.
Use it as below:
image = scipy.ndimage.interpolation.zoom(image, 0.5)

python incorrect size with getsizeof() and .nbytes with nested lists

I apologise if this is a duplicate issue, but I've been having some issues with .nsize and sys.getsizeof().
In particular, I have a list which contains numpy arrays, each array is a 3D representation of an image (row, column, RGB) and each of these images have different dimensions.
There are over 4000 images, and this may increase in the future, as I plan to use them for machine learning.
When I use .nsize with one image, I get the correct size, but when I try to evaluate the whole lot, I get an incorrect size:
# size of image 1 in bytes
print("size of first image: %d bytes" % images[0].nbytes)
# size of all images in bytes
print("total size of all images: %d bytes" % images.nbytes)
size of first image: 60066 bytes
total size of all images: 36600 bytes
Are the only ways around this to either loop through all the images or change to a monstrous 4D array instead of a list of 3D arrays? Is there another function which better evaluates size for this kind of nested setup?
I'm running Python 3.6.7.
Try running images.dtype. What does it return? If it's dtype('O'), that explains your problem: images is not a list, but is instead a Numpy array of type object, which is generally a Bad Idea™️. Technically, it'll be an 1D array holding a bunch of 3D arrays.
Numpy arrays are best suited to use with numerical data. They're flexible enough to hold arbitrary Python objects, but it greatly impairs both their functionality and their efficiency. Unless you have a clear reason why in mind, you should generally just use a plain Python list [] in these situations.
You may actually be best off converting images to a 4D array, as this is the only way that images.nbytes will work correctly. You can't do this if your images are all different sizes, but given that they all have the same shape (x, y, z) it's actually pretty straightforward:
images = np.array([a for a in images])
Now images.shape will be (n, x, y, z), where n is the total number of images. You can access the 3D array that represents the ith image by just indexing images:
image_i = images[i]
Alternatively, you can convert images to a normal Python list:
images = images.to_list()
If you don't want to bother with any of those conversions, you can always get the size of all the subarrays via iteration:
totalsize = sum(arr.nbytes for arr in images)

Plotting Hidden Weights

I've had an interest for neural networks for a while now and have just started following the deep learning tutorials. I have what I hope is a relatively straight forward question that I am hoping someone may answer.
In the multilayer perception tutorial, I am interested in seeing the state of the network at different layers (something similar to what is seen in this paper: http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/247 ). For instance, I am able to write out the weights of the hidden layer using:
W_open = open('mlp_w_pickle.pkl','w')
cPickle.dump(classifier.hiddenLayer.W.get_value(borrow=True), W_open, -1)
When I plot this using the utils.py tile plotting, I get the following pretty plot [edit: pretty plot rmoved as I dont have enough rep].
If I wanted to plot the weights at the logRegressionLayer, such that
cPickle.dump(classifier.logRegressionLayer.W.get_value(borrow=True), W_open, -1)
what would I actually have to do? The above doesn't seem to work - it returns a 2darray of shape (500,10). I understand that the 500 relates to the number of hidden units. The paragraph on the Miscellaneous page:
Plotting the weights is a bit more tricky. We have n_hidden hidden
units, each of them corresponding to a column of the weight matrix. A
column has the same shape as the visible, where the weight
corresponding to the connection with visible unit j is at position j.
Therefore, if we reshape every such column, using numpy.reshape, we
get a filter image that tells us how this hidden unit is influenced by
the input image.
confuses me alittle. I am unsure exactly how I would string it together.
Thanks to all - sorry if the question is confusing!
You could plot them just the like the weights in the first layer but they will not necessarily make much sense.
Consider the weights in the first layer of a neural network. If the inputs have size 784 (e.g. MNIST images) and there are 2000 hidden units in the first layer then the first layer weights are a matrix of size 784x2000 (or maybe the transpose depending on how it's implemented). Those weights can be plotted as either 784 patches of size 2000 or, more usually, 2000 patches of size 784. In this latter case each patch can be plotted as a 28x28 image which directly ties back to the original inputs and thus is interpretable.
For you higher level regression layer, you could plot 10 tiles, each of size 500 (e.g. patches of size 22x23 with some padding to make it rectangular), or 500 patches of size 10. Either might illustrate some patterns that are being found but it may be difficult to tie those patterns back to the original inputs.

How to average a stack of images together using Pillow? [duplicate]

For example, I have 100 pictures whose resolution is the same, and I want to merge them into one picture. For the final picture, the RGB value of each pixel is the average of the 100 pictures' at that position. I know the getdata function can work in this situation, but is there a simpler and faster way to do this in PIL(Python Image Library)?
Let's assume that your images are all .png files and they are all stored in the current working directory. The python code below will do what you want. As Ignacio suggests, using numpy along with PIL is the key here. You just need to be a little bit careful about switching between integer and float arrays when building your average pixel intensities.
import os, numpy, PIL
from PIL import Image
# Access all PNG files in directory
imlist=[filename for filename in allfiles if filename[-4:] in [".png",".PNG"]]
# Assuming all images are the same size, get dimensions of first image
# Create a numpy array of floats to store the average (assume RGB images)
# Build up average pixel intensities, casting each image as an array of floats
for im in imlist:
# Round values in array and cast as 8-bit integer
# Generate, save and preview final image
The image below was generated from a sequence of HD video frames using the code above.
I find it difficult to imagine a situation where memory is an issue here, but in the (unlikely) event that you absolutely cannot afford to create the array of floats required for my original answer, you could use PIL's blend function, as suggested by #mHurley as follows:
# Alternative method using PIL blend function
for i in xrange(1,N):
You could derive the correct sequence of alpha values, starting with the definition from PIL's blend function:
out = image1 * (1.0 - alpha) + image2 * alpha
Think about applying that function recursively to a vector of numbers (rather than images) to get the mean of the vector. For a vector of length N, you would need N-1 blending operations, with N-1 different values of alpha.
However, it's probably easier to think intuitively about the operations. At each step you want the avg image to contain equal proportions of the source images from earlier steps. When blending the first and second source images, alpha should be 1/2 to ensure equal proportions. When blending the third with the the average of the first two, you would like the new image to be made up of 1/3 of the third image, with the remainder made up of the average of the previous images (current value of avg), and so on.
In principle this new answer, based on blending, should be fine. However I don't know exactly how the blend function works. This makes me worry about how the pixel values are rounded after each iteration.
The image below was generated from 288 source images using the code from my original answer:
On the other hand, this image was generated by repeatedly applying PIL's blend function to the same 288 images:
I hope you can see that the outputs from the two algorithms are noticeably different. I expect this is because of accumulation of small rounding errors during repeated application of Image.blend
I strongly recommend my original answer over this alternative.
One can also use numpy mean function for averaging. The code looks better and works faster.
Here the comparison of timing and results for 700 noisy grayscale images of faces:
def average_img_1(imlist):
# Assuming all images are the same size, get dimensions of first image
# Create a numpy array of floats to store the average (assume RGB images)
# Build up average pixel intensities, casting each image as an array of floats
for im in imlist:
out = Image.fromarray(arr)
return out
def average_img_2(imlist):
# Alternative method using PIL blend function
N = len(imlist)
for i in xrange(1,N):
return avg
def average_img_3(imlist):
# Alternative method using numpy mean function
images = np.array([np.array(Image.open(fname)) for fname in imlist])
arr = np.array(np.mean(images, axis=(0)), dtype=np.uint8)
out = Image.fromarray(arr)
return out
100 loops, best of 3: 362 ms per loop
100 loops, best of 3: 340 ms per loop
100 loops, best of 3: 311 ms per loop
BTW, the results of averaging are quite different. I think the first method lose information during averaging. And the second one has some artifacts.
in case anybody is interested in a blueprint numpy solution (I was actually looking for it), here's the code:
mean_frame = np.mean(([frame for frame in frames]), axis=0)
I would consider creating an array of x by y integers all starting at (0, 0, 0) and then for each pixel in each file add the RGB value in, divide all the values by the number of images and then create the image from that - you will probably find that numpy can help.
I ran into MemoryErrors when trying the method in the accepted answer. I found a way to optimize that seems to produce the same result. Basically, you blend one image at a time, instead of adding them all up and dividing.
avg = Image.open(images_to_blend[0])
for im in images_to_blend: #assuming your list is filenames, not images
img = Image.open(im)
avg = Image.blend(avg, img, 1/N)
This does two things, you don't have to have two very dense copies of the image while you're turning the image into an array, and you don't have to use 64-bit floats at all. You get similarly high precision, with smaller numbers. The results APPEAR to be the same, though I'd appreciate if someone checked my math.
