indexing through an array in blocks - python-3.x

I have an image represented as an array (img), and I'd like to make many copies of the image, and in each copy zero out different squares of the image (in the first copy zero out 0:2,0:2 in the next copy zero out 0:2, 3:5 etc). I've used np.broadcast_to to create multiple copies of the image, but I'm having trouble indexing through the multiple copies of the image, and the multiple locations within the images to zero out squares within the image.
I think I'm looking for something like skimage.util.view_as_blocks, but I need to be able to write to the original array, not just read.
The idea behind this is to pass all the copies of the image through a neural network. The copy that performs the worst should be the one with the class (picture) I am trying to identify in its zero'd out location.
img = np.arange(10*10).reshape(10,10)
img_copies = np.broadcast_to(img, [100, 10, 10])
z = np.zeros(2*2).reshape(2,2)
Thanks

I think I have cracked it! Here's an approach using masking along a 6D reshaped array -
def block_masked_arrays(img, BSZ):
# Store shape params
m = img.shape[0]//BSZ
n = m**2
# Make copies of input array such that we replicate array along first axis.
# Reshape such that the block sizes are exposed by going higher dimensional.
img3D = np.tile(img,(n,1,1)).reshape(m,m,m,BSZ,m,BSZ)
# Create a square matrix with all ones except on diagonals.
# Reshape and broadcast it to match the "blocky" reshaped input array.
mask = np.eye(n,dtype=bool).reshape(m,m,m,1,m,1)
# Use the mask to mask out the appropriate blocks. Reshape back to 3D.
img3D[np.broadcast_to(mask, img3D.shape)] = 0
img3D.shape = (n,m*BSZ,-1)
return img3D
Sample run -
In [339]: img
Out[339]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [340]: block_masked_arrays(img, BSZ=2)
Out[340]:
array([[[ 0, 0, 2, 3],
[ 0, 0, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[ 0, 1, 0, 0],
[ 4, 5, 0, 0],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 0, 0, 10, 11],
[ 0, 0, 14, 15]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 0, 0],
[12, 13, 0, 0]]])

Related

creating tensor by composition of smaller tensors

I would like to create a 4x4 tensor that is composed of four smaller 2x2 tensors in this manner:
The tensor I would like to create:
in_t = torch.tensor([[14, 7, 6, 2],
[ 4, 8, 11, 1],
[ 3, 5, 9, 10],
[12, 15, 16, 13]])
I would like to create this tensor composed from these four smaller tensors:
a = torch.tensor([[14, 7], [ 4, 8]])
b = torch.tensor([[6, 2], [11, 1]])
c = torch.tensor([[3, 5], [12, 15]])
d = torch.tensor([[9, 10], [16, 13]])
I have tried to use torch.cat like this:
mm_ab = torch.cat((a,b,c,d), dim=0)
but I end up with an 8x2 tensor.
You can control the layout of your tensor and achieve the desired result with a combination of torch.transpose and torch.reshape. You can perform an outer transpose followed by an inner transpose:
>>> stack = torch.stack((a,b,c,d))
tensor([[[14, 7],
[ 4, 8]],
[[ 6, 2],
[11, 1]],
[[ 3, 5],
[12, 15]],
[[ 9, 10],
[16, 13]]])
Reshape-tranpose-reshape-transpose-reshape:
>>> stack.reshape(4,2,-1).transpose(0,1).reshape(-1,2,4).transpose(0,1).reshape(-1,4)
tensor([[14, 7, 6, 2],
[ 4, 8, 11, 1],
[ 3, 5, 9, 10],
[12, 15, 16, 13]])
Essentially, reshapes allow you to group and view your tensor differently while transpose operation will alter its layout (it won't remain contiguous) meaning you can achieve the desired output.
If you concatenate all your tensors this way below, you will get exactly your output:
tensor a
tensor b
tensor c
tensor d
You really started with a good and easy approach, this is the completion of your attempt:
p1 = torch.concat((a,b),axis=1)
p2 = torch.concat((c,d),axis=1)
p3 = torch.concat((p1,p2),axis=0)
print(p3)
#output
tensor([[14, 7, 6, 2],
[ 4, 8, 11, 1],
[ 3, 5, 9, 10],
[12, 15, 16, 13]])

Pytorch: Most computationally and memory efficient way to make a series of concatenations from extracting tensor rows?

Say that this is my sample tensor
sample = torch.tensor(
[[2, 7, 3, 1, 1],
[9, 5, 8, 2, 5],
[0, 4, 0, 1, 4],
[5, 4, 9, 0, 0]]
)
I want to have a new tensor, which will consist of concatenations of 2 rows from the sample tensor.
So I have a tensor which contains pairs of the row numbers that I want concatenated into a single row for the new tensor
cat_indices = torch.tensor([[0, 1], [1, 2], [0, 2], [2, 3]])
The current method I am using is this
torch.cat((sample[cat_indices[:,0]], sample[cat_indices[:,1]]), dim=1)
Which gives the desired result
tensor([[2, 7, 3, 1, 1, 9, 5, 8, 2, 5],
[9, 5, 8, 2, 5, 0, 4, 0, 1, 4],
[2, 7, 3, 1, 1, 0, 4, 0, 1, 4],
[0, 4, 0, 1, 4, 5, 4, 9, 0, 0]])
Is this the most memory and computationally efficient method of doing this? I am not sure because I am making two calls to cat_indices, and then I am doing a concatenation operation.
I feel that there should be a way to do this via some sort of view. Perhaps advanced indexing. I've tried things like sample[cat_indices[:,0], cat_indices[:,1]] or sample[cat_indices[0], cat_indices[1]] but I can't make the view come out right.
What you have should be pretty fast. An alternative is
sample[cat_indices].reshape(cat_indices.shape[0],-1)
You would have to benchmark the performance on your machine though to see which is better.

Sampling from a 2d numpy array

I was wondering if there was a reasonably efficient way of sampling from a 2d numpy array. If I have a generic array:
dims = (4,4)
test_array = np.arange(np.prod(dims)).reshape(*dims)
test_array
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Then I'd like to randomly set, say, two elements from it to a specific value (let's say 100). I've tried creating an indexing array and then applying that:
sample_from = np.random.randint(low=0, high=5, size=(2,2))
sample_from
array([[0, 2],
[1, 1]])
But if I try using this to index, it gives me a slightly unexpected answer:
test_array[sample_from]
array([[[ 0, 1, 2, 3],
[ 8, 9, 10, 11]],
[[ 4, 5, 6, 7],
[ 4, 5, 6, 7]]])
What I would have expected (and the kind of result I'd like) is if I'd just entered the indexing array directly:
test_array[[0,2],[1,1]] = 100
test_array
giving:
array([[ 0, 100, 2, 3],
[ 4, 5, 6, 7],
[ 8, 100, 10, 11],
[ 12, 13, 14, 15]])
Any help gratefully received.
You could use np.random.choice + np.unravel_index to assign directly to your array.
test_array[
np.unravel_index(np.random.choice(np.prod(dims), 2, replace=False), dims)
] = 100

CSV Writer omits many items when saving a matrix with many zeros

I am saving a matrix using python csv writer in the following way:
def write_to_disk(csv_path, mtx_norm, cell_ids, gene_symbols):
print('writing the results to disk')
with open(csv_path,'w', encoding='utf8') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(["", cell_ids])
for idx, row in enumerate(mtx_norm):
writer.writerow([gene_symbols[idx], row])
I have a plenty of zeros in the matrix and what csv writer is doing is contracting all of the spaces where there are many similar numbers (zeros in this case) saving in place just ... character. So, it is saved as a bunch of arrays with various length. Then, I am having trouble opening it up and using. I can open non-contracted csv in the following way:
data = np.genfromtxt(open(path_to_data, "r"), delimiter=",")
But not with these saved by csv writer files. Is there a way to avoid this contraction and/or open both types of csv files converting them into one format - numpy 2D array without these ... items?
If you work with numpy arrays you should consider to use numpy.savetxt() function instead https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.savetxt.html. For example:
import numpy as np
a = np.random.randint(0, 10, (10, 10), dtype=int)
a[1:5, 1:8] = 0
np.savetxt('1.txt', a, fmt='%d', delimiter=',')
File content:
0,8,5,8,0,7,5,8,0,9
0,0,0,0,0,0,0,0,3,4
5,0,0,0,0,0,0,0,7,3
9,0,0,0,0,0,0,0,7,5
7,0,0,0,0,0,0,0,6,9
9,9,9,9,2,7,5,0,0,7
4,6,9,0,7,5,2,4,7,5
2,5,1,9,4,9,3,5,3,7
3,3,6,8,5,7,5,8,5,5
9,4,1,2,0,9,2,2,8,2
You can load the data with numpy.loadtxt() https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html:
a = np.loadtxt('1.txt', delimiter=',', dtype=int)
Then a is:
array([[0, 8, 5, 8, 0, 7, 5, 8, 0, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 3, 4],
[5, 0, 0, 0, 0, 0, 0, 0, 7, 3],
[9, 0, 0, 0, 0, 0, 0, 0, 7, 5],
[7, 0, 0, 0, 0, 0, 0, 0, 6, 9],
[9, 9, 9, 9, 2, 7, 5, 0, 0, 7],
[4, 6, 9, 0, 7, 5, 2, 4, 7, 5],
[2, 5, 1, 9, 4, 9, 3, 5, 3, 7],
[3, 3, 6, 8, 5, 7, 5, 8, 5, 5],
[9, 4, 1, 2, 0, 9, 2, 2, 8, 2]])

Why does the HoughLinesP output a 3D array, instead of a 2D array?

I was working with the function (Python, Numpy, OpenCV3) over an image, and this is the sample output I have -
[[[539 340 897 538]]
[[533 340 877 538]]
[[280 460 346 410]]
[[292 462 353 411]]
[[540 343 798 492]]]
Its size is (5,1,4)
I am trying to understand what scenario will the function output something like (5,2,4) or (5,3,4). But I can't think of any and all the images I have worked with right now, it's a 3D array with the number of columns as 1.
Wouldn't just a 2D array be sufficient and perhaps more efficient?
I asked on the OpenCV Q&A and got the following response -
opencv is a c++ library, and the python wrappers are auto-generated
from some scripts, so in c++ we have:
vector lines; to hold the hough results.
now unfortunately , Vec4i is a descendant of Matx , which is actually
a 2d thing, so in python you get:
[ #one for the vector
[ #one for the 1st dim of Vec4i (1, pretty
useless, admittedly :)
[ #one for the 2nd dim of Vec4i (4 elements)
again, i think, you'll just have to live with it.
If you dont want the extra dimension and since it is 1, just use squeeze
>>> a = np.arange(5*4).reshape(5,1,4)
>>> a
array([[[ 0, 1, 2, 3]],
[[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11]],
[[12, 13, 14, 15]],
[[16, 17, 18, 19]]])
>>> a.squeeze()
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
Sometimes the extra axis comes in handy
a.swapaxes(1,2)
array([[[ 0],
[ 1],
[ 2],
[ 3]],
... Snip
[[16],
[17],
[18],
[19]]])
in light of the update and assuming the first is needed either of these return the same results assuming the extra dimension isn't needed.
>>> a[0].squeeze()
array([0, 1, 2, 3])
>>> a.squeeze()[0]
array([0, 1, 2, 3])

Resources