I am saving a matrix using python csv writer in the following way:
def write_to_disk(csv_path, mtx_norm, cell_ids, gene_symbols):
print('writing the results to disk')
with open(csv_path,'w', encoding='utf8') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(["", cell_ids])
for idx, row in enumerate(mtx_norm):
writer.writerow([gene_symbols[idx], row])
I have a plenty of zeros in the matrix and what csv writer is doing is contracting all of the spaces where there are many similar numbers (zeros in this case) saving in place just ... character. So, it is saved as a bunch of arrays with various length. Then, I am having trouble opening it up and using. I can open non-contracted csv in the following way:
data = np.genfromtxt(open(path_to_data, "r"), delimiter=",")
But not with these saved by csv writer files. Is there a way to avoid this contraction and/or open both types of csv files converting them into one format - numpy 2D array without these ... items?
If you work with numpy arrays you should consider to use numpy.savetxt() function instead https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.savetxt.html. For example:
import numpy as np
a = np.random.randint(0, 10, (10, 10), dtype=int)
a[1:5, 1:8] = 0
np.savetxt('1.txt', a, fmt='%d', delimiter=',')
File content:
0,8,5,8,0,7,5,8,0,9
0,0,0,0,0,0,0,0,3,4
5,0,0,0,0,0,0,0,7,3
9,0,0,0,0,0,0,0,7,5
7,0,0,0,0,0,0,0,6,9
9,9,9,9,2,7,5,0,0,7
4,6,9,0,7,5,2,4,7,5
2,5,1,9,4,9,3,5,3,7
3,3,6,8,5,7,5,8,5,5
9,4,1,2,0,9,2,2,8,2
You can load the data with numpy.loadtxt() https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html:
a = np.loadtxt('1.txt', delimiter=',', dtype=int)
Then a is:
array([[0, 8, 5, 8, 0, 7, 5, 8, 0, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 3, 4],
[5, 0, 0, 0, 0, 0, 0, 0, 7, 3],
[9, 0, 0, 0, 0, 0, 0, 0, 7, 5],
[7, 0, 0, 0, 0, 0, 0, 0, 6, 9],
[9, 9, 9, 9, 2, 7, 5, 0, 0, 7],
[4, 6, 9, 0, 7, 5, 2, 4, 7, 5],
[2, 5, 1, 9, 4, 9, 3, 5, 3, 7],
[3, 3, 6, 8, 5, 7, 5, 8, 5, 5],
[9, 4, 1, 2, 0, 9, 2, 2, 8, 2]])
Related
>>> b
tensor([[ 6, 7, 12, 7, 8],
[ 0, 1, 6, 1, 2],
[ 0, 1, 6, 1, 2],
[ 2, 3, 8, 3, 4],
[ 2, 3, 8, 3, 4],
[ 2, 3, 8, 3, 4],
[10, 11, 16, 11, 12],
[-1, 0, 5, 0, 1],
[-2, -1, 4, -1, 0],
[ 2, 3, 8, 3, 4],
[ 1, 2, 7, 2, 3],
[ 1, 2, 7, 2, 3],
[ 2, 3, 8, 3, 4],
[ 5, 6, 11, 6, 7],
[-2, -1, 4, -1, 0],
[-3, -2, 3, -2, -1],
[-5, -4, 1, -4, -3],
[ 1, 2, 7, 2, 3],
[12, 13, 18, 13, 14],
[-3, -2, 3, -2, -1],
[ 2, 3, 8, 3, 4],
[ 3, 4, 9, 4, 5],
[10, 11, 16, 11, 12],
[-6, -5, 0, -5, -4],
[ 9, 10, 15, 10, 11],
[12, 13, 18, 13, 14],
[-3, -2, 3, -2, -1],
[-2, -1, 4, -1, 0],
[-4, -3, 2, -3, -2],
[-1, 0, 5, 0, 1],
[ 2, 3, 8, 3, 4],
[ 4, 5, 10, 5, 6],
[-1, 0, 5, 0, 1],
[ 5, 6, 11, 6, 7],
[ 7, 8, 13, 8, 9],
[ 3, 4, 9, 4, 5],
[ 2, 3, 8, 3, 4],
[ 4, 5, 10, 5, 6],
[-4, -3, 2, -3, -2],
[ 2, 3, 8, 3, 4],
[-1, 0, 5, 0, 1],
[ 2, 3, 8, 3, 4],
[ 4, 5, 10, 5, 6],
[ 9, 10, 15, 10, 11],
[-1, 0, 5, 0, 1],
[-4, -3, 2, -3, -2],
[ 0, 1, 6, 1, 2],
[ 4, 5, 10, 5, 6],
[ 6, 7, 12, 7, 8],
[-2, -1, 4, -1, 0]])
>>> torch.mode(b, 0)
torch.return_types.mode(
values=tensor([2, 3, 8, 3, 4]),
indices=tensor([20, 20, 20, 20, 20]))
i don't know why indeces is all equal to 20
the details of torch.mode description as below
https://pytorch.org/docs/stable/generated/torch.mode.html#torch.mode
torch.mode(input, dim=- 1, keepdim=False, *, out=None)
Returns a namedtuple (values, indices) where values is the mode value of each row of the input tensor in the given dimension dim, i.e. a value which appears most often in that row, and indices is the index location of each mode value found.
By default, dim is the last dimension of the input tensor.
If keepdim is True, the output tensors are of the same size as input except in the dimension dim where they are of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensors having 1 fewer dimension than input.
It is because of the way the tensor b is. The row [2, 3, 8, 3, 4] is repeated a lot, so in each column the modes are respectively [2, 3, 8, 3, 4] and more importantly, the mode indices will be equal precisely because the modes occur together; if you look at the row with index 20 (i.e., the 21st row), it is exactly [2, 3, 8, 3, 4].
I am assuming that you constructed b similar to the example in torch.mode which I believe is a poor choice for an example as it leads to confusion like the one you are having.
Instead, consider the following:
>>> b = torch.randint(4, (5, 7))
>>> b
tensor([[0, 0, 0, 2, 0, 0, 2],
[0, 3, 0, 0, 2, 0, 1],
[2, 2, 2, 0, 0, 0, 3],
[2, 2, 3, 0, 1, 1, 0],
[1, 1, 0, 0, 2, 0, 2]])
>>> torch.mode(b, 0)
torch.return_types.mode(
values=tensor([0, 2, 0, 0, 0, 0, 2]),
indices=tensor([1, 3, 4, 4, 2, 4, 4]))
In the above, b has different modes in each column which are respectively [0, 2, 0, 0, 0, 0, 2] and the indices returned by torch.mode are [1, 3, 4, 4, 2, 4, 4]. This makes sense because, for example, in the first column, 0 is the most common element and there is a 0 at index 1. Similarly, in the second column, 2 is the most common element and there is a 2 at index 3. This is true for all columns. If you want the modes of the rows instead, you would do torch.mode(b, 1).
I have the indices of a 2D array. I want to partition the indices such that the corresponding entries form blocks (block size is given as input m and n).
For example, if the indices are as given below
(array([0, 0, 0, 0, 1, 1, 1, 1, 6, 6, 6, 6, 7, 7, 7, 7 ]), array([0, 1, 7, 8, 0,1,7,8, 0,1,7,8, 0, 1, 7, 8]))
for the original matrix (from which the indices are generated)
array([[3, 4, 2, 0, 1, 1, 0, 2, 4],
[1, 3, 2, 0, 0, 1, 0, 4, 0],
[1, 0, 0, 1, 1, 0, 1, 1, 3],
[0, 0, 0, 3, 3, 0, 4, 0, 4],
[4, 3, 4, 2, 1, 1, 0, 0, 4],
[0, 1, 0, 4, 4, 2, 2, 2, 1],
[2, 4, 0, 1, 1, 0, 0, 2, 1],
[0, 4, 1, 3, 3, 2, 3, 2, 4]])
and if the block size is (2,2), then the blocks should be
[[3, 4],
[1, 3]]
[[2, 4]
[4, 0]]
[[2, 4]
[0, 4]]
[[2, 1]
[2, 4]]
Any help to do it efficiently?
Does this help? A is your matrix.
row_size = 2
col_size = 3
for i in range(A.shape[0] // row_size):
for j in range(A.shape[1] // col_size):
print(A[row_size*i:row_size*i + row_size, col_size*j:col_size*j + col_size])
I am kind of new with numpy and torch and I am struggling to understand what to me seems the most basic operations.
For instance, given this tensor:
A = tensor([[[6, 3, 8, 3],
[1, 0, 9, 9]],
[[4, 9, 4, 1],
[8, 1, 3, 5]],
[[9, 7, 5, 6],
[3, 7, 8, 1]]])
And this other tensor:
B = tensor([1, 0, 1])
I would like to use B as indexes for A so that I get a 3 by 4 tensor that looks like this:
[[1, 0, 9, 9],
[4, 9, 4, 1],
[3, 7, 8, 1]]
Thanks!
Ok, my mistake was to assume this:
A[:, B]
is equal to this:
A[[0, 1, 2], B]
Or more generally the solution I wanted is:
A[range(B.shape[0]), B]
Alternatively, you can use torch.gather:
>>> indexer = B.view(-1, 1, 1).expand(-1, -1, 4)
tensor([[[1, 1, 1, 1]],
[[0, 0, 0, 0]],
[[1, 1, 1, 1]]])
>>> A.gather(1, indexer).view(len(B), -1)
tensor([[1, 0, 9, 9],
[4, 9, 4, 1],
[3, 7, 8, 1]])
I'm hoping to calculate the distances between two points in a (Nx1) numpy array, i.e.:
a = [2, 5, 5, 12, 5, 3, 10, 8, 1, 3, 1]
I'm hoping to get a square matrix with the (normed) distances between each point:
sq = [[0, |2-5|, |2-5|, |2-12|, |2-5|, ...],
[|5-2|, 0, ...], ...]
So far, what I have doesn't work, giving wrong values for the square distance matrix. Is there a way to (I'm not sure if it is the correct term?) vectorise my method too, but am unfamiliar with the advanced indexing.
What I currently have is the following:
sq = np.zero((len(a), len(a))
for i in a:
for j in len(a+1):
sq[i,j] = np.abs(a[:,0] - a[:,0])
Would appreciate any help!
I think that by exploiting numpy broadcasting, this is the faster solution:
a = [2, 5, 5, 12, 5, 3, 10, 8, 1, 3, 1]
a = np.array(a).reshape(-1,1)
sq = np.abs(a.T-a)
sq
array([[ 0, 3, 3, 10, 3, 1, 8, 6, 1, 1, 1],
[ 3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4],
[ 3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4],
[10, 7, 7, 0, 7, 9, 2, 4, 11, 9, 11],
[ 3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4],
[ 1, 2, 2, 9, 2, 0, 7, 5, 2, 0, 2],
[ 8, 5, 5, 2, 5, 7, 0, 2, 9, 7, 9],
[ 6, 3, 3, 4, 3, 5, 2, 0, 7, 5, 7],
[ 1, 4, 4, 11, 4, 2, 9, 7, 0, 2, 0],
[ 1, 2, 2, 9, 2, 0, 7, 5, 2, 0, 2],
[ 1, 4, 4, 11, 4, 2, 9, 7, 0, 2, 0]])
With numpy the following line might be the shortest to your result:
import numpy as np
a = np.array([2, 5, 5, 12, 5, 3, 10, 8, 1, 3, 1])
sq = np.array([np.array([(np.abs(i - j)) for j in a]) for i in a])
print(sq)
The following would give you the desired result without numpy.
a = [2, 5, 5, 12, 5, 3, 10, 8, 1, 3, 1]
sq = []
for i in a:
distances = []
for j in a:
distances.append(abs(i-j))
sq.append(distances)
print(sq)
With both, the result comes as:
[[0, 3, 3, 10, 3, 1, 8, 6, 1, 1, 1], [3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4], [3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4], [10, 7, 7, 0, 7, 9, 2, 4, 11, 9, 11], [3, 0, 0, 7, 0, 2, 5, 3, 4, 2, 4], [1, 2, 2, 9, 2, 0, 7, 5, 2, 0, 2], [8, 5, 5, 2, 5, 7, 0, 2, 9, 7, 9], [6, 3, 3, 4, 3, 5, 2, 0, 7, 5, 7], [1, 4, 4, 11, 4, 2, 9, 7, 0, 2, 0], [1, 2, 2, 9, 2, 0, 7, 5, 2, 0, 2], [1, 4, 4, 11, 4, 2, 9, 7, 0, 2, 0]]
There may be more than one way to do this but one way is to only use numpy operations instead of loops because internally python does lots of optimizations for numpy arrays.
One way to do only using array operations is to create an NxN matrix by repeating the original matrix (a) N times.
This will create a matrix N times.
E.g:
a = [1, 2, 3]
b = [[1 , 2, 3], [1 , 2, 3], [1 , 2, 3]]
Then you can do a matrix, array operation of
ans = abs(b - a)
Assuming a is numpy array, you can do:
b = np.repeat(a,a.shape).reshape((a.shape[0],a.shape[0]))
ans = np.abs(b - a)
I am trying to replace values in a 2 dimensional array by accessing values in another array. Last line of my array is repeated even at places where I did not replace it.
A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
AA = [[0] * (len(A)+2)] * (len(A)+2)
print(AA)
for r in range(len(A)):
for c in range(len(A[r])):
AA[r+1][c+1] = A[r][c]
print(AA[r+1][c+1], " ")
print(AA)
I expect an output like:
[[0, 0, 0, 0, 0], [0, 1, 2, 3, 0], [0, 4, 5, 6, 0], [0, 7, 8, 9, 0], [0, 0, 0, 0, 0]]
But actual output is:
[[0, 7, 8, 9, 0], [0, 7, 8, 9, 0], [0, 7, 8, 9, 0], [0, 7, 8, 9, 0], [0, 7, 8, 9, 0]]
The guilty is your AA declaration. It's not a good way to declare a list by duplicating the elements (not creating a proper matrix). More details in this discussion.
Try:
AA = [[0 for _ in range(len(A)+2)] for _ in range(len(A) + 2)]
Instead of:
AA = [[0]*(len(A)+2)]*(len(A)+2)
And that should work fine !