How to select rows from two different Numpy arrays conditionally? - python-3.x

I have two Numpy 2D arrays and I want to get a single 2D array by selecting rows from the original two arrays. The selection is done conditionally. Here is the simple Python way,
import numpy as np
a = np.array([4, 0, 1, 2, 4])
b = np.array([0, 4, 3, 2, 0])
y = np.array([[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 0, 1, 1],
[0, 0, 1, 0]])
x = np.array([[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 1, 0, 0],
[1, 1, 1, 1],
[0, 0, 1, 0]])
z = np.empty(shape=x.shape, dtype=x.dtype)
for i in range(x.shape[0]):
z[i] = y[i] if a[i] >= b[i] else x[i]
print(z)
Looking at numpy.select, I tried, np.select([a >= b, a < b], [y, x], -1) but got ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (5,) and arg 1 with shape (5, 4).
Could someone help me write this in a more efficient Numpy manner?

This should do the trick, but it would be helpful if you could show an example of your expected output:
>>> np.where((a >= b)[:, None], y, x)
array([[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 1, 0, 0],
[0, 0, 1, 1],
[0, 0, 1, 0]])

Related

accessing elements through sparse matrix in scipy

I have below code in python
# dense to sparse
from numpy import array
from scipy.sparse import csr_matrix
# create dense matrix
A = array([[1, 0, 0, 1, 0, 0], [0, 0, 2, 0, 0, 1], [0, 0, 0, 2, 0, 0]])
print(A)
# convert to sparse matrix (CSR method)
S = csr_matrix(A)
print(S)
# reconstruct dense matrix
B = S.todense()
print(B)
Above code when I have following statement I have
print(B[0])
I have following output:
[[1 0 0 1 0 0]]
How can I loop through the above values i.e, 1, 0, 0, 1, 0, 0, 0
In [2]: from scipy.sparse import csr_matrix
...: # create dense matrix
...: A = np.array([[1, 0, 0, 1, 0, 0], [0, 0, 2, 0, 0, 1], [0, 0, 0, 2, 0, 0]])
...: S = csr_matrix(A)
In [3]: A
Out[3]:
array([[1, 0, 0, 1, 0, 0],
[0, 0, 2, 0, 0, 1],
[0, 0, 0, 2, 0, 0]])
In [4]: S
Out[4]:
<3x6 sparse matrix of type '<class 'numpy.int64'>'
with 5 stored elements in Compressed Sparse Row format>
S.toarray() or S.A for short, makes a dense ndarray:
In [5]: S.A
Out[5]:
array([[1, 0, 0, 1, 0, 0],
[0, 0, 2, 0, 0, 1],
[0, 0, 0, 2, 0, 0]])
todense makes a np.matrix object, which is always 2d
In [6]: S.todense()
Out[6]:
matrix([[1, 0, 0, 1, 0, 0],
[0, 0, 2, 0, 0, 1],
[0, 0, 0, 2, 0, 0]])
In [7]: S.todense()[0]
Out[7]: matrix([[1, 0, 0, 1, 0, 0]])
In [9]: S.todense()[0][0]
Out[9]: matrix([[1, 0, 0, 1, 0, 0]])
To iterate by 'columns' we have to do something like:
In [10]: [S.todense()[0][:,i] for i in range(3)]
Out[10]: [matrix([[1]]), matrix([[0]]), matrix([[0]])]
In [11]: [S.todense()[0][0,i] for i in range(3)]
Out[11]: [1, 0, 0]
There is a shortcut for converting a 1d row np.matrix to a 1d ndarray:
In [12]: S.todense()[0].A1
Out[12]: array([1, 0, 0, 1, 0, 0])
Get a 1d array from a "row" of a ndarray is simpler:
In [14]: S.toarray()[0]
Out[14]: array([1, 0, 0, 1, 0, 0])
np.matrix is generally discouraged, as a remnant from a time when the transition from MATLAB was more important. Now that fact that sparse is modeled on np.matrix (but not subclassed) is the main reason for keeping np.matrix. Row and column sums of a sparse matrix return dense matrix.

Multiclass vs. multilabel fitting

In scikit-learn tutorials, I found the following paragraphs in the section 'Multiclass vs. multilabel fitting'.
I couldn't understand why the following codes generate the given results.
First
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import LabelBinarizer
X = [[1, 2], [2, 4], [4, 5], [3, 2], [3, 1]]
y = [0, 0, 1, 1, 2]
classif = OneVsRestClassifier(estimator=SVC(random_state=0))
classif.fit(X, y).predict(X)
array([0, 0, 1, 1, 2])
y = LabelBinarizer().fit_transform(y)
classif.fit(X, y).predict(X)
array([[1, 0, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 0],
[0, 0, 0]])
Next
from sklearn.preprocessing import MultiLabelBinarizer
y = [[0, 1], [0, 2], [1, 3], [0, 2, 3], [2, 4]]
y = MultiLabelBinarizer().fit_transform(y)
classif.fit(X, y).predict(X)
array([[1, 1, 0, 0, 0],
[1, 0, 1, 0, 0],
[0, 1, 0, 1, 0],
[1, 0, 1, 0, 0],
[1, 0, 1, 0, 0]])
Label binarization in scikit-learn will transform your targets and represent them in a label indicator matrix. This label indicator matrix has the shape (n_samples, n_classes) and is composed as follows:
each row represents a sample
each column represents a class
each element is 1 if the sample is labeled with the class and 0 if not
In your first example, you have a target collection with 5 samples and 3 classes. That's why transforming y with LabelBinarizer results in a 5x3 matrix. In your case, [1, 0, 0] corresponds to class 0, [0, 1, 0] corresponds to class 1 and so forth. Notice that in each row there is only one element set to 1, since each sample can have one label only.
In your next example, you have a target collection with 5 samples and 5 classes. That's why transforming y with MultiLabelBinarizer results in a 5x5 matrix. In your case, [1, 1, 0, 0, 0] corresponds to the multilabel [0, 1], [0, 1, 0, 1, 0] corresponds to the multilabel [1, 3] and so forth. The key difference to the first example is that each row can have multiple elements set to 1, because each sample can have multiple labels/classes.
The predicted values you get follow the very same pattern. They are however not equivalent to the original values in y since your classification model has obviously predicted different values. You can check this with the inverse_transform() of the binarizers:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
y = np.array([[0, 1], [0, 2], [1, 3], [0, 2, 3], [2, 4]])
y_bin = mlb.fit_transform(y)
# direct transformation
[[1 1 0 0 0]
[1 0 1 0 0]
[0 1 0 1 0]
[1 0 1 1 0]
[0 0 1 0 1]]
# prediction of your classifier
y_pred = np.array([[1, 1, 0, 0, 0],
[1, 0, 1, 0, 0],
[0, 1, 0, 1, 0],
[1, 0, 1, 0, 0],
[1, 0, 1, 0, 0]])
# inverting the binarized values to the original classes
y_inv = mlb.inverse_transform(y_pred)
# output
[(0, 1), (0, 2), (1, 3), (0, 2), (0, 2)]

Upsampling xarray DataArray similar to np.repeat()?

I'm hoping to upsample values in a large 2-dimensional DataArray (below). Is there an xarray tool similar to np.repeat() which can be applied in each dimension (x and y)? In the example below, I would like to duplicate each array entry in both x and y.
import xarray as xr
import numpy as np
x = np.arange(3)
y = np.arange(3)
x_mesh,y_mesh = np.meshgrid(x, y)
arr = x_mesh*y_mesh
df = xr.DataArray(arr, coords={'x':x, 'y':y}, dims=['x','y'])
Desired input:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
Desired output:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 2, 2],
[0, 0, 1, 1, 2, 2],
[0, 0, 2, 2, 4, 4],
[0, 0, 2, 2, 4, 4]])
I am aware of the xesmf regridding tools, but they seem more complicated than necessary for the application I have in mind.
There is a simple solution for this with np.kron.
>>> arr
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
>>> np.int_(np.kron(arr, np.ones((2,2))))
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 2, 2],
[0, 0, 1, 1, 2, 2],
[0, 0, 2, 2, 4, 4],
[0, 0, 2, 2, 4, 4]])

Hermitian Adjacency Matrix of Digraph

I am trying to find a pythonic way to calculate the Hermitian adjacency matrix in Python and I'm really struggling. The definition of a Hermitian Adjacency matrix is shown in this image:
It works as follows. Lets say we have two nodes named i and j. If there is an directed edge going from both i to j and j to i, then the corresponding matrix value at location [ i, j ] should be set to 1. If there is only a directed edge from i to j, then the matrix element at location [i, j] should be set to +i. And if there is only a directed edge from j to i then the matrix element at location [i, j] should be set to -i. All other matrix values are set to 0.
I cannot figure out a smart way to make this Hermitian Adjacency Matrix that doesn't involve iterating through my nodes one by one. Any advice?
I don't think there's a built-in for this, so I've cobbled together my own vectorised solution:
import numpy as np
import networkx as nx
# Create standard adjacency matrix
A = nx.linalg.graphmatrix.adjacency_matrix(G).toarray()
# Add to its transpose and convert from sparse array
B = A + A.T
# Get row index matrix
I = np.indices(B.shape)[0] + 1
# Apply vectorised formula to get Hermitian adjacency matrix
H = np.multiply(B/2 * (2*I)**(B%2), 2*A-1).astype(int)
Explanation
Let's start with a directed graph:
We start by creating the normal adjacency matrix using nx.linalg.graphmatrix.adjacency_matrix(), giving us the following matrix:
>>> A = nx.linalg.graphmatrix.adjacency_matrix(G).toarray()
[[1, 1, 0, 1, 0, 1, 0, 0],
[1, 0, 0, 1, 0, 0, 1, 0],
[1, 1, 1, 1, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 1, 0, 0, 0, 0],
[1, 1, 0, 0, 1, 0, 1, 1],
[0, 1, 0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0]]
We can then add this matrix to its transpose, giving us 2 in every location where there is a directed edge going from i to j and vice-versa, a 1 in every location where only one of these edges exists, and a 0 in every location where no edge exists:
>>> B = A + A.T
>>> B
[[2, 2, 1, 1, 1, 2, 0, 0],
[2, 0, 1, 2, 0, 1, 2, 0],
[1, 1, 2, 1, 0, 1, 0, 0],
[1, 2, 1, 0, 1, 0, 0, 0],
[1, 0, 0, 1, 0, 1, 1, 1],
[2, 1, 1, 0, 1, 0, 1, 1],
[0, 2, 0, 0, 1, 1, 0, 1],
[0, 0, 0, 0, 1, 1, 1, 0]]
Now, we want to apply a function to the matrix so that 0 maps to 0, 2 maps to 1, and 1 maps to the row number i. We can use np.indices() to get the row number, and the following equation: x/2 * (2*i)**(x%2), where i is the row number and x is the element. Finally, we need to multiply elements in positions where no edge ij exists by -1. This can be vectorised as follows:
>>> I = np.indices(B.shape)[0] + 1
>>> H = np.multiply(B/2 * (2*I)**(B%2), 2*A-1).astype(int)
>>> H
[[ 1, 1, -1, 1, -1, 1, 0, 0],
[ 1, 0, -2, 1, 0, -2, 1, 0],
[ 3, 3, 1, 3, 0, 3, 0, 0],
[-4, 1, -4, 0, -4, 0, 0, 0],
[ 5, 0, 0, 5, 0, -5, -5, -5],
[ 1, 6, -6, 0, 6, 0, 6, 6],
[ 0, 1, 0, 0, 7, -7, 0, 7],
[ 0, 0, 0, 0, 8, -8, -8, 0]]
As required.
We can check that this is correct by using a naïve iterate-through-nodes approach:
>>> check = np.zeros([8,8])
>>> for i in G.nodes:
for j in G.nodes:
if (i, j) in G.edges:
if (j, i) in G.edges:
check[i-1, j-1] = 1
else:
check[i-1, j-1] = i
else:
if (j, i) in G.edges:
check[i-1, j-1] = -i
else:
check[i-1, j-1] = 0
>>> (check == H).all()
True

regarding the array indexing in numpy in a given example

The following example is about index array
import numpy as np
labels = np.array([0, 1, 2, 0, 4])
image = np.array([[0, 0, 1, 1, 1],
[2, 2, 0, 0, 0],
[0, 0, 3, 0, 4]])
And the labels[image] gives the following result
array([[0, 0, 1, 1, 1],
[2, 2, 0, 0, 0],
[0, 0, 0, 0, 4]])
I am not clear how does this, i.e., labels[image] works? Thanks.

Resources