Accessing elements in an sklearn sparse array - python-3.x

I'm trying to write a function for a euclidean minimum spanning tree, where I have run into trouble is finding K nearest neighbor, as you can see I call the function that returns a sparse array tat contains the the indexes and distance to its nearest neighbor, however I can not access the elements as I assumed I would:
for p1,p2, w in A:
do things
as this returns an error that A only returns 1 item(not 3). Is there a way to access the elements of each within this data set to form edges with the distance as weight? I am pretty new to python and still trying to learn all of finer details of the language.
from sklearn.neighbors import kneighbors_graph
from kruskalsalgorithm import *
import networkx as nx
def EMST(inlist):
graph = nx.Graph()
for a,b in inlist:
graph.add_node((a,b))
print("nodes = ", graph.nodes())
A = kneighbors_graph(graph.nodes(),1,mode='distance', metric='euclidean',include_self=False,n_jobs=-1)
A.toarray()
This is how I am testing my function
mylist = [[2,3],[4,2],[9,4],[3,1]]
EMST(mylist)
and my output is:
nodes = [(2, 3), (4, 2), (9, 4), (3, 1)]
(0, 1) 2.2360679775
(1, 3) 1.41421356237
(2, 1) 5.38516480713
(3, 1) 1.41421356237

You did not really explain what exactly you want to do. There are a lot of potential things imaginable.
But in general you should follow the docs # scipy.sparse. In your case, sklearn's function guarantees the csr_format.
One potential usage is something like:
from scipy import sparse as sp
import numpy as np
np.random.seed(1)
mat = sp.random(4,4, density=0.4)
print(mat)
I, J, V = sp.find(mat)
print(I)
print(J)
print(V)
Output:
(3, 0) 0.846310916686
(1, 3) 0.313273516932
(3, 1) 0.524548159573
(2, 0) 0.44345289378
(2, 1) 0.22957721373
(2, 2) 0.534413908947
[2 3 2 3 2 1]
[0 0 1 1 2 3]
[ 0.44345289 0.84631092 0.22957721 0.52454816 0.53441391 0.31327352]
Of course you could do:
for a, b, w in zip(I, J, V):
print(a, b, w)
which prints:
2 0 0.44345289378
3 0 0.846310916686
2 1 0.22957721373
3 1 0.524548159573
2 2 0.534413908947
1 3 0.313273516932

I can recreate your display with:
In [65]: from scipy import sparse
In [72]: row = np.array([0,1,2,3])
In [73]: col = np.array([1,3,1,1])
In [74]: data = np.array([5,2,29,2])**.5
In [75]: M = sparse.csr_matrix((data, (row, col)), shape=(4,4))
In [76]: M
Out[76]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [77]: print(M)
(0, 1) 2.23606797749979
(1, 3) 1.4142135623730951
(2, 1) 5.385164807134504
(3, 1) 1.4142135623730951
In [78]: M.A # M.toarray()
Out[78]:
array([[0. , 2.23606798, 0. , 0. ],
[0. , 0. , 0. , 1.41421356],
[0. , 5.38516481, 0. , 0. ],
[0. , 1.41421356, 0. , 0. ]])
pts=[(2, 3), (4, 2), (9, 4), (3, 1)]'. Distance frompts[0] to pts[1]issqrt(5)`, etc.
Sparse coo format gives access to the coordinates and distances. sparse.find also produces these arrays.
In [83]: Mc = M.tocoo()
In [84]: Mc.row
Out[84]: array([0, 1, 2, 3], dtype=int32)
In [85]: Mc.col
Out[85]: array([1, 3, 1, 1], dtype=int32)
In [86]: Mc.data
Out[86]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
Checking the point and matrix match:
In [95]: pts = np.array([(2, 3), (4, 2), (9, 4), (3, 1)])
In [96]: pts
Out[96]:
array([[2, 3],
[4, 2],
[9, 4],
[3, 1]])
In [97]: for r,c,d in zip(*sparse.find(M)):
...: print(((pts[r]-pts[c])**2).sum()**.5)
...:
2.23606797749979
5.385164807134504
1.4142135623730951
1.4142135623730951
Or getting all closest distances at once:
In [107]: np.sqrt(((pts[row,:]-pts[col,:])**2).sum(1))
Out[107]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
In [110]: np.linalg.norm(pts[row,:]-pts[col,:],axis=1)
Out[110]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
A 'brute force' minimum distance calc:
All pairwise distances:
In [112]: dist = np.linalg.norm(pts[None,:,:]-pts[:,None,:],axis=2)
In [113]: dist
Out[113]:
array([[0. , 2.23606798, 7.07106781, 2.23606798],
[2.23606798, 0. , 5.38516481, 1.41421356],
[7.07106781, 5.38516481, 0. , 6.70820393],
[2.23606798, 1.41421356, 6.70820393, 0. ]])
(compare this with Out[78])
'blank' out the diagonal
In [114]: D = dist + np.eye(4)*100
Minimum distance and coordinates (by row):
In [116]: np.min(D, axis=1)
Out[116]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
In [117]: np.argmin(D, axis=1)
Out[117]: array([1, 3, 1, 1], dtype=int32)

Related

Iterate over i,j in a function returning two arrays

I am trying to accept two arrays from a function and iterate over value pairs in an array
import numpy as np
a = np.zeros(10).astype(np.uint8)
a[0:4] = 1
hist = np.zeros(4)
values, counts = np.unique(a, return_counts=True)
for u, c in zip(values, counts):
hist[u] += c
# This works. hist: [6. 4. 0. 0.]
# for u, c in zip(np.unique(a, return_counts=True)): # ValueError: not enough values to unpack (expected 2, got 1)
# hist[u] += c
# for u, c in np.unique(a, return_counts=True): # IndexError: index 6 is out of bounds for axis 0 with size 4
# hist[u] += c
Code works if I first accept two arrays, then use for k,v in zip(arr1, arr2)
Is it possible two write for k,v in function_returning_two_arrays(args) as a one line statement?
Update. Both zip(*arg) and [arg] work. Can you please elaborate on this syntax, please. A link to an article would be enough. Then I can accept the answer. I got it that a * unpacks a tuple, but what does [some_tupple] do?
Other than the unique step, this just basic python.
In [78]: a = np.zeros(10).astype(np.uint8)
...: a[0:4] = 1
...: ret = np.unique(a, return_counts=True)
unique returns a tuple of arrays, which can be used as is, or unpacked into 2 variables. I think unpacking makes the code clearer.
In [79]: ret
Out[79]: (array([0, 1], dtype=uint8), array([6, 4]))
In [80]: values, counts = ret
In [81]: values
Out[81]: array([0, 1], dtype=uint8)
In [82]: counts
Out[82]: array([6, 4])
The following just makes a list with 1 item - the tuple
In [83]: [ret]
Out[83]: [(array([0, 1], dtype=uint8), array([6, 4]))]
That's different from making a list of the two arrays - which just changes the tuple "wrapper" to a list:
In [84]: [values, counts]
Out[84]: [array([0, 1], dtype=uint8), array([6, 4])]
zip takes multiple items (it has a *args signature)
In [85]: list(zip(*ret)) # same as zip(values, counts)
Out[85]: [(0, 6), (1, 4)]
In [86]: [(i,j) for i,j in zip(*ret)] # using that in an iteration
Out[86]: [(0, 6), (1, 4)]
In [87]: [(i,j) for i,j in zip(values, counts)]
Out[87]: [(0, 6), (1, 4)]
So it pairs the nth element of values with the nth element of counts
Iteration on the [ret] list does something entirely different, or rather it does nothing - compare with `Out[83]:
In [88]: [(i,j) for i,j in [ret]]
Out[88]: [(array([0, 1], dtype=uint8), array([6, 4]))]
I think of list(zip(*arg)) as a list version of transpose:
In [90]: np.transpose(ret)
Out[90]:
array([[0, 6],
[1, 4]])
In [91]: [(i,j) for i,j in np.transpose(ret)]
Out[91]: [(0, 6), (1, 4)]

python find the top N weighted edges regardless of weight

I am looking for a way to find the biggest 5 weighted edges in a node. Is there a way to specify that I want exactly the biggest 5 edges without a specific threshold value(a.k.a universal for any weighted graph)?
You could consider the edges sorted by weight and build a dictionary that maps a node with its edges, sorted by weight in a non-increasing way.
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for u,v in sorted(G.edges(), key=lambda x: G.get_edge_data(x[0], x[1])["weight"], reverse=True):
... res[u].append((u,v))
... res[v].append((u,v))
...
Then, given a node (e.g., 0), you could get the top N (e.g., 5) weighted edges as
>>> res[0][:5]
[(0, 7), (0, 2), (0, 6), (0, 1), (0, 3)]
If you only need to do it for a node (e.g., 0), you can directly do:
>>> sorted_edges_u = sorted(G.edges(0), key=lambda x: G.get_edge_data(x[0], x[1])["weight"], reverse=True)
>>> sorted_edges_u[:5]
[(0, 7), (0, 2), (0, 6), (0, 1), (0, 3)]

List of list to get element whose values greater than 3

I have 2 list where each list is of size 250000. I wanted to iterate thru the lists and return the values that are greater than 3.
For example:
import itertools
from array import array
import numpy as np
input = (np.array([list([8,1]), list([2,3,4]), list([5,3])],dtype=object), np.array([1,0,0,0,1,1,1]))
X = input[0]
y = input[1]
res = [ u for s in X for u in zip(y,s) ]
res
I don't get the expected output.
Actual res : [(1, 8), (0, 1), (1, 2), (0, 3), (0, 4), (1, 5), (0, 3)]
Expected output 1 : [(8,1), (1,0), (2, 0), (3, 0), (4, 1), (5, 1), (3, 1)]
Expected output 2 : [(8,1), (4, 1), (5, 1))] ---> for greater than 3
I took references from stackoverflow. Tried itertools as well.
Using NumPy to store lists of non-uniform lengths creates a whole lot of issues, like the ones you are seeing. If it were an array integers, you could simply do
X[X > 3]
but since it is an array of lists, you have to jump through all sorts of hoops to get what you want, and basically lose all the advantages of using NumPy in the first place. You could just as well use lists of lists and skip NumPy altogether.
As an alternative I would recommend using Pandas or something else more suitable than NumPy:
import pandas as pd
df = pd.DataFrame({
'group': [0, 0, 1, 1, 1, 2, 2],
'data': [8, 1, 2, 3, 4, 5, 4],
'flag': [1, 0, 0, 0, 1, 1, 1],
})
df[df['data'] > 3]
# group data flag
# 0 0 8 1
# 4 1 4 1
# 5 2 5 1
# 6 2 4 1
Use filter
For example:
input = [1, 3, 2, 5, 6, 7, 8, 22]
# result contains even numbers of the list
result = filter(lambda x: x % 2 == 0, input)
This should give you result = [2, 6, 8, 22]
Not sureI quite understand exactly what you're trying to do... but filter is probably a good way.

Stacking numpy array with offset

I'm trying to locate some values into an np.array with searchsorted and I would like to get the resulting array stacked with itself modulo an offset.
I can do this with:
import numpy as np
a = np.array([(1, 3.5), (1, 2.1), (1, 5.8), (1, 0.)])
b = np.arange(0.5, 5.5, 1.)
c = np.searchsorted(b, a[:, 1])
d = np.column_stack((c, c + 1))
but I'd like to do it more directly, something similar to:
c = np.column_stack((np.searchsorted(b, a[:, 1]), np.searchsorted(b, a[:, 1]) + 1))
without repeating the call to np.searchsorted.
For the example above, the result should be:
[[3 4]
[2 3]
[5 6]
[0 1]]
Any idea?

Get degree of each nodes in a graph by Networkx in python

Suppose I have a data set like below that shows an undirected graph:
1 2
1 3
1 4
3 5
3 6
7 8
8 9
10 11
I have a python script like it:
for s in ActorGraph.degree():
print(s)
that is a dictionary consist of key and value that keys are node names and values are degree of nodes:
('9', 1)
('5', 1)
('11', 1)
('8', 2)
('6', 1)
('4', 1)
('10', 1)
('7', 1)
('2', 1)
('3', 3)
('1', 3)
In networkx documentation suggest to use values() for having nodes degree.
now I like to have just keys that are degree of nodes and I use this part of script but it does't work and say object has no attribute 'values':
for s in ActorGraph.degree():
print(s.values())
how can I do it?
You are using version 2.0 of networkx. Which changed from using a dict for G.degree() to using a dict-like (but not dict) DegreeView. See this guide.
To have the degrees in a list you can use a list-comprehension:
degrees = [val for (node, val) in G.degree()]
I'd like to add the following: if you're initializing the undirected graph with nx.Graph() and adding the edges afterwards, just beware that networkx doesn't guarrantee the order of nodes will be preserved -- this also applies to degree(). This means that if you use the list comprehension approach then try to access the degree by list index the indexes may not correspond to the right nodes. If you'd like them to correspond, you can instead do:
degrees = [val for (node, val) in sorted(G.degree(), key=lambda pair: pair[0])]
Here's a simple example to illustrate this:
>>> edges = [(0, 1), (0, 3), (0, 5), (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (2, 5)]
>>> g = nx.Graph()
>>> g.add_edges_from(edges)
>>> print(g.degree())
[(0, 3), (1, 4), (3, 3), (5, 2), (2, 4), (4, 2)]
>>> print([val for (node, val) in g.degree()])
[3, 4, 3, 2, 4, 2]
>>> print([val for (node, val) in sorted(g.degree(), key=lambda pair: pair[0])])
[3, 4, 4, 3, 2, 2]
You can also use a dict comprehension to get an actual dictionary:
degrees = {node:val for (node, val) in G.degree()}

Resources