python3: normalize matrix of transition probabilities - python-3.x

I have a Python code partially borrowed from Generating Markov transition matrix in Python:
# xstates is a dictionary
# n - is the matrix size
def prob(xstates, n):
# we want to do smoothing, so create matrix of all 1s
M = [[1] * n for _ in range(n)]
# populate matrix by (row, column)
for key, val in xstates.items():
(row, col) = key
M[row][col] = val
# and finally calculate probabilities
for row in M:
s = sum(row)
if s > 0:
row[:] = [f/s for f in row]
return M
xstates here comes in a form of dictionary, e.g. :
{(2, 2): 387, (1, 2): 25, (0, 1): 15, (2, 1): 12, (3, 2): 5, (2, 3): 5, (6, 2): 4, (5, 6): 4, (4, 2): 2, (0, 2): 1}
where (1, 2) means state 1 transits to state 2 and similar to others.
This function generates the matrix of transition probabilities, the sum of all elements in a row is 1. Now I need to normalize the values. How would I do that? Can I do that with numpy library?

import numpy as np
M = np.random.random([3,2])
print(M)
row sum to 1
M = M / M.sum(axis=1)[:, np.newaxis]
print(M)
column sum to 1
M = M / M.sum(axis=0)[np.newaxis,:]
print(M)

Related

Changing cells in a 4D numpy matrix subject to conditions on axis indexes

Suppose I have a 4D numpy array A with indexes i, j, k, l for the four dimensions, suppose 50 x 40 x 30 x 20. Also suppose I have some other list B.
How can I set all cells in A that satisfy some condition to 0? Is there a way to do it efficiently without loops (with vectorization?).
Example condition: All cells that have 3rd dimensional index k whereby B[k] == x
For instance,
if we have the 2D matrix A = [[1,2],[3,4]] and B = [7,8]
Then for the 2nd dimension of A (i.e. columns), I want to zero out all cells in the 2nd dimension whereby the index of the cell in that dimension (call the index i), satisfies the condition B[i] == 7. In this case, A will be converted to
A = [[0,0],[3,4]].
You can specify boolean arrays for specific axes:
import numpy as np
i, j, k, l = 50, 40, 30, 20
a = np.random.random((i, j, k, l))
b_k = np.random.random(k)
b_j = np.random.random(j)
# i, j, k, l
a[:, :, b_k < 0.5, :] = 0
# You can alsow combine multiple conditions along the different axes
# i, j, k, l
a[:, b_j > 0.5, b_k < 0.5, :] = 0
# Or work with the index explicitly
condition_k = np.arange(k) % 3 == 0 # Is the index divisible by 3?
# i, j, k, l
a[:, :, condition_k, :] = 0
To work with the example you have given
a = np.array([[1, 2],
[3, 4]])
b = np.array([7, 8])
# i, j
a[b == 7, :] = 0
# array([[0, 0],
# [3, 4]])
Does the following help?
A = np.arange(16,dtype='float64').reshape(2,2,2,2)
A[A == 2] = 3.14
I'm replacing the entry equal to 2 with 3.14. You can set it to some other value.

Detect ranges of integers in a list in python

I am trying to write a function "detect_range" which detects ranges of integers from a list let's say:
a = [2, 4, 5, 6, 7, 8, 10, 12, 13]
import itertools
def detect_range(L):
for i, j in itertools.groupby(enumerate(L), lambda x: x[1] - x[0]):
j = list(j)
yield j[0][1], j[-1][1]
print(list(detect_range(a)))
It prints:
[(2, 2), (4, 8), (10, 10), (12, 13)]
However, I do not want the single integer like 2 and 10 to be printed in a pair, but single. So the output which I am looking from this code is:
[2, (4, 9), 10, (12, 14)]
If you insist on using itertools, you should add an if-statement to differenciate between the different cases.
To make the code more readable, I added the temporal variables start and length.
import itertools
def detect_range(L):
for i, j in itertools.groupby(enumerate(L), lambda x: x[1] - x[0]):
j = list(j)
start = j[0][1]
length = len(j)
if length == 1:
yield start
else:
yield (start, start+length)
print(list(detect_range(a)))
[2, (4, 9), 10, (12, 14)]
Otherwise, you could scrap itertools and simply implement your own algorithm:
def detect_range(input_list):
start = None
length = 0
for elem in input_list:
# First element
if start is None:
start = elem
length = 1
continue
# Element in row, just count up
if elem == start + length:
length += 1
continue
# Otherwise, yield
if length == 1:
yield start
else:
yield (start, start+length)
start = elem
length = 1
if length == 1:
yield start
else:
yield (start, start+length)
print(list(detect_range(a)))
[2, (4, 9), 10, (12, 14)]
Change it to
if j[0][1] == j[-1][1]:
yield j[0][1]
else:
yield j[0][1], j[-1][1]
You can change the yield statement to have a condition -
def detect_range(L):
for i, j in itertools.groupby(enumerate(L), lambda x: x[1] - x[0]):
j = list(j)
yield (j[0][1], j[-1][1]) if j[0][1]!=j[-1][1] else j[0][1]
Output:
[2, (4, 8), 10, (12, 13)]
Also the expected output is different from the given output (apart from single 2 and 10). So, this code assumes that it was a typo

python: create numpy array from dictionary, where key is coordinates

I have a dictionary of the following form:
{(2, 2): 387, (1, 2): 25, (0, 1): 15, (2, 1): 12, (2, 6): 5, (6, 2): 5, (4, 2): 4, (3, 4): 4, (5, 2): 2, (0, 2): 1}
where key represents coordinates to the matrix, and value is actual value to be added at the coordinates.
At the moment I create and populate matrix in the following way:
import numpy as np
def build_matrix(data, n):
M = np.zeros(shape=(n, n), dtype=np.float64)
for key, val in data.items():
(row, col) = key
M[row][col] = val
Is there a way to do it shorter, using numpy'a API? I looked at np.array(), np.asarray() bit none seem to fit my needs.
The shortest version given n and the input dictionary itself seems to be -
M = np.zeros(shape=(n, n), dtype=np.float64)
M[tuple(zip(*d.keys()))] = list(d.values())
That tuple(zip(*d.keys())) is basically transposing nested items and then packing into tuples as needed for integer-indexing into NumPy arrays. More info on transposing nested items.
Generic case
To handle generic cases, when n is not given and is required to generated based on the extents of the keys alongwith dtype from dictionary values, it would be -
idx_ar = np.array(list(d.keys()))
out_shp = idx_ar.max(0)+1
data = np.array(list(d.values()))
M = np.zeros(shape=out_shp, dtype=data.dtype)
M[tuple(idx_ar.T)] = data
If you don't mind using scipy, what you've basically created is a sparse dok_matrix (Dictionary of Keys)
from scipy.sparse import dok_matrix
out = dok_matrix((n, n))
out.update(data)
out = out.todense()

Graph reduction

I have been working on an piece of code to reduce a graph. The problem is that there are some branches that I want to remove. Once I remove a branch I can merge the nodes or not, depending on the number of paths between the nodes the branch joined.
Maybe the following example illustrates what I want:
The code I have is the following:
from networkx import DiGraph, all_simple_paths, draw
from matplotlib import pyplot as plt
# data preparation
branches = [(2,1), (3,2), (4,3), (4,13), (7,6), (6,5), (5,4),
(8,7), (9,8), (9,10), (10,11), (11,12), (12,1), (13,9)]
branches_to_remove_idx = [11, 10, 9, 8, 6, 5, 3, 2, 0]
ft_dict = dict()
graph = DiGraph()
for i, br in enumerate(branches):
graph.add_edge(br[0], br[1])
ft_dict[i] = (br[0], br[1])
# Processing -----------------------------------------------------
for idx in branches_to_remove_idx:
# get the nodes that define the edge to remove
f, t = ft_dict[idx]
# get the number of paths from 'f' to 't'
n_paths = len(list(all_simple_paths(graph, f, t)))
if n_paths == 1:
# remove branch and merge the nodes 'f' and 't'
#
# This is what I have no clue how to do
#
pass
else:
# remove the branch and that's it
graph.remove_edge(f, t)
print('Simple removal of', f, t)
# -----------------------------------------------------------------
draw(graph, with_labels=True)
plt.show()
I feel that there should be a simpler direct way to obtain the last figure from the first, given the branch indices, but I have no clue.
I think this is more or less what you want. I am merging all nodes that are in chains (connected nodes of degree 2) into one hypernode. I return the the new graph and a dictionary mapping the hypernode to the contracted nodes.
import networkx as nx
def contract(g):
"""
Contract chains of neighbouring vertices with degree 2 into one hypernode.
Arguments:
----------
g -- networkx.Graph instance
Returns:
--------
h -- networkx.Graph instance
the contracted graph
hypernode_to_nodes -- dict: int hypernode -> [v1, v2, ..., vn]
dictionary mapping hypernodes to nodes
"""
# create subgraph of all nodes with degree 2
is_chain = [node for node, degree in g.degree_iter() if degree == 2]
chains = g.subgraph(is_chain)
# contract connected components (which should be chains of variable length) into single node
components = list(nx.components.connected_component_subgraphs(chains))
hypernode = max(g.nodes()) +1
hypernodes = []
hyperedges = []
hypernode_to_nodes = dict()
false_alarms = []
for component in components:
if component.number_of_nodes() > 1:
hypernodes.append(hypernode)
vs = [node for node in component.nodes()]
hypernode_to_nodes[hypernode] = vs
# create new edges from the neighbours of the chain ends to the hypernode
component_edges = [e for e in component.edges()]
for v, w in [e for e in g.edges(vs) if not ((e in component_edges) or (e[::-1] in component_edges))]:
if v in component:
hyperedges.append([hypernode, w])
else:
hyperedges.append([v, hypernode])
hypernode += 1
else: # nothing to collapse as there is only a single node in component:
false_alarms.extend([node for node in component.nodes()])
# initialise new graph with all other nodes
not_chain = [node for node in g.nodes() if not node in is_chain]
h = g.subgraph(not_chain + false_alarms)
h.add_nodes_from(hypernodes)
h.add_edges_from(hyperedges)
return h, hypernode_to_nodes
edges = [(2, 1),
(3, 2),
(4, 3),
(4, 13),
(7, 6),
(6, 5),
(5, 4),
(8, 7),
(9, 8),
(9, 10),
(10, 11),
(11, 12),
(12, 1),
(13, 9)]
g = nx.Graph(edges)
h, hypernode_to_nodes = contract(g)
print("Edges in contracted graph:")
print(h.edges())
print('')
print("Hypernodes:")
for hypernode, nodes in hypernode_to_nodes.items():
print("{} : {}".format(hypernode, nodes))
This returns for your example:
Edges in contracted graph:
[(9, 13), (9, 14), (9, 15), (4, 13), (4, 14), (4, 15)]
Hypernodes:
14 : [1, 2, 3, 10, 11, 12]
15 : [8, 5, 6, 7]
I built this function that scales much better and runs faster with larger graphs:
def add_dicts(vector):
l = list(map(lambda x: Counter(x),vector))
return reduce(lambda x,y:x+y,l)
def consolidate_dup_edges(g):
edges = pd.DataFrame(g.edges(data=True),columns=['start','end','weight'])
edges_consolidated = edges.groupby(['start','end']).agg({'weight':add_dicts}).reset_index()
return nx.from_edgelist(list(edges_consolidated.itertuples(index=False,name=None)))
def graph_reduce(g):
g = consolidate_dup_edges(g)
is_deg2 = [node for node, degree in g.degree() if degree == 2]
is_deg2_descendents =list(map(lambda x: tuple(nx.descendants_at_distance(g,x,1)),is_deg2))
edges_on_deg2= list(map(lambda x: list(map(lambda x:x[2],g.edges(x,data=True))),is_deg2))
edges_on_deg2= list(map(lambda x: add_dicts(x),edges_on_deg2))
new_edges = list(zip(is_deg2_descendents,edges_on_deg2))
new_edges = [(a,b,c) for (a,b),c in new_edges]
g.remove_nodes_from(is_deg2)
g.add_edges_from(new_edges)
g.remove_edges_from(nx.selfloop_edges(g))
g.remove_nodes_from([node for node, degree in g.degree() if degree <= 1])
return consolidate_dup_edges(g)
The graph_reduce function basically removes nodes with degree 1 and removes intermediate nodes with degree 2 and connects the nodes that the degree 2 node was connected to. We can see the best impact when we run this code iteratively until the number of nodes plateaus to a stable number. This only works on undirected graphs.

Accessing elements in an sklearn sparse array

I'm trying to write a function for a euclidean minimum spanning tree, where I have run into trouble is finding K nearest neighbor, as you can see I call the function that returns a sparse array tat contains the the indexes and distance to its nearest neighbor, however I can not access the elements as I assumed I would:
for p1,p2, w in A:
do things
as this returns an error that A only returns 1 item(not 3). Is there a way to access the elements of each within this data set to form edges with the distance as weight? I am pretty new to python and still trying to learn all of finer details of the language.
from sklearn.neighbors import kneighbors_graph
from kruskalsalgorithm import *
import networkx as nx
def EMST(inlist):
graph = nx.Graph()
for a,b in inlist:
graph.add_node((a,b))
print("nodes = ", graph.nodes())
A = kneighbors_graph(graph.nodes(),1,mode='distance', metric='euclidean',include_self=False,n_jobs=-1)
A.toarray()
This is how I am testing my function
mylist = [[2,3],[4,2],[9,4],[3,1]]
EMST(mylist)
and my output is:
nodes = [(2, 3), (4, 2), (9, 4), (3, 1)]
(0, 1) 2.2360679775
(1, 3) 1.41421356237
(2, 1) 5.38516480713
(3, 1) 1.41421356237
You did not really explain what exactly you want to do. There are a lot of potential things imaginable.
But in general you should follow the docs # scipy.sparse. In your case, sklearn's function guarantees the csr_format.
One potential usage is something like:
from scipy import sparse as sp
import numpy as np
np.random.seed(1)
mat = sp.random(4,4, density=0.4)
print(mat)
I, J, V = sp.find(mat)
print(I)
print(J)
print(V)
Output:
(3, 0) 0.846310916686
(1, 3) 0.313273516932
(3, 1) 0.524548159573
(2, 0) 0.44345289378
(2, 1) 0.22957721373
(2, 2) 0.534413908947
[2 3 2 3 2 1]
[0 0 1 1 2 3]
[ 0.44345289 0.84631092 0.22957721 0.52454816 0.53441391 0.31327352]
Of course you could do:
for a, b, w in zip(I, J, V):
print(a, b, w)
which prints:
2 0 0.44345289378
3 0 0.846310916686
2 1 0.22957721373
3 1 0.524548159573
2 2 0.534413908947
1 3 0.313273516932
I can recreate your display with:
In [65]: from scipy import sparse
In [72]: row = np.array([0,1,2,3])
In [73]: col = np.array([1,3,1,1])
In [74]: data = np.array([5,2,29,2])**.5
In [75]: M = sparse.csr_matrix((data, (row, col)), shape=(4,4))
In [76]: M
Out[76]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [77]: print(M)
(0, 1) 2.23606797749979
(1, 3) 1.4142135623730951
(2, 1) 5.385164807134504
(3, 1) 1.4142135623730951
In [78]: M.A # M.toarray()
Out[78]:
array([[0. , 2.23606798, 0. , 0. ],
[0. , 0. , 0. , 1.41421356],
[0. , 5.38516481, 0. , 0. ],
[0. , 1.41421356, 0. , 0. ]])
pts=[(2, 3), (4, 2), (9, 4), (3, 1)]'. Distance frompts[0] to pts[1]issqrt(5)`, etc.
Sparse coo format gives access to the coordinates and distances. sparse.find also produces these arrays.
In [83]: Mc = M.tocoo()
In [84]: Mc.row
Out[84]: array([0, 1, 2, 3], dtype=int32)
In [85]: Mc.col
Out[85]: array([1, 3, 1, 1], dtype=int32)
In [86]: Mc.data
Out[86]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
Checking the point and matrix match:
In [95]: pts = np.array([(2, 3), (4, 2), (9, 4), (3, 1)])
In [96]: pts
Out[96]:
array([[2, 3],
[4, 2],
[9, 4],
[3, 1]])
In [97]: for r,c,d in zip(*sparse.find(M)):
...: print(((pts[r]-pts[c])**2).sum()**.5)
...:
2.23606797749979
5.385164807134504
1.4142135623730951
1.4142135623730951
Or getting all closest distances at once:
In [107]: np.sqrt(((pts[row,:]-pts[col,:])**2).sum(1))
Out[107]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
In [110]: np.linalg.norm(pts[row,:]-pts[col,:],axis=1)
Out[110]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
A 'brute force' minimum distance calc:
All pairwise distances:
In [112]: dist = np.linalg.norm(pts[None,:,:]-pts[:,None,:],axis=2)
In [113]: dist
Out[113]:
array([[0. , 2.23606798, 7.07106781, 2.23606798],
[2.23606798, 0. , 5.38516481, 1.41421356],
[7.07106781, 5.38516481, 0. , 6.70820393],
[2.23606798, 1.41421356, 6.70820393, 0. ]])
(compare this with Out[78])
'blank' out the diagonal
In [114]: D = dist + np.eye(4)*100
Minimum distance and coordinates (by row):
In [116]: np.min(D, axis=1)
Out[116]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
In [117]: np.argmin(D, axis=1)
Out[117]: array([1, 3, 1, 1], dtype=int32)

Resources