I have a dictionary of the following form:
{(2, 2): 387, (1, 2): 25, (0, 1): 15, (2, 1): 12, (2, 6): 5, (6, 2): 5, (4, 2): 4, (3, 4): 4, (5, 2): 2, (0, 2): 1}
where key represents coordinates to the matrix, and value is actual value to be added at the coordinates.
At the moment I create and populate matrix in the following way:
import numpy as np
def build_matrix(data, n):
M = np.zeros(shape=(n, n), dtype=np.float64)
for key, val in data.items():
(row, col) = key
M[row][col] = val
Is there a way to do it shorter, using numpy'a API? I looked at np.array(), np.asarray() bit none seem to fit my needs.
The shortest version given n and the input dictionary itself seems to be -
M = np.zeros(shape=(n, n), dtype=np.float64)
M[tuple(zip(*d.keys()))] = list(d.values())
That tuple(zip(*d.keys())) is basically transposing nested items and then packing into tuples as needed for integer-indexing into NumPy arrays. More info on transposing nested items.
Generic case
To handle generic cases, when n is not given and is required to generated based on the extents of the keys alongwith dtype from dictionary values, it would be -
idx_ar = np.array(list(d.keys()))
out_shp = idx_ar.max(0)+1
data = np.array(list(d.values()))
M = np.zeros(shape=out_shp, dtype=data.dtype)
M[tuple(idx_ar.T)] = data
If you don't mind using scipy, what you've basically created is a sparse dok_matrix (Dictionary of Keys)
from scipy.sparse import dok_matrix
out = dok_matrix((n, n))
out.update(data)
out = out.todense()
Related
I am looking for a way to find the biggest 5 weighted edges in a node. Is there a way to specify that I want exactly the biggest 5 edges without a specific threshold value(a.k.a universal for any weighted graph)?
You could consider the edges sorted by weight and build a dictionary that maps a node with its edges, sorted by weight in a non-increasing way.
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for u,v in sorted(G.edges(), key=lambda x: G.get_edge_data(x[0], x[1])["weight"], reverse=True):
... res[u].append((u,v))
... res[v].append((u,v))
...
Then, given a node (e.g., 0), you could get the top N (e.g., 5) weighted edges as
>>> res[0][:5]
[(0, 7), (0, 2), (0, 6), (0, 1), (0, 3)]
If you only need to do it for a node (e.g., 0), you can directly do:
>>> sorted_edges_u = sorted(G.edges(0), key=lambda x: G.get_edge_data(x[0], x[1])["weight"], reverse=True)
>>> sorted_edges_u[:5]
[(0, 7), (0, 2), (0, 6), (0, 1), (0, 3)]
I have a Python code partially borrowed from Generating Markov transition matrix in Python:
# xstates is a dictionary
# n - is the matrix size
def prob(xstates, n):
# we want to do smoothing, so create matrix of all 1s
M = [[1] * n for _ in range(n)]
# populate matrix by (row, column)
for key, val in xstates.items():
(row, col) = key
M[row][col] = val
# and finally calculate probabilities
for row in M:
s = sum(row)
if s > 0:
row[:] = [f/s for f in row]
return M
xstates here comes in a form of dictionary, e.g. :
{(2, 2): 387, (1, 2): 25, (0, 1): 15, (2, 1): 12, (3, 2): 5, (2, 3): 5, (6, 2): 4, (5, 6): 4, (4, 2): 2, (0, 2): 1}
where (1, 2) means state 1 transits to state 2 and similar to others.
This function generates the matrix of transition probabilities, the sum of all elements in a row is 1. Now I need to normalize the values. How would I do that? Can I do that with numpy library?
import numpy as np
M = np.random.random([3,2])
print(M)
row sum to 1
M = M / M.sum(axis=1)[:, np.newaxis]
print(M)
column sum to 1
M = M / M.sum(axis=0)[np.newaxis,:]
print(M)
I have been working on an piece of code to reduce a graph. The problem is that there are some branches that I want to remove. Once I remove a branch I can merge the nodes or not, depending on the number of paths between the nodes the branch joined.
Maybe the following example illustrates what I want:
The code I have is the following:
from networkx import DiGraph, all_simple_paths, draw
from matplotlib import pyplot as plt
# data preparation
branches = [(2,1), (3,2), (4,3), (4,13), (7,6), (6,5), (5,4),
(8,7), (9,8), (9,10), (10,11), (11,12), (12,1), (13,9)]
branches_to_remove_idx = [11, 10, 9, 8, 6, 5, 3, 2, 0]
ft_dict = dict()
graph = DiGraph()
for i, br in enumerate(branches):
graph.add_edge(br[0], br[1])
ft_dict[i] = (br[0], br[1])
# Processing -----------------------------------------------------
for idx in branches_to_remove_idx:
# get the nodes that define the edge to remove
f, t = ft_dict[idx]
# get the number of paths from 'f' to 't'
n_paths = len(list(all_simple_paths(graph, f, t)))
if n_paths == 1:
# remove branch and merge the nodes 'f' and 't'
#
# This is what I have no clue how to do
#
pass
else:
# remove the branch and that's it
graph.remove_edge(f, t)
print('Simple removal of', f, t)
# -----------------------------------------------------------------
draw(graph, with_labels=True)
plt.show()
I feel that there should be a simpler direct way to obtain the last figure from the first, given the branch indices, but I have no clue.
I think this is more or less what you want. I am merging all nodes that are in chains (connected nodes of degree 2) into one hypernode. I return the the new graph and a dictionary mapping the hypernode to the contracted nodes.
import networkx as nx
def contract(g):
"""
Contract chains of neighbouring vertices with degree 2 into one hypernode.
Arguments:
----------
g -- networkx.Graph instance
Returns:
--------
h -- networkx.Graph instance
the contracted graph
hypernode_to_nodes -- dict: int hypernode -> [v1, v2, ..., vn]
dictionary mapping hypernodes to nodes
"""
# create subgraph of all nodes with degree 2
is_chain = [node for node, degree in g.degree_iter() if degree == 2]
chains = g.subgraph(is_chain)
# contract connected components (which should be chains of variable length) into single node
components = list(nx.components.connected_component_subgraphs(chains))
hypernode = max(g.nodes()) +1
hypernodes = []
hyperedges = []
hypernode_to_nodes = dict()
false_alarms = []
for component in components:
if component.number_of_nodes() > 1:
hypernodes.append(hypernode)
vs = [node for node in component.nodes()]
hypernode_to_nodes[hypernode] = vs
# create new edges from the neighbours of the chain ends to the hypernode
component_edges = [e for e in component.edges()]
for v, w in [e for e in g.edges(vs) if not ((e in component_edges) or (e[::-1] in component_edges))]:
if v in component:
hyperedges.append([hypernode, w])
else:
hyperedges.append([v, hypernode])
hypernode += 1
else: # nothing to collapse as there is only a single node in component:
false_alarms.extend([node for node in component.nodes()])
# initialise new graph with all other nodes
not_chain = [node for node in g.nodes() if not node in is_chain]
h = g.subgraph(not_chain + false_alarms)
h.add_nodes_from(hypernodes)
h.add_edges_from(hyperedges)
return h, hypernode_to_nodes
edges = [(2, 1),
(3, 2),
(4, 3),
(4, 13),
(7, 6),
(6, 5),
(5, 4),
(8, 7),
(9, 8),
(9, 10),
(10, 11),
(11, 12),
(12, 1),
(13, 9)]
g = nx.Graph(edges)
h, hypernode_to_nodes = contract(g)
print("Edges in contracted graph:")
print(h.edges())
print('')
print("Hypernodes:")
for hypernode, nodes in hypernode_to_nodes.items():
print("{} : {}".format(hypernode, nodes))
This returns for your example:
Edges in contracted graph:
[(9, 13), (9, 14), (9, 15), (4, 13), (4, 14), (4, 15)]
Hypernodes:
14 : [1, 2, 3, 10, 11, 12]
15 : [8, 5, 6, 7]
I built this function that scales much better and runs faster with larger graphs:
def add_dicts(vector):
l = list(map(lambda x: Counter(x),vector))
return reduce(lambda x,y:x+y,l)
def consolidate_dup_edges(g):
edges = pd.DataFrame(g.edges(data=True),columns=['start','end','weight'])
edges_consolidated = edges.groupby(['start','end']).agg({'weight':add_dicts}).reset_index()
return nx.from_edgelist(list(edges_consolidated.itertuples(index=False,name=None)))
def graph_reduce(g):
g = consolidate_dup_edges(g)
is_deg2 = [node for node, degree in g.degree() if degree == 2]
is_deg2_descendents =list(map(lambda x: tuple(nx.descendants_at_distance(g,x,1)),is_deg2))
edges_on_deg2= list(map(lambda x: list(map(lambda x:x[2],g.edges(x,data=True))),is_deg2))
edges_on_deg2= list(map(lambda x: add_dicts(x),edges_on_deg2))
new_edges = list(zip(is_deg2_descendents,edges_on_deg2))
new_edges = [(a,b,c) for (a,b),c in new_edges]
g.remove_nodes_from(is_deg2)
g.add_edges_from(new_edges)
g.remove_edges_from(nx.selfloop_edges(g))
g.remove_nodes_from([node for node, degree in g.degree() if degree <= 1])
return consolidate_dup_edges(g)
The graph_reduce function basically removes nodes with degree 1 and removes intermediate nodes with degree 2 and connects the nodes that the degree 2 node was connected to. We can see the best impact when we run this code iteratively until the number of nodes plateaus to a stable number. This only works on undirected graphs.
I have to merge all the tuples containing atleast one element of each other.
tups=[(1,2),(2,3),(8,9),(4,5),(15,12),(9,6),(7,8),(3,11),(1,15)]
first tuple (1,2) should be merged with (2,3),(3,11),(1,15),(15,12) since each of these tuples contains similar items of the preceding tuple. so the final ouput should be
lst1 = [1,2,3,11,12,15]
lst2=[6,7,8,9] since (8,9),(9,6) and (7,8) have matching elements
My code so far:
finlst=[]
for items in range(len(tups)):
for resid in range(len(tups)):
if(tups[items] != tups[resid] ):
if(tups[items][0]==tups[resid][0] or tups[items][0]==tups[resid][1]):
finlst.append(list(set(tups[items]+tups[resid])))
You could do it like this, using sets that are expanded with matching tuples:
tups = [(1, 2), (2, 3), (8, 9), (4, 5), (15, 12), (9, 6), (7, 8), (3, 11), (1, 15)]
groups = []
for t in tups:
for group in groups:
# find a group that has at least one element in common with the tuple
if any(x in group for x in t):
# extend the group with the items from the tuple
group.update(t)
# break from the group-loop as we don’t need to search any further
break
else:
# otherwise (if the group-loop ended without being cancelled with `break`)
# create a new group from the tuple
groups.append(set(t))
# output
for group in groups:
print(group)
{1, 2, 3, 11, 15}
{8, 9, 6, 7}
{4, 5}
{12, 15}
Since this solution iterates the original tuple list once and in order, this will not work for inputs where the connections are not directly visible. For that, we could use the following solution instead which uses fixed-point iteration to combine the groups for as long as that still works:
tups = [(1, 2), (3, 4), (1, 4)]
import itertools
groups = [set(t) for t in tups]
while True:
for a, b in itertools.combinations(groups, 2):
# if the groups can be merged
if len(a & b):
# construct new groups list
groups = [g for g in groups if g != a and g != b]
groups.append(a | b)
# break the for loop and restart
break
else:
# the for loop ended naturally, so no overlapping groups were found
break
I found the solution, it's more of graph theory problem related to connectivity, Connectivity-Graph Theory
We can use NetworkX for this, it's pretty much guaranteed to be correct:
def uniqueGroup(groups):
# grp=[]
# for group in groups:
# grp.append(list(group))
# l=groups
import networkx
from networkx.algorithms.components.connected import connected_components
def to_graph(groups):
G = networkx.Graph()
for part in groups:
# each sublist is a bunch of nodes
G.add_nodes_from(part)
# it also imlies a number of edges:
G.add_edges_from(to_edges(part))
return G
def to_edges(groups):
"""
treat `l` as a Graph and returns it's edges
to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)]
"""
it = iter(groups)
last = next(it)
for current in it:
yield last, current
last = current
G = to_graph(groups)
return connected_components(G)
Output:
tups = [(1, 2),(3,4),(1,4)]
uniqueGroup(tups)
{1, 2, 3, 4}
I'm trying to create the model shown below with PyMC 3 but can't figure out how to properly map probabilities to the observed data with a lambda function.
import numpy as np
import pymc as pm
data = np.array([[0, 0, 1, 1, 2],
[0, 1, 2, 2, 2],
[2, 2, 1, 1, 0],
[1, 1, 2, 0, 1]])
(D, W) = data.shape
V = len(set(data.ravel()))
T = 3
a = np.ones(T)
b = np.ones(V)
with pm.Model() as model:
theta = [pm.Dirichlet('theta_%s' % i, a, shape=T) for i in range(D)]
z = [pm.Categorical('z_%i' % i, theta[i], shape=W) for i in range(D)]
phi = [pm.Dirichlet('phi_%i' % i, b, shape=V) for i in range(T)]
w = [pm.Categorical('w_%i_%i' % (i, j),
p=lambda z=z[i][j], phi_=phi: phi_[z], # Error is here
observed=data[i, j])
for i in range(D) for j in range(W)]
The error I get is
AttributeError: 'function' object has no attribute 'shape'
In the model I'm attempting to build, the elements of z indicate which element in phi gives the probability of the corresponding observed value in data (placed in RV w). In other words,
P(data[i,j]) <- phi[z[i,j]][data[i,j]]
I'm guessing I need to define the probability with a Theano expression or use Theano as_op but I don't see how it can be done for this model.
You should specify your categorical p values as Deterministic objects before passing them on to w. Otherwise, the as_op implementation would look something like this:
#theano.compile.ops.as_op(itypes=[t.lscalar, t.dscalar, t.dscalar],otypes=[t.dvector])
def p(z=z, phi=phi):
return [phi[z[i,j]] for i in range(D) for j in range(W)]