Merge tuples in list with similar elements - python-3.x

I have to merge all the tuples containing atleast one element of each other.
tups=[(1,2),(2,3),(8,9),(4,5),(15,12),(9,6),(7,8),(3,11),(1,15)]
first tuple (1,2) should be merged with (2,3),(3,11),(1,15),(15,12) since each of these tuples contains similar items of the preceding tuple. so the final ouput should be
lst1 = [1,2,3,11,12,15]
lst2=[6,7,8,9] since (8,9),(9,6) and (7,8) have matching elements
My code so far:
finlst=[]
for items in range(len(tups)):
for resid in range(len(tups)):
if(tups[items] != tups[resid] ):
if(tups[items][0]==tups[resid][0] or tups[items][0]==tups[resid][1]):
finlst.append(list(set(tups[items]+tups[resid])))

You could do it like this, using sets that are expanded with matching tuples:
tups = [(1, 2), (2, 3), (8, 9), (4, 5), (15, 12), (9, 6), (7, 8), (3, 11), (1, 15)]
groups = []
for t in tups:
for group in groups:
# find a group that has at least one element in common with the tuple
if any(x in group for x in t):
# extend the group with the items from the tuple
group.update(t)
# break from the group-loop as we don’t need to search any further
break
else:
# otherwise (if the group-loop ended without being cancelled with `break`)
# create a new group from the tuple
groups.append(set(t))
# output
for group in groups:
print(group)
{1, 2, 3, 11, 15}
{8, 9, 6, 7}
{4, 5}
{12, 15}
Since this solution iterates the original tuple list once and in order, this will not work for inputs where the connections are not directly visible. For that, we could use the following solution instead which uses fixed-point iteration to combine the groups for as long as that still works:
tups = [(1, 2), (3, 4), (1, 4)]
import itertools
groups = [set(t) for t in tups]
while True:
for a, b in itertools.combinations(groups, 2):
# if the groups can be merged
if len(a & b):
# construct new groups list
groups = [g for g in groups if g != a and g != b]
groups.append(a | b)
# break the for loop and restart
break
else:
# the for loop ended naturally, so no overlapping groups were found
break

I found the solution, it's more of graph theory problem related to connectivity, Connectivity-Graph Theory
We can use NetworkX for this, it's pretty much guaranteed to be correct:
def uniqueGroup(groups):
# grp=[]
# for group in groups:
# grp.append(list(group))
# l=groups
import networkx
from networkx.algorithms.components.connected import connected_components
def to_graph(groups):
G = networkx.Graph()
for part in groups:
# each sublist is a bunch of nodes
G.add_nodes_from(part)
# it also imlies a number of edges:
G.add_edges_from(to_edges(part))
return G
def to_edges(groups):
"""
treat `l` as a Graph and returns it's edges
to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)]
"""
it = iter(groups)
last = next(it)
for current in it:
yield last, current
last = current
G = to_graph(groups)
return connected_components(G)
Output:
tups = [(1, 2),(3,4),(1,4)]
uniqueGroup(tups)
{1, 2, 3, 4}

Related

Is it possible to chain iterators on a list in Python?

I have a list with negative and positive numbers:
a = [10, -5, 30, -23, 9]
And I want to get two lists of tuples of the values and their indexes in the original list so I did this.
b = list(enumerate(a))
positive = list(filter(lambda x: x[1] > 0, b))
negative = list(filter(lambda x: x[1] < 0, b))
But it feels kind of bad casting to list in between. Is there a way to chain these iterators immediately?
The iterator returned by enumerate can only be iterated once so that is why you have to cast it to a list or a tuple if you want to reuse the functionality of it multiple times.
If you don't want to cast it or find that looping over the list twice is too inefficient for your liking you might want to consider a standard for loop, since it allows you to append to both lists at the same time in only a single iteration of the original list:
a = [10, -5, 30, -23, 9]
positive, negative = [], []
for idx, val in enumerate(a):
if val == 0:
continue
(positive if val > 0 else negative).append((idx, val))
print(f"Positive: {positive}")
print(f"Negative: {negative}")
Output:
Positive: [(0, 10), (2, 30), (4, 9)]
Negative: [(1, -5), (3, -23)]

python: create numpy array from dictionary, where key is coordinates

I have a dictionary of the following form:
{(2, 2): 387, (1, 2): 25, (0, 1): 15, (2, 1): 12, (2, 6): 5, (6, 2): 5, (4, 2): 4, (3, 4): 4, (5, 2): 2, (0, 2): 1}
where key represents coordinates to the matrix, and value is actual value to be added at the coordinates.
At the moment I create and populate matrix in the following way:
import numpy as np
def build_matrix(data, n):
M = np.zeros(shape=(n, n), dtype=np.float64)
for key, val in data.items():
(row, col) = key
M[row][col] = val
Is there a way to do it shorter, using numpy'a API? I looked at np.array(), np.asarray() bit none seem to fit my needs.
The shortest version given n and the input dictionary itself seems to be -
M = np.zeros(shape=(n, n), dtype=np.float64)
M[tuple(zip(*d.keys()))] = list(d.values())
That tuple(zip(*d.keys())) is basically transposing nested items and then packing into tuples as needed for integer-indexing into NumPy arrays. More info on transposing nested items.
Generic case
To handle generic cases, when n is not given and is required to generated based on the extents of the keys alongwith dtype from dictionary values, it would be -
idx_ar = np.array(list(d.keys()))
out_shp = idx_ar.max(0)+1
data = np.array(list(d.values()))
M = np.zeros(shape=out_shp, dtype=data.dtype)
M[tuple(idx_ar.T)] = data
If you don't mind using scipy, what you've basically created is a sparse dok_matrix (Dictionary of Keys)
from scipy.sparse import dok_matrix
out = dok_matrix((n, n))
out.update(data)
out = out.todense()

Graph reduction

I have been working on an piece of code to reduce a graph. The problem is that there are some branches that I want to remove. Once I remove a branch I can merge the nodes or not, depending on the number of paths between the nodes the branch joined.
Maybe the following example illustrates what I want:
The code I have is the following:
from networkx import DiGraph, all_simple_paths, draw
from matplotlib import pyplot as plt
# data preparation
branches = [(2,1), (3,2), (4,3), (4,13), (7,6), (6,5), (5,4),
(8,7), (9,8), (9,10), (10,11), (11,12), (12,1), (13,9)]
branches_to_remove_idx = [11, 10, 9, 8, 6, 5, 3, 2, 0]
ft_dict = dict()
graph = DiGraph()
for i, br in enumerate(branches):
graph.add_edge(br[0], br[1])
ft_dict[i] = (br[0], br[1])
# Processing -----------------------------------------------------
for idx in branches_to_remove_idx:
# get the nodes that define the edge to remove
f, t = ft_dict[idx]
# get the number of paths from 'f' to 't'
n_paths = len(list(all_simple_paths(graph, f, t)))
if n_paths == 1:
# remove branch and merge the nodes 'f' and 't'
#
# This is what I have no clue how to do
#
pass
else:
# remove the branch and that's it
graph.remove_edge(f, t)
print('Simple removal of', f, t)
# -----------------------------------------------------------------
draw(graph, with_labels=True)
plt.show()
I feel that there should be a simpler direct way to obtain the last figure from the first, given the branch indices, but I have no clue.
I think this is more or less what you want. I am merging all nodes that are in chains (connected nodes of degree 2) into one hypernode. I return the the new graph and a dictionary mapping the hypernode to the contracted nodes.
import networkx as nx
def contract(g):
"""
Contract chains of neighbouring vertices with degree 2 into one hypernode.
Arguments:
----------
g -- networkx.Graph instance
Returns:
--------
h -- networkx.Graph instance
the contracted graph
hypernode_to_nodes -- dict: int hypernode -> [v1, v2, ..., vn]
dictionary mapping hypernodes to nodes
"""
# create subgraph of all nodes with degree 2
is_chain = [node for node, degree in g.degree_iter() if degree == 2]
chains = g.subgraph(is_chain)
# contract connected components (which should be chains of variable length) into single node
components = list(nx.components.connected_component_subgraphs(chains))
hypernode = max(g.nodes()) +1
hypernodes = []
hyperedges = []
hypernode_to_nodes = dict()
false_alarms = []
for component in components:
if component.number_of_nodes() > 1:
hypernodes.append(hypernode)
vs = [node for node in component.nodes()]
hypernode_to_nodes[hypernode] = vs
# create new edges from the neighbours of the chain ends to the hypernode
component_edges = [e for e in component.edges()]
for v, w in [e for e in g.edges(vs) if not ((e in component_edges) or (e[::-1] in component_edges))]:
if v in component:
hyperedges.append([hypernode, w])
else:
hyperedges.append([v, hypernode])
hypernode += 1
else: # nothing to collapse as there is only a single node in component:
false_alarms.extend([node for node in component.nodes()])
# initialise new graph with all other nodes
not_chain = [node for node in g.nodes() if not node in is_chain]
h = g.subgraph(not_chain + false_alarms)
h.add_nodes_from(hypernodes)
h.add_edges_from(hyperedges)
return h, hypernode_to_nodes
edges = [(2, 1),
(3, 2),
(4, 3),
(4, 13),
(7, 6),
(6, 5),
(5, 4),
(8, 7),
(9, 8),
(9, 10),
(10, 11),
(11, 12),
(12, 1),
(13, 9)]
g = nx.Graph(edges)
h, hypernode_to_nodes = contract(g)
print("Edges in contracted graph:")
print(h.edges())
print('')
print("Hypernodes:")
for hypernode, nodes in hypernode_to_nodes.items():
print("{} : {}".format(hypernode, nodes))
This returns for your example:
Edges in contracted graph:
[(9, 13), (9, 14), (9, 15), (4, 13), (4, 14), (4, 15)]
Hypernodes:
14 : [1, 2, 3, 10, 11, 12]
15 : [8, 5, 6, 7]
I built this function that scales much better and runs faster with larger graphs:
def add_dicts(vector):
l = list(map(lambda x: Counter(x),vector))
return reduce(lambda x,y:x+y,l)
def consolidate_dup_edges(g):
edges = pd.DataFrame(g.edges(data=True),columns=['start','end','weight'])
edges_consolidated = edges.groupby(['start','end']).agg({'weight':add_dicts}).reset_index()
return nx.from_edgelist(list(edges_consolidated.itertuples(index=False,name=None)))
def graph_reduce(g):
g = consolidate_dup_edges(g)
is_deg2 = [node for node, degree in g.degree() if degree == 2]
is_deg2_descendents =list(map(lambda x: tuple(nx.descendants_at_distance(g,x,1)),is_deg2))
edges_on_deg2= list(map(lambda x: list(map(lambda x:x[2],g.edges(x,data=True))),is_deg2))
edges_on_deg2= list(map(lambda x: add_dicts(x),edges_on_deg2))
new_edges = list(zip(is_deg2_descendents,edges_on_deg2))
new_edges = [(a,b,c) for (a,b),c in new_edges]
g.remove_nodes_from(is_deg2)
g.add_edges_from(new_edges)
g.remove_edges_from(nx.selfloop_edges(g))
g.remove_nodes_from([node for node, degree in g.degree() if degree <= 1])
return consolidate_dup_edges(g)
The graph_reduce function basically removes nodes with degree 1 and removes intermediate nodes with degree 2 and connects the nodes that the degree 2 node was connected to. We can see the best impact when we run this code iteratively until the number of nodes plateaus to a stable number. This only works on undirected graphs.

going through a list of objects in Python by order of an attribute until exhaustion [duplicate]

Is there any other argument than key, for example: value?
Arguments of sort and sorted
Both sort and sorted have three keyword arguments: cmp, key and reverse.
L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
cmp(x, y) -> -1, 0, 1
sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list
Using key and reverse is preferred, because they work much faster than an equivalent cmp.
key should be a function which takes an item and returns a value to compare and sort by. reverse allows to reverse sort order.
Using key argument
You can use operator.itemgetter as a key argument to sort by second, third etc. item in a tuple.
Example
>>> from operator import itemgetter
>>> a = range(5)
>>> b = a[::-1]
>>> c = map(lambda x: chr(((x+3)%5)+97), a)
>>> sequence = zip(a,b,c)
# sort by first item in a tuple
>>> sorted(sequence, key = itemgetter(0))
[(0, 4, 'd'), (1, 3, 'e'), (2, 2, 'a'), (3, 1, 'b'), (4, 0, 'c')]
# sort by second item in a tuple
>>> sorted(sequence, key = itemgetter(1))
[(4, 0, 'c'), (3, 1, 'b'), (2, 2, 'a'), (1, 3, 'e'), (0, 4, 'd')]
# sort by third item in a tuple
>>> sorted(sequence, key = itemgetter(2))
[(2, 2, 'a'), (3, 1, 'b'), (4, 0, 'c'), (0, 4, 'd'), (1, 3, 'e')]
Explanation
Sequences can contain any objects, not even comparable, but if we can define a function which produces something we can compare for each of the items, we can pass this function in key argument to sort or sorted.
itemgetter, in particular, creates such a function that fetches the given item from its operand. An example from its documentation:
After, f=itemgetter(2), the call f(r) returns r[2].
Mini-benchmark, key vs cmp
Just out of curiosity, key and cmp performance compared, smaller is better:
>>> from timeit import Timer
>>> Timer(stmt="sorted(xs,key=itemgetter(1))",setup="from operator import itemgetter;xs=range(100);xs=zip(xs,xs);").timeit(300000)
6.7079150676727295
>>> Timer(stmt="sorted(xs,key=lambda x:x[1])",setup="xs=range(100);xs=zip(xs,xs);").timeit(300000)
11.609490871429443
>>> Timer(stmt="sorted(xs,cmp=lambda a,b: cmp(a[1],b[1]))",setup="xs=range(100);xs=zip(xs,xs);").timeit(300000)
22.335839986801147
So, sorting with key seems to be at least twice as fast as sorting with cmp. Using itemgetter instead of lambda x: x[1] makes sort even faster.
Besides key=, the sort method of lists in Python 2.x could alternatively take a cmp= argument (not a good idea, it's been removed in Python 3); with either or none of these two, you can always pass reverse=True to have the sort go downwards (instead of upwards as is the default, and which you can also request explicitly with reverse=False if you're really keen to do that for some reason). I have no idea what that value argument you're mentioning is supposed to do.
Yes, it takes other arguments, but no value.
>>> print list.sort.__doc__
L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
cmp(x, y) -> -1, 0, 1
What would a value argument even mean?

Python - eliminating common elements from two lists

If i have two lists:
a = [1,2,1,2,4] and b = [1,2,4]
how do i get
a - b = [1,2,4]
such that one element from b removes only one element from a if that element is present in a.
You can use itertools.zip_longest to zip the lists with different length then use a list comprehension :
>>> from itertools import zip_longest
>>> [i for i,j in izip_longest(a,b) if i!=j]
[1, 2, 4]
Demo:
>>> list(izip_longest(a,b))
[(1, 1), (2, 2), (1, 4), (2, None), (4, None)]
a = [1,2,1,2,4]
b = [1,2,4]
c= set(a) & set(b)
d=list(c)
The answer is just a little modification to this topic's answer:
Find non-common elements in lists
and since you cannot iterate a set object:
https://www.daniweb.com/software-development/python/threads/462906/typeerror-set-object-does-not-support-indexing

Resources