Disjoint set implementation in Python - python-3.x

I am relatively new to Python. I am studying Disjoint sets, and implemented it as follows:
class DisjointSet:
def __init__(self, vertices, parent):
self.vertices = vertices
self.parent = parent
def find(self, item):
if self.parent[item] == item:
return item
else:
return self.find(self.parent[item])
def union(self, set1, set2):
self.parent[set1] = set2
Now in the driver code:
def main():
vertices = ['a', 'b', 'c', 'd', 'e', 'h', 'i']
parent = {}
for v in vertices:
parent[v] = v
ds = DisjointSet(vertices, parent)
print("Print all vertices in genesis: ")
ds.union('b', 'd')
ds.union('h', 'b')
print(ds.find('h')) # prints d (OK)
ds.union('h', 'i')
print(ds.find('i')) # prints i (expecting d)
main()
So, at first I initialized all nodes as individual disjoint sets. Then unioned bd and hb which makes the set: hbd then hi is unioned, which should (as I assumed) give us the set: ihbd. I understand that due to setting the parent in this line of union(set1, set2):
self.parent[set1] = set2
I am setting the parent of h as i and thus removing it from the set of bd. How can I achieve a set of ihbd where the order of the params in union() won't yield different results?

Your program is not working correctly because you have misunderstood the algorithm for disjoint set implementation. Union is implemented by modifying the parent of the root node rather than the node provided as input. As you have already noticed, blindly modifying parents of any node you receive in input will just destroy previous unions.
Here's a correct implementation:
def union(self, set1, set2):
root1 = self.find(set1)
root2 = self.find(set2)
self.parent[root1] = root2
I would also suggest reading Disjoint-set data structure for more info as well as possible optimizations.

To make your implementation faster, you may want to update the parent as you find()
def find(self, item):
if self.parent[item] == item:
return item
else:
res = self.find(self.parent[item])
self.parent[item] = res
return res

Related

Difference between lists direct assignment and slice assignment

I have:
def reverseString(self, s: List[str]) -> None:
s[:] = s[::-1] # Works
... and
def reverseString(self, s: List[str]) -> None:
s = s[::-1] # Doesn't work
Where s is a list of characters lets say s = ["k","a","k","a","s","h","i"]
While doing a question on leetcode it rejected when I used s = ... but accepted when I used s[:] = ... and also it was written that DO NOT RETURN ANYTHING but return s.reverse also worked.
This is actually a bit complex and requires two explanations.
First, a python function argument act as label on the passed object. For example, in the following code, arg is the local label (/name) attached to the initial list. When the label arg is re-used, for example by attaching it to a new object (17), the original list is not reachable anymore within function f.
On the other hand, outside of f, the list labeled L is still here, untouched:
def f(arg):
arg = 17
print(arg) # 17
L = ['a', 'list']
f(L)
print(L) # ['a', 'list']
That explains why the following function doesn't reverse your list in place:
def reverse_list(arg):
arg = arg[::-1]
print(arg) # ['list', 'a']
L = ['a', 'list']
reverse_list(L)
print(L) # ['a', 'list']
This function simply attach the label arg to a new list (that is indeed equal to the reversed list).
Secondly, the difference between arg[:] = ... and arg = ... is that the first will modify the content of the list (instead of attaching the label arg to a new object). This is the reason why the following works as expected:
def alt_reverse_list(arg):
arg[:] = arg[::-1]
L = ['a', 'list']
alt_reverse_list(L)
print(L) # ['list', 'a']
In this second example we say that the list has been mutated (modified in place). Here is a detailed explanation on slice assignments
For the same reason, calling arg.reverse() would have worked.
Identifying objects
Using the id() function can help figure out what is going on with the argument in the first example (where we don't mutate the list but affect a new value):
def reverse_list(arg):
print("List ID before: ", id(arg))
arg = arg[::-1]
print("List ID after: ", id(arg))
L = ['a', 'list']
print("Original list ID: ", id(L))
reverse_list(L)
print("Final list ID: ", id(L))
Which will print something like:
Original list ID: 140395368281088
List ID before: 140395368281088
List ID after: 140395368280447 <--- intruder spotted
Final list ID: 140395368281088
Here we can clearly see that after calling arg = arg[::-1] the object we are manipulating under the name arg is not the same. This shows why the function doesn't have any (side) effect.

Python: Build Graphs with set of sets adjacency list

Hello this is my first post and I'm not fluent in English, so forgive me if I do any wrong post protocol ....
so, I have this current implementation of a graph, through an adjacency matrix:
class GFG:
def __init__(self,graph):
self.graph = graph
from which I start like this:
my_shape = (N,M)
bpGraph = np.zeros(my_shape)
and add edges like this:
vertex_a = 2
vertex_b = 5
bpGraph [vertex_a, vertex_b] = 1
g = GFG(bpGraph)
and check for an edge like this:
if bpGraph [vertice_a, vertice_b]:
do something...
But this type of structure can take a lot of space if I have a lot of vertices, so I would like to switch to a set map
class GFG:
def __init__(self,N,M):
mysetdict = {
"dummy_a": {"dummy_b","dummy_c"},
}
self.graph = mysetdict
def add_edges(self,N,M):
N_set = {}
if str(N) in self.graph:
N_set = self.graph[str(N)]
N_set.add(str(M))
self.graph[str(N)] = N_set
def check_edges(self,N,M):
return str(M) in self.graph[str(N)]
When I try to add a vertex I get this error:
g = GFG (3,4)
g.add_edges (0, 0)
AttributeError: 'dict' object has no attribute 'add'
---> 16 N_set.add(str(M))
im trying many ways to build this set and check if a entry exists before add a vertex_a to do a edge with vertex_b, but all my tryes results in a atribute error.
any suggestions?
solved, just doing N_set = set()
def add_edges(self,N,M):
N_set = set()
if str(N) in self.m_graph:
N_set = self.m_graph[str(N)]
N_set.add(str(M))
self.m_graph[str(N)] = N_set

What's the underlying implementation for most_common method of Counter?

I found a pyi file which has the following def
def most_common(self, n: Optional[int] = ...) -> List[Tuple[_T, int]]: ...
How could this happen? List is not defined, and no implementation?
Just highlight some valuable suggestions here for followers:
List is imported from the typing module; it's not the same thing as list. The .pyi file doesn't need to import it because stub files are never executed; they just have to be syntactically valid Python
If you use from future import annotations, you won't have to import typing to use List et al. in function annotations in .py files, either, since function annotations will be treated as string literals. (Starting in Python 4, that will be the default behavior. See PEP 563 for details.)
You are looking at the pyi file which is used solely for annotations. It is never executed by the Python interpreter. You can learn more about pyi files by reading PEP484.
Using a debugger, put a breakpoint on the line where you call most_commonand then step into the method.
Python 3.7 implementation.
...\Lib\collections\__init__.py:
def most_common(self, n=None):
'''List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
'''
# Emulate Bag.sortedByCount from Smalltalk
if n is None:
return sorted(self.items(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
_heapq.nlargest (in ...\Lib\heapq.py) implementation:
def nlargest(n, iterable, key=None):
"""Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
"""
# Short-cut for n==1 is to use max()
if n == 1:
it = iter(iterable)
sentinel = object()
if key is None:
result = max(it, default=sentinel)
else:
result = max(it, default=sentinel, key=key)
return [] if result is sentinel else [result]
# When n>=size, it's faster to use sorted()
try:
size = len(iterable)
except (TypeError, AttributeError):
pass
else:
if n >= size:
return sorted(iterable, key=key, reverse=True)[:n]
# When key is none, use simpler decoration
if key is None:
it = iter(iterable)
result = [(elem, i) for i, elem in zip(range(0, -n, -1), it)]
if not result:
return result
heapify(result)
top = result[0][0]
order = -n
_heapreplace = heapreplace
for elem in it:
if top < elem:
_heapreplace(result, (elem, order))
top, _order = result[0]
order -= 1
result.sort(reverse=True)
return [elem for (elem, order) in result]
# General case, slowest method
it = iter(iterable)
result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
if not result:
return result
heapify(result)
top = result[0][0]
order = -n
_heapreplace = heapreplace
for elem in it:
k = key(elem)
if top < k:
_heapreplace(result, (k, order, elem))
top, _order, _elem = result[0]
order -= 1
result.sort(reverse=True)
return [elem for (k, order, elem) in result]

Most "pythonic" way of populating a nested indexed list from a flat list

I have a situation where I am generating a number of template nested lists with n organised elements where each number in the template corresponds to the index from a flat list of n values:
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
For each of these templates, the values inside them correspond to the index value from a flat list like so:
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
to get;
B = [[["c","e"],["a","d"]], [["b","f"],["g","h"]],[["k","j"],["i","l"],["n","m"]]]
How can I populate the structure S with the values from list A to get B, considering that:
- the values of list A can change in value but not in a number
- the template can have any depth of nested structure of but will only use an index from A once as the example shown above.
I did this with the very ugly append unflatten function below that works if the depth of the template is not more then 3 levels. Is there a better way of accomplishing it using generators, yield so it works for any arbitrary depth of template.
Another solution I thought but couldn't implement is to set the template as a string with generated variables and then assigning the variables with new values using eval()
def unflatten(item, template):
# works up to 3 levels of nested lists
tree = []
for el in template:
if isinstance(el, collections.Iterable) and not isinstance(el, str):
tree.append([])
for j, el2 in enumerate(el):
if isinstance(el2, collections.Iterable) and not isinstance(el2, str):
tree[-1].append([])
for k, el3 in enumerate(el2):
if isinstance(el3, collections.Iterable) and not isinstance(el3, str):
tree[-1][-1].append([])
else:
tree[-1][-1].append(item[el3])
else:
tree[-1].append(item[el2])
else:
tree.append(item[el])
return tree
I need a better solution that can be employed to accomplish this when doing the above recursively and for n = 100's of organised elements.
UPDATE 1
The timing function I am using is this one:
def timethis(func):
'''
Decorator that reports the execution time.
'''
#wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
print(func.__name__, end-start)
return result
return wrapper
and I am wrapping the function suggested by #DocDrivin inside another to call it with a one-liner. Below it is my ugly append function.
#timethis
def unflatten(A, S):
for i in range(100000):
# making sure that you don't modify S
rebuilt_list = copy.deepcopy(S)
# create the mapping dict
adict = {key: val for key, val in enumerate(A)}
# the recursive worker function
def worker(alist):
for idx, entry in enumerate(alist):
if isinstance(entry, list):
worker(entry)
else:
# might be a good idea to catch key errors here
alist[idx] = adict[entry]
#build list
worker(rebuilt_list)
return rebuilt_list
#timethis
def unflatten2(A, S):
for i in range (100000):
#up to level 3
temp_tree = []
for i, el in enumerate(S):
if isinstance(el, collections.Iterable) and not isinstance(el, str):
temp_tree.append([])
for j, el2 in enumerate(el):
if isinstance(el2, collections.Iterable) and not isinstance(el2, str):
temp_tree[-1].append([])
for k, el3 in enumerate(el2):
if isinstance(el3, collections.Iterable) and not isinstance(el3, str):
temp_tree[-1][-1].append([])
else:
temp_tree[-1][-1].append(A[el3])
else:
temp_tree[-1].append(A[el2])
else:
temp_tree.append(A[el])
return temp_tree
The recursive method is much better syntax, however, it is considerably slower then using the append method.
You can do this by using recursion:
import copy
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
# making sure that you don't modify S
B = copy.deepcopy(S)
# create the mapping dict
adict = {key: val for key, val in enumerate(A)}
# the recursive worker function
def worker(alist):
for idx, entry in enumerate(alist):
if isinstance(entry, list):
worker(entry)
else:
# might be a good idea to catch key errors here
alist[idx] = adict[entry]
worker(B)
print(B)
This yields the following output for B:
[[['c', 'e'], ['a', 'd']], [['b', 'f'], ['g', 'h']], [['k', 'j'], ['i', 'l'], ['n', 'm']]]
I did not check if the list entry can actually be mapped with the dict, so you might want to add a check (marked the spot in the code).
Small edit: just saw that your desired output (probably) has a typo. Index 3 maps to "d", not to "c". You might want to edit that.
Big edit: To prove that my proposal is not as catastrophic as it seems at a first glance, I decided to include some code to test its runtime. Check this out:
import timeit
setup1 = '''
import copy
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
adict = {key: val for key, val in enumerate(A)}
# the recursive worker function
def worker(olist):
alist = copy.deepcopy(olist)
for idx, entry in enumerate(alist):
if isinstance(entry, list):
worker(entry)
else:
alist[idx] = adict[entry]
return alist
'''
code1 = '''
worker(S)
'''
setup2 = '''
import collections
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
def unflatten2(A, S):
#up to level 3
temp_tree = []
for i, el in enumerate(S):
if isinstance(el, collections.Iterable) and not isinstance(el, str):
temp_tree.append([])
for j, el2 in enumerate(el):
if isinstance(el2, collections.Iterable) and not isinstance(el2, str):
temp_tree[-1].append([])
for k, el3 in enumerate(el2):
if isinstance(el3, collections.Iterable) and not isinstance(el3, str):
temp_tree[-1][-1].append([])
else:
temp_tree[-1][-1].append(A[el3])
else:
temp_tree[-1].append(A[el2])
else:
temp_tree.append(A[el])
return temp_tree
'''
code2 = '''
unflatten2(A, S)
'''
print(f'Recursive func: { [i/10000 for i in timeit.repeat(setup = setup1, stmt = code1, repeat = 3, number = 10000)] }')
print(f'Original func: { [i/10000 for i in timeit.repeat(setup = setup2, stmt = code2, repeat = 3, number = 10000)] }')
I am using the timeit module to do my tests. When running this snippet, you will get an output similar to this:
Recursive func: [8.74395573977381e-05, 7.868373290111777e-05, 7.9051584698027e-05]
Original func: [3.548609419958666e-05, 3.537480780214537e-05, 3.501355930056888e-05]
These are the average times of 10000 iterations, and I decided to run it 3 times to show the fluctuation. As you can see, my function in this particular case is 2.22 to 2.50 times slower than the original, but still acceptable. The slowdown is probably due to using deepcopy.
Your test has some flaws, e.g. you redefine the mapping dict at every iteration. You wouldn't do that normally, instead you would give it as a param to the function after defining it once.
You can use generators with recursion
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
S = [[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = {k: v for k, v in enumerate(A)}
def worker(alist):
for e in alist:
if isinstance(e, list):
yield list(worker(e))
else:
yield A[e]
def do(alist):
return list(worker(alist))
This is also a recursive approach, just avoiding individual item assignment and letting list do the work by reading the values "hot off the CPU" from your generator. If you want, you can Try it online!-- setup1 and setup2 copied from #DocDriven 's answer (but I recommend you don't exaggerate with the numbers, do it locally if you want to play around).
Here are example time numbers:
My result: [0.11194685893133283, 0.11086182110011578, 0.11299032904207706]
result1: [1.0810202199500054, 1.046933784848079, 0.9381260159425437]
result2: [0.23467918601818383, 0.236218704842031, 0.22498539905063808]

How to say if a tree is included in another one?

I would like to create an alogritmo that allows me to say if a tree is included in another. Thanks to this site I managed an algorithm that allows me to know for binary trees, but I would like to generalize it.
def isSubtree(T,S):
'''
function to say if a tree S is a subtree of another one, T
'''
# Base Case
if S is None:
return True
if T is None:
return True
# Check the tree with root as current node
if (areIdentical(T, S)):
return True
# IF the tree with root as current node doesn't match
# then try left and right subtreee one by one
return isSubtree(T.children, S) or isSubtree(T.children, S) # won't work because we have several children
def areIdentical(root1, root2):
'''
function to say if two roots are identical
'''
# Base Case
if root1 is None and root2 is None:
return True
if root1 is None or root2 is None:
return False
# Check if the data of both roots their and children are the same
return (root1.data == root2.data and
areIdentical(root1.children, root2.children)) # Here problem because it won't work for children
Expectd output
For instance :
>>># first tree creation
>>>text = start becoming popular
>>>textSpacy = spacy_nlp(text1)
>>>treeText = nltk_spacy_tree(textSpacy)
>>>t = WordTree(treeText[0])
>>># second tree creation
>>>question = When did Beyonce start becoming popular?
>>>questionSpacy = spacy_nlp(question)
>>>treeQuestion = nltk_spacy_tree(questionSpacy)
>>>q = WordTree(treeQuestion[0])
>>># tree comparison
>>>isSubtree(t,q)
True
In case this may be useful, here is the WordTree class that I used:
class WordTree:
'''Tree for spaCy dependency parsing array'''
def __init__(self, tree, is_subtree=False):
"""
Construct a new 'WordTree' object.
:param array: The array contening the dependency
:param parent: The parent of the array if exists
:return: returns nothing
"""
self.parent = []
self.children = []
self.data = tree.label().split('_')[0] # the first element of the tree # We are going to add the synonyms as well.
for subtree in tree:
if type(subtree) == Tree:
# Iterate through the depth of the subtree.
t = WordTree(subtree, True)
t.parent=tree.label().split('_')[0]
elif type(subtree) == str:
surface_form = subtree.split('_')[0]
self.children.append(surface_form)
It works very well with trees made with Spacy phrases.
question = "When did Beyonce start becoming popular?"
questionSpacy = spacy_nlp(question)
treeQuestion = nltk_spacy_tree(questionSpacy)
t = WordTree(treeQuestion[0])
You can just iterate over all the children of T, and if S is a subtree of any of the children of T, then S is a subtree of T. Also, you should return False when T is None because it means that you are already at a leaf of T and S is still not found to be a subtree:
def isSubtree(T,S):
if S is None:
return True
if T is None:
return False
if areIdentical(T, S):
return True
return any(isSubtree(c, S) for c in T.children)

Resources