Iteration to make a union of sets

Iteration to make a union of sets - python-3.x

I have a dictionary containing sets as the values, and I would like to make a union of all of these sets using a for loop. I have tried using set.union() with a for loop but I do not think this is working, any simple ways to do this iteration?
for key in self.thisDict.keys():
for otherKey in self.thisDict.keys():
if otherKey!=key:
unionSet=set.union(self.thisDict[otherKey])
The problem I think I am having is that I am not making a union of all sets. I am dealing with a lot of data so it is hard to tell. With the unionSet object I am creating, I am printing out this data and it doesn't seem anywhere as large as I expect it to be

It's fairly naive approach - create a result set, iterate over dict values and update result set with values found in current iteration. |= is an alias for set.update method.
d = {1: {1, 2, 3}, 2: {4, 5, 6}}
result = set()
for v in d.values():
result |= v
assert result == {1, 2, 3, 4, 5, 6}

A simple set comprehension will do:
>>> d = {1: {1, 2, 3}, 2: {4, 5, 6}}
>>> {element for value in d.values() for element in value}
{1, 2, 3, 4, 5, 6}
To my eye, this is more readable:
>>> from itertools import chain
>>> set(chain.from_iterable(d.values()))
{1, 2, 3, 4, 5, 6}

Related

What's the difference between .union and | for sets in python?

What's the difference between .union and | for sets in python?
>>> a = set([1, 2, 3, 4])
>>> b = set([3, 4, 5, 6])
>>> a|b
{1, 2, 3, 4, 5, 6}
>>> a.union(b)
{1, 2, 3, 4, 5, 6}

No difference.
In fact on the official python documentation about sets they are written together.
There is a little difference: one is an operator, so it has specific operator the operator precedence (e.g. if mixed with other set operators). On the function case, the function parenthesis explicitly fix the priority.

Avoiding overlapping tuples in python while appending to a list

I am doing a pattern matching and appending their indices into a list. While appending, i would like to avoid appending any overlapping indices already in the list. I have an example code below that i worked out but do not print exactly what i require.
import re
pat_list=[]
for i in ['ALA', 'AL', 'LA']:
for p in re.finditer(i, 'MALAYALAM'):
if p:
print (i, p.span())
if len(pat_list)==0:
pat_list.append(p.span())
print ('LIST',pat_list)
if len(pat_list) >0:
res=[(idx[0], idx[1]) for idx in pat_list if not p.span()[0] >= idx[0] and
p.span()[0]<= idx[1] or p.span()[1] >= idx[0] and p.span()[1]<= idx[1] ]
print ('RES',res)
What i expect to have in the list is [(1,4), (5,8)] and the rest of the indices should not be added.
For any suggestion or help, i will be very grateful!!

This isn't the most optimized code. But I've implemented it using set so that it can be easily understood what I'm trying to do.
word = 'MALAYALAM'
to_find = ['ALA', 'AL', 'LA']
indices = []
# I am creating list of sets to use the issubset method
for piece in to_find:
for found in re.finditer(piece, word):
indices.append(set(range(found.start(), found.end() + 1)))
# indices: [{1, 2, 3, 4}, {5, 6, 7, 8}, {1, 2, 3}, {5, 6, 7}, {2, 3, 4}, {6, 7, 8}]
non_overlap = []
for left in indices:
for right in indices:
is_subset = False
if left==right:
continue
if left.issubset(right):
is_subset = True
break
# If left is the super-set. i.e. independent of other set
if not is_subset:
non_overlap.append((min(left), max(left)))
# non_overlap: [(1, 4), (5, 8)]
There are definately efficient methods out there. But this is one of the easiest solutions.

Delete all values from a dictionary that occur more than once

I use a dictionary that looks somewhat like this:
data = {1: [3, 5], 2: [1, 2], 3: [1, 2, 3, 4], 4: [1, 2, 3], 5: [1, 2, 3]}
I want to delete values and their corresponding keys in that dictionary that are having exactly the same value. So my dictionary should look like this:
data = {1: [3, 5], 2: [1, 2], 3: [1, 2, 3, 4]}
I've tried to use this right here: Removing Duplicates From Dictionary
But although I tried changing it, it gets quite complicated really fast and there is probably an easier way to do this. I've also tried using count() function, but it did not work. Here is what it looks like. Maybe I declared it the wrong way?
no_duplicates = [value for value in data.values() if data.count(value) == 1]
Is there an easy way the remove all key-value-pairs that are not unique with respect to their values?

You can do this with a dictionary comprehension, where you make a dictionary with the key value pairs where the value count is 1
def get_unique_dict(data):
#Get the list of dictionary values
values = list(data.values())
#Make a new dictionary with key-value pairs where value occurs exactly once
return {key: value for key, value in data.items() if values.count(value) == 1}
data = {1: [3, 5], 2: [1, 2], 3: [1, 2, 3, 4], 4: [1, 2, 3], 5: [1, 2, 3]}
print(get_unique_dict(data))
The output will be
{
1: [3, 5],
2: [1, 2],
3: [1, 2, 3, 4]
}

How to find how many times last value in the inner list are duplicated in a nested list of python?

I had a list:
a = [[0, 2, 5, 12], [1, 2, 3, 4, 8, 9], [2, 4, 12], [3, 8, 9], [4, 5, 7]]
And I want to get 2 because a[0][-1] == a[2][-1], a[1][-1] == a[3][-1]
Two of the last value are duplicated.
I tried to use sum(1 for sublist in a if sublist[-1] == sublist[-1])
it failed and I got stuck for this.

You could use collections.Counter to count the occurrences of the last items:
from collections import Counter
last = Counter([i[-1] for i in a])
# Counter({7: 1, 9: 2, 12: 2})
And then count how many of these are greater than 1:
sum(1 for k,v in last.items() if v>1)
# 2

Create a set of last values from the list, and then count the length:
len(a) - len({x[-1] for x in a})
# 2

Since the place of the duplicates does not matter, the problem is simply length of the list minus the unique values. So:
last_elements=[i[-1] for i in a]
num_duplicates=len(last_elements)-len(set(last_elements)

conflicting cases in python lists

I have two lists of sets -
attribute = [{0, 1, 2, 3, 6, 7}, {4, 5}]
and
decision = [{0, 1, 2}, {3, 4}, {5}, {6, 7}]
I want -
{3, 4}
Here, {3, 4} is conflicting, as it is neither a subset of {0, 1, 2, 3, 6, 7}, nor of {4, 5}.
My code -
check = []
for i in attribute:
for j in decision:
if j.issubset(i):
check.append(j)
print(check)
for x in decision:
if not x in check:
temp = x
print(temp)
This gives me {3, 4}, but is there any easier (and/ or) faster way to do this?

You can use the following list comprehension:
[d for d in decision if not any(d <= a for a in attribute)]
This returns:
[{3, 4}]
If you want just the first set that satisfies the criteria, you can use next with a generator expression instead:
next(d for d in decision if not any(d <= a for a in attribute))
This returns:
{3, 4}

result = [i for i in decision if not [j for j in attribute if i.issubset(j)]]
result is the list of all set they are not a subset of attribute. :)
This is the compact version of :
result = []
for i in decision:
tmp_list = []
for j in attribute:
if i.issubset(j):
tmp_list.append(j)
if not tmp_list:
result.append(i)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Iteration to make a union of sets - python-3.x

It's fairly naive approach - create a result set, iterate over dict values and update result set with values found in current iteration. |= is an alias for set.update method. d = {1: {1, 2, 3}, 2: {4, 5, 6}} result = set() for v in d.values(): result |= v assert result == {1, 2, 3, 4, 5, 6}

A simple set comprehension will do: >>> d = {1: {1, 2, 3}, 2: {4, 5, 6}} >>> {element for value in d.values() for element in value} {1, 2, 3, 4, 5, 6} To my eye, this is more readable: >>> from itertools import chain >>> set(chain.from_iterable(d.values())) {1, 2, 3, 4, 5, 6}

Related

What's the difference between .union and | for sets in python?

Avoiding overlapping tuples in python while appending to a list

Delete all values from a dictionary that occur more than once

How to find how many times last value in the inner list are duplicated in a nested list of python?

conflicting cases in python lists

Categories

Resources