I have two lists of sets -
attribute = [{0, 1, 2, 3, 6, 7}, {4, 5}]
and
decision = [{0, 1, 2}, {3, 4}, {5}, {6, 7}]
I want -
{3, 4}
Here, {3, 4} is conflicting, as it is neither a subset of {0, 1, 2, 3, 6, 7}, nor of {4, 5}.
My code -
check = []
for i in attribute:
for j in decision:
if j.issubset(i):
check.append(j)
print(check)
for x in decision:
if not x in check:
temp = x
print(temp)
This gives me {3, 4}, but is there any easier (and/ or) faster way to do this?
You can use the following list comprehension:
[d for d in decision if not any(d <= a for a in attribute)]
This returns:
[{3, 4}]
If you want just the first set that satisfies the criteria, you can use next with a generator expression instead:
next(d for d in decision if not any(d <= a for a in attribute))
This returns:
{3, 4}
result = [i for i in decision if not [j for j in attribute if i.issubset(j)]]
result is the list of all set they are not a subset of attribute. :)
This is the compact version of :
result = []
for i in decision:
tmp_list = []
for j in attribute:
if i.issubset(j):
tmp_list.append(j)
if not tmp_list:
result.append(i)
Related
Just an example:
set_0={0,3,4}
set_1={1,3,4}
set_2={1,5,23,8,24}
set_4={1,2,6,10}
set_5={1,60,34,2}
set_6={1,45,32,4}
set_7={1,6,9,14}
set_8={1,56,3,23}
set_9={1,34,23,3}
all_intersection=set.intersection(set_0,set_1,set_2,set_3,set_4, set_5, set_6, set_7, set_8, set_9)
gives empty set. Is there any way I can find the intersection among all possible combinations of 9 out of 10 sets in a pythonic way (perhaps without the brute force approach).
For this dataset I would expect to retrieve 1.
Trying to call intersection on the class set is going to lead to errors, because the returned methods are descriptors.
Though, looks like you need to choose a set from which to form a basis of intersection. But, that intersection won't find the most common value, it will tell you which value is in each set at least once. The Counter class from collections can tell you which values are the most common.
from collections import Counter
set_0 = {0, 3, 4}
set_1 = {1, 3, 4}
set_2 = {1, 5, 23, 8, 24}
set_4 = {1, 2, 6, 10} # you're missing set_3
set_5 = {1, 60, 34, 2}
set_6 = {1, 45, 32, 4}
set_7 = {1, 6, 9, 14}
set_8 = {1, 56, 3, 23}
set_9 = {1, 34, 23, 3}
my_sets = (set_0, set_1, set_2, set_4, set_5, set_6, set_7, set_8, set_9)
values_of_interest = set().union(*my_sets)
values_shared_among_all_sets = values_of_interest.intersection(*my_sets)
counter = Counter(item for collection in my_sets for item in collection)
the_5_most_common_values = counter.most_common(5)
print(f"values in all sets: {values_shared_among_all_sets}")
print(f"most common 5 values: {the_5_most_common_values}")
# this is the output
values in all sets: set()
most common 5 values: [(1, 8), (3, 4), (4, 3), (23, 3), (2, 2)]
I am doing a pattern matching and appending their indices into a list. While appending, i would like to avoid appending any overlapping indices already in the list. I have an example code below that i worked out but do not print exactly what i require.
import re
pat_list=[]
for i in ['ALA', 'AL', 'LA']:
for p in re.finditer(i, 'MALAYALAM'):
if p:
print (i, p.span())
if len(pat_list)==0:
pat_list.append(p.span())
print ('LIST',pat_list)
if len(pat_list) >0:
res=[(idx[0], idx[1]) for idx in pat_list if not p.span()[0] >= idx[0] and
p.span()[0]<= idx[1] or p.span()[1] >= idx[0] and p.span()[1]<= idx[1] ]
print ('RES',res)
What i expect to have in the list is [(1,4), (5,8)] and the rest of the indices should not be added.
For any suggestion or help, i will be very grateful!!
This isn't the most optimized code. But I've implemented it using set so that it can be easily understood what I'm trying to do.
word = 'MALAYALAM'
to_find = ['ALA', 'AL', 'LA']
indices = []
# I am creating list of sets to use the issubset method
for piece in to_find:
for found in re.finditer(piece, word):
indices.append(set(range(found.start(), found.end() + 1)))
# indices: [{1, 2, 3, 4}, {5, 6, 7, 8}, {1, 2, 3}, {5, 6, 7}, {2, 3, 4}, {6, 7, 8}]
non_overlap = []
for left in indices:
for right in indices:
is_subset = False
if left==right:
continue
if left.issubset(right):
is_subset = True
break
# If left is the super-set. i.e. independent of other set
if not is_subset:
non_overlap.append((min(left), max(left)))
# non_overlap: [(1, 4), (5, 8)]
There are definately efficient methods out there. But this is one of the easiest solutions.
I want to create a list that gives an output whereby if the original list has an empty value, it takes the sum average of the adjacent values to replace it in its place. Assume missing data is denoted by -99
def clean_missing_data():
data_list = []
for number, adjacent in enumerate(a):
if (number != -99):
data_list.append(number)
else:
adjacent_left = a[number-1]
adjacent_right = a[number+1]
fill_in = (adjacent_left + adjacent_right) / 2
data_list.append(fill_in)
return data_list
a = [1,2,3,-99,5]
check_data = clean_missing_data()
print('original test case:', a)
print('After clearing, the test case became:', check_data)
OUTPUT
original test case: [1, 2, 3, -99, 5]
After clearing, the test case became: [0, 1, 2, 3, 4]
E.g. For this test case, the missing value is the fourth number of the list (denoted by -99), which means the list takes the sum average of the adjacent data; the values 3 and 5, and replace it back to the list.
In essence, it means: [1,2,3, (3+5)/2, 5]
Please help!
The requirements are a bit unclear, so I'm not 100% sure this does exactly what you want, but this is my best guess for now.
def get_right_number(numbers, i):
""" Recursive function to search for the first valid number to the right """
if i >= len(numbers) - 1:
right = -99
else:
right = numbers[i + 1]
if right == -99:
right = get_right_number(numbers, i+1)
return right
def clean_missing_data(numbers):
print(f'Input: {numbers}')
if all(x == -99 for x in numbers):
print('All values in list are invalid. Could not compute.')
return
clean_numbers = []
for i in range(len(numbers)):
if numbers[i] != -99:
clean_numbers.append(numbers[i])
else:
valid_count = 0
if i == 0:
left = 0
else:
left = clean_numbers[i - 1]
valid_count += 1
right = get_right_number(numbers, i)
if right == -99:
right = 0
else:
valid_count += 1
average = (left + right) / valid_count
clean_numbers.append(average)
print(f'Output: {clean_numbers}\n')
return clean_numbers
Here are my test cases (print is embedded in the clean method above):
clean_missing_data([1, 2, 3, 4, 5])
clean_missing_data([1, 2, 3, -99, 5])
clean_missing_data([-99, 2, 3, 4, 5])
clean_missing_data([-99, -99, 3, 4, 5])
clean_missing_data([1, 2, 3, 4, -99])
clean_missing_data([1, 2, 3, -99, -99])
clean_missing_data([1, -99, -99, -99, 5])
clean_missing_data([-99, -99, -99, -99, -99])
Here are the outputs:
Input: [1, 2, 3, 4, 5]
Output: [1, 2, 3, 4, 5]
Input: [1, 2, 3, -99, 5]
Output: [1, 2, 3, 4.0, 5]
Input: [-99, 2, 3, 4, 5]
Output: [2.0, 2, 3, 4, 5]
Input: [-99, -99, 3, 4, 5]
Output: [3.0, 3.0, 3, 4, 5]
Input: [1, 2, 3, 4, -99]
Output: [1, 2, 3, 4, 4.0]
Input: [1, 2, 3, -99, -99]
Output: [1, 2, 3, 3.0, 3.0]
Input: [1, -99, -99, -99, 5]
Output: [1, 3.0, 4.0, 4.5, 5]
Input: [-99, -99, -99, -99, -99]
All values in list are invalid.
Note that when you have a string of invalid numbers, we will fetch the right-most valid number and take the average with that. This new average will be considered in the calculation for the next number, etc. This performs a kind of interpolation, but it's not a linear interpolation strictly speaking. Without complete requirements, this will have to do for now (on time and under budget!)
If you need to change the requirements, you can tweak the code above until all the test cases do what you need. I'm also sure there's a cleaner way to do this, but I'll leave that to you to figure out. Good luck!
You confounded the variables number and adjacent. The convention is to talk about enumerate(a) returning an index as the location in the array and an element as the element itself. In that case your code becomes
def clean_missing_data():
data_list = []
for index, element in enumerate(a):
if (element != -99):
data_list.append(element)
else:
adjacent_left = a[index - 1]
adjacent_right = a[index + 1]
fill_in = (adjacent_left + adjacent_right) / 2
data_list.append(fill_in)
return data_list
a = [1,2,3,-99,5]
check_data = clean_missing_data()
print('original test case:', a)
print('After clearing, the test case became:', check_data)
which gives [1, 2, 3, 4.0, 5], where 4.0 is of course equivalent to 4
You do need to understand that there are still some problems with the code. What if the first or last number is -99? What if two adjacent numbers are -99? But this should at least work for the example you gave!
Is there an elegant/pythonic method for creating a dictionary from a nested list with enumerate, that enumerates at the sublist level rather that at the list-of-lists level?
Please consider this example:
nested_list = [["ca", "at"], ["li", "if", "fe"], ["ca", "ar"]]
I have tried to convert it into a dictionary that looks like this:
# Desired output.
# {'ca': 0, 'at': 1, 'li': 2, 'if': 3, 'fe': 4, 'ar': 5}
Here is my best attempt, but it appears to enumerate the list at the upper level and overwrite the value for duplicate keys - which is undesirable.
item_dict = {item: i for i, item in enumerate(nested_list) for item in item}
# Current output.
# {'ca': 2, 'at': 0, 'li': 1, 'if': 1, 'fe': 1, 'ar': 2}
Am I better off splitting this task into an un-nesting step, and then a dictionary comprehension step?
All insight is appreciated, thank you.
Using itertools.chain
Ex:
from itertools import chain
nested_list = [["ca", "at"], ["li", "if", "fe"], ["ca", "ar"]]
result = {}
c = 0
for i in chain.from_iterable(nested_list):
if i not in result:
result[i] = c
c += 1
print(result)
Output:
{'ca': 0, 'at': 1, 'li': 2, 'if': 3, 'fe': 4, 'ar': 5}
I think I've solved it, but it's quite ugly...
{key: i for i, key in enumerate(set([item for item in nested_list for item in item]))}
# Output.
{'if': 0, 'at': 1, 'fe': 2, 'li': 3, 'ca': 4, 'ar': 5}
I have a dictionary containing sets as the values, and I would like to make a union of all of these sets using a for loop. I have tried using set.union() with a for loop but I do not think this is working, any simple ways to do this iteration?
for key in self.thisDict.keys():
for otherKey in self.thisDict.keys():
if otherKey!=key:
unionSet=set.union(self.thisDict[otherKey])
The problem I think I am having is that I am not making a union of all sets. I am dealing with a lot of data so it is hard to tell. With the unionSet object I am creating, I am printing out this data and it doesn't seem anywhere as large as I expect it to be
It's fairly naive approach - create a result set, iterate over dict values and update result set with values found in current iteration. |= is an alias for set.update method.
d = {1: {1, 2, 3}, 2: {4, 5, 6}}
result = set()
for v in d.values():
result |= v
assert result == {1, 2, 3, 4, 5, 6}
A simple set comprehension will do:
>>> d = {1: {1, 2, 3}, 2: {4, 5, 6}}
>>> {element for value in d.values() for element in value}
{1, 2, 3, 4, 5, 6}
To my eye, this is more readable:
>>> from itertools import chain
>>> set(chain.from_iterable(d.values()))
{1, 2, 3, 4, 5, 6}