Grouping elements in list by equivalence class - python-3.x

I have an issue when trying to make new lists from one list by applying sets.
Suppose I have the following list:
L=[[(a),(b),(c)],[(b),(c),(a)],[(a),(c),(b)],[(a),(d),(b)]]
And I wish to just creat ONE list from the lists in L which have the same elements. We can clearly see that:
[(a),(b),(c)], [(b),(c),(a)] and [(a),(c),(b)]
when seen as sets, they are the same, because all share the elements (a), (b) and (c).
So if I wish to create new lists from L applying this rule:
I would then need two new lists, which are:
[(a),(b),(c)] and [(a),(d),(b)]
since
[(a),(d),(b)]
seen as a set differs from the rest of the lists.
What would be an optimal way to do this? I know how to convert an element inside L as a set, but if I wish to apply this rule in order to create only two independent lists, what should I do?

A set of frozensets would get you roughly what you want (though it won't preserve order):
unique_sets = {frozenset(lst) for lst in L}
Though order is lost in the set conversion, converting back to a list of lists is fairly easy:
unique_lists = [list(s) for s in unique_sets]

You can make a set of frozensets to get only the unique collections ignoring order and counts of items:
set(map(frozenset, L))
# {frozenset({'a', 'd', 'b'}), frozenset({'a', 'c', 'b'})}
It's then pretty trivial to convert those back to lists:
list(map(list, set(map(frozenset, L))))
# [['a', 'd', 'b'], ['a', 'c', 'b']]

If you'd be willing to write a hash method for set then you could do:
import itertools
[k for k, g in itertools.groupby(sorted([set(y) for y in x], key = your_hash))]

Related

How to find match between two 2D lists in Python?

Lets say I have two 2D lists like this:
list1 = [ ['A', 5], ['X', 7], ['P', 3]]
list2 = [ ['B', 9], ['C', 5], ['A', 3]]
I want to compare these two lists and find where the 2nd item matches between the two lists e.g here we can see that numbers 5 and 3 appear in both lists. The first item is actually not relevant in comparison.
How do I compare the lists and copy those values that appear in 2nd column of both lists? Using 'x in list' does not work since these are 2D lists. Do I create another copy of the lists with just the 2nd column copied across?
It is possible that this can be done using list comprehension but I am not sure about it so far.
There might be a duplicate for this but I have not found it yet.
The pursuit of one-liners is a futile exercise. They aren't always more efficient than the regular loopy way, and almost always less readable when you're writing anything more complicated than one or two nested loops. So let's get a multi-line solution first. Once we have a working solution, we can try to convert it to a one-liner.
Now the solution you shared in the comments works, but it doesn't handle duplicate elements and also is O(n^2) because it contains a nested loop. https://wiki.python.org/moin/TimeComplexity
list_common = [x[1] for x in list1 for y in list2 if x[1] == y[1]]
A few key things to remember:
A single loop O(n) is better than a nested loop O(n^2).
Membership lookup in a set O(1) is much quicker than lookup in a list O(n).
Sets also get rid of duplicates for you.
Python includes set operations like union, intersection, etc.
Let's code something using these points:
# Create a set containing all numbers from list1
set1 = set(x[1] for x in list1)
# Create a set containing all numbers from list2
set2 = set(x[1] for x in list2)
# Intersection contains numbers in both sets
intersection = set1.intersection(set2)
# If you want, convert this to a list
list_common = list(intersection)
Now, to convert this to a one-liner:
list_common = list(set(x[1] for x in list1).intersection(x[1] for x in list2))
We don't need to explicitly convert x[1] for x in list2 to a set because the set.intersection() function takes generator expressions and internally handles the conversion to a set.
This gives you the result in O(n) time, and also gets rid of duplicates in the process.

Is there a way to create combinations that preserve the order of elements in a list?

I have a function that supplies me with a list of lists. The length of the list corresponds to the length of the combination, while the length of the sublist corresponds to the different letters that can be used in that position. So, for instance the expected combinations for this list [['W'], ['I'], ['C', 'J'], ['K', 'Y']] are "WICK", "WICY", "WIJK", and "WIJY". I know how to generate those combinations using nested for loops as follows:
for a in lst[0]:
for b in lst[1]:
for c in lst[2]:
for d in lst[3]:
print(a+b+c+d)
However, since the length of each list may vary, doing it manually is not feasible for my program. Is there a way I can do this automatically?
I think what you're looking for is product (short for Cartesian product) which is in the itertools module. You can read about it here.
Here is the sample code:
import itertools as it
data = [['W'], ['I'], ['C', 'J'], ['K', 'Y']] #not a very good variable name
combos = list(it.product(*data))
This is hardcoreded considering 4 elements in Parent List # As per Qn
lst = [['W'], ['I'], ['C', 'J'], ['K', 'Y']]
for a in range(len(lst[0])):
for b in range(len(lst[1])):
for c in range(len(lst[2])):
for d in range(len(lst[3])):
print(lst[0][a]+lst[1][b]+lst[2][c]+lst[3][d])

Why is sorted() not sorting my list of strings?

Problem:
So I was trying to alphabetically sort my list of strings maybe I overlooked something very minor. I have tried both .sort and sorted() but maybe I didn't do it correctly?
Here is my Code:
words = input("Words: ")
list1 = []
list1.append(words.split())
print(sorted(list1))
Expected output-
Input: "a b d c"
Output: ['a', 'b', 'c', 'd']
Current output-
Input: "a b d c"
Output: [['a', 'b', 'd', 'c']]
Your code is not working because you are trying to sort a list inside a list.
When you call words.split() it returns a list. So when you do list1.append(words.split()) it is appending a list into list1.
You should do this:
words = input("Words: ")
list1 = words.split()
print(sorted(list1))
You can try a simple method as follows:
list1 = [i for i in input('Words: ').split(' ')]
print(sorted(list1))
I've tested it. And it is working
Without deviating from your current effort, the only modification you need to do to fix your code is :
words = input("Words: ")
list1 = []
list1.append(words.split())
print(sorted(list1[0]))
Explanation of what you were doing wrong:
The root cause of your confusion is append() .According to python docs,append() takes exactly one argument.
So when you do this,
words.split()
You are trying to append more than 1 element into the list1 and when you append() something more than 1 in a list, it appends as a nested list (i.e a list inside another list.)
To support my explanation you can see that your code fixed by a simple [0]
print(sorted(list1[0]))
That is because your input is stored as a list of list, AND it is stored in the first index (Point to note - 1st index in a python list is 0, hence the usage of list1[0])
Please let me know if I could have explained it in a more simpler way or if you have any other confusions that aid from the above explanation.

Python: Faster way to filter a list using list comprehension

Consider the following problem: I want to keep elements of list1 that belongs to list2. So I can do something like this:
filtered_list = [w for w in list1 if w in list2]
I need to repeat this same procedure for different examples of list1 (about 20000 different examples) and a "constant" (frozen) list2.
How can I speed up the process?
I also know the following properties:
1) list1 has repeated elements and it is not sorted and it has about 10000 (ten thousand) items.
2) list2 is a giant sorted list (about 200000 - two hundred thousand) entries in Python) and each element is unique.
The first thing that comes to me is that maybe I can use a kind of binary search. However, is there a way to do this in Python?
Furthermore, I do not mind if filtered_list has the same order of items of list1. So, maybe I can check only a unrepeated version of list1 and after removing the elements in list1 that do not belong to list 2, I can return the repeated items.
Is there a fast way to do this in Python 3?
Convert list2 to a set:
# do once
set2 = set(list2)
# then every time
filtered_list = [w for w in list1 if w in set2]
x in list2 is sequential; x in set2 uses the same mechanism as dictionaries, resulting in a very quick lookup.
If list1 didn't have duplicates, converting both to sets and taking set intersection would be the way to go:
filtered_set = set1 & set2
but with duplicates you're stuck with iterating over list1 as above.
(As you said, you could even see elements that you should delete, using set1 - set2, but then you'd still be stuck in a loop in order to delete - there shouldn't be any difference in performance between filtering keepers vs filtering trash, you still have to iterate over list1, so that's no win over the method above.)
EDIT in response to comment: Converting list1 to a Counter would might (EDIT: or not; testing needed!) speed it up if you can use it normally like that (i.e. you never have a list, you always just deal with a Counter). But if you have to preprocess list1 into counter1 each time you do the above operation, again it's no win - creating a Counter will again involve a loop.

Set indexing python

Can we do indexing in a python set, to attain an element from a specific index?
Like accessing a certain element from the below set:
st = {'a', 'b', 'g'}
How to return the second element by indexing?
No. A set is unordered by definition:
A set is an unordered collection with no duplicate elements.
In addition to #learner8269 's answer, if you just need the index of a particular element (for some unknown reasons of life), you can get it using enumerate().
ele = 4
s = set(range(2,10))
for i,j in enumerate(s):
if j == ele:
break
print('Index of %d: %d'%(ele,i))
You want to get a subset using indexes. set is an unordered collection with non-duplicated items. The set's pop method removes a random item from the set. So, it is impossible in general, but if you want to remove a limited number of random items (I can't imagine why anybody needs it), you can call pop multiple times in a loop.
If you have influence on the construction of the set, you can create a dictionary instead of that set where each value of the original set is the key to your index or vice versa whatever suits the need.
>>>{key:i for i,key in enumerate(st)}
{'g': 0, 'a': 1, 'b': 2}
>>>{i:key for i,key in enumerate(st)}
{0: 'g', 1: 'a', 2: 'b'}
As set is un-ordered we need to work around to get what is expected. Following code would do the job:
mySet = set([1, 2, 3])
list(mySet)[0]
This create a new list which contains each member of the set, It won't be a good choice if your set is really large.

Resources