How to find match between two 2D lists in Python? - python-3.x

Lets say I have two 2D lists like this:
list1 = [ ['A', 5], ['X', 7], ['P', 3]]
list2 = [ ['B', 9], ['C', 5], ['A', 3]]
I want to compare these two lists and find where the 2nd item matches between the two lists e.g here we can see that numbers 5 and 3 appear in both lists. The first item is actually not relevant in comparison.
How do I compare the lists and copy those values that appear in 2nd column of both lists? Using 'x in list' does not work since these are 2D lists. Do I create another copy of the lists with just the 2nd column copied across?
It is possible that this can be done using list comprehension but I am not sure about it so far.
There might be a duplicate for this but I have not found it yet.

The pursuit of one-liners is a futile exercise. They aren't always more efficient than the regular loopy way, and almost always less readable when you're writing anything more complicated than one or two nested loops. So let's get a multi-line solution first. Once we have a working solution, we can try to convert it to a one-liner.
Now the solution you shared in the comments works, but it doesn't handle duplicate elements and also is O(n^2) because it contains a nested loop. https://wiki.python.org/moin/TimeComplexity
list_common = [x[1] for x in list1 for y in list2 if x[1] == y[1]]
A few key things to remember:
A single loop O(n) is better than a nested loop O(n^2).
Membership lookup in a set O(1) is much quicker than lookup in a list O(n).
Sets also get rid of duplicates for you.
Python includes set operations like union, intersection, etc.
Let's code something using these points:
# Create a set containing all numbers from list1
set1 = set(x[1] for x in list1)
# Create a set containing all numbers from list2
set2 = set(x[1] for x in list2)
# Intersection contains numbers in both sets
intersection = set1.intersection(set2)
# If you want, convert this to a list
list_common = list(intersection)
Now, to convert this to a one-liner:
list_common = list(set(x[1] for x in list1).intersection(x[1] for x in list2))
We don't need to explicitly convert x[1] for x in list2 to a set because the set.intersection() function takes generator expressions and internally handles the conversion to a set.
This gives you the result in O(n) time, and also gets rid of duplicates in the process.

Related

Is there a way to create combinations that preserve the order of elements in a list?

I have a function that supplies me with a list of lists. The length of the list corresponds to the length of the combination, while the length of the sublist corresponds to the different letters that can be used in that position. So, for instance the expected combinations for this list [['W'], ['I'], ['C', 'J'], ['K', 'Y']] are "WICK", "WICY", "WIJK", and "WIJY". I know how to generate those combinations using nested for loops as follows:
for a in lst[0]:
for b in lst[1]:
for c in lst[2]:
for d in lst[3]:
print(a+b+c+d)
However, since the length of each list may vary, doing it manually is not feasible for my program. Is there a way I can do this automatically?
I think what you're looking for is product (short for Cartesian product) which is in the itertools module. You can read about it here.
Here is the sample code:
import itertools as it
data = [['W'], ['I'], ['C', 'J'], ['K', 'Y']] #not a very good variable name
combos = list(it.product(*data))
This is hardcoreded considering 4 elements in Parent List # As per Qn
lst = [['W'], ['I'], ['C', 'J'], ['K', 'Y']]
for a in range(len(lst[0])):
for b in range(len(lst[1])):
for c in range(len(lst[2])):
for d in range(len(lst[3])):
print(lst[0][a]+lst[1][b]+lst[2][c]+lst[3][d])

Python: Faster way to filter a list using list comprehension

Consider the following problem: I want to keep elements of list1 that belongs to list2. So I can do something like this:
filtered_list = [w for w in list1 if w in list2]
I need to repeat this same procedure for different examples of list1 (about 20000 different examples) and a "constant" (frozen) list2.
How can I speed up the process?
I also know the following properties:
1) list1 has repeated elements and it is not sorted and it has about 10000 (ten thousand) items.
2) list2 is a giant sorted list (about 200000 - two hundred thousand) entries in Python) and each element is unique.
The first thing that comes to me is that maybe I can use a kind of binary search. However, is there a way to do this in Python?
Furthermore, I do not mind if filtered_list has the same order of items of list1. So, maybe I can check only a unrepeated version of list1 and after removing the elements in list1 that do not belong to list 2, I can return the repeated items.
Is there a fast way to do this in Python 3?
Convert list2 to a set:
# do once
set2 = set(list2)
# then every time
filtered_list = [w for w in list1 if w in set2]
x in list2 is sequential; x in set2 uses the same mechanism as dictionaries, resulting in a very quick lookup.
If list1 didn't have duplicates, converting both to sets and taking set intersection would be the way to go:
filtered_set = set1 & set2
but with duplicates you're stuck with iterating over list1 as above.
(As you said, you could even see elements that you should delete, using set1 - set2, but then you'd still be stuck in a loop in order to delete - there shouldn't be any difference in performance between filtering keepers vs filtering trash, you still have to iterate over list1, so that's no win over the method above.)
EDIT in response to comment: Converting list1 to a Counter would might (EDIT: or not; testing needed!) speed it up if you can use it normally like that (i.e. you never have a list, you always just deal with a Counter). But if you have to preprocess list1 into counter1 each time you do the above operation, again it's no win - creating a Counter will again involve a loop.

Grouping elements in list by equivalence class

I have an issue when trying to make new lists from one list by applying sets.
Suppose I have the following list:
L=[[(a),(b),(c)],[(b),(c),(a)],[(a),(c),(b)],[(a),(d),(b)]]
And I wish to just creat ONE list from the lists in L which have the same elements. We can clearly see that:
[(a),(b),(c)], [(b),(c),(a)] and [(a),(c),(b)]
when seen as sets, they are the same, because all share the elements (a), (b) and (c).
So if I wish to create new lists from L applying this rule:
I would then need two new lists, which are:
[(a),(b),(c)] and [(a),(d),(b)]
since
[(a),(d),(b)]
seen as a set differs from the rest of the lists.
What would be an optimal way to do this? I know how to convert an element inside L as a set, but if I wish to apply this rule in order to create only two independent lists, what should I do?
A set of frozensets would get you roughly what you want (though it won't preserve order):
unique_sets = {frozenset(lst) for lst in L}
Though order is lost in the set conversion, converting back to a list of lists is fairly easy:
unique_lists = [list(s) for s in unique_sets]
You can make a set of frozensets to get only the unique collections ignoring order and counts of items:
set(map(frozenset, L))
# {frozenset({'a', 'd', 'b'}), frozenset({'a', 'c', 'b'})}
It's then pretty trivial to convert those back to lists:
list(map(list, set(map(frozenset, L))))
# [['a', 'd', 'b'], ['a', 'c', 'b']]
If you'd be willing to write a hash method for set then you could do:
import itertools
[k for k, g in itertools.groupby(sorted([set(y) for y in x], key = your_hash))]

Get name of elements of a OrderedDict in pandas [duplicate]

With Python 2.7, I can get dictionary keys, values, or items as a list:
>>> newdict = {1:0, 2:0, 3:0}
>>> newdict.keys()
[1, 2, 3]
With Python >= 3.3, I get:
>>> newdict.keys()
dict_keys([1, 2, 3])
How do I get a plain list of keys with Python 3?
This will convert the dict_keys object to a list:
list(newdict.keys())
On the other hand, you should ask yourself whether or not it matters. It is Pythonic to assume duck typing -- if it looks like a duck and it quacks like a duck, it is a duck. The dict_keys object can be iterated over just like a list. For instance:
for key in newdict.keys():
print(key)
Note that dict_keys doesn't support insertion newdict[k] = v, though you may not need it.
Python >= 3.5 alternative: unpack into a list literal [*newdict]
New unpacking generalizations (PEP 448) were introduced with Python 3.5 allowing you to now easily do:
>>> newdict = {1:0, 2:0, 3:0}
>>> [*newdict]
[1, 2, 3]
Unpacking with * works with any object that is iterable and, since dictionaries return their keys when iterated through, you can easily create a list by using it within a list literal.
Adding .keys() i.e [*newdict.keys()] might help in making your intent a bit more explicit though it will cost you a function look-up and invocation. (which, in all honesty, isn't something you should really be worried about).
The *iterable syntax is similar to doing list(iterable) and its behaviour was initially documented in the Calls section of the Python Reference manual. With PEP 448 the restriction on where *iterable could appear was loosened allowing it to also be placed in list, set and tuple literals, the reference manual on Expression lists was also updated to state this.
Though equivalent to list(newdict) with the difference that it's faster (at least for small dictionaries) because no function call is actually performed:
%timeit [*newdict]
1000000 loops, best of 3: 249 ns per loop
%timeit list(newdict)
1000000 loops, best of 3: 508 ns per loop
%timeit [k for k in newdict]
1000000 loops, best of 3: 574 ns per loop
with larger dictionaries the speed is pretty much the same (the overhead of iterating through a large collection trumps the small cost of a function call).
In a similar fashion, you can create tuples and sets of dictionary keys:
>>> *newdict,
(1, 2, 3)
>>> {*newdict}
{1, 2, 3}
beware of the trailing comma in the tuple case!
list(newdict) works in both Python 2 and Python 3, providing a simple list of the keys in newdict. keys() isn't necessary.
You can also use a list comprehension:
>>> newdict = {1:0, 2:0, 3:0}
>>> [k for k in newdict.keys()]
[1, 2, 3]
Or, shorter,
>>> [k for k in newdict]
[1, 2, 3]
Note: Order is not guaranteed on versions under 3.7 (ordering is still only an implementation detail with CPython 3.6).
A bit off on the "duck typing" definition -- dict.keys() returns an iterable object, not a list-like object. It will work anywhere an iterable will work -- not any place a list will. a list is also an iterable, but an iterable is NOT a list (or sequence...)
In real use-cases, the most common thing to do with the keys in a dict is to iterate through them, so this makes sense. And if you do need them as a list you can call list().
Very similarly for zip() -- in the vast majority of cases, it is iterated through -- why create an entire new list of tuples just to iterate through it and then throw it away again?
This is part of a large trend in python to use more iterators (and generators), rather than copies of lists all over the place.
dict.keys() should work with comprehensions, though -- check carefully for typos or something... it works fine for me:
>>> d = dict(zip(['Sounder V Depth, F', 'Vessel Latitude, Degrees-Minutes'], [None, None]))
>>> [key.split(", ") for key in d.keys()]
[['Sounder V Depth', 'F'], ['Vessel Latitude', 'Degrees-Minutes']]
If you need to store the keys separately, here's a solution that requires less typing than every other solution presented thus far, using Extended Iterable Unpacking (Python3.x+):
newdict = {1: 0, 2: 0, 3: 0}
*k, = newdict
k
# [1, 2, 3]
Operation
no. Of characters
k = list(d)
9 characters (excluding whitespace)
k = [*d]
6 characters
*k, = d
5 characters
Converting to a list without using the keys method makes it more readable:
list(newdict)
and, when looping through dictionaries, there's no need for keys():
for key in newdict:
print key
unless you are modifying it within the loop which would require a list of keys created beforehand:
for key in list(newdict):
del newdict[key]
On Python 2 there is a marginal performance gain using keys().
Yes, There is a better and simplest way to do this in python3.X
use inbuild list() function
#Devil
newdict = {1:0, 2:0, 3:0}
key_list = list(newdict)
print(key_list)
#[1, 2, 3]
I can think of 2 ways in which we can extract the keys from the dictionary.
Method 1: -
To get the keys using .keys() method and then convert it to list.
some_dict = {1: 'one', 2: 'two', 3: 'three'}
list_of_keys = list(some_dict.keys())
print(list_of_keys)
-->[1,2,3]
Method 2: -
To create an empty list and then append keys to the list via a loop.
You can get the values with this loop as well (use .keys() for just keys and .items() for both keys and values extraction)
list_of_keys = []
list_of_values = []
for key,val in some_dict.items():
list_of_keys.append(key)
list_of_values.append(val)
print(list_of_keys)
-->[1,2,3]
print(list_of_values)
-->['one','two','three']
Beyond the classic (and probably more correct) way to do this (some_dict.keys()) there is also a more "cool" and surely more interesting way to do this:
some_dict = { "foo": "bar", "cool": "python!" }
print( [*some_dict] == ["foo", "cool"] ) # True
Note: this solution shouldn't be used in a develop environment; I showed it here just because I thought it was quite interesting from the *-operator-over-dictionary side of view. Also, I'm not sure whether this is a documented feature or not, and its behaviour may change in later versions :)
You can you use simple method like below
keys = newdict.keys()
print(keys)
This is the best way to get key List in one line of code
dict_variable = {1:"a",2:"b",3:"c"}
[key_val for key_val in dict_variable.keys()]

Python: How do I compare all items in one list as a string to all items in another list?

for each_word1 in list_a:
###compare all values to...###
for each_word2 in list_b:
if each_word2 in list_b == any_word in list_a:
add each_word2 to list_c
###something like that###
Python does have the figure of sets which are ideal for situation like this. These are data structures that excel in fast comparison of containment, and has the same primitive operations we have in basic Math back in the first school grades.
Just convert one of your lists to a set (this will remove duplcates if any), and use the intersection operation. Convert the result back to a list if you need:
list_c = list(set(list_a).intersection(list_b))
This is what I would do (if I understand you correctly).
list_a = ['abc', 'def', 'ghi']
list_b = ['ghi', 'jkl', 'mno']
list_c = []
for string in list_a:
if string in list_b:
list_c.append(string)

Resources