Split/partition list based on invariant/hash? - python-3.x

I have a list [a1,21,...] and would like to split it based on the value of a function f(a).
For example if the input is the list [0,1,2,3,4] and the function def f(x): return x % 3,
I would like to return a list [0,3], [1,4], [2], since the first group all takes values 0 under f, the 2nd group take value 1, etc...
Something like this works:
return [[x for x in lst if f(x) == val] for val in set(map(f,lst))],
But it does not seem optimal (nor pythonic) since the inner loop unnecessarily scans the entire list and computes same f values of the elements several times.
I'm looking for a solution that would compute the value of f ideally once for every element...

If you're not irrationally ;-) set on a one-liner, it's straightforward:
from collections import defaultdict
lst = [0,1,2,3,4]
f = lambda x: x % 3
d = defaultdict(list)
for x in lst:
d[f(x)].append(x)
print(list(d.values()))
displays what you want. f() is executed len(lst) times, which can't be beat
EDIT: or, if you must:
from itertools import groupby
print([[pair[1] for pair in grp]
for ignore, grp in
groupby(sorted((f(x), x) for x in lst),
key=lambda pair: pair[0])])
That doesn't require that f() produce values usable as dict keys, but incurs the extra expense of a sort, and is close to incomprehensible. Clarity is much more Pythonic than striving for one-liners.

#Tim Peters is right, and here is a mentioned setdefault and another itertool.groupby option.
Given
import itertools as it
iterable = range(5)
keyfunc = lambda x: x % 3
Code
setdefault
d = {}
for x in iterable:
d.setdefault(keyfunc(x), []).append(x)
list(d.values())
groupby
[list(g) for _, g in it.groupby(sorted(iterable, key=keyfunc), key=keyfunc)]
See also more on itertools.groupby

Related

creating a list of tuples based on successive items of initial list [duplicate]

I sometimes need to iterate a list in Python looking at the "current" element and the "next" element. I have, till now, done so with code like:
for current, next in zip(the_list, the_list[1:]):
# Do something
This works and does what I expect, but is there's a more idiomatic or efficient way to do the same thing?
Some answers to this problem can simplify by addressing the specific case of taking only two elements at a time. For the general case of N elements at a time, see Rolling or sliding window iterator?.
The documentation for 3.8 provides this recipe:
import itertools
def pairwise(iterable):
"s -> (s0, s1), (s1, s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
For Python 2, use itertools.izip instead of zip to get the same kind of lazy iterator (zip will instead create a list):
import itertools
def pairwise(iterable):
"s -> (s0, s1), (s1, s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return itertools.izip(a, b)
How this works:
First, two parallel iterators, a and b are created (the tee() call), both pointing to the first element of the original iterable. The second iterator, b is moved 1 step forward (the next(b, None)) call). At this point a points to s0 and b points to s1. Both a and b can traverse the original iterator independently - the izip function takes the two iterators and makes pairs of the returned elements, advancing both iterators at the same pace.
Since tee() can take an n parameter (the number of iterators to produce), the same technique can be adapted to produce a larger "window". For example:
def threes(iterator):
"s -> (s0, s1, s2), (s1, s2, s3), (s2, s3, 4), ..."
a, b, c = itertools.tee(iterator, 3)
next(b, None)
next(c, None)
next(c, None)
return zip(a, b, c)
Caveat: If one of the iterators produced by tee advances further than the others, then the implementation needs to keep the consumed elements in memory until every iterator has consumed them (it cannot 'rewind' the original iterator). Here it doesn't matter because one iterator is only 1 step ahead of the other, but in general it's easy to use a lot of memory this way.
Roll your own!
def pairwise(iterable):
it = iter(iterable)
a = next(it, None)
for b in it:
yield (a, b)
a = b
Starting in Python 3.10, this is the exact role of the pairwise function:
from itertools import pairwise
list(pairwise([1, 2, 3, 4, 5]))
# [(1, 2), (2, 3), (3, 4), (4, 5)]
or simply pairwise([1, 2, 3, 4, 5]) if you don't need the result as a list.
I’m just putting this out, I’m very surprised no one has thought of enumerate().
for (index, thing) in enumerate(the_list):
if index < len(the_list):
current, next_ = thing, the_list[index + 1]
#do something
Since the_list[1:] actually creates a copy of the whole list (excluding its first element), and zip() creates a list of tuples immediately when called, in total three copies of your list are created. If your list is very large, you might prefer
from itertools import izip, islice
for current_item, next_item in izip(the_list, islice(the_list, 1, None)):
print(current_item, next_item)
which does not copy the list at all.
Iterating by index can do the same thing:
#!/usr/bin/python
the_list = [1, 2, 3, 4]
for i in xrange(len(the_list) - 1):
current_item, next_item = the_list[i], the_list[i + 1]
print(current_item, next_item)
Output:
(1, 2)
(2, 3)
(3, 4)
I am really surprised nobody has mentioned the shorter, simpler and most importantly general solution:
Python 3:
from itertools import islice
def n_wise(iterable, n):
return zip(*(islice(iterable, i, None) for i in range(n)))
Python 2:
from itertools import izip, islice
def n_wise(iterable, n):
return izip(*(islice(iterable, i, None) for i in xrange(n)))
It works for pairwise iteration by passing n=2, but can handle any higher number:
>>> for a, b in n_wise('Hello!', 2):
>>> print(a, b)
H e
e l
l l
l o
o !
>>> for a, b, c, d in n_wise('Hello World!', 4):
>>> print(a, b, c, d)
H e l l
e l l o
l l o
l o W
o W o
W o r
W o r l
o r l d
r l d !
This is now a simple Import As of 16th May 2020
from more_itertools import pairwise
for current, next in pairwise(your_iterable):
print(f'Current = {current}, next = {nxt}')
Docs for more-itertools
Under the hood this code is the same as that in the other answers, but I much prefer imports when available.
If you don't already have it installed then:
pip install more-itertools
Example
For instance if you had the fibbonnacci sequence, you could calculate the ratios of subsequent pairs as:
from more_itertools import pairwise
fib= [1,1,2,3,5,8,13]
for current, nxt in pairwise(fib):
ratio=current/nxt
print(f'Curent = {current}, next = {nxt}, ratio = {ratio} ')
As others have pointed out, itertools.pairwise() is the way to go on recent versions of Python. However, for 3.8+, a fun and somewhat more concise (compared to the other solutions that have been posted) option that does not require an extra import comes via the walrus operator:
def pairwise(iterable):
a = next(iterable)
yield from ((a, a := b) for b in iterable)
A basic solution:
def neighbors( list ):
i = 0
while i + 1 < len( list ):
yield ( list[ i ], list[ i + 1 ] )
i += 1
for ( x, y ) in neighbors( list ):
print( x, y )
Pairs from a list using a list comprehension
the_list = [1, 2, 3, 4]
pairs = [[the_list[i], the_list[i + 1]] for i in range(len(the_list) - 1)]
for [current_item, next_item] in pairs:
print(current_item, next_item)
Output:
(1, 2)
(2, 3)
(3, 4)
code = '0016364ee0942aa7cc04a8189ef3'
# Getting the current and next item
print [code[idx]+code[idx+1] for idx in range(len(code)-1)]
# Getting the pair
print [code[idx*2]+code[idx*2+1] for idx in range(len(code)/2)]

How to make create two list with one generator in python

I am trying to create two separate lists from a base list with only one generator but do not know how to do it.
this is the idea, I am wondering if there is a way to create the list's b and c below while only looping through a once.
a = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
b = [x[:2] for x in a]
c = [x[2:] for x in a]
What I did before this was just use a for loop through a and constantly append x[:2], x[2:] to b, c with every iteration but after using timeit module I found that using a generator is actually faster, and so I moved on to using two separate generators but now after using timeit with the above python code it seems to be just as slow as before the generators. I suspect it is because I am iterating through the list a twice now.
So basically my question is, what is the most efficient way to create b and c given a two dimensional list, for my application the base list, a, are quite large and so I need it as efficient as possible.
TL;DR: I would suggest to keep using list comprehension and bench mark before making any optimizations (if you really think you need them).
I tried four ways:
Using loops:
def use_loop(a):
b = []
c = []
for item in a:
b.append(item[:2])
c.append(item[2:])
return (b,c)
Using list comprehension twice:
def use_comprehension(a):
b = [x[:2] for x in a]
c = [x[2:] for x in a]
return (b,c)
Using list comprehension with zip
def use_comprehension_with_zip(a):
data = [[], []]
b, c = zip(*[(x[:2], x[2:]) for x in a])
return (list(b),list(c))
Usings threads is overkill and it will definitely increase your time.
def get_shorter_list(a, index, ans):
if index == 0:
for item in a:
ans.append(item[:2])
else:
for item in a:
ans.append(item[2:])
def use_threads(a):
b = []
c = []
data = {0:b, 1:c}
threads = []
for x in range(2):
thread = threading.Thread(target = get_shorter_list, args=(a, x, data.get(x)))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
return (b,c)
Then I used timeit for the three methods
function_list = [
use_loop,
use_comprehension,
use_comprehension_with_zip,
use_threads
]
assert use_loop(a) == use_comprehension(a) == use_comprehension_with_zip(a) == use_threads(a)
for function in function_list:
print(f'Time taken by {function.__name__}: {timeit.timeit("function(a)", globals = globals(), number = 10000)}')
That did not show a huge degradation by using list comprehension which I always prefer for clarity, brevity and speed.
Time taken by use_loop: 0.01102178400469711
Time taken by use_comprehension: 0.011585937994823325
Time taken by use_comprehension_with_zip: 0.0187399349961197
Time taken by use_threads: 2.036599840997951

Permutations in a list

I have a list containing n integers. The ith element of the list a, a[i], can be swapped into any integer x such that 0 ≤ x ≤ a[i]. For example if a[i] is 3, it can take values 0, 1, 2, 3.
The task is to find all permutations of such list. For example, if the list is
my_list = [2,1,4]
then the possible permutations are:
[0,0,0], [0,0,1], ... [0,0,4],
[0,1,0], [0,1,1], ... [0,1,4],
[1,0,0], [1,0,1], ... [1,0,4],
[1,1,0], [1,1,1], ... [1,1,4],
[2,0,0], [2,0,1], ... [2,0,4],
[2,1,0], [2,1,1], ... [2,1,4]
How to find all such permutations?
you could use a comibation of range to get all the 'valid' values for each element of the list and itertools.product:
import itertools
my_list = [2,1,4]
# get a list of lists with all the possible values
plist = [list(range(y+1)) for y in my_list]
#
permutations = sorted(list(itertools.product(*plist)))
more on itertools product see e.g. here on SO or the docs.
Here's a solution:
my_list=[2,1,4]
def premutation_list(p_list):
the_number=int("".join(map(str,p_list)))
total_len=len(str(the_number))
a=[i for i in range(the_number)]
r_list=[]
for i in a:
if len(str(i))<total_len:
add_rate=total_len - len(str(i))
b="0,"*add_rate
b=b.split(",")
b=b[0:len(b)-1]
b.append(str(i))
r_list.append([int(y) for x in b for y in x ])
else:
r_list.append([int(x) for x in str(i)])
return r_list
print(premutation_list(my_list))
Explanation:
The basic idea is just getting all the numbers till the given number. For example till 4 there are 0,1,2,3, number.
I have achieved this first by converting the list into a integer.
Then getting all the numbers till the_number.
Try this. Let me know if I misunderstood your question
def permute(l,cnt,n):
if cnt==n:
print(l)
return
limit = l[cnt]
for i in range(limit+1):
l[cnt]=i
permute(l[:n],cnt+1,n)
l =[2,1,4]
permute(l,0,3)

Check if element is occurring very first time in python list

I have a list with values occurring multiple times. I want to loop over the list and check if value is occurring very first time.
For eg: Let's say I have a one list like ,
L = ['a','a','a','b','b','b','b','b','e','e','e'.......]
Now, at every first occurrence of element, I want to perform some set of tasks.
How to get the first occurrence of element?
Thanks in Advance!!
Use a set to check if you had processed that item already:
visited = set()
L = ['a','a','a','b','b','b','b','b','e','e','e'.......]
for e in L:
if e not in visited:
visited.add(e)
# process first time tasks
else:
# process not first time tasks
You can use unique_everseen from itertools recipes.
This function returns a generator which yield only the first occurence of an element.
Code
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Example
lst = ['a', 'a', 'b', 'c', 'b']
for x in unique_everseen(lst):
print(x) # Do something with the element
Output
a
b
c
The function unique_everseen also allows to pass a key for comparison of elements. This is useful in many cases, by example if you also need to know the position of each first occurence.
Example
lst = ['a', 'a', 'b', 'c', 'b']
for i, x in unique_everseen(enumerate(lst), key=lambda x: x[1]):
print(i, x)
Output
0 a
2 b
3 c
Why not using that?
L = ['a','a','a','b','b','b','b','b','e','e','e'.......]
for idxL, L_idx in enumerate(L):
if (L.index(L_idx) == idxL):
print("This is first occurence")
For very long lists, it is less efficient than building a set prior to the loop, but seems more direct to write.

If I have duplicates in a list with brackets, what should I do

Suppose I have the following list:
m=[1,2,[1],1,2,[1]]
I wish to take away all duplicates. If it were not for the brackets inside the the list, then I could use:
m=list(set(m))
but when I do this, I get the error:
unhashable type 'set'.
What command will help me remove duplicates so that I could only be left with the list
m=[1,2,[1]]
Thank you
You can do something along these lines:
m=[1,2,[1],1,2,[1]]
seen=set()
nm=[]
for e in m:
try:
x={e}
x=e
except TypeError:
x=frozenset(e)
if x not in seen:
seen.add(x)
nm.append(e)
>>> nm
[1, 2, [1]]
From comments: This method preserves the order of the original list. If you want the numeric types in order first and the other types second, you can do:
sorted(nm, key=lambda e: 0 if isinstance(e, (int,float)) else 1)
The first step will be to convert the inner lists to tuples:
>> new_list = [tuple(i) if type(i) == list else i for i in m]
Then create a set to remove duplicates:
>> no_duplicates = set(new_list)
>> no_duplicates
{1, 2, (1,)}
and you can convert that into list if you wish.
For a more generic solution you can serialize each list item with pickle.dumps before passing them to set(), and then de-serialize the items with pickle.loads:
import pickle
m = list(map(pickle.loads, set(map(pickle.dumps, m))))
If you want the original order to be maintained, you can use a dict (which has become ordered since Python 3.6+) instead of a set:
import pickle
m = list(map(pickle.loads, {k: 1 for k in map(pickle.dumps, m)}))
Or if you need to be compatible with Python 3.5 or earlier versions, you can use collections.OrderedDict instead:
import pickle
from collections import OrderedDict
m = list(map(pickle.loads, OrderedDict((k, 1) for k in map(pickle.dumps, m))))
result = []
for i in m:
flag = True
for j in m:
if i == j:
flag = False
if flag:
result.append(i)
Result will be: [1,2,[1]]
There are ways to make this code shorter, but I'm writing it more verbosely for readability. Also, note that this method is O(n^2), so I wouldn't recommend for long lists. But benefits is the simplicity.
Simple Solution,
m=[1,2,[1],1,2,[1]]
l= []
for i in m:
if i not in l:
l.append(i)
print(l)
[1, 2, [1]]
[Program finished]

Resources