Pytorch how to reshape/reduce the number of filters without altering the shape of the individual filters - pytorch

With a 3D tensor of shape (number of filters, height, width), how can one reduce the number of filters with a reshape which keeps the original filters together as whole blocks?
Assume the new size has dimensions chosen such that a whole number of the original filters can fit side by side in one of the new filters. So an original size of (4, 2, 2) can be reshaped to (2, 2, 4).
A visual explanation of the side by side reshape where you see the standard reshape will alter the individual filter shapes:
I have tried various pytorch functions such as gather and select_index but not found a way to get to the end result in a general manner (i.e. works for different numbers of filters and different filter sizes).
I think it would be easier to rearrange the tensor values after performing the reshape but could not get a tensor of the pytorch reshaped form:
[[[1,2,3,4],
[5,6,7,8]],
[[9,10,11,12],
[13,14,15,16]]]
to:
[[[1,2,5,6],
[3,4,7,8]],
[[9,10,13,14],
[11,12,15,16]]]
for completeness, the original tensor before reshaping:
[[[1,2],
[3,4]],
[[5,6],
[7,8]],
[[9,10],
[11,12]],
[[13,14],
[15,16]]]

Another option is to construct a list of parts and concatenate them
x = torch.arange(4).reshape(4, 1, 1).repeat(1, 2, 2)
y = torch.cat([x[i::2] for i in range(2)], dim=2)
print('Before\n', x)
print('After\n', y)
which gives
Before
tensor([[[0, 0],
[0, 0]],
[[1, 1],
[1, 1]],
[[2, 2],
[2, 2]],
[[3, 3],
[3, 3]]])
After
tensor([[[0, 0, 1, 1],
[0, 0, 1, 1]],
[[2, 2, 3, 3],
[2, 2, 3, 3]]])
Or a little more generally we could write a function that takes groups of neighbors along a source dimension and concatenates them along a destination dimension
def group_neighbors(x, group_size, src_dim, dst_dim):
assert x.shape[src_dim] % group_size == 0
return torch.cat([x[[slice(None)] * (src_dim) + [slice(i, None, group_size)] + [slice(None)] * (len(x.shape) - (src_dim + 2))] for i in range(group_size)], dim=dst_dim)
x = torch.arange(4).reshape(4, 1, 1).repeat(1, 2, 2)
# read as "take neighbors in groups of 2 from dimension 0 and concatenate them in dimension 2"
y = group_neighbors(x, group_size=2, src_dim=0, dst_dim=2)
print('Before\n', x)
print('After\n', y)

You could do it by chunking tensor and then recombining.
def side_by_side_reshape(x):
n_pairs = x.shape[0] // 2
filter_size = x.shape[-1]
x = x.reshape((n_pairs, 2, filter_size, filter_size))
return torch.stack(list(map(lambda x: torch.hstack(x.unbind()), k)))
>> p = torch.arange(1, 91).reshape((10, 3, 3))
>> side_by_side_reshape(p)
tensor([[[ 1, 2, 3, 10, 11, 12],
[ 4, 5, 6, 13, 14, 15],
[ 7, 8, 9, 16, 17, 18]],
[[19, 20, 21, 28, 29, 30],
[22, 23, 24, 31, 32, 33],
[25, 26, 27, 34, 35, 36]],
[[37, 38, 39, 46, 47, 48],
[40, 41, 42, 49, 50, 51],
[43, 44, 45, 52, 53, 54]],
[[55, 56, 57, 64, 65, 66],
[58, 59, 60, 67, 68, 69],
[61, 62, 63, 70, 71, 72]],
[[73, 74, 75, 82, 83, 84],
[76, 77, 78, 85, 86, 87],
[79, 80, 81, 88, 89, 90]]])
but I know it's not ideal since there is map, list and unbind which disrupts memory. This is what I offer till I figure out how to do it via view only (so a real reshape)

Related

Plot a dataframe based on specific group/id in Python

I have a dataset given as such:
#Load the required libraries
import pandas as pd
import matplotlib.pyplot as plt
#Create dataset
data = {'id': [1, 1, 1, 1, 1,1, 1,
2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3,
4, 4, 4, 4,
5, 5, 5, 5, 5,5],
'cycle': [1,2, 3, 4, 5,6,7,
1,2, 3,4,5,6,
1,2, 3, 4, 5,
1,2, 3, 4,
1,2, 3, 4, 5,6,],
'Salary': [7, 7, 7,7,7,7,7,
4, 4, 4,4,4,4,
8,8,8,8,8,
10,10,10,10,
15, 15,15,15,15,15,],
'Jobs': [123, 18, 69, 65, 120, 11, 52,
96, 120,10, 141, 52,6,
101,99, 128, 1, 141,
141,123, 12, 66,
12, 128, 66, 100, 141, 52,],
'Days': [123, 128, 66, 66, 120, 141, 52,
96, 120,120, 141, 52,96,
15,123, 128, 120, 141,
141,123, 128, 66,
123, 128, 66, 120, 141, 52,],
}
#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)
The above dataframe looks as such:
In order to plot the 'cycle' vs 'Salary' for id =1, I have used following codes:
plt.plot(df.groupby(by="id").get_group(1)['cycle'], df.groupby(by="id").get_group(1)['Salary'], label = 'id=1')
plt.xlabel('cycle')
plt.ylabel('Salary')
plt.legend()
plt.xlim(0, 10)
plt.ylim(0, 20)
plt.show()
The plot looks as such:
However, I wish to plot the 'cycle' vs 'Salary' for all id's in one single plot. The graph need to look as such:
Can somebody please let me know how to achieve this task in Python.
Use a pivot:
ax = df.pivot(index='cycle', columns='id', values='Salary').plot()
# display
ax.set_ylim(bottom=0)
ax.set_xlim(left=0)
ax.set_ylabel('Salary')
Output:
swapping axes
using seaborn.lineplot
import seaborn as sns
sns.lineplot(data=df, x='Salary', y='cycle', hue='id',
palette='Set1', estimator=None)
Output:

randomly sample from a high dimensional array along with a specific dimension

There has a 3-dimensional array x of shape (2000,60,5). If we think it represents a video, the 2000 can represent 2000 frames. I would like to randomly sample it along with the first dimension, i.e., get a set of frame samples. For instance, how to get an array of (500,60,5) which is randomly sampled from x along with the first dimension?
You can pass x as the first argument of the choice method. If you don't want repeated frames in your sample, use replace=False.
For example,
In [10]: x = np.arange(72).reshape(9, 2, 4) # Small array for the demo.
In [11]: x
Out[11]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23]],
[[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39]],
[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[48, 49, 50, 51],
[52, 53, 54, 55]],
[[56, 57, 58, 59],
[60, 61, 62, 63]],
[[64, 65, 66, 67],
[68, 69, 70, 71]]])
Sample "frames" from x with the choice method of NumPy random generator instance.
In [12]: rng = np.random.default_rng()
In [13]: rng.choice(x, size=3)
Out[13]:
array([[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [14]: rng.choice(x, size=3, replace=False)
Out[14]:
array([[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[32, 33, 34, 35],
[36, 37, 38, 39]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]]])
Note that the frames will be in random order; if you want to preserve the order, you could use choice to generate an array of indices, then use the sorted indices to pull the frames out of x.

How to Split list into multiple different lists in Python 3 [duplicate]

How do I split a list of arbitrary length into equal sized chunks?
See How to iterate over a list in chunks if the data result will be used directly for a loop, and does not need to be stored.
For the same question with a string input, see Split string every nth character?. The same techniques generally apply, though there are some variations.
Here's a generator that yields evenly-sized chunks:
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
For Python 2, using xrange instead of range:
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in xrange(0, len(lst), n):
yield lst[i:i + n]
Below is a list comprehension one-liner. The method above is preferable, though, since using named functions makes code easier to understand. For Python 3:
[lst[i:i + n] for i in range(0, len(lst), n)]
For Python 2:
[lst[i:i + n] for i in xrange(0, len(lst), n)]
Something super simple:
def chunks(xs, n):
n = max(1, n)
return (xs[i:i+n] for i in range(0, len(xs), n))
For Python 2, use xrange() instead of range().
I know this is kind of old but nobody yet mentioned numpy.array_split:
import numpy as np
lst = range(50)
np.array_split(lst, 5)
Result:
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]
Directly from the (old) Python documentation (recipes for itertools):
from itertools import izip, chain, repeat
def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
The current version, as suggested by J.F.Sebastian:
#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
I guess Guido's time machine works—worked—will work—will have worked—was working again.
These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.
I'm surprised nobody has thought of using iter's two-argument form:
from itertools import islice
def chunk(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())
Demo:
>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice:
from itertools import islice, chain, repeat
def chunk_pad(it, size, padval=None):
it = chain(iter(it), repeat(padval))
return iter(lambda: tuple(islice(it, size)), (padval,) * size)
Demo:
>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
Like the izip_longest-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:
_no_padding = object()
def chunk(it, size, padval=_no_padding):
if padval == _no_padding:
it = iter(it)
sentinel = ()
else:
it = chain(iter(it), repeat(padval))
sentinel = (padval,) * size
return iter(lambda: tuple(islice(it, size)), sentinel)
Demo:
>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
I believe this is the shortest chunker proposed that offers optional padding.
As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here's a final variation that works around that problem in a reasonable way:
_no_padding = object()
def chunk(it, size, padval=_no_padding):
it = iter(it)
chunker = iter(lambda: tuple(islice(it, size)), ())
if padval == _no_padding:
yield from chunker
else:
for ch in chunker:
yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))
Demo:
>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]
Here is a generator that work on arbitrary iterables:
def split_seq(iterable, size):
it = iter(iterable)
item = list(itertools.islice(it, size))
while item:
yield item
item = list(itertools.islice(it, size))
Example:
>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
Simple yet elegant
L = range(1, 1000)
print [L[x:x+10] for x in xrange(0, len(L), 10)]
or if you prefer:
def chunks(L, n): return [L[x: x+n] for x in xrange(0, len(L), n)]
chunks(L, 10)
Don't reinvent the wheel.
UPDATE: The upcoming Python 3.12 introduces itertools.batched, which solves this problem at last. See below.
Given
import itertools as it
import collections as ct
import more_itertools as mit
iterable = range(11)
n = 3
Code
itertools.batched++
list(it.batched(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
more_itertools+
list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]
list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
list(mit.chunked_even(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
(or DIY, if you want)
The Standard Library
list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
d = {}
for i, x in enumerate(iterable):
d.setdefault(i//n, []).append(x)
list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
dd[i//n].append(x)
list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
References
more_itertools.chunked (related posted)
more_itertools.sliced
more_itertools.grouper (related post)
more_itertools.windowed (see also stagger, zip_offset)
more_itertools.chunked_even
zip_longest (related post, related post)
setdefault (ordered results requires Python 3.6+)
collections.defaultdict (ordered results requires Python 3.6+)
+ A third-party library that implements itertools recipes and more. > pip install more_itertools
++Included in Python Standard Library 3.12+. batched is similar to more_itertools.chunked.
How do you split a list into evenly sized chunks?
"Evenly sized chunks", to me, implies that they are all the same length, or barring that option, at minimal variance in length. E.g. 5 baskets for 21 items could have the following results:
>>> import statistics
>>> statistics.variance([5,5,5,5,1])
3.2
>>> statistics.variance([5,4,4,4,4])
0.19999999999999998
A practical reason to prefer the latter result: if you were using these functions to distribute work, you've built-in the prospect of one likely finishing well before the others, so it would sit around doing nothing while the others continued working hard.
Critique of other answers here
When I originally wrote this answer, none of the other answers were evenly sized chunks - they all leave a runt chunk at the end, so they're not well balanced, and have a higher than necessary variance of lengths.
For example, the current top answer ends with:
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
Others, like list(grouper(3, range(7))), and chunk(range(7), 3) both return: [(0, 1, 2), (3, 4, 5), (6, None, None)]. The None's are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables.
Why can't we divide these better?
Cycle Solution
A high-level balanced solution using itertools.cycle, which is the way I might do it today. Here's the setup:
from itertools import cycle
items = range(10, 75)
number_of_baskets = 10
Now we need our lists into which to populate the elements:
baskets = [[] for _ in range(number_of_baskets)]
Finally, we zip the elements we're going to allocate together with a cycle of the baskets until we run out of elements, which, semantically, it exactly what we want:
for element, basket in zip(items, cycle(baskets)):
basket.append(element)
Here's the result:
>>> from pprint import pprint
>>> pprint(baskets)
[[10, 20, 30, 40, 50, 60, 70],
[11, 21, 31, 41, 51, 61, 71],
[12, 22, 32, 42, 52, 62, 72],
[13, 23, 33, 43, 53, 63, 73],
[14, 24, 34, 44, 54, 64, 74],
[15, 25, 35, 45, 55, 65],
[16, 26, 36, 46, 56, 66],
[17, 27, 37, 47, 57, 67],
[18, 28, 38, 48, 58, 68],
[19, 29, 39, 49, 59, 69]]
To productionize this solution, we write a function, and provide the type annotations:
from itertools import cycle
from typing import List, Any
def cycle_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
baskets = [[] for _ in range(min(maxbaskets, len(items)))]
for item, basket in zip(items, cycle(baskets)):
basket.append(item)
return baskets
In the above, we take our list of items, and the max number of baskets. We create a list of empty lists, in which to append each element, in a round-robin style.
Slices
Another elegant solution is to use slices - specifically the less-commonly used step argument to slices. i.e.:
start = 0
stop = None
step = number_of_baskets
first_basket = items[start:stop:step]
This is especially elegant in that slices don't care how long the data are - the result, our first basket, is only as long as it needs to be. We'll only need to increment the starting point for each basket.
In fact this could be a one-liner, but we'll go multiline for readability and to avoid an overlong line of code:
from typing import List, Any
def slice_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
n_baskets = min(maxbaskets, len(items))
return [items[i::n_baskets] for i in range(n_baskets)]
And islice from the itertools module will provide a lazily iterating approach, like that which was originally asked for in the question.
I don't expect most use-cases to benefit very much, as the original data is already fully materialized in a list, but for large datasets, it could save nearly half the memory usage.
from itertools import islice
from typing import List, Any, Generator
def yield_islice_baskets(items: List[Any], maxbaskets: int) -> Generator[List[Any], None, None]:
n_baskets = min(maxbaskets, len(items))
for i in range(n_baskets):
yield islice(items, i, None, n_baskets)
View results with:
from pprint import pprint
items = list(range(10, 75))
pprint(cycle_baskets(items, 10))
pprint(slice_baskets(items, 10))
pprint([list(s) for s in yield_islice_baskets(items, 10)])
Updated prior solutions
Here's another balanced solution, adapted from a function I've used in production in the past, that uses the modulo operator:
def baskets_from(items, maxbaskets=25):
baskets = [[] for _ in range(maxbaskets)]
for i, item in enumerate(items):
baskets[i % maxbaskets].append(item)
return filter(None, baskets)
And I created a generator that does the same if you put it into a list:
def iter_baskets_from(items, maxbaskets=3):
'''generates evenly balanced baskets from indexable iterable'''
item_count = len(items)
baskets = min(item_count, maxbaskets)
for x_i in range(baskets):
yield [items[y_i] for y_i in range(x_i, item_count, baskets)]
And finally, since I see that all of the above functions return elements in a contiguous order (as they were given):
def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
'''
generates balanced baskets from iterable, contiguous contents
provide item_count if providing a iterator that doesn't support len()
'''
item_count = item_count or len(items)
baskets = min(item_count, maxbaskets)
items = iter(items)
floor = item_count // baskets
ceiling = floor + 1
stepdown = item_count % baskets
for x_i in range(baskets):
length = ceiling if x_i < stepdown else floor
yield [items.next() for _ in range(length)]
Output
To test them out:
print(baskets_from(range(6), 8))
print(list(iter_baskets_from(range(6), 8)))
print(list(iter_baskets_contiguous(range(6), 8)))
print(baskets_from(range(22), 8))
print(list(iter_baskets_from(range(22), 8)))
print(list(iter_baskets_contiguous(range(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(range(26), 5))
print(list(iter_baskets_from(range(26), 5)))
print(list(iter_baskets_contiguous(range(26), 5)))
Which prints out:
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]
Notice that the contiguous generator provide chunks in the same length patterns as the other two, but the items are all in order, and they are as evenly divided as one may divide a list of discrete elements.
def chunk(input, size):
return map(None, *([iter(input)] * size))
If you know list size:
def SplitList(mylist, chunk_size):
return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]
If you don't (an iterator):
def IterChunks(sequence, chunk_size):
res = []
for item in sequence:
res.append(item)
if len(res) >= chunk_size:
yield res
res = []
if res:
yield res # yield the last, incomplete, portion
In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).
I saw the most awesome Python-ish answer in a duplicate of this question:
from itertools import zip_longest
a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]
You can create n-tuple for any n. If a = range(1, 15), then the result will be:
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]
If the list is divided evenly, then you can replace zip_longest with zip, otherwise the triplet (13, 14, None) would be lost. Python 3 is used above. For Python 2, use izip_longest.
[AA[i:i+SS] for i in range(len(AA))[::SS]]
Where AA is array, SS is chunk size. For example:
>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3
To expand the ranges in py3 do
(py3) >>> [list(AA[i:i+SS]) for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
With Assignment Expressions in Python 3.8 it becomes quite nice:
import itertools
def batch(iterable, size):
it = iter(iterable)
while item := list(itertools.islice(it, size)):
yield item
This works on an arbitrary iterable, not just a list.
>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
UPDATE
Starting with Python 3.12, this exact implementation is available as itertools.batched
If you had a chunk size of 3 for example, you could do:
zip(*[iterable[i::3] for i in range(3)])
source:
http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/
I would use this when my chunk size is fixed number I can type, e.g. '3', and would never change.
The toolz library has the partition function for this:
from toolz.itertoolz.core import partition
list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]
I was curious about the performance of different approaches and here it is:
Tested on Python 3.5.1
import time
batch_size = 7
arr_len = 298937
#---------slice-------------
print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
if not arr:
break
tmp = arr[0:batch_size]
arr = arr[batch_size:-1]
print(time.time() - start)
#-----------index-----------
print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)
#----------batches 1------------
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#----------batches 2------------
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([next(batchiter)], batchiter)
print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#---------chunks-------------
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
tmp = x
print(time.time() - start)
#-----------grouper-----------
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(iterable, n, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
tmp = x
print(time.time() - start)
Results:
slice
31.18285083770752
index
0.02184295654296875
batches 1
0.03503894805908203
batches 2
0.22681021690368652
chunks
0.019841909408569336
grouper
0.006506919860839844
You may also use get_chunks function of utilspie library as:
>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]
You can install utilspie via pip:
sudo pip install utilspie
Disclaimer: I am the creator of utilspie library.
I like the Python doc's version proposed by tzot and J.F.Sebastian a lot,
but it has two shortcomings:
it is not very explicit
I usually don't want a fill value in the last chunk
I'm using this one a lot in my code:
from itertools import islice
def chunks(n, iterable):
iterable = iter(iterable)
while True:
yield tuple(islice(iterable, n)) or iterable.next()
UPDATE: A lazy chunks version:
from itertools import chain, islice
def chunks(n, iterable):
iterable = iter(iterable)
while True:
yield chain([next(iterable)], islice(iterable, n-1))
code:
def split_list(the_list, chunk_size):
result_list = []
while the_list:
result_list.append(the_list[:chunk_size])
the_list = the_list[chunk_size:]
return result_list
a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print split_list(a_list, 3)
result:
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
heh, one line version
In [48]: chunk = lambda ulist, step: map(lambda i: ulist[i:i+step], xrange(0, len(ulist), step))
In [49]: chunk(range(1,100), 10)
Out[49]:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
[41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
[61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
[71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
[91, 92, 93, 94, 95, 96, 97, 98, 99]]
Another more explicit version.
def chunkList(initialList, chunkSize):
"""
This function chunks a list into sub lists
that have a length equals to chunkSize.
Example:
lst = [3, 4, 9, 7, 1, 1, 2, 3]
print(chunkList(lst, 3))
returns
[[3, 4, 9], [7, 1, 1], [2, 3]]
"""
finalList = []
for i in range(0, len(initialList), chunkSize):
finalList.append(initialList[i:i+chunkSize])
return finalList
At this point, I think we need a recursive generator, just in case...
In python 2:
def chunks(li, n):
if li == []:
return
yield li[:n]
for e in chunks(li[n:], n):
yield e
In python 3:
def chunks(li, n):
if li == []:
return
yield li[:n]
yield from chunks(li[n:], n)
Also, in case of massive Alien invasion, a decorated recursive generator might become handy:
def dec(gen):
def new_gen(li, n):
for e in gen(li, n):
if e == []:
return
yield e
return new_gen
#dec
def chunks(li, n):
yield li[:n]
for e in chunks(li[n:], n):
yield e
Without calling len() which is good for large lists:
def splitter(l, n):
i = 0
chunk = l[:n]
while chunk:
yield chunk
i += n
chunk = l[i:i+n]
And this is for iterables:
def isplitter(l, n):
l = iter(l)
chunk = list(islice(l, n))
while chunk:
yield chunk
chunk = list(islice(l, n))
The functional flavour of the above:
def isplitter2(l, n):
return takewhile(bool,
(tuple(islice(start, n))
for start in repeat(iter(l))))
OR:
def chunks_gen_sentinel(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return iter(imap(tuple, continuous_slices).next,())
OR:
def chunks_gen_filter(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return takewhile(bool,imap(tuple, continuous_slices))
def split_seq(seq, num_pieces):
start = 0
for i in xrange(num_pieces):
stop = start + len(seq[i::num_pieces])
yield seq[start:stop]
start = stop
usage:
seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for seq in split_seq(seq, 3):
print seq
See this reference
>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>>
Python3
def chunks(iterable,n):
"""assumes n is an integer>0
"""
iterable=iter(iterable)
while True:
result=[]
for i in range(n):
try:
a=next(iterable)
except StopIteration:
break
else:
result.append(a)
if result:
yield result
else:
break
g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
Since everybody here talking about iterators. boltons has perfect method for that, called iterutils.chunked_iter.
from boltons import iterutils
list(iterutils.chunked_iter(list(range(50)), 11))
Output:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49]]
But if you don't want to be mercy on memory, you can use old-way and store the full list in the first place with iterutils.chunked.
Consider using matplotlib.cbook pieces
for example:
import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
print s
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
CHUNK = 4
[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]

Creating a sliced list from a list [duplicate]

How do I split a list of arbitrary length into equal sized chunks?
See How to iterate over a list in chunks if the data result will be used directly for a loop, and does not need to be stored.
For the same question with a string input, see Split string every nth character?. The same techniques generally apply, though there are some variations.
Here's a generator that yields evenly-sized chunks:
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
For Python 2, using xrange instead of range:
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in xrange(0, len(lst), n):
yield lst[i:i + n]
Below is a list comprehension one-liner. The method above is preferable, though, since using named functions makes code easier to understand. For Python 3:
[lst[i:i + n] for i in range(0, len(lst), n)]
For Python 2:
[lst[i:i + n] for i in xrange(0, len(lst), n)]
Something super simple:
def chunks(xs, n):
n = max(1, n)
return (xs[i:i+n] for i in range(0, len(xs), n))
For Python 2, use xrange() instead of range().
I know this is kind of old but nobody yet mentioned numpy.array_split:
import numpy as np
lst = range(50)
np.array_split(lst, 5)
Result:
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]
Directly from the (old) Python documentation (recipes for itertools):
from itertools import izip, chain, repeat
def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
The current version, as suggested by J.F.Sebastian:
#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
I guess Guido's time machine works—worked—will work—will have worked—was working again.
These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.
I'm surprised nobody has thought of using iter's two-argument form:
from itertools import islice
def chunk(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())
Demo:
>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice:
from itertools import islice, chain, repeat
def chunk_pad(it, size, padval=None):
it = chain(iter(it), repeat(padval))
return iter(lambda: tuple(islice(it, size)), (padval,) * size)
Demo:
>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
Like the izip_longest-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:
_no_padding = object()
def chunk(it, size, padval=_no_padding):
if padval == _no_padding:
it = iter(it)
sentinel = ()
else:
it = chain(iter(it), repeat(padval))
sentinel = (padval,) * size
return iter(lambda: tuple(islice(it, size)), sentinel)
Demo:
>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]
I believe this is the shortest chunker proposed that offers optional padding.
As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here's a final variation that works around that problem in a reasonable way:
_no_padding = object()
def chunk(it, size, padval=_no_padding):
it = iter(it)
chunker = iter(lambda: tuple(islice(it, size)), ())
if padval == _no_padding:
yield from chunker
else:
for ch in chunker:
yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))
Demo:
>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]
Here is a generator that work on arbitrary iterables:
def split_seq(iterable, size):
it = iter(iterable)
item = list(itertools.islice(it, size))
while item:
yield item
item = list(itertools.islice(it, size))
Example:
>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
Simple yet elegant
L = range(1, 1000)
print [L[x:x+10] for x in xrange(0, len(L), 10)]
or if you prefer:
def chunks(L, n): return [L[x: x+n] for x in xrange(0, len(L), n)]
chunks(L, 10)
Don't reinvent the wheel.
UPDATE: The upcoming Python 3.12 introduces itertools.batched, which solves this problem at last. See below.
Given
import itertools as it
import collections as ct
import more_itertools as mit
iterable = range(11)
n = 3
Code
itertools.batched++
list(it.batched(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
more_itertools+
list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]
list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
list(mit.chunked_even(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
(or DIY, if you want)
The Standard Library
list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]
d = {}
for i, x in enumerate(iterable):
d.setdefault(i//n, []).append(x)
list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
dd[i//n].append(x)
list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
References
more_itertools.chunked (related posted)
more_itertools.sliced
more_itertools.grouper (related post)
more_itertools.windowed (see also stagger, zip_offset)
more_itertools.chunked_even
zip_longest (related post, related post)
setdefault (ordered results requires Python 3.6+)
collections.defaultdict (ordered results requires Python 3.6+)
+ A third-party library that implements itertools recipes and more. > pip install more_itertools
++Included in Python Standard Library 3.12+. batched is similar to more_itertools.chunked.
How do you split a list into evenly sized chunks?
"Evenly sized chunks", to me, implies that they are all the same length, or barring that option, at minimal variance in length. E.g. 5 baskets for 21 items could have the following results:
>>> import statistics
>>> statistics.variance([5,5,5,5,1])
3.2
>>> statistics.variance([5,4,4,4,4])
0.19999999999999998
A practical reason to prefer the latter result: if you were using these functions to distribute work, you've built-in the prospect of one likely finishing well before the others, so it would sit around doing nothing while the others continued working hard.
Critique of other answers here
When I originally wrote this answer, none of the other answers were evenly sized chunks - they all leave a runt chunk at the end, so they're not well balanced, and have a higher than necessary variance of lengths.
For example, the current top answer ends with:
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
Others, like list(grouper(3, range(7))), and chunk(range(7), 3) both return: [(0, 1, 2), (3, 4, 5), (6, None, None)]. The None's are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables.
Why can't we divide these better?
Cycle Solution
A high-level balanced solution using itertools.cycle, which is the way I might do it today. Here's the setup:
from itertools import cycle
items = range(10, 75)
number_of_baskets = 10
Now we need our lists into which to populate the elements:
baskets = [[] for _ in range(number_of_baskets)]
Finally, we zip the elements we're going to allocate together with a cycle of the baskets until we run out of elements, which, semantically, it exactly what we want:
for element, basket in zip(items, cycle(baskets)):
basket.append(element)
Here's the result:
>>> from pprint import pprint
>>> pprint(baskets)
[[10, 20, 30, 40, 50, 60, 70],
[11, 21, 31, 41, 51, 61, 71],
[12, 22, 32, 42, 52, 62, 72],
[13, 23, 33, 43, 53, 63, 73],
[14, 24, 34, 44, 54, 64, 74],
[15, 25, 35, 45, 55, 65],
[16, 26, 36, 46, 56, 66],
[17, 27, 37, 47, 57, 67],
[18, 28, 38, 48, 58, 68],
[19, 29, 39, 49, 59, 69]]
To productionize this solution, we write a function, and provide the type annotations:
from itertools import cycle
from typing import List, Any
def cycle_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
baskets = [[] for _ in range(min(maxbaskets, len(items)))]
for item, basket in zip(items, cycle(baskets)):
basket.append(item)
return baskets
In the above, we take our list of items, and the max number of baskets. We create a list of empty lists, in which to append each element, in a round-robin style.
Slices
Another elegant solution is to use slices - specifically the less-commonly used step argument to slices. i.e.:
start = 0
stop = None
step = number_of_baskets
first_basket = items[start:stop:step]
This is especially elegant in that slices don't care how long the data are - the result, our first basket, is only as long as it needs to be. We'll only need to increment the starting point for each basket.
In fact this could be a one-liner, but we'll go multiline for readability and to avoid an overlong line of code:
from typing import List, Any
def slice_baskets(items: List[Any], maxbaskets: int) -> List[List[Any]]:
n_baskets = min(maxbaskets, len(items))
return [items[i::n_baskets] for i in range(n_baskets)]
And islice from the itertools module will provide a lazily iterating approach, like that which was originally asked for in the question.
I don't expect most use-cases to benefit very much, as the original data is already fully materialized in a list, but for large datasets, it could save nearly half the memory usage.
from itertools import islice
from typing import List, Any, Generator
def yield_islice_baskets(items: List[Any], maxbaskets: int) -> Generator[List[Any], None, None]:
n_baskets = min(maxbaskets, len(items))
for i in range(n_baskets):
yield islice(items, i, None, n_baskets)
View results with:
from pprint import pprint
items = list(range(10, 75))
pprint(cycle_baskets(items, 10))
pprint(slice_baskets(items, 10))
pprint([list(s) for s in yield_islice_baskets(items, 10)])
Updated prior solutions
Here's another balanced solution, adapted from a function I've used in production in the past, that uses the modulo operator:
def baskets_from(items, maxbaskets=25):
baskets = [[] for _ in range(maxbaskets)]
for i, item in enumerate(items):
baskets[i % maxbaskets].append(item)
return filter(None, baskets)
And I created a generator that does the same if you put it into a list:
def iter_baskets_from(items, maxbaskets=3):
'''generates evenly balanced baskets from indexable iterable'''
item_count = len(items)
baskets = min(item_count, maxbaskets)
for x_i in range(baskets):
yield [items[y_i] for y_i in range(x_i, item_count, baskets)]
And finally, since I see that all of the above functions return elements in a contiguous order (as they were given):
def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
'''
generates balanced baskets from iterable, contiguous contents
provide item_count if providing a iterator that doesn't support len()
'''
item_count = item_count or len(items)
baskets = min(item_count, maxbaskets)
items = iter(items)
floor = item_count // baskets
ceiling = floor + 1
stepdown = item_count % baskets
for x_i in range(baskets):
length = ceiling if x_i < stepdown else floor
yield [items.next() for _ in range(length)]
Output
To test them out:
print(baskets_from(range(6), 8))
print(list(iter_baskets_from(range(6), 8)))
print(list(iter_baskets_contiguous(range(6), 8)))
print(baskets_from(range(22), 8))
print(list(iter_baskets_from(range(22), 8)))
print(list(iter_baskets_contiguous(range(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(range(26), 5))
print(list(iter_baskets_from(range(26), 5)))
print(list(iter_baskets_contiguous(range(26), 5)))
Which prints out:
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]
Notice that the contiguous generator provide chunks in the same length patterns as the other two, but the items are all in order, and they are as evenly divided as one may divide a list of discrete elements.
def chunk(input, size):
return map(None, *([iter(input)] * size))
If you know list size:
def SplitList(mylist, chunk_size):
return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]
If you don't (an iterator):
def IterChunks(sequence, chunk_size):
res = []
for item in sequence:
res.append(item)
if len(res) >= chunk_size:
yield res
res = []
if res:
yield res # yield the last, incomplete, portion
In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).
I saw the most awesome Python-ish answer in a duplicate of this question:
from itertools import zip_longest
a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]
You can create n-tuple for any n. If a = range(1, 15), then the result will be:
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]
If the list is divided evenly, then you can replace zip_longest with zip, otherwise the triplet (13, 14, None) would be lost. Python 3 is used above. For Python 2, use izip_longest.
[AA[i:i+SS] for i in range(len(AA))[::SS]]
Where AA is array, SS is chunk size. For example:
>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3
To expand the ranges in py3 do
(py3) >>> [list(AA[i:i+SS]) for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
With Assignment Expressions in Python 3.8 it becomes quite nice:
import itertools
def batch(iterable, size):
it = iter(iterable)
while item := list(itertools.islice(it, size)):
yield item
This works on an arbitrary iterable, not just a list.
>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]
UPDATE
Starting with Python 3.12, this exact implementation is available as itertools.batched
If you had a chunk size of 3 for example, you could do:
zip(*[iterable[i::3] for i in range(3)])
source:
http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/
I would use this when my chunk size is fixed number I can type, e.g. '3', and would never change.
The toolz library has the partition function for this:
from toolz.itertoolz.core import partition
list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]
I was curious about the performance of different approaches and here it is:
Tested on Python 3.5.1
import time
batch_size = 7
arr_len = 298937
#---------slice-------------
print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
if not arr:
break
tmp = arr[0:batch_size]
arr = arr[batch_size:-1]
print(time.time() - start)
#-----------index-----------
print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)
#----------batches 1------------
def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#----------batches 2------------
from itertools import islice, chain
def batch(iterable, size):
sourceiter = iter(iterable)
while True:
batchiter = islice(sourceiter, size)
yield chain([next(batchiter)], batchiter)
print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
tmp = x
print(time.time() - start)
#---------chunks-------------
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
tmp = x
print(time.time() - start)
#-----------grouper-----------
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)
def grouper(iterable, n, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
tmp = x
print(time.time() - start)
Results:
slice
31.18285083770752
index
0.02184295654296875
batches 1
0.03503894805908203
batches 2
0.22681021690368652
chunks
0.019841909408569336
grouper
0.006506919860839844
You may also use get_chunks function of utilspie library as:
>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]
You can install utilspie via pip:
sudo pip install utilspie
Disclaimer: I am the creator of utilspie library.
I like the Python doc's version proposed by tzot and J.F.Sebastian a lot,
but it has two shortcomings:
it is not very explicit
I usually don't want a fill value in the last chunk
I'm using this one a lot in my code:
from itertools import islice
def chunks(n, iterable):
iterable = iter(iterable)
while True:
yield tuple(islice(iterable, n)) or iterable.next()
UPDATE: A lazy chunks version:
from itertools import chain, islice
def chunks(n, iterable):
iterable = iter(iterable)
while True:
yield chain([next(iterable)], islice(iterable, n-1))
code:
def split_list(the_list, chunk_size):
result_list = []
while the_list:
result_list.append(the_list[:chunk_size])
the_list = the_list[chunk_size:]
return result_list
a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print split_list(a_list, 3)
result:
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
heh, one line version
In [48]: chunk = lambda ulist, step: map(lambda i: ulist[i:i+step], xrange(0, len(ulist), step))
In [49]: chunk(range(1,100), 10)
Out[49]:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
[41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
[61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
[71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
[81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
[91, 92, 93, 94, 95, 96, 97, 98, 99]]
Another more explicit version.
def chunkList(initialList, chunkSize):
"""
This function chunks a list into sub lists
that have a length equals to chunkSize.
Example:
lst = [3, 4, 9, 7, 1, 1, 2, 3]
print(chunkList(lst, 3))
returns
[[3, 4, 9], [7, 1, 1], [2, 3]]
"""
finalList = []
for i in range(0, len(initialList), chunkSize):
finalList.append(initialList[i:i+chunkSize])
return finalList
At this point, I think we need a recursive generator, just in case...
In python 2:
def chunks(li, n):
if li == []:
return
yield li[:n]
for e in chunks(li[n:], n):
yield e
In python 3:
def chunks(li, n):
if li == []:
return
yield li[:n]
yield from chunks(li[n:], n)
Also, in case of massive Alien invasion, a decorated recursive generator might become handy:
def dec(gen):
def new_gen(li, n):
for e in gen(li, n):
if e == []:
return
yield e
return new_gen
#dec
def chunks(li, n):
yield li[:n]
for e in chunks(li[n:], n):
yield e
Without calling len() which is good for large lists:
def splitter(l, n):
i = 0
chunk = l[:n]
while chunk:
yield chunk
i += n
chunk = l[i:i+n]
And this is for iterables:
def isplitter(l, n):
l = iter(l)
chunk = list(islice(l, n))
while chunk:
yield chunk
chunk = list(islice(l, n))
The functional flavour of the above:
def isplitter2(l, n):
return takewhile(bool,
(tuple(islice(start, n))
for start in repeat(iter(l))))
OR:
def chunks_gen_sentinel(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return iter(imap(tuple, continuous_slices).next,())
OR:
def chunks_gen_filter(n, seq):
continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
return takewhile(bool,imap(tuple, continuous_slices))
def split_seq(seq, num_pieces):
start = 0
for i in xrange(num_pieces):
stop = start + len(seq[i::num_pieces])
yield seq[start:stop]
start = stop
usage:
seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for seq in split_seq(seq, 3):
print seq
See this reference
>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>>
Python3
def chunks(iterable,n):
"""assumes n is an integer>0
"""
iterable=iter(iterable)
while True:
result=[]
for i in range(n):
try:
a=next(iterable)
except StopIteration:
break
else:
result.append(a)
if result:
yield result
else:
break
g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
Since everybody here talking about iterators. boltons has perfect method for that, called iterutils.chunked_iter.
from boltons import iterutils
list(iterutils.chunked_iter(list(range(50)), 11))
Output:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
[44, 45, 46, 47, 48, 49]]
But if you don't want to be mercy on memory, you can use old-way and store the full list in the first place with iterutils.chunked.
Consider using matplotlib.cbook pieces
for example:
import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
print s
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
CHUNK = 4
[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]

How to slice a Tensor by another Tensor in Keras?

I have a Keras network with two inputs:
image of shape (128, 128, 3)
bounding-box of shape (4), i.e. (x0, y0, x1, y1)
In my network definition, I need to include the extraction of the image patch defined by the bounding-box from the input image, but I do not know how (or my attempts did not work). Here is my current attempt to achieve this, can someone please help me to understand slicing Tensors by Values of other Tensors in Keras?
# get masked image and bounding box information as inputs
masked_img = Input(shape=self.input_shape)
mask_bounding_box = Input(shape=(4,))
# fill in the masked region and extract the fill-in region
filled_img = self.generator(masked_img)
fill_in = K.slice(filled_img, (int(mask_bounding_box[0]), int(mask_bounding_box[1])),
(int(mask_bounding_box[2]), int(mask_bounding_box[3])))
Does anybody know how to do this? Any hint in the right direction would help me, please ...
Thanks in advance!
here's a native numpy solution.
import numpy as np
a = np.arange(48).reshape(3,4,4)
a
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]]])
box = (1,1,2,2) # slicing from (1,1) to (2,2)
b = a[:, box[0]:box[2]+1, box[1]:box[3]+1] # slicing on all channels
b
array([[[ 5, 6],
[ 9, 10]],
[[21, 22],
[25, 26]],
[[37, 38],
[41, 42]]])
Keras.backend.slice() requires starts and offsets, so you could do it like this:
import keras.backend as K
start=(0,1,1) # 1st channel, x1, y1
sizes=(3,2,2) # number of channels, x2-x1+1, y2-y1+1
with sess.as_default():
b=K.slice(a, start, sizes)
print(b.eval())
[[[ 5 6]
[ 9 10]]
[[21 22]
[25 26]]
[[37 38]
[41 42]]]

Resources