What is the preferred way to concatenate sequences in Python 3? - python-3.x

What is the preferred way to concatenate sequences in Python 3?
Right now, I'm doing:
import functools
import operator
def concatenate(sequences):
return functools.reduce(operator.add, sequences)
print(concatenate([['spam', 'eggs'], ['ham']]))
# ['spam', 'eggs', 'ham']
Needing to import two separate modules to do this seems clunky.
An alternative could be:
def concatenate(sequences):
concatenated_sequence = []
for sequence in sequences:
concatenated_sequence += sequence
return concatenated_sequence
However, this is incorrect because you don't know that the sequences are lists.
You could do:
import copy
def concatenate(sequences):
head, *tail = sequences
concatenated_sequence = copy.copy(head)
for sequence in sequences:
concatenated_sequence += sequence
return concatenated_sequence
But that seems horribly bug prone -- a direct call to copy? (I know head.copy() works for lists and tuples, but copy isn't part of the sequence ABC, so you can't rely on it... what if you get handed strings?). You have to copy to prevent mutation in case you get handed a MutableSequence. Moreover, this solution forces you to unpack the entire set of sequences first. Trying again:
import copy
def concatenate(sequences):
iterable = iter(sequences)
head = next(iterable)
concatenated_sequence = copy.copy(head)
for sequence in iterable:
concatenated_sequence += sequence
return concatenated_sequence
But come on... this is python! So... what is the preferred way to do this?

I'd use itertools.chain.from_iterable() instead:
import itertools
def chained(sequences):
return itertools.chain.from_iterable(sequences):
or, since you tagged this with python-3.3 you could use the new yield from syntax (look ma, no imports!):
def chained(sequences):
for seq in sequences:
yield from seq
which both return iterators (use list() on them if you must materialize the full list). Most of the time you do not need to construct a whole new sequence from concatenated sequences, really, you just want to loop over them to process and/or search for something instead.
Note that for strings, you should use str.join() instead of any of the techniques described either in my answer or your question:
concatenated = ''.join(sequence_of_strings)
Combined, to handle sequences fast and correct, I'd use:
def chained(sequences):
for seq in sequences:
yield from seq
def concatenate(sequences):
sequences = iter(sequences)
first = next(sequences)
if hasattr(first, 'join'):
return first + ''.join(sequences)
return first + type(first)(chained(sequences))
This works for tuples, lists and strings:
>>> concatenate(['abcd', 'efgh', 'ijkl'])
'abcdefghijkl'
>>> concatenate([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> concatenate([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
(1, 2, 3, 4, 5, 6, 7, 8, 9)
and uses the faster ''.join() for a sequence of strings.

what is wrong with:
from itertools import chain
def chain_sequences(*sequences):
return chain(*sequences)

Use itertools.chain.from_iterable.
import itertools
def concatenate(sequences):
return list(itertools.chain.from_iterable(sequences))
The call to list is needed only if you need an actual new list, so skip it if you just iterate over this new sequence once.

Related

What is the best possible way to find the first AND the last occurrences of an element in a list in Python?

The basic way I usually use is by using the list.index(element) and reversed_list.index(element), but this fails when I need to search for many elements and the length of the list is too large say 10^5 or say 10^6 or even larger than that. What is the best possible way (which uses very little time) for the same?
You can build auxiliary lookup structures:
lst = [1,2,3,1,2,3] # super long list
last = {n: i for i, n in enumerate(lst)}
first = {n: i for i, n in reversed(list(enumerate(lst)))}
last[3]
# 5
first[3]
# 2
The construction of the lookup dicts takes linear time, but then the lookup itself is constant.
Whreas calls to list.index() take linear time, and repeatedly doing so is then quadratic (given the number of lookups you make depends on the size of the list).
You could also build a single structure in one iteration:
from collections import defaultdict
lookup = defaultdict(lambda: [None, None])
for i, n in enumerate(lst):
lookup[n][1] = i
if lookup[n][0] is None:
lookup[n][0] = i
lookup[3]
# [2, 5]
lookup[2]
# [1, 4]
Well, someone needs to do the work in finding the element, and in a large list this can take time! Without more information or a code example, it'll be difficult to help you, but usually the go-to answer is to use another data structure- for example, if you can keep your elements in a dictionary instead of a list with the key being the element and the value being an array of indices, you'll be much quicker.
You can just remember first and last index for every element in the list:
In [9]: l = [random.randint(1, 10) for _ in range(100)]
In [10]: first_index = {}
In [11]: last_index = {}
In [12]: for idx, x in enumerate(l):
...: if x not in first_index:
...: first_index[x] = idx
...: last_index[x] = idx
...:
In [13]: [(x, first_index.get(x), last_index.get(x)) for x in range(1, 11)]
Out[13]:
[(1, 3, 88),
(2, 23, 90),
(3, 10, 91),
(4, 13, 98),
(5, 11, 57),
(6, 4, 99),
(7, 9, 92),
(8, 19, 95),
(9, 0, 77),
(10, 2, 87)]
In [14]: l[0]
Out[14]: 9
Your approach sounds good, I did some testing and:
import numpy as np
long_list = list(np.random.randint(0, 100_000, 100_000_000))
# This takes 10ms in my machine
long_list.index(999)
# This takes 1,100ms in my machine
long_list[::-1].index(999)
# This takes 1,300ms in my machine
list(reversed(long_list)).index(999)
# This takes 200ms in my machine
long_list.reverse()
long_list.index(999)
long_list.reverse()
But at the end of the day, a Python list does not seem like the best data structure for this.
As others have sugested, you can build a dict:
indexes = {}
for i, val in enumerate(long_list):
if val in indexes.keys():
indexes[val].append(i)
else:
indexes[val] = [i]
This is memory expensive, but solves your problem (depends on how often you modify the original list).
You can then do:
# This takes 0.02ms in my machine
ix = indexes.get(999)
ix[0], ix[-1]

List of dicts - Partial shuffle

Let's suppose I have this:
my_list = [{'id':'2','value':'4'},
{'id':'6','value':'3'},
{'id':'4','value':'5'},
{'id':'9','value':'10'},
{'id':'0','value':'9'}]
and I want to shuffle the list but I want to do it partly - by this I mean that I do not want to shuffle all the elements but only a percentage of them (eg 40%).
For example like this:
my_list = [{'id':'4','value':'5'},
{'id':'6','value':'3'},
{'id':'2','value':'4'},
{'id':'9','value':'10'},
{'id':'0','value':'9'}]
How can this be efficiently done?
random.shuffle does not allow you to specify only part of a list, it will always shuffle an entire list.
A trade-off between effort, speed, and memory footprint would be to slice out the part of the list you want to shuffle, do it, and then assign it back to that slice:
>>> from random import shuffle
>>> x = list(range(10))
>>> y = x[:5]
>>> shuffle(y)
>>> x[:5] = y
>>> x
[2, 1, 4, 3, 0, 5, 6, 7, 8, 9]
My solution is the following:
from random import sample
shuffle_percentage = 0.4
x = sample(range(len(my_list)), int(len(my_list) * shuffle_percentage))
for index in range(0, len(x)-1, 2):
my_list[x[index]], my_list[x[index+1]] = my_list[x[index+1]], my_list[x[index]]

python 3 function with one argument returning list of integers

how the 're' function should look like if it must receive just one argument 's' and must return a list with the numbers (integers) from 1 to 12 incl. (for example)?
so the result in the interactive console have to be:
>>> re(12)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
First of all you used def incorrectly, if you want to define you have to enter : and define function in additional indentation below, or if you want to use function, you have to remove def.
Python has built-in range() immutable sequence type, which takes one to three arguments start, stop and step, in this case we only will use first two. However to get list we also need to use another built-in, which is mutable sequence type - list(), you can read more about lists in here. We will use list() as the type constructor: list() or list(iterable) as specified in built-in types page:
Lists may be constructed in several ways:
Using a pair of square brackets to denote the empty list: []
Using square brackets, separating items with commas: [a], [a, b, c]
Using a list comprehension: [x for x in iterable]
Using the type constructor: list() or list(iterable)
The constructor builds a list whose items are the same and in the same
order as iterable’s items. iterable may be either a sequence, a
container that supports iteration, or an iterator object. If iterable
is already a list, a copy is made and returned, similar to
iterable[:]. For example, list('abc') returns ['a', 'b', 'c'] and
list( (1, 2, 3) ) returns [1, 2, 3]. If no argument is given, the
constructor creates a new empty list, [].
Now that we understand how list() works, we can go back to range() usage:
The arguments to the range constructor must be integers (either built-in int or any object that implements the index special
method). If the step argument is omitted, it defaults to 1. If the
start argument is omitted, it defaults to 0. If step is zero,
ValueError is raised.
For a positive step, the contents of a range r are determined by the formula r[i] = start + step*i where i >= 0 and r[i] < stop.
For a negative step, the contents of the range are still determined by the formula r[i] = start + step*i, but the constraints
are i >= 0 and r[i] > stop.
A range object will be empty if r[0] does not meet the value constraint. Ranges do support negative indices, but these are
interpreted as indexing from the end of the sequence determined by the
positive indices.
Ranges containing absolute values larger than sys.maxsize are permitted but some features (such as len()) may raise OverflowError.
Range examples:
>>>
>>> list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(range(1, 11))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list(range(0, 30, 5))
[0, 5, 10, 15, 20, 25]
>>> list(range(0, 10, 3))
[0, 3, 6, 9]
>>> list(range(0, -10, -1))
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
>>> list(range(0))
[]
>>> list(range(1, 0))
[]
Ranges implement all of the common sequence operations except concatenation and repetition (due to the fact that range objects can
only represent sequences that follow a strict pattern and repetition
and concatenation will usually violate that pattern).
start:
The value of the start parameter (or 0 if the parameter was not supplied)
stop:
The value of the stop parameter
step:
The value of the step parameter (or 1 if the parameter was not supplied)
Many other operations also produce lists, including the sorted() built-in.
The answer to your question looks like that:
def re(ending_number):
return list(range(1, ending_number + 1))
list_of_twelve = re(12) # list_of_twelve will contain [1, 2, ..., 12]
I would avoid using "re" as a function name, since re is also a python library for regex expressions.
Lycopersicum's answer does a good job of explaining range(), which is the fastest and most straight-forward way of approaching your problem. In general, it is best to use Python's built-in functions, that's because it will use Python's compiled C code rather than slower Python code.
I just thought I'd share a little bit about why you should use Range().
So, there are other ways to generate a list of numbers. First generate a list directly using a loop.
def listOfNumbers (number):
start = 1
listOf = []
while (start <= number):
listOf.append(start)
start = start + 1
return listOf
In this case, you simply use listOfNumbers(12) and you will get a list of numbers. However, this stores a list in memory and is slow, so not good for very large numbers.
On the other hand, you could use a generator (which is very much like range()). A generator does not store data in a list. Instead, it just "yields" numbers one at a time until the code stops. It's much faster:
def generatorOfNumbers (number):
start = 1
while start <= number:
yield start
start += 1
Then you can call it one of two ways to produce a list:
def listFromGenerator1 (number):
return [x for x in generatorOfNumbers(number)]
def listFromGenerator2 (number):
return list(generatorOfNumbers (number))
When I time these approaches I get.
timed(listOfNumbers) # time for list of 10000
...
Elapsed Time: 2.16007232666
Elapsed Time: 1.32894515991
Elapsed Time: 2.09093093872
Elapsed Time: 1.99699401855
Elapsed Time: 3.2000541687
... timed(listFromGenerator1)
...
Elapsed Time: 1.33109092712
Elapsed Time: 1.30605697632
Elapsed Time: 1.93309783936
Elapsed Time: 1.79386138916
Elapsed Time: 1.90401077271
... timed(listFromGenerator2)
...
Elapsed Time: 0.869989395142
Elapsed Time: 1.08408927917
Elapsed Time: 1.65319442749
Elapsed Time: 1.53398513794
Elapsed Time: 1.36089324951
... timed(listFromRange) # Lycopersicum's approach
...
Elapsed Time: 0.346899032593
Elapsed Time: 0.284194946289
Elapsed Time: 0.282049179077
Elapsed Time: 0.295877456665
Elapsed Time: 0.303983688354
In conclusion, always use built-in functions whenever possible rather than trying to build your own. That includes the (slight) preference for list() vs a list comprehension.

Get a list of all number in a certain range containing only certain digits without checking each number

Is there a way to create a list of all numbers less than 10,000 that do not contain any of the digits 0, 2, 4, 5, 6, 8? Of course one can simply type something like:
bads = ['0', '2', '4', '5', '6', '8']
goods = []
for n in range(1, 10000, 2):
if not any(bad in str(n) for bad in bads):
goods.append(n)
However, I'm looking for a method which instead considers the digits 1, 3, 7, 9 and creates all possible unique strings of permutations of these numbers of size 4 or less, duplicate digits allowed. Does itertools, for example, have something that would easily do this? I looked at the permutations method, but that doesn't produce numbers with repeated digits from the collection, and the product method doesn't seem to be what I'm after either, given that it simply would return Cartesian products of 1, 3, 5, 7 with itself.
Here's a simple-minded approach using permutations and combinations_with_replacement from itertools:
from itertools import permutations, combinations_with_replacement
def digit_combinations(power_of_ten):
numbers = set()
for n in range(1, power_of_ten + 1):
for combination in combinations_with_replacement("1379", n):
numbers |= set(permutations(combination, len(combination)))
return sorted(int(''.join(number)) for number in numbers)
print(digit_combinations(4))
OUTPUT
[1, 3, 7, 9, 11, 13, 17, 19, ..., 9971, 9973, 9977, 9979, 9991, 9993, 9997, 9999]
It could be made more space efficient using generators, but depending on the range, it might not be worth it. (For up to 10,000 there are only 340 numbers.) For numbers to 10^4, this code takes roughly as long as your simple example. But for 10^7, this code runs over 40x faster on my system than your simple example.
Could you include your idea for the generator?
Here's a basic rework of the code above into generator form:
from itertools import permutations, combinations_with_replacement
def digit_combinations_generator(power_of_ten):
for n in range(1, power_of_ten + 1):
for combination in combinations_with_replacement("1379", n):
for number in set(permutations(combination, len(combination))):
yield int(''.join(number))
generator = digit_combinations_generator(4)
while True:
try:
print(next(generator), end=', ')
except StopIteration:
print()
break
This does not return the numbers sorted, it just hands them out as fast as it generates them.

How to assing values to a dictionary

I am creating a function which is supposed to return a dictionary with keys and values from different lists. But I amhavin problems in getting the mean of a list o numbers as values of the dictionary. However, I think I am getting the keys properly.
This is what I get so far:
def exp (magnitudes,measures):
"""return for each magnitude the associated mean of numbers from a list"""
dict_expe = {}
for mag in magnitudes:
dict_expe[mag] = 0
for mea in measures:
summ = 0
for n in mea:
summ += n
dict_expe[mag] = summ/len(mea)
return dict_expe
print(exp(['mag1', 'mag2', 'mag3'], [[1,2,3],[3,4],[5]]))
The output should be:
{mag1 : 2, mag2: 3.5, mag3: 5}
But what I am getting is always 5 as values of all keys. I thought about the zip() method but im trying to avoid it as because the it requieres the same length in both lists.
An average of a sequence is sum(sequence) / len(sequence), so you need to iterate through both magnitudes and measures, calculate these means (arithmetical averages) and store it in a dictionary.
There are much more pythonic ways you can achieve this. All of these examples produce {'mag1': 2.0, 'mag2': 3.5, 'mag3': 5.0} as result.
Using for i in range() loop:
def exp(magnitudes, measures):
means = {}
for i in range(len(magnitudes)):
means[magnitudes[i]] = sum(measures[i]) / len(measures[i])
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
But if you need both indices and values of a list you can use for i, val in enumerate(sequence) approach which is much more suitable in this case:
def exp(magnitudes, measures):
means = {}
for i, mag in enumerate(magnitudes):
means[mag] = sum(measures[i]) / len(measures[i])
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
Another problem hides here: i index belongs to magnitudes but we are also getting values from measures using it, this is not a big deal in your case if you have magnitudes and measures the same length but if magnitudes will be larger you will get an IndexError. So it seems to me like using zip function is what would be the best choice here (actually as of python3.6 it doesn't require two lists to be the same length, it will just use the length of shortest one as the length of result):
def exp(magnitudes, measures):
means = {}
for mag, mes in zip(magnitudes, measures):
means[mag] = sum(mes) / len(mes)
return means
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))
So feel free to use the example which suits your requirements of which one you like and don't forget to add docstring.
More likely you don't need such pythonic way but it can be even shorter when dictionary comprehension comes into play:
def exp(magnitudes, measures):
return {mag: sum(mes) / len(mes) for mag, mes in zip(magnitudes, measures)}
print(exp(['mag1', 'mag2', 'mag3'], [[1, 2, 3], [3, 4], [5]]))

Resources