Sorted insert on a python list - python-3.x

Good morning everybody,
Here is my function that is supposed to make a recursive sorted insertion of a couple of data:
def sorted_insert(w_i,sim,neighbors):
if neighbors==[]:
neighbors.append((w_i,sim))
elif neighbors[0][1]<sim:
neighbors.insert(0,(w_i,sim))
else:
sorted_insert(w_i,sim,neighbors[1:])
return neighbors
The problem is, this function doesn't insert values in the middle, here is a series of insertions :
>>> n=[]
>>> n=sorted_insert("w1",0.6,n)
>>> n=sorted_insert("w1",0.3,n)
>>> n=sorted_insert("w1",0.5,n)
>>> n=sorted_insert("w1",0.8,n)
>>> n=sorted_insert("w1",0.7,n)
>>> n
[('w1', 0.8), ('w1', 0.6)]
Is there someone that can correct my function ?
Thanks in advance.

This should work.
def sorted_insert(w_i,sim,neighbors, i=0):
if len(neighbors) == i or sim > neighbors[i][1]:
neighbors.insert(i, (w_i,sim))
else:
sorted_insert(w_i,sim,neighbors, i+1)
n=[]
sorted_insert("w1",0.6,n)
sorted_insert("w1",0.3,n)
sorted_insert("w1",0.5,n)
sorted_insert("w1",0.8,n)
sorted_insert("w1",0.7,n)
print n
# [('w1', 0.8), ('w1', 0.7), ('w1', 0.6), ('w1', 0.5), ('w1', 0.3)]

Related

Probabilistic random selection [duplicate]

I needed to write a weighted version of random.choice (each element in the list has a different probability for being selected). This is what I came up with:
def weightedChoice(choices):
"""Like random.choice, but each element can have a different chance of
being selected.
choices can be any iterable containing iterables with two items each.
Technically, they can have more than two items, the rest will just be
ignored. The first item is the thing being chosen, the second item is
its weight. The weights can be any numeric values, what matters is the
relative differences between them.
"""
space = {}
current = 0
for choice, weight in choices:
if weight > 0:
space[current] = choice
current += weight
rand = random.uniform(0, current)
for key in sorted(space.keys() + [current]):
if rand < key:
return choice
choice = space[key]
return None
This function seems overly complex to me, and ugly. I'm hoping everyone here can offer some suggestions on improving it or alternate ways of doing this. Efficiency isn't as important to me as code cleanliness and readability.
Since version 1.7.0, NumPy has a choice function that supports probability distributions.
from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
p=probability_distribution)
Note that probability_distribution is a sequence in the same order of list_of_candidates. You can also use the keyword replace=False to change the behavior so that drawn items are not replaced.
Since Python 3.6 there is a method choices from the random module.
In [1]: import random
In [2]: random.choices(
...: population=[['a','b'], ['b','a'], ['c','b']],
...: weights=[0.2, 0.2, 0.6],
...: k=10
...: )
Out[2]:
[['c', 'b'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['b', 'a'],
['c', 'b'],
['c', 'b']]
Note that random.choices will sample with replacement, per the docs:
Return a k sized list of elements chosen from the population with replacement.
Note for completeness of answer:
When a sampling unit is drawn from a finite population and is returned
to that population, after its characteristic(s) have been recorded,
before the next unit is drawn, the sampling is said to be "with
replacement". It basically means each element may be chosen more than
once.
If you need to sample without replacement, then as #ronan-paixão's brilliant answer states, you can use numpy.choice, whose replace argument controls such behaviour.
def weighted_choice(choices):
total = sum(w for c, w in choices)
r = random.uniform(0, total)
upto = 0
for c, w in choices:
if upto + w >= r:
return c
upto += w
assert False, "Shouldn't get here"
Arrange the weights into a
cumulative distribution.
Use random.random() to pick a random
float 0.0 <= x < total.
Search the
distribution using bisect.bisect as
shown in the example at http://docs.python.org/dev/library/bisect.html#other-examples.
from random import random
from bisect import bisect
def weighted_choice(choices):
values, weights = zip(*choices)
total = 0
cum_weights = []
for w in weights:
total += w
cum_weights.append(total)
x = random() * total
i = bisect(cum_weights, x)
return values[i]
>>> weighted_choice([("WHITE",90), ("RED",8), ("GREEN",2)])
'WHITE'
If you need to make more than one choice, split this into two functions, one to build the cumulative weights and another to bisect to a random point.
If you don't mind using numpy, you can use numpy.random.choice.
For example:
import numpy
items = [["item1", 0.2], ["item2", 0.3], ["item3", 0.45], ["item4", 0.05]
elems = [i[0] for i in items]
probs = [i[1] for i in items]
trials = 1000
results = [0] * len(items)
for i in range(trials):
res = numpy.random.choice(items, p=probs) #This is where the item is selected!
results[items.index(res)] += 1
results = [r / float(trials) for r in results]
print "item\texpected\tactual"
for i in range(len(probs)):
print "%s\t%0.4f\t%0.4f" % (items[i], probs[i], results[i])
If you know how many selections you need to make in advance, you can do it without a loop like this:
numpy.random.choice(items, trials, p=probs)
As of Python v3.6, random.choices could be used to return a list of elements of specified size from the given population with optional weights.
random.choices(population, weights=None, *, cum_weights=None, k=1)
population : list containing unique observations. (If empty, raises IndexError)
weights : More precisely relative weights required to make selections.
cum_weights : cumulative weights required to make selections.
k : size(len) of the list to be outputted. (Default len()=1)
Few Caveats:
1) It makes use of weighted sampling with replacement so the drawn items would be later replaced. The values in the weights sequence in itself do not matter, but their relative ratio does.
Unlike np.random.choice which can only take on probabilities as weights and also which must ensure summation of individual probabilities upto 1 criteria, there are no such regulations here. As long as they belong to numeric types (int/float/fraction except Decimal type) , these would still perform.
>>> import random
# weights being integers
>>> random.choices(["white", "green", "red"], [12, 12, 4], k=10)
['green', 'red', 'green', 'white', 'white', 'white', 'green', 'white', 'red', 'white']
# weights being floats
>>> random.choices(["white", "green", "red"], [.12, .12, .04], k=10)
['white', 'white', 'green', 'green', 'red', 'red', 'white', 'green', 'white', 'green']
# weights being fractions
>>> random.choices(["white", "green", "red"], [12/100, 12/100, 4/100], k=10)
['green', 'green', 'white', 'red', 'green', 'red', 'white', 'green', 'green', 'green']
2) If neither weights nor cum_weights are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the population sequence.
Specifying both weights and cum_weights raises a TypeError.
>>> random.choices(["white", "green", "red"], k=10)
['white', 'white', 'green', 'red', 'red', 'red', 'white', 'white', 'white', 'green']
3) cum_weights are typically a result of itertools.accumulate function which are really handy in such situations.
From the documentation linked:
Internally, the relative weights are converted to cumulative weights
before making selections, so supplying the cumulative weights saves
work.
So, either supplying weights=[12, 12, 4] or cum_weights=[12, 24, 28] for our contrived case produces the same outcome and the latter seems to be more faster / efficient.
Crude, but may be sufficient:
import random
weighted_choice = lambda s : random.choice(sum(([v]*wt for v,wt in s),[]))
Does it work?
# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]
# initialize tally dict
tally = dict.fromkeys(choices, 0)
# tally up 1000 weighted choices
for i in xrange(1000):
tally[weighted_choice(choices)] += 1
print tally.items()
Prints:
[('WHITE', 904), ('GREEN', 22), ('RED', 74)]
Assumes that all weights are integers. They don't have to add up to 100, I just did that to make the test results easier to interpret. (If weights are floating point numbers, multiply them all by 10 repeatedly until all weights >= 1.)
weights = [.6, .2, .001, .199]
while any(w < 1.0 for w in weights):
weights = [w*10 for w in weights]
weights = map(int, weights)
If you have a weighted dictionary instead of a list you can write this
items = { "a": 10, "b": 5, "c": 1 }
random.choice([k for k in items for dummy in range(items[k])])
Note that [k for k in items for dummy in range(items[k])] produces this list ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'c', 'b', 'b', 'b', 'b', 'b']
Here's is the version that is being included in the standard library for Python 3.6:
import itertools as _itertools
import bisect as _bisect
class Random36(random.Random):
"Show the code included in the Python 3.6 version of the Random class"
def choices(self, population, weights=None, *, cum_weights=None, k=1):
"""Return a k sized list of population elements chosen with replacement.
If the relative weights or cumulative weights are not specified,
the selections are made with equal probability.
"""
random = self.random
if cum_weights is None:
if weights is None:
_int = int
total = len(population)
return [population[_int(random() * total)] for i in range(k)]
cum_weights = list(_itertools.accumulate(weights))
elif weights is not None:
raise TypeError('Cannot specify both weights and cumulative weights')
if len(cum_weights) != len(population):
raise ValueError('The number of weights does not match the population')
bisect = _bisect.bisect
total = cum_weights[-1]
return [population[bisect(cum_weights, random() * total)] for i in range(k)]
Source: https://hg.python.org/cpython/file/tip/Lib/random.py#l340
A very basic and easy approach for a weighted choice is the following:
np.random.choice(['A', 'B', 'C'], p=[0.3, 0.4, 0.3])
import numpy as np
w=np.array([ 0.4, 0.8, 1.6, 0.8, 0.4])
np.random.choice(w, p=w/sum(w))
I'm probably too late to contribute anything useful, but here's a simple, short, and very efficient snippet:
def choose_index(probabilies):
cmf = probabilies[0]
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]
No need to sort your probabilities or create a vector with your cmf, and it terminates once it finds its choice. Memory: O(1), time: O(N), with average running time ~ N/2.
If you have weights, simply add one line:
def choose_index(weights):
probabilities = weights / sum(weights)
cmf = probabilies[0]
choice = random.random()
for k in xrange(len(probabilies)):
if choice <= cmf:
return k
else:
cmf += probabilies[k+1]
If your list of weighted choices is relatively static, and you want frequent sampling, you can do one O(N) preprocessing step, and then do the selection in O(1), using the functions in this related answer.
# run only when `choices` changes.
preprocessed_data = prep(weight for _,weight in choices)
# O(1) selection
value = choices[sample(preprocessed_data)][0]
If you happen to have Python 3, and are afraid of installing numpy or writing your own loops, you could do:
import itertools, bisect, random
def weighted_choice(choices):
weights = list(zip(*choices))[1]
return choices[bisect.bisect(list(itertools.accumulate(weights)),
random.uniform(0, sum(weights)))][0]
Because you can build anything out of a bag of plumbing adaptors! Although... I must admit that Ned's answer, while slightly longer, is easier to understand.
I looked the pointed other thread and came up with this variation in my coding style, this returns the index of choice for purpose of tallying, but it is simple to return the string ( commented return alternative):
import random
import bisect
try:
range = xrange
except:
pass
def weighted_choice(choices):
total, cumulative = 0, []
for c,w in choices:
total += w
cumulative.append((total, c))
r = random.uniform(0, total)
# return index
return bisect.bisect(cumulative, (r,))
# return item string
#return choices[bisect.bisect(cumulative, (r,))][0]
# define choices and relative weights
choices = [("WHITE",90), ("RED",8), ("GREEN",2)]
tally = [0 for item in choices]
n = 100000
# tally up n weighted choices
for i in range(n):
tally[weighted_choice(choices)] += 1
print([t/sum(tally)*100 for t in tally])
A general solution:
import random
def weighted_choice(choices, weights):
total = sum(weights)
treshold = random.uniform(0, total)
for k, weight in enumerate(weights):
total -= weight
if total < treshold:
return choices[k]
Here is another version of weighted_choice that uses numpy. Pass in the weights vector and it will return an array of 0's containing a 1 indicating which bin was chosen. The code defaults to just making a single draw but you can pass in the number of draws to be made and the counts per bin drawn will be returned.
If the weights vector does not sum to 1, it will be normalized so that it does.
import numpy as np
def weighted_choice(weights, n=1):
if np.sum(weights)!=1:
weights = weights/np.sum(weights)
draws = np.random.random_sample(size=n)
weights = np.cumsum(weights)
weights = np.insert(weights,0,0.0)
counts = np.histogram(draws, bins=weights)
return(counts[0])
It depends on how many times you want to sample the distribution.
Suppose you want to sample the distribution K times. Then, the time complexity using np.random.choice() each time is O(K(n + log(n))) when n is the number of items in the distribution.
In my case, I needed to sample the same distribution multiple times of the order of 10^3 where n is of the order of 10^6. I used the below code, which precomputes the cumulative distribution and samples it in O(log(n)). Overall time complexity is O(n+K*log(n)).
import numpy as np
n,k = 10**6,10**3
# Create dummy distribution
a = np.array([i+1 for i in range(n)])
p = np.array([1.0/n]*n)
cfd = p.cumsum()
for _ in range(k):
x = np.random.uniform()
idx = cfd.searchsorted(x, side='right')
sampled_element = a[idx]
There is lecture on this by Sebastien Thurn in the free Udacity course AI for Robotics. Basically he makes a circular array of the indexed weights using the mod operator %, sets a variable beta to 0, randomly chooses an index,
for loops through N where N is the number of indices and in the for loop firstly increments beta by the formula:
beta = beta + uniform sample from {0...2* Weight_max}
and then nested in the for loop, a while loop per below:
while w[index] < beta:
beta = beta - w[index]
index = index + 1
select p[index]
Then on to the next index to resample based on the probabilities (or normalized probability in the case presented in the course).
On Udacity find Lesson 8, video number 21 of Artificial Intelligence for Robotics where he is lecturing on particle filters.
Another way of doing this, assuming we have weights at the same index as the elements in the element array.
import numpy as np
weights = [0.1, 0.3, 0.5] #weights for the item at index 0,1,2
# sum of weights should be <=1, you can also divide each weight by sum of all weights to standardise it to <=1 constraint.
trials = 1 #number of trials
num_item = 1 #number of items that can be picked in each trial
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# gives number of times an item was selected at a particular index
# this assumes selection with replacement
# one possible output
# selected_item_arr
# array([[0, 0, 1]])
# say if trials = 5, the the possible output could be
# selected_item_arr
# array([[1, 0, 0],
# [0, 0, 1],
# [0, 0, 1],
# [0, 1, 0],
# [0, 0, 1]])
Now let's assume, we have to sample out 3 items in 1 trial. You can assume that there are three balls R,G,B present in large quantity in ratio of their weights given by weight array, the following could be possible outcome:
num_item = 3
trials = 1
selected_item_arr = np.random.multinomial(num_item, weights, trials)
# selected_item_arr can give output like :
# array([[1, 0, 2]])
you can also think number of items to be selected as number of binomial/ multinomial trials within a set. So, the above example can be still work as
num_binomial_trial = 5
weights = [0.1,0.9] #say an unfair coin weights for H/T
num_experiment_set = 1
selected_item_arr = np.random.multinomial(num_binomial_trial, weights, num_experiment_set)
# possible output
# selected_item_arr
# array([[1, 4]])
# i.e H came 1 time and T came 4 times in 5 binomial trials. And one set contains 5 binomial trails.
let's say you have
items = [11, 23, 43, 91]
probability = [0.2, 0.3, 0.4, 0.1]
and you have function which generates a random number between [0, 1) (we can use random.random() here).
so now take the prefix sum of probability
prefix_probability=[0.2,0.5,0.9,1]
now we can just take a random number between 0-1 and use binary search to find where that number belongs in prefix_probability. that index will be your answer
Code will go something like this
return items[bisect.bisect(prefix_probability,random.random())]
One way is to randomize on the total of all the weights and then use the values as the limit points for each var. Here is a crude implementation as a generator.
def rand_weighted(weights):
"""
Generator which uses the weights to generate a
weighted random values
"""
sum_weights = sum(weights.values())
cum_weights = {}
current_weight = 0
for key, value in sorted(weights.iteritems()):
current_weight += value
cum_weights[key] = current_weight
while True:
sel = int(random.uniform(0, 1) * sum_weights)
for key, value in sorted(cum_weights.iteritems()):
if sel < value:
break
yield key
Using numpy
def choice(items, weights):
return items[np.argmin((np.cumsum(weights) / sum(weights)) < np.random.rand())]
I needed to do something like this really fast really simple, from searching for ideas i finally built this template. The idea is receive the weighted values in a form of a json from the api, which here is simulated by the dict.
Then translate it into a list in which each value repeats proportionally to it's weight, and just use random.choice to select a value from the list.
I tried it running with 10, 100 and 1000 iterations. The distribution seems pretty solid.
def weighted_choice(weighted_dict):
"""Input example: dict(apples=60, oranges=30, pineapples=10)"""
weight_list = []
for key in weighted_dict.keys():
weight_list += [key] * weighted_dict[key]
return random.choice(weight_list)
I didn't love the syntax of any of those. I really wanted to just specify what the items were and what the weighting of each was. I realize I could have used random.choices but instead I quickly wrote the class below.
import random, string
from numpy import cumsum
class randomChoiceWithProportions:
'''
Accepts a dictionary of choices as keys and weights as values. Example if you want a unfair dice:
choiceWeightDic = {"1":0.16666666666666666, "2": 0.16666666666666666, "3": 0.16666666666666666
, "4": 0.16666666666666666, "5": .06666666666666666, "6": 0.26666666666666666}
dice = randomChoiceWithProportions(choiceWeightDic)
samples = []
for i in range(100000):
samples.append(dice.sample())
# Should be close to .26666
samples.count("6")/len(samples)
# Should be close to .16666
samples.count("1")/len(samples)
'''
def __init__(self, choiceWeightDic):
self.choiceWeightDic = choiceWeightDic
weightSum = sum(self.choiceWeightDic.values())
assert weightSum == 1, 'Weights sum to ' + str(weightSum) + ', not 1.'
self.valWeightDict = self._compute_valWeights()
def _compute_valWeights(self):
valWeights = list(cumsum(list(self.choiceWeightDic.values())))
valWeightDict = dict(zip(list(self.choiceWeightDic.keys()), valWeights))
return valWeightDict
def sample(self):
num = random.uniform(0,1)
for key, val in self.valWeightDict.items():
if val >= num:
return key
Provide random.choice() with a pre-weighted list:
Solution & Test:
import random
options = ['a', 'b', 'c', 'd']
weights = [1, 2, 5, 2]
weighted_options = [[opt]*wgt for opt, wgt in zip(options, weights)]
weighted_options = [opt for sublist in weighted_options for opt in sublist]
print(weighted_options)
# test
counts = {c: 0 for c in options}
for x in range(10000):
counts[random.choice(weighted_options)] += 1
for opt, wgt in zip(options, weights):
wgt_r = counts[opt] / 10000 * sum(weights)
print(opt, counts[opt], wgt, wgt_r)
Output:
['a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'd', 'd']
a 1025 1 1.025
b 1948 2 1.948
c 5019 5 5.019
d 2008 2 2.008
In case you don't define in advance how many items you want to pick (so, you don't do something like k=10) and you just have probabilities, you can do the below. Note that your probabilities do not need to add up to 1, they can be independent of each other:
soup_items = ['pepper', 'onion', 'tomato', 'celery']
items_probability = [0.2, 0.3, 0.9, 0.1]
selected_items = [item for item,p in zip(soup_items,items_probability) if random.random()<p]
print(selected_items)
>>>['pepper','tomato']
Step-1: Generate CDF F in which you're interesting
Step-2: Generate u.r.v. u
Step-3: Evaluate z=F^{-1}(u)
This modeling is described in course of probability theory or stochastic processes. This is applicable just because you have easy CDF.

Python Pandas How to get rid of groupings with only 1 row?

In my dataset, I am trying to get the margin between two values. The code below runs perfectly if the fourth race was not included. After grouping based on a column, it seems that sometimes, there will be only 1 value, therefore, no other value to get a margin out of. I want to ignore these groupings in that case. Here is my current code:
import pandas as pd
data = {'Name':['A', 'B', 'B', 'C', 'A', 'C', 'A'], 'RaceNumber':
[1, 1, 2, 2, 3, 3, 4], 'PlaceWon':['First', 'Second', 'First', 'Second', 'First', 'Second', 'First'], 'TimeRanInSec':[100, 98, 66, 60, 75, 70, 75]}
df = pd.DataFrame(data)
print(df)
def winning_margin(times):
times = list(times)
winner = min(times)
times.remove(winner)
return min(times) - winner
winning_margins = df[['RaceNumber', 'TimeRanInSec']] \
.groupby('RaceNumber').agg(winning_margin)
winning_margins.columns = ['margin']
winners = df.loc[df.PlaceWon == 'First', :]
winners = winners.join(winning_margins, on='RaceNumber')
avg_margins = winners[['Name', 'margin']].groupby('Name').mean()
avg_margins
How about returning a NaN if times does not have enough elements:
import numpy as np
def winning_margin(times):
if len(times) <= 1: # New code
return np.NaN # New code
times = list(times)
winner = min(times)
times.remove(winner)
return min(times) - winner
your code runs with this change and seem to produce sensible results. But you can furthermore remove NaNs later if you want eg in this line
winning_margins = df[['RaceNumber', 'TimeRanInSec']] \
.groupby('RaceNumber').agg(winning_margin).dropna() # note the addition of .dropna()
You could get the winner and margin in one step:
def get_margin(x):
if len(x) < 2:
return np.NaN
i = x['TimeRanInSec'].idxmin()
nl = x['TimeRanInSec'].nsmallest(2)
margin = nl.max()-nl.min()
return [x['Name'].loc[i], margin]
Then:
df.groupby('RaceNumber').apply(get_margin).dropna()
RaceNumber
1 [B, 2]
2 [C, 6]
3 [C, 5]
(the data has the 'First' indicator corresponding to the slower time in the data)

Using python need to get the substrings

Q)After executing the code Need to print the values [1, 12, 123, 2, 23, 3, 13], but iam getting [1, 12, 123, 2, 23, 3]. I have missing the letter 13. can any one tell me the reason to overcome that error?
def get_all_substrings(string):
length = len(string)
list = []
for i in range(length):
for j in range(i,length):
list.append(string[i:j+1])
return list
values = get_all_substrings('123')
results = list(map(int, values))
print(results)
count = 0
for i in results:
if i > 1 :
if (i % 2) != 0:
count += 1
print(count)
Pretty straight forward issue in your nested for loops within get_all_substrings(), lets walk it!
You are iterating over each element of your string 123:
for i in range(length) # we know length to be 3, so range is 0, 1, 2
You then iterate each subsequent element from the current i:
for j in range(i,length)
Finally you append a string from position i to j+1 using the slice operator:
list.append(string[i:j+1])
But what exactly is happening? Well we can step through further!
The first value of i is 0, so lets skip the first for, go to the second:
for j in range(0, 3): # i.e. the whole string!
# you would eventually execute all of the following
list.append(string[0:0 + 1]) # '1'
list.append(string[0:1 + 1]) # '12'
list.append(string[0:2 + 1]) # '123'
# but wait...were is '13'???? (this is your hint!)
The next value of i is 1:
for j in range(1, 3):
# you would eventually execute all of the following
list.append(string[1:1 + 1]) # '2'
list.append(string[1:2 + 1]) # '23'
# notice how we are only grabbing values of position i or more?
Finally you get to i is 2:
for j in range(2, 3): # i.e. the whole string!
# you would eventually execute all of the following
list.append(string[2:2 + 1]) # '3'
I've shown you what is happening (as you've asked in your question), I leave it to you to devise your own solution. A couple notes:
You need to look at all index combinations from position i
Dont name objects by their type (i.e. dont name a list object list)
I would try something like this using itertools and powerset() recipe
from itertools import chain, combinations
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))
output = list(map(''.join, powerset('123')))
output.pop(0)
Here is another option, using combinations
from itertools import combinations
def get_sub_ints(raw):
return [''.join(sub) for i in range(1, len(raw) + 1) for sub in combinations(raw, i)]
if __name__ == '__main__':
print(get_sub_ints('123'))
>>> ['1', '2', '3', '12', '13', '23', '123']

what does the colon mean? I am searching for a long time on net. But no use [duplicate]

How does Python's slice notation work? That is: when I write code like a[x:y:z], a[:], a[::2] etc., how can I understand which elements end up in the slice? Please include references where appropriate.
See Why are slice and range upper-bound exclusive? for more discussion of the design decisions behind the notation.
See Pythonic way to return list of every nth item in a larger list for the most common practical usage of slicing (and other ways to solve the problem): getting every Nth element of a list. Please use that question instead as a duplicate target where appropriate.
The syntax is:
a[start:stop] # items start through stop-1
a[start:] # items start through the rest of the array
a[:stop] # items from the beginning through stop-1
a[:] # a copy of the whole array
There is also the step value, which can be used with any of the above:
a[start:stop:step] # start through not past stop, by step
The key point to remember is that the :stop value represents the first value that is not in the selected slice. So, the difference between stop and start is the number of elements selected (if step is 1, the default).
The other feature is that start or stop may be a negative number, which means it counts from the end of the array instead of the beginning. So:
a[-1] # last item in the array
a[-2:] # last two items in the array
a[:-2] # everything except the last two items
Similarly, step may be a negative number:
a[::-1] # all items in the array, reversed
a[1::-1] # the first two items, reversed
a[:-3:-1] # the last two items, reversed
a[-3::-1] # everything except the last two items, reversed
Python is kind to the programmer if there are fewer items than you ask for. For example, if you ask for a[:-2] and a only contains one element, you get an empty list instead of an error. Sometimes you would prefer the error, so you have to be aware that this may happen.
Relationship with the slice object
A slice object can represent a slicing operation, i.e.:
a[start:stop:step]
is equivalent to:
a[slice(start, stop, step)]
Slice objects also behave slightly differently depending on the number of arguments, similarly to range(), i.e. both slice(stop) and slice(start, stop[, step]) are supported.
To skip specifying a given argument, one might use None, so that e.g. a[start:] is equivalent to a[slice(start, None)] or a[::-1] is equivalent to a[slice(None, None, -1)].
While the :-based notation is very helpful for simple slicing, the explicit use of slice() objects simplifies the programmatic generation of slicing.
The Python tutorial talks about it (scroll down a bit until you get to the part about slicing).
The ASCII art diagram is helpful too for remembering how slices work:
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
0 1 2 3 4 5 6
-6 -5 -4 -3 -2 -1
One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n.
Enumerating the possibilities allowed by the grammar for the sequence x:
>>> x[:] # [x[0], x[1], ..., x[-1] ]
>>> x[low:] # [x[low], x[low+1], ..., x[-1] ]
>>> x[:high] # [x[0], x[1], ..., x[high-1]]
>>> x[low:high] # [x[low], x[low+1], ..., x[high-1]]
>>> x[::stride] # [x[0], x[stride], ..., x[-1] ]
>>> x[low::stride] # [x[low], x[low+stride], ..., x[-1] ]
>>> x[:high:stride] # [x[0], x[stride], ..., x[high-1]]
>>> x[low:high:stride] # [x[low], x[low+stride], ..., x[high-1]]
Of course, if (high-low)%stride != 0, then the end point will be a little lower than high-1.
If stride is negative, the ordering is changed a bit since we're counting down:
>>> x[::-stride] # [x[-1], x[-1-stride], ..., x[0] ]
>>> x[high::-stride] # [x[high], x[high-stride], ..., x[0] ]
>>> x[:low:-stride] # [x[-1], x[-1-stride], ..., x[low+1]]
>>> x[high:low:-stride] # [x[high], x[high-stride], ..., x[low+1]]
Extended slicing (with commas and ellipses) are mostly used only by special data structures (like NumPy); the basic sequences don't support them.
>>> class slicee:
... def __getitem__(self, item):
... return repr(item)
...
>>> slicee()[0, 1:2, ::5, ...]
'(0, slice(1, 2, None), slice(None, None, 5), Ellipsis)'
The answers above don't discuss slice assignment. To understand slice assignment, it's helpful to add another concept to the ASCII art:
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
Slice position: 0 1 2 3 4 5 6
Index position: 0 1 2 3 4 5
>>> p = ['P','y','t','h','o','n']
# Why the two sets of numbers:
# indexing gives items, not lists
>>> p[0]
'P'
>>> p[5]
'n'
# Slicing gives lists
>>> p[0:1]
['P']
>>> p[0:2]
['P','y']
One heuristic is, for a slice from zero to n, think: "zero is the beginning, start at the beginning and take n items in a list".
>>> p[5] # the last of six items, indexed from zero
'n'
>>> p[0:5] # does NOT include the last item!
['P','y','t','h','o']
>>> p[0:6] # not p[0:5]!!!
['P','y','t','h','o','n']
Another heuristic is, "for any slice, replace the start by zero, apply the previous heuristic to get the end of the list, then count the first number back up to chop items off the beginning"
>>> p[0:4] # Start at the beginning and count out 4 items
['P','y','t','h']
>>> p[1:4] # Take one item off the front
['y','t','h']
>>> p[2:4] # Take two items off the front
['t','h']
# etc.
The first rule of slice assignment is that since slicing returns a list, slice assignment requires a list (or other iterable):
>>> p[2:3]
['t']
>>> p[2:3] = ['T']
>>> p
['P','y','T','h','o','n']
>>> p[2:3] = 't'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only assign an iterable
The second rule of slice assignment, which you can also see above, is that whatever portion of the list is returned by slice indexing, that's the same portion that is changed by slice assignment:
>>> p[2:4]
['T','h']
>>> p[2:4] = ['t','r']
>>> p
['P','y','t','r','o','n']
The third rule of slice assignment is, the assigned list (iterable) doesn't have to have the same length; the indexed slice is simply sliced out and replaced en masse by whatever is being assigned:
>>> p = ['P','y','t','h','o','n'] # Start over
>>> p[2:4] = ['s','p','a','m']
>>> p
['P','y','s','p','a','m','o','n']
The trickiest part to get used to is assignment to empty slices. Using heuristic 1 and 2 it's easy to get your head around indexing an empty slice:
>>> p = ['P','y','t','h','o','n']
>>> p[0:4]
['P','y','t','h']
>>> p[1:4]
['y','t','h']
>>> p[2:4]
['t','h']
>>> p[3:4]
['h']
>>> p[4:4]
[]
And then once you've seen that, slice assignment to the empty slice makes sense too:
>>> p = ['P','y','t','h','o','n']
>>> p[2:4] = ['x','y'] # Assigned list is same length as slice
>>> p
['P','y','x','y','o','n'] # Result is same length
>>> p = ['P','y','t','h','o','n']
>>> p[3:4] = ['x','y'] # Assigned list is longer than slice
>>> p
['P','y','t','x','y','o','n'] # The result is longer
>>> p = ['P','y','t','h','o','n']
>>> p[4:4] = ['x','y']
>>> p
['P','y','t','h','x','y','o','n'] # The result is longer still
Note that, since we are not changing the second number of the slice (4), the inserted items always stack right up against the 'o', even when we're assigning to the empty slice. So the position for the empty slice assignment is the logical extension of the positions for the non-empty slice assignments.
Backing up a little bit, what happens when you keep going with our procession of counting up the slice beginning?
>>> p = ['P','y','t','h','o','n']
>>> p[0:4]
['P','y','t','h']
>>> p[1:4]
['y','t','h']
>>> p[2:4]
['t','h']
>>> p[3:4]
['h']
>>> p[4:4]
[]
>>> p[5:4]
[]
>>> p[6:4]
[]
With slicing, once you're done, you're done; it doesn't start slicing backwards. In Python you don't get negative strides unless you explicitly ask for them by using a negative number.
>>> p[5:3:-1]
['n','o']
There are some weird consequences to the "once you're done, you're done" rule:
>>> p[4:4]
[]
>>> p[5:4]
[]
>>> p[6:4]
[]
>>> p[6]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
In fact, compared to indexing, Python slicing is bizarrely error-proof:
>>> p[100:200]
[]
>>> p[int(2e99):int(1e99)]
[]
This can come in handy sometimes, but it can also lead to somewhat strange behavior:
>>> p
['P', 'y', 't', 'h', 'o', 'n']
>>> p[int(2e99):int(1e99)] = ['p','o','w','e','r']
>>> p
['P', 'y', 't', 'h', 'o', 'n', 'p', 'o', 'w', 'e', 'r']
Depending on your application, that might... or might not... be what you were hoping for there!
Below is the text of my original answer. It has been useful to many people, so I didn't want to delete it.
>>> r=[1,2,3,4]
>>> r[1:1]
[]
>>> r[1:1]=[9,8]
>>> r
[1, 9, 8, 2, 3, 4]
>>> r[1:1]=['blah']
>>> r
[1, 'blah', 9, 8, 2, 3, 4]
This may also clarify the difference between slicing and indexing.
Explain Python's slice notation
In short, the colons (:) in subscript notation (subscriptable[subscriptarg]) make slice notation, which has the optional arguments start, stop, and step:
sliceable[start:stop:step]
Python slicing is a computationally fast way to methodically access parts of your data. In my opinion, to be even an intermediate Python programmer, it's one aspect of the language that it is necessary to be familiar with.
Important Definitions
To begin with, let's define a few terms:
start: the beginning index of the slice, it will include the element at this index unless it is the same as stop, defaults to 0, i.e. the first index. If it's negative, it means to start n items from the end.
stop: the ending index of the slice, it does not include the element at this index, defaults to length of the sequence being sliced, that is, up to and including the end.
step: the amount by which the index increases, defaults to 1. If it's negative, you're slicing over the iterable in reverse.
How Indexing Works
You can make any of these positive or negative numbers. The meaning of the positive numbers is straightforward, but for negative numbers, just like indexes in Python, you count backwards from the end for the start and stop, and for the step, you simply decrement your index. This example is from the documentation's tutorial, but I've modified it slightly to indicate which item in a sequence each index references:
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
0 1 2 3 4 5
-6 -5 -4 -3 -2 -1
How Slicing Works
To use slice notation with a sequence that supports it, you must include at least one colon in the square brackets that follow the sequence (which actually implement the __getitem__ method of the sequence, according to the Python data model.)
Slice notation works like this:
sequence[start:stop:step]
And recall that there are defaults for start, stop, and step, so to access the defaults, simply leave out the argument.
Slice notation to get the last nine elements from a list (or any other sequence that supports it, like a string) would look like this:
my_list[-9:]
When I see this, I read the part in the brackets as "9th from the end, to the end." (Actually, I abbreviate it mentally as "-9, on")
Explanation:
The full notation is
my_list[-9:None:None]
and to substitute the defaults (actually when step is negative, stop's default is -len(my_list) - 1, so None for stop really just means it goes to whichever end step takes it to):
my_list[-9:len(my_list):1]
The colon, :, is what tells Python you're giving it a slice and not a regular index. That's why the idiomatic way of making a shallow copy of lists in Python 2 is
list_copy = sequence[:]
And clearing them is with:
del my_list[:]
(Python 3 gets a list.copy and list.clear method.)
When step is negative, the defaults for start and stop change
By default, when the step argument is empty (or None), it is assigned to +1.
But you can pass in a negative integer, and the list (or most other standard sliceables) will be sliced from the end to the beginning.
Thus a negative slice will change the defaults for start and stop!
Confirming this in the source
I like to encourage users to read the source as well as the documentation. The source code for slice objects and this logic is found here. First we determine if step is negative:
step_is_negative = step_sign < 0;
If so, the lower bound is -1 meaning we slice all the way up to and including the beginning, and the upper bound is the length minus 1, meaning we start at the end. (Note that the semantics of this -1 is different from a -1 that users may pass indexes in Python indicating the last item.)
if (step_is_negative) {
lower = PyLong_FromLong(-1L);
if (lower == NULL)
goto error;
upper = PyNumber_Add(length, lower);
if (upper == NULL)
goto error;
}
Otherwise step is positive, and the lower bound will be zero and the upper bound (which we go up to but not including) the length of the sliced list.
else {
lower = _PyLong_Zero;
Py_INCREF(lower);
upper = length;
Py_INCREF(upper);
}
Then, we may need to apply the defaults for start and stop—the default then for start is calculated as the upper bound when step is negative:
if (self->start == Py_None) {
start = step_is_negative ? upper : lower;
Py_INCREF(start);
}
and stop, the lower bound:
if (self->stop == Py_None) {
stop = step_is_negative ? lower : upper;
Py_INCREF(stop);
}
Give your slices a descriptive name!
You may find it useful to separate forming the slice from passing it to the list.__getitem__ method (that's what the square brackets do). Even if you're not new to it, it keeps your code more readable so that others that may have to read your code can more readily understand what you're doing.
However, you can't just assign some integers separated by colons to a variable. You need to use the slice object:
last_nine_slice = slice(-9, None)
The second argument, None, is required, so that the first argument is interpreted as the start argument otherwise it would be the stop argument.
You can then pass the slice object to your sequence:
>>> list(range(100))[last_nine_slice]
[91, 92, 93, 94, 95, 96, 97, 98, 99]
It's interesting that ranges also take slices:
>>> range(100)[last_nine_slice]
range(91, 100)
Memory Considerations:
Since slices of Python lists create new objects in memory, another important function to be aware of is itertools.islice. Typically you'll want to iterate over a slice, not just have it created statically in memory. islice is perfect for this. A caveat, it doesn't support negative arguments to start, stop, or step, so if that's an issue you may need to calculate indices or reverse the iterable in advance.
length = 100
last_nine_iter = itertools.islice(list(range(length)), length-9, None, 1)
list_last_nine = list(last_nine_iter)
and now:
>>> list_last_nine
[91, 92, 93, 94, 95, 96, 97, 98, 99]
The fact that list slices make a copy is a feature of lists themselves. If you're slicing advanced objects like a Pandas DataFrame, it may return a view on the original, and not a copy.
And a couple of things that weren't immediately obvious to me when I first saw the slicing syntax:
>>> x = [1,2,3,4,5,6]
>>> x[::-1]
[6,5,4,3,2,1]
Easy way to reverse sequences!
And if you wanted, for some reason, every second item in the reversed sequence:
>>> x = [1,2,3,4,5,6]
>>> x[::-2]
[6,4,2]
In Python 2.7
Slicing in Python
[a:b:c]
len = length of string, tuple or list
c -- default is +1. The sign of c indicates forward or backward, absolute value of c indicates steps. Default is forward with step size 1. Positive means forward, negative means backward.
a -- When c is positive or blank, default is 0. When c is negative, default is -1.
b -- When c is positive or blank, default is len. When c is negative, default is -(len+1).
Understanding index assignment is very important.
In forward direction, starts at 0 and ends at len-1
In backward direction, starts at -1 and ends at -len
When you say [a:b:c], you are saying depending on the sign of c (forward or backward), start at a and end at b (excluding element at bth index). Use the indexing rule above and remember you will only find elements in this range:
-len, -len+1, -len+2, ..., 0, 1, 2,3,4 , len -1
But this range continues in both directions infinitely:
...,-len -2 ,-len-1,-len, -len+1, -len+2, ..., 0, 1, 2,3,4 , len -1, len, len +1, len+2 , ....
For example:
0 1 2 3 4 5 6 7 8 9 10 11
a s t r i n g
-9 -8 -7 -6 -5 -4 -3 -2 -1
If your choice of a, b, and c allows overlap with the range above as you traverse using rules for a,b,c above you will either get a list with elements (touched during traversal) or you will get an empty list.
One last thing: if a and b are equal, then also you get an empty list:
>>> l1
[2, 3, 4]
>>> l1[:]
[2, 3, 4]
>>> l1[::-1] # a default is -1 , b default is -(len+1)
[4, 3, 2]
>>> l1[:-4:-1] # a default is -1
[4, 3, 2]
>>> l1[:-3:-1] # a default is -1
[4, 3]
>>> l1[::] # c default is +1, so a default is 0, b default is len
[2, 3, 4]
>>> l1[::-1] # c is -1 , so a default is -1 and b default is -(len+1)
[4, 3, 2]
>>> l1[-100:-200:-1] # Interesting
[]
>>> l1[-1:-200:-1] # Interesting
[4, 3, 2]
>>> l1[-1:-1:1]
[]
>>> l1[-1:5:1] # Interesting
[4]
>>> l1[1:-7:1]
[]
>>> l1[1:-7:-1] # Interesting
[3, 2]
>>> l1[:-2:-2] # a default is -1, stop(b) at -2 , step(c) by 2 in reverse direction
[4]
Found this great table at http://wiki.python.org/moin/MovingToPythonFromOtherLanguages
Python indexes and slices for a six-element list.
Indexes enumerate the elements, slices enumerate the spaces between the elements.
Index from rear: -6 -5 -4 -3 -2 -1 a=[0,1,2,3,4,5] a[1:]==[1,2,3,4,5]
Index from front: 0 1 2 3 4 5 len(a)==6 a[:5]==[0,1,2,3,4]
+---+---+---+---+---+---+ a[0]==0 a[:-2]==[0,1,2,3]
| a | b | c | d | e | f | a[5]==5 a[1:2]==[1]
+---+---+---+---+---+---+ a[-1]==5 a[1:-1]==[1,2,3,4]
Slice from front: : 1 2 3 4 5 : a[-2]==4
Slice from rear: : -5 -4 -3 -2 -1 :
b=a[:]
b==[0,1,2,3,4,5] (shallow copy of a)
After using it a bit I realise that the simplest description is that it is exactly the same as the arguments in a for loop...
(from:to:step)
Any of them are optional:
(:to:step)
(from::step)
(from:to)
Then the negative indexing just needs you to add the length of the string to the negative indices to understand it.
This works for me anyway...
I find it easier to remember how it works, and then I can figure out any specific start/stop/step combination.
It's instructive to understand range() first:
def range(start=0, stop, step=1): # Illegal syntax, but that's the effect
i = start
while (i < stop if step > 0 else i > stop):
yield i
i += step
Begin from start, increment by step, do not reach stop. Very simple.
The thing to remember about negative step is that stop is always the excluded end, whether it's higher or lower. If you want same slice in opposite order, it's much cleaner to do the reversal separately: e.g. 'abcde'[1:-2][::-1] slices off one char from left, two from right, then reverses. (See also reversed().)
Sequence slicing is same, except it first normalizes negative indexes, and it can never go outside the sequence:
TODO: The code below had a bug with "never go outside the sequence" when abs(step)>1; I think I patched it to be correct, but it's hard to understand.
def this_is_how_slicing_works(seq, start=None, stop=None, step=1):
if start is None:
start = (0 if step > 0 else len(seq)-1)
elif start < 0:
start += len(seq)
if not 0 <= start < len(seq): # clip if still outside bounds
start = (0 if step > 0 else len(seq)-1)
if stop is None:
stop = (len(seq) if step > 0 else -1) # really -1, not last element
elif stop < 0:
stop += len(seq)
for i in range(start, stop, step):
if 0 <= i < len(seq):
yield seq[i]
Don't worry about the is None details - just remember that omitting start and/or stop always does the right thing to give you the whole sequence.
Normalizing negative indexes first allows start and/or stop to be counted from the end independently: 'abcde'[1:-2] == 'abcde'[1:3] == 'bc' despite range(1,-2) == [].
The normalization is sometimes thought of as "modulo the length", but note it adds the length just once: e.g. 'abcde'[-53:42] is just the whole string.
I use the "an index points between elements" method of thinking about it myself, but one way of describing it which sometimes helps others get it is this:
mylist[X:Y]
X is the index of the first element you want.
Y is the index of the first element you don't want.
Index:
------------>
0 1 2 3 4
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+
0 -4 -3 -2 -1
<------------
Slice:
<---------------|
|--------------->
: 1 2 3 4 :
+---+---+---+---+---+
| a | b | c | d | e |
+---+---+---+---+---+
: -4 -3 -2 -1 :
|--------------->
<---------------|
I hope this will help you to model the list in Python.
Reference: http://wiki.python.org/moin/MovingToPythonFromOtherLanguages
This is how I teach slices to newbies:
Understanding the difference between indexing and slicing:
Wiki Python has this amazing picture which clearly distinguishes indexing and slicing.
It is a list with six elements in it. To understand slicing better, consider that list as a set of six boxes placed together. Each box has an alphabet in it.
Indexing is like dealing with the contents of box. You can check contents of any box. But you can't check the contents of multiple boxes at once. You can even replace the contents of the box. But you can't place two balls in one box or replace two balls at a time.
In [122]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']
In [123]: alpha
Out[123]: ['a', 'b', 'c', 'd', 'e', 'f']
In [124]: alpha[0]
Out[124]: 'a'
In [127]: alpha[0] = 'A'
In [128]: alpha
Out[128]: ['A', 'b', 'c', 'd', 'e', 'f']
In [129]: alpha[0,1]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-129-c7eb16585371> in <module>()
----> 1 alpha[0,1]
TypeError: list indices must be integers, not tuple
Slicing is like dealing with boxes themselves. You can pick up the first box and place it on another table. To pick up the box, all you need to know is the position of beginning and ending of the box.
You can even pick up the first three boxes or the last two boxes or all boxes between 1 and 4. So, you can pick any set of boxes if you know the beginning and ending. These positions are called start and stop positions.
The interesting thing is that you can replace multiple boxes at once. Also you can place multiple boxes wherever you like.
In [130]: alpha[0:1]
Out[130]: ['A']
In [131]: alpha[0:1] = 'a'
In [132]: alpha
Out[132]: ['a', 'b', 'c', 'd', 'e', 'f']
In [133]: alpha[0:2] = ['A', 'B']
In [134]: alpha
Out[134]: ['A', 'B', 'c', 'd', 'e', 'f']
In [135]: alpha[2:2] = ['x', 'xx']
In [136]: alpha
Out[136]: ['A', 'B', 'x', 'xx', 'c', 'd', 'e', 'f']
Slicing With Step:
Till now you have picked boxes continuously. But sometimes you need to pick up discretely. For example, you can pick up every second box. You can even pick up every third box from the end. This value is called step size. This represents the gap between your successive pickups. The step size should be positive if You are picking boxes from the beginning to end and vice versa.
In [137]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']
In [142]: alpha[1:5:2]
Out[142]: ['b', 'd']
In [143]: alpha[-1:-5:-2]
Out[143]: ['f', 'd']
In [144]: alpha[1:5:-2]
Out[144]: []
In [145]: alpha[-1:-5:2]
Out[145]: []
How Python Figures Out Missing Parameters:
When slicing, if you leave out any parameter, Python tries to figure it out automatically.
If you check the source code of CPython, you will find a function called PySlice_GetIndicesEx() which figures out indices to a slice for any given parameters. Here is the logical equivalent code in Python.
This function takes a Python object and optional parameters for slicing and returns the start, stop, step, and slice length for the requested slice.
def py_slice_get_indices_ex(obj, start=None, stop=None, step=None):
length = len(obj)
if step is None:
step = 1
if step == 0:
raise Exception("Step cannot be zero.")
if start is None:
start = 0 if step > 0 else length - 1
else:
if start < 0:
start += length
if start < 0:
start = 0 if step > 0 else -1
if start >= length:
start = length if step > 0 else length - 1
if stop is None:
stop = length if step > 0 else -1
else:
if stop < 0:
stop += length
if stop < 0:
stop = 0 if step > 0 else -1
if stop >= length:
stop = length if step > 0 else length - 1
if (step < 0 and stop >= start) or (step > 0 and start >= stop):
slice_length = 0
elif step < 0:
slice_length = (stop - start + 1)/(step) + 1
else:
slice_length = (stop - start - 1)/(step) + 1
return (start, stop, step, slice_length)
This is the intelligence that is present behind slices. Since Python has an built-in function called slice, you can pass some parameters and check how smartly it calculates missing parameters.
In [21]: alpha = ['a', 'b', 'c', 'd', 'e', 'f']
In [22]: s = slice(None, None, None)
In [23]: s
Out[23]: slice(None, None, None)
In [24]: s.indices(len(alpha))
Out[24]: (0, 6, 1)
In [25]: range(*s.indices(len(alpha)))
Out[25]: [0, 1, 2, 3, 4, 5]
In [26]: s = slice(None, None, -1)
In [27]: range(*s.indices(len(alpha)))
Out[27]: [5, 4, 3, 2, 1, 0]
In [28]: s = slice(None, 3, -1)
In [29]: range(*s.indices(len(alpha)))
Out[29]: [5, 4]
Note: This post was originally written in my blog, The Intelligence Behind Python Slices.
Python slicing notation:
a[start:end:step]
For start and end, negative values are interpreted as being relative to the end of the sequence.
Positive indices for end indicate the position after the last element to be included.
Blank values are defaulted as follows: [+0:-0:1].
Using a negative step reverses the interpretation of start and end
The notation extends to (numpy) matrices and multidimensional arrays. For example, to slice entire columns you can use:
m[::,0:2:] ## slice the first two columns
Slices hold references, not copies, of the array elements. If you want to make a separate copy an array, you can use deepcopy().
You can also use slice assignment to remove one or more elements from a list:
r = [1, 'blah', 9, 8, 2, 3, 4]
>>> r[1:4] = []
>>> r
[1, 2, 3, 4]
This is just for some extra info...
Consider the list below
>>> l=[12,23,345,456,67,7,945,467]
Few other tricks for reversing the list:
>>> l[len(l):-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]
>>> l[:-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]
>>> l[len(l)::-1]
[467, 945, 7, 67, 456, 345, 23, 12]
>>> l[::-1]
[467, 945, 7, 67, 456, 345, 23, 12]
>>> l[-1:-len(l)-1:-1]
[467, 945, 7, 67, 456, 345, 23, 12]
1. Slice Notation
To make it simple, remember slice has only one form:
s[start:end:step]
and here is how it works:
s: an object that can be sliced
start: first index to start iteration
end: last index, NOTE that end index will not be included in the resulted slice
step: pick element every step index
Another import thing: all start,end, step can be omitted! And if they are omitted, their default value will be used: 0,len(s),1 accordingly.
So possible variations are:
# Mostly used variations
s[start:end]
s[start:]
s[:end]
# Step-related variations
s[:end:step]
s[start::step]
s[::step]
# Make a copy
s[:]
NOTE: If start >= end (considering only when step>0), Python will return a empty slice [].
2. Pitfalls
The above part explains the core features on how slice works, and it will work on most occasions. However, there can be pitfalls you should watch out, and this part explains them.
Negative indexes
The very first thing that confuses Python learners is that an index can be negative!
Don't panic: a negative index means count backwards.
For example:
s[-5:] # Start at the 5th index from the end of array,
# thus returning the last 5 elements.
s[:-5] # Start at index 0, and end until the 5th index from end of array,
# thus returning s[0:len(s)-5].
Negative step
Making things more confusing is that step can be negative too!
A negative step means iterate the array backwards: from the end to start, with the end index included, and the start index excluded from the result.
NOTE: when step is negative, the default value for start is len(s) (while end does not equal to 0, because s[::-1] contains s[0]). For example:
s[::-1] # Reversed slice
s[len(s)::-1] # The same as above, reversed slice
s[0:len(s):-1] # Empty list
Out of range error?
Be surprised: slice does not raise an IndexError when the index is out of range!
If the index is out of range, Python will try its best to set the index to 0 or len(s) according to the situation. For example:
s[:len(s)+5] # The same as s[:len(s)]
s[-len(s)-5::] # The same as s[0:]
s[len(s)+5::-1] # The same as s[len(s)::-1], and the same as s[::-1]
3. Examples
Let's finish this answer with examples, explaining everything we have discussed:
# Create our array for demonstration
In [1]: s = [i for i in range(10)]
In [2]: s
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]: s[2:] # From index 2 to last index
Out[3]: [2, 3, 4, 5, 6, 7, 8, 9]
In [4]: s[:8] # From index 0 up to index 8
Out[4]: [0, 1, 2, 3, 4, 5, 6, 7]
In [5]: s[4:7] # From index 4 (included) up to index 7(excluded)
Out[5]: [4, 5, 6]
In [6]: s[:-2] # Up to second last index (negative index)
Out[6]: [0, 1, 2, 3, 4, 5, 6, 7]
In [7]: s[-2:] # From second last index (negative index)
Out[7]: [8, 9]
In [8]: s[::-1] # From last to first in reverse order (negative step)
Out[8]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
In [9]: s[::-2] # All odd numbers in reversed order
Out[9]: [9, 7, 5, 3, 1]
In [11]: s[-2::-2] # All even numbers in reversed order
Out[11]: [8, 6, 4, 2, 0]
In [12]: s[3:15] # End is out of range, and Python will set it to len(s).
Out[12]: [3, 4, 5, 6, 7, 8, 9]
In [14]: s[5:1] # Start > end; return empty list
Out[14]: []
In [15]: s[11] # Access index 11 (greater than len(s)) will raise an IndexError
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-15-79ffc22473a3> in <module>()
----> 1 s[11]
IndexError: list index out of range
As a general rule, writing code with a lot of hardcoded index values leads to a readability
and maintenance mess. For example, if you come back to the code a year later, you’ll
look at it and wonder what you were thinking when you wrote it. The solution shown
is simply a way of more clearly stating what your code is actually doing.
In general, the built-in slice() creates a slice object that can be used anywhere a slice
is allowed. For example:
>>> items = [0, 1, 2, 3, 4, 5, 6]
>>> a = slice(2, 4)
>>> items[2:4]
[2, 3]
>>> items[a]
[2, 3]
>>> items[a] = [10,11]
>>> items
[0, 1, 10, 11, 4, 5, 6]
>>> del items[a]
>>> items
[0, 1, 4, 5, 6]
If you have a slice instance s, you can get more information about it by looking at its
s.start, s.stop, and s.step attributes, respectively. For example:
>>> a = slice(10, 50, 2)
>>> a.start
10
>>> a.stop
50
>>> a.step
2
>>>
The previous answers don't discuss multi-dimensional array slicing which is possible using the famous NumPy package:
Slicing can also be applied to multi-dimensional arrays.
# Here, a is a NumPy array
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a[:2, 0:3:2]
array([[1, 3],
[5, 7]])
The ":2" before the comma operates on the first dimension and the "0:3:2" after the comma operates on the second dimension.
The rules of slicing are as follows:
[lower bound : upper bound : step size]
I- Convert upper bound and lower bound into common signs.
II- Then check if the step size is a positive or a negative value.
(i) If the step size is a positive value, upper bound should be greater than lower bound, otherwise empty string is printed. For example:
s="Welcome"
s1=s[0:3:1]
print(s1)
The output:
Wel
However if we run the following code:
s="Welcome"
s1=s[3:0:1]
print(s1)
It will return an empty string.
(ii) If the step size if a negative value, upper bound should be lesser than lower bound, otherwise empty string will be printed. For example:
s="Welcome"
s1=s[3:0:-1]
print(s1)
The output:
cle
But if we run the following code:
s="Welcome"
s1=s[0:5:-1]
print(s1)
The output will be an empty string.
Thus in the code:
str = 'abcd'
l = len(str)
str2 = str[l-1:0:-1] #str[3:0:-1]
print(str2)
str2 = str[l-1:-1:-1] #str[3:-1:-1]
print(str2)
In the first str2=str[l-1:0:-1], the upper bound is lesser than the lower bound, thus dcb is printed.
However in str2=str[l-1:-1:-1], the upper bound is not less than the lower bound (upon converting lower bound into negative value which is -1: since index of last element is -1 as well as 3).
In my opinion, you will understand and memorize better the Python string slicing notation if you look at it the following way (read on).
Let's work with the following string ...
azString = "abcdefghijklmnopqrstuvwxyz"
For those who don't know, you can create any substring from azString using the notation azString[x:y]
Coming from other programming languages, that's when the common sense gets compromised. What are x and y?
I had to sit down and run several scenarios in my quest for a memorization technique that will help me remember what x and y are and help me slice strings properly at the first attempt.
My conclusion is that x and y should be seen as the boundary indexes that are surrounding the strings that we want to extra. So we should see the expression as azString[index1, index2] or even more clearer as azString[index_of_first_character, index_after_the_last_character].
Here is an example visualization of that ...
Letters a b c d e f g h i j ...
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
┊ ┊
Indexes 0 1 2 3 4 5 6 7 8 9 ...
┊ ┊
cdefgh index1 index2
So all you have to do is setting index1 and index2 to the values that will surround the desired substring. For instance, to get the substring "cdefgh", you can use azString[2:8], because the index on the left side of "c" is 2 and the one on the right size of "h" is 8.
Remember that we are setting the boundaries. And those boundaries are the positions where you could place some brackets that will be wrapped around the substring like this ...
a b [ c d e f g h ] i j
That trick works all the time and is easy to memorize.
I personally think about it like a for loop:
a[start:end:step]
# for(i = start; i < end; i += step)
Also, note that negative values for start and end are relative to the end of the list and computed in the example above by given_index + a.shape[0].
#!/usr/bin/env python
def slicegraphical(s, lista):
if len(s) > 9:
print """Enter a string of maximum 9 characters,
so the printig would looki nice"""
return 0;
# print " ",
print ' '+'+---' * len(s) +'+'
print ' ',
for letter in s:
print '| {}'.format(letter),
print '|'
print " ",; print '+---' * len(s) +'+'
print " ",
for letter in range(len(s) +1):
print '{} '.format(letter),
print ""
for letter in range(-1*(len(s)), 0):
print ' {}'.format(letter),
print ''
print ''
for triada in lista:
if len(triada) == 3:
if triada[0]==None and triada[1] == None and triada[2] == None:
# 000
print s+'[ : : ]' +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] == None and triada[1] == None and triada[2] != None:
# 001
print s+'[ : :{0:2d} ]'.format(triada[2], '','') +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] == None and triada[1] != None and triada[2] == None:
# 010
print s+'[ :{0:2d} : ]'.format(triada[1]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] == None and triada[1] != None and triada[2] != None:
# 011
print s+'[ :{0:2d} :{1:2d} ]'.format(triada[1], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] != None and triada[1] == None and triada[2] == None:
# 100
print s+'[{0:2d} : : ]'.format(triada[0]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] != None and triada[1] == None and triada[2] != None:
# 101
print s+'[{0:2d} : :{1:2d} ]'.format(triada[0], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] != None and triada[1] != None and triada[2] == None:
# 110
print s+'[{0:2d} :{1:2d} : ]'.format(triada[0], triada[1]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif triada[0] != None and triada[1] != None and triada[2] != None:
# 111
print s+'[{0:2d} :{1:2d} :{2:2d} ]'.format(triada[0], triada[1], triada[2]) +' = ', s[triada[0]:triada[1]:triada[2]]
elif len(triada) == 2:
if triada[0] == None and triada[1] == None:
# 00
print s+'[ : ] ' + ' = ', s[triada[0]:triada[1]]
elif triada[0] == None and triada[1] != None:
# 01
print s+'[ :{0:2d} ] '.format(triada[1]) + ' = ', s[triada[0]:triada[1]]
elif triada[0] != None and triada[1] == None:
# 10
print s+'[{0:2d} : ] '.format(triada[0]) + ' = ', s[triada[0]:triada[1]]
elif triada[0] != None and triada[1] != None:
# 11
print s+'[{0:2d} :{1:2d} ] '.format(triada[0],triada[1]) + ' = ', s[triada[0]:triada[1]]
elif len(triada) == 1:
print s+'[{0:2d} ] '.format(triada[0]) + ' = ', s[triada[0]]
if __name__ == '__main__':
# Change "s" to what ever string you like, make it 9 characters for
# better representation.
s = 'COMPUTERS'
# add to this list different lists to experement with indexes
# to represent ex. s[::], use s[None, None,None], otherwise you get an error
# for s[2:] use s[2:None]
lista = [[4,7],[2,5,2],[-5,1,-1],[4],[-4,-6,-1], [2,-3,1],[2,-3,-1], [None,None,-1],[-5,None],[-5,0,-1],[-5,None,-1],[-1,1,-2]]
slicegraphical(s, lista)
You can run this script and experiment with it, below is some samples that I got from the script.
+---+---+---+---+---+---+---+---+---+
| C | O | M | P | U | T | E | R | S |
+---+---+---+---+---+---+---+---+---+
0 1 2 3 4 5 6 7 8 9
-9 -8 -7 -6 -5 -4 -3 -2 -1
COMPUTERS[ 4 : 7 ] = UTE
COMPUTERS[ 2 : 5 : 2 ] = MU
COMPUTERS[-5 : 1 :-1 ] = UPM
COMPUTERS[ 4 ] = U
COMPUTERS[-4 :-6 :-1 ] = TU
COMPUTERS[ 2 :-3 : 1 ] = MPUT
COMPUTERS[ 2 :-3 :-1 ] =
COMPUTERS[ : :-1 ] = SRETUPMOC
COMPUTERS[-5 : ] = UTERS
COMPUTERS[-5 : 0 :-1 ] = UPMO
COMPUTERS[-5 : :-1 ] = UPMOC
COMPUTERS[-1 : 1 :-2 ] = SEUM
[Finished in 0.9s]
When using a negative step, notice that the answer is shifted to the right by 1.
My brain seems happy to accept that lst[start:end] contains the start-th item. I might even say that it is a 'natural assumption'.
But occasionally a doubt creeps in and my brain asks for reassurance that it does not contain the end-th element.
In these moments I rely on this simple theorem:
for any n, lst = lst[:n] + lst[n:]
This pretty property tells me that lst[start:end] does not contain the end-th item because it is in lst[end:].
Note that this theorem is true for any n at all. For example, you can check that
lst = range(10)
lst[:-42] + lst[-42:] == lst
returns True.
In Python, the most basic form for slicing is the following:
l[start:end]
where l is some collection, start is an inclusive index, and end is an exclusive index.
In [1]: l = list(range(10))
In [2]: l[:5] # First five elements
Out[2]: [0, 1, 2, 3, 4]
In [3]: l[-5:] # Last five elements
Out[3]: [5, 6, 7, 8, 9]
When slicing from the start, you can omit the zero index, and when slicing to the end, you can omit the final index since it is redundant, so do not be verbose:
In [5]: l[:3] == l[0:3]
Out[5]: True
In [6]: l[7:] == l[7:len(l)]
Out[6]: True
Negative integers are useful when doing offsets relative to the end of a collection:
In [7]: l[:-1] # Include all elements but the last one
Out[7]: [0, 1, 2, 3, 4, 5, 6, 7, 8]
In [8]: l[-3:] # Take the last three elements
Out[8]: [7, 8, 9]
It is possible to provide indices that are out of bounds when slicing such as:
In [9]: l[:20] # 20 is out of index bounds, and l[20] will raise an IndexError exception
Out[9]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [11]: l[-20:] # -20 is out of index bounds, and l[-20] will raise an IndexError exception
Out[11]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Keep in mind that the result of slicing a collection is a whole new collection. In addition, when using slice notation in assignments, the length of the slice assignments do not need to be the same. The values before and after the assigned slice will be kept, and the collection will shrink or grow to contain the new values:
In [16]: l[2:6] = list('abc') # Assigning fewer elements than the ones contained in the sliced collection l[2:6]
In [17]: l
Out[17]: [0, 1, 'a', 'b', 'c', 6, 7, 8, 9]
In [18]: l[2:5] = list('hello') # Assigning more elements than the ones contained in the sliced collection l [2:5]
In [19]: l
Out[19]: [0, 1, 'h', 'e', 'l', 'l', 'o', 6, 7, 8, 9]
If you omit the start and end index, you will make a copy of the collection:
In [14]: l_copy = l[:]
In [15]: l == l_copy and l is not l_copy
Out[15]: True
If the start and end indexes are omitted when performing an assignment operation, the entire content of the collection will be replaced with a copy of what is referenced:
In [20]: l[:] = list('hello...')
In [21]: l
Out[21]: ['h', 'e', 'l', 'l', 'o', '.', '.', '.']
Besides basic slicing, it is also possible to apply the following notation:
l[start:end:step]
where l is a collection, start is an inclusive index, end is an exclusive index, and step is a stride that can be used to take every nth item in l.
In [22]: l = list(range(10))
In [23]: l[::2] # Take the elements which indexes are even
Out[23]: [0, 2, 4, 6, 8]
In [24]: l[1::2] # Take the elements which indexes are odd
Out[24]: [1, 3, 5, 7, 9]
Using step provides a useful trick to reverse a collection in Python:
In [25]: l[::-1]
Out[25]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
It is also possible to use negative integers for step as the following example:
In[28]: l[::-2]
Out[28]: [9, 7, 5, 3, 1]
However, using a negative value for step could become very confusing. Moreover, in order to be Pythonic, you should avoid using start, end, and step in a single slice. In case this is required, consider doing this in two assignments (one to slice, and the other to stride).
In [29]: l = l[::2] # This step is for striding
In [30]: l
Out[30]: [0, 2, 4, 6, 8]
In [31]: l = l[1:-1] # This step is for slicing
In [32]: l
Out[32]: [2, 4, 6]
I want to add one Hello, World! example that explains the basics of slices for the very beginners. It helped me a lot.
Let's have a list with six values ['P', 'Y', 'T', 'H', 'O', 'N']:
+---+---+---+---+---+---+
| P | Y | T | H | O | N |
+---+---+---+---+---+---+
0 1 2 3 4 5
Now the simplest slices of that list are its sublists. The notation is [<index>:<index>] and the key is to read it like this:
[ start cutting before this index : end cutting before this index ]
Now if you make a slice [2:5] of the list above, this will happen:
| |
+---+---|---+---+---|---+
| P | Y | T | H | O | N |
+---+---|---+---+---|---+
0 1 | 2 3 4 | 5
You made a cut before the element with index 2 and another cut before the element with index 5. So the result will be a slice between those two cuts, a list ['T', 'H', 'O'].
Most of the previous answers clears up questions about slice notation.
The extended indexing syntax used for slicing is aList[start:stop:step], and basic examples are:
:
More slicing examples: 15 Extended Slices
The below is the example of an index of a string:
+---+---+---+---+---+
| H | e | l | p | A |
+---+---+---+---+---+
0 1 2 3 4 5
-5 -4 -3 -2 -1
str="Name string"
Slicing example: [start:end:step]
str[start:end] # Items start through end-1
str[start:] # Items start through the rest of the array
str[:end] # Items from the beginning through end-1
str[:] # A copy of the whole array
Below is the example usage:
print str[0] = N
print str[0:2] = Na
print str[0:7] = Name st
print str[0:7:2] = Nm t
print str[0:-1:2] = Nm ti
If you feel negative indices in slicing is confusing, here's a very easy way to think about it: just replace the negative index with len - index. So for example, replace -3 with len(list) - 3.
The best way to illustrate what slicing does internally is just show it in code that implements this operation:
def slice(list, start = None, end = None, step = 1):
# Take care of missing start/end parameters
start = 0 if start is None else start
end = len(list) if end is None else end
# Take care of negative start/end parameters
start = len(list) + start if start < 0 else start
end = len(list) + end if end < 0 else end
# Now just execute a for-loop with start, end and step
return [list[i] for i in range(start, end, step)]
I don't think that the Python tutorial diagram (cited in various other answers) is good as this suggestion works for positive stride, but does not for a negative stride.
This is the diagram:
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
0 1 2 3 4 5 6
-6 -5 -4 -3 -2 -1
From the diagram, I expect a[-4,-6,-1] to be yP but it is ty.
>>> a = "Python"
>>> a[2:4:1] # as expected
'th'
>>> a[-4:-6:-1] # off by 1
'ty'
What always work is to think in characters or slots and use indexing as a half-open interval -- right-open if positive stride, left-open if negative stride.
This way, I can think of a[-4:-6:-1] as a(-6,-4] in interval terminology.
+---+---+---+---+---+---+
| P | y | t | h | o | n |
+---+---+---+---+---+---+
0 1 2 3 4 5
-6 -5 -4 -3 -2 -1
+---+---+---+---+---+---+---+---+---+---+---+---+
| P | y | t | h | o | n | P | y | t | h | o | n |
+---+---+---+---+---+---+---+---+---+---+---+---+
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5

Python- converting to float using map()

If I have a list like the one given below, and want to convert the given integers to float, is there a way I can do it with map()?
my_list=[['Hello', 1 , 3],['Hi', 4 , 6]]
It's ugly, but yeah, it can be done:
my_list=[['Hello', 1 , 3],['Hi', 4 , 6]]
print(list(map(lambda lst: list(map(lambda x: float(x) if type(x) == int else x, lst)), my_list)))
# Output: [['Hello', 1.0, 3.0], ['Hi', 4.0, 6.0]]
EDIT
I personally prefer list comprehension in most cases as opposed to map/lambda, especially here where all the conversions to lists are annoying:
print([
[float(x) if type(x) == int else x for x in lst]
for lst in my_list
])
EDIT 2
And finally, an in-place solution that uses good old for loops:
for lst in my_list:
for i, item in enumerate(lst):
if type(item) == int:
lst[i] = float(item)
print(my_list)
If you are going to use map, you can increase readability by using it with previously defined functions. First make a function which converts an object to a float if possible, and leaves it alone if not:
def toFloat(x):
try:
return float(x)
except:
return x
Then you can map this across a list like:
def listToFloats(items):
return list(map(toFloat,items))
And then map this last function across a list of lists:
def listsToFloats(lists):
return list(map(listToFloats,lists))
For example:
>>> my_list=[['Hello', 1 , 3],['Hi', 4 , 6]]
>>> listsToFloats(my_list)
[['Hello', 1.0, 3.0], ['Hi', 4.0, 6.0]]

Resources