Find all root of an equation between a given range - python-3.x

I am trying to find all the root between a range of an equation as:
def f(x):
return np.tan(x) - 3*x
from scipy.optimize import fsolve
In [14]: fsolve(f,0)
Out[14]: array([ 0.]) # one of the root of the eqn
But for any other initial guess, it gives 0 unless the initial guess is very close to the root.
In [15]: fsolve(f, 2)
Out[15]: array([ 0.]) # expected ouptut 1.32419445
In [16]: fsolve(f,[1.32])
Out[16]: array([ 1.32419445])
In [17]: fsolve(f, 5)
Out[17]: array([ 0.]) # expected ouptut 4.64068363
In [18]: fsolve(f,[4.64])
Out[18]: array([ 4.64068363])
Is there any way to find all the root between a given range?

Every function, like a piece of wood, has its own "grain" that can present problems when working with it. One of my favorite methods is to rearrange the expression to get rid of the variable in the denominator. In your case, solving as sin(x)-3*x*cos(x) has a much better behavior:
>>> [nsolve(sin(x)-3*x*cos(x),i).n(2) for i in range(10)]
[0, 1.3, 1.3, -1.3, 4.6, 4.6, 1.3, 7.8, 7.8, 7.8]
Continuation is also a useful method for ill behaved functions. In this case, using a parameter to slowly turn on the ill-behaved part of the function can be useful. In your case, the x in 3*x*cos(x) makes things more difficult. But if you divide by the approximate value you are seeking and slowly change that divisor to 1 you can follow the approximate root to the desired root. Here is an example:
>>> a = 0.
>>> for j in range(5):
... for i in range(10):
... a = nsolve(sin(x)-3*x*cos(x)/(a + i*(1-a)/9),a)
... print(a)
... a += pi.n()+0.1
...
0
4.64068363077555
7.81133447513087
10.9651844009289
14.1135533715145

If a simple binary search suits you, and you can provide x's for which f has different signs, the following might help:
from math import *
eps = 1e-20
def test_func(x):
return tan(x) - 3*x
def find_root(f, a, b):
for i in range(20):
x = i / 10.0
print(x, f(x))
fa = f(a)
fb = f(b)
if fa*fb > 0:
raise ("f(a) and f(b) need to have different signs")
while True:
if fabs(fa) < eps:
return a
elif fabs(fb) < eps:
return b
else:
m = (a + b) / 2
fm = f(m)
if m == a or m == b:
return m
if fa*fm > 0:
a, fa = m, fm
else:
b, fb = m, fm
r = find_root(test_func, 1.0, 1.5)
print (r, test_func(r)) # 1.324194449575503 3.1086244689504383e-15

Related

Is possible to convert multiple if <n in interval> into an array?

Code like this
def IsEven(n):
if n%2==0:
return "Is even"
else:
return "Is odd"
can be converted to a function like this
def isEven(n):
return ["Is even","Is odd"][n%2==0]
The question is if code like this:
def intervalsToOutput(n):
intervals=[(x1,x2), (x3,x4), (x5,x6), ... (xn,xn+1)]
if x1<=n<=x2:
return "in first interval"
elif x3<=n<=x4:
return "in second interval"
elif x5<=n<=x6:
return "in third interval"
...
elif xn<=n<=xn+1:
return "in last interval"
... can be replaced efficiently with a function like this:
(assuming the intervals (xi,xi+1) are not overlapping, and are sorted by xi)
def intervalsToOutput(n):
intervals=[(x1,x2), (x3,x4), (x5,x6), ... (xn,xn+1)]
answer=["in first interval","in second interval","in third interval",...,"in last interval"]
return answer[index of (n in Interval)]
The best I made (looking for speed) is
def intervalsToOutput(n):
intervals=[(x1,x2), (x3,x4), (x5,x6), ... (xn,xn+1)]
answer=["in first interval","in second interval","in third interval",...,"in last interval"]
import bisect as bisect
return answer[bisect.bisect_left(intervals, (n, )) - 1]# -1 because it is a zero based list
What bisect_left does is to find the position of n in intervals (in O(log(n)) time) by comparing (n,) with (xn,xn+1)
But the purpose of bisect is to find the insertion place, so it fails for any n which is x_{i+1}<n<=x_{j}
... (x_i, x_{i+1}), (x_j, x_{j+1}) ...
[x_i______x_i+1] fails on this interval x_j [________x_j+1]
EDIT:
This is a solution using intervaltree instead of bisect, but it has some overhead, because intervaltree returns a set, which may be empty
(I also accept faster/more elegant solutions based on this code)
#generate intervals and answers
import random
min=0
max=20
numIntervals=6
def Intervals(min,max,n):
randInts = random.sample(range(min, max), n * 2)
randInts.sort()
return [(x1,x2) for x1,x2 in zip(randInts[::2], randInts[1::2])]
intervals=Intervals(min,max,numIntervals)
answer=[f"In {n}th interval" for n in range(len(intervals))]
#Implement solution
import intervaltree
tree = intervaltree.IntervalTree()
[tree.addi(i[0],i[1],a) for i,a in zip(intervals,answer)]
def intervalsToOutput(x):
answer=tree.at(x)
if len(answer)>0:
return answer.pop().data
return "value not found"
print(intervals)
[print(f"{x} is ",intervalsToOutput(x)) for x in random.sample(range(min,max),numIntervals)]
I would do this:
intervals=[(1,2),(6,8),(22,45),(101,110)]
tgt=27
idx=next(i for i,t in enumerate(intervals) if t[0]<=tgt<=t[1])
>>> idx
2
If you want to catch a not-found condition, either use try / except:
try:
idx=next(i for i,t in enumerate(intervals) if t[0]<=tgt<=t[1])
except StopIteration:
# not found
Or use the default form of next:
idx=next((i for i,t in enumerate(intervals) if t[0]<=tgt<=t[1]), None)
Fair point regarding this method being slow. Here is a monumentally faster way.
You can take the source of bisect and modify with custom logic to inspect each tuple in the list see if the condition of t[0] <= x <= t[1] and return None if that condition cannot be satisfied.
Given a list of tuples of this form:
[(0, 3), (5, 10), (13, 17)] ... [(69735, 69739), (69742, 69746), (69749, 69752)]
You can have a custom bisect search to find a tuple that satisfies the condition stated:
def bisect_search(a, x, lo=0, hi=None):
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None or hi>len(a):
hi = len(a)
while lo < hi:
mid = (lo + hi) // 2
f_mid=-1 if x<a[mid][0] else 1 if x>a[mid][-1] else 0
if f_mid < 0:
hi = mid
elif f_mid > 0:
lo = mid + 1
else:
return mid
return None
Here is a benchmark for those two methods:
import random
import time
def gen_tuples(cnt, span=(1,5)):
li=[]
lo,hi=0,random.randint(*span)
for _ in range(cnt):
li.append((lo,hi))
lo=li[-1][-1]+random.randint(*span)
hi=lo+random.randint(span[0]+2,span[-1])
rando=random.randint(cnt//2,cnt-1)
to_find=li[rando][random.randint(0,len(li[rando])-1)]
return (rando,to_find,li)
def next_(a, x):
return next((i for i,t in enumerate(a) if t[0] <= x <=t [1]), None)
def bisect_search(a, x, lo=0, hi=None):
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None or hi>len(a):
hi = len(a)
while lo < hi:
mid = (lo + hi) // 2
f_mid=-1 if x<a[mid][0] else 1 if x>a[mid][-1] else 0
if f_mid < 0:
hi = mid
elif f_mid > 0:
lo = mid + 1
else:
return mid
return None
def cmpthese(funcs, args=(), cnt=1000, rate=True, micro=True):
"""Generate a Perl style function benchmark"""
def pprint_table(table):
"""Perl style table output"""
def format_field(field, fmt='{:,.0f}'):
if type(field) is str: return field
if type(field) is tuple: return field[1].format(field[0])
return fmt.format(field)
def get_max_col_w(table, index):
return max([len(format_field(row[index])) for row in table])
col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
for i,row in enumerate(table):
# left col
row_tab=[row[0].ljust(col_paddings[0])]
# rest of the cols
row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
print(' '.join(row_tab))
results={}
for i in range(cnt):
for f in funcs:
start=time.perf_counter_ns()
f(*args)
stop=time.perf_counter_ns()
results.setdefault(f.__name__, []).append(stop-start)
results={k:float(sum(v))/len(v) for k,v in results.items()}
fastest=sorted(results,key=results.get, reverse=True)
table=[['']]
if rate: table[0].append('rate/sec')
if micro: table[0].append('\u03bcsec/pass')
table[0].extend(fastest)
for e in fastest:
tmp=[e]
if rate:
tmp.append('{:,}'.format(int(round(float(cnt)*1000000.0/results[e]))))
if micro:
tmp.append('{:,.1f}'.format(results[e]/float(cnt)))
for x in fastest:
if x==e: tmp.append('--')
else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e]))
table.append(tmp)
pprint_table(table)
if __name__=='__main__':
import sys
print(sys.version)
cases=(
('small, found', True, 100),
('small, not found', False, 100),
('large, found', True, 10000),
('large, not found', False, 10000)
)
for txt, f, cnt in cases:
rando,to_find,li=gen_tuples(cnt)
tgt=to_find if f else -1
args=(li, tgt)
f1,f2=bisect_search(*args), next_(*args)
print(f'\n{txt}: {f1}, {f2}')
cmpthese([next_,bisect_search], args)
Prints:
3.10.2 (main, Feb 2 2022, 06:19:27) [Clang 13.0.0 (clang-1300.0.29.3)]
small, found: 84, 84
rate/sec μsec/pass next_ bisect_search
next_ 121,533 8.2 -- -82.0%
bisect_search 675,074 1.5 455.5% --
small, not found: None, None
rate/sec μsec/pass next_ bisect_search
next_ 155,885 6.4 -- -81.6%
bisect_search 847,744 1.2 443.8% --
large, found: 8824, 8824
rate/sec μsec/pass next_ bisect_search
next_ 1,232 811.9 -- -99.6%
bisect_search 299,677 3.3 24230.9% --
large, not found: None, None
rate/sec μsec/pass next_ bisect_search
next_ 1,524 656.0 -- -99.6%
bisect_search 340,112 2.9 22211.5% --
Which shows that the bisect_search method is substantially faster.
Note: It is tempting to try and use the the key in Python 3.10 bisect to do the same thing. This is not possible since the upper end of the range of the target tuple is not known and the method of (x,) is only comparing to the bottom of the range of the target tuple.
Well here's one way, but don't do this (for readability, and probably performance). You should really just chain if,elif,andelse statements (or use a for loop):
def really_esoteric(n):
return ["first", "second", "n"][
[n in range(0,10),
n in range(11,16),
n in range(16,30)].index(True)
]
REPL:
>>> really_esoteric(5)
'first'
>>> really_esoteric(13)
'second'
>>> really_esoteric(20)
'n'
Another way of going about this:
from bisect import bisect
def search_ranges(n, intervals):
# Get the index on an insertion
index = bisect(intervals, (n,))
# Get the low and high values for the interval
lower_low, lower_high = intervals[index - 1]
upper_low, upper_high = intervals[index % len(intervals)]
# Test if n is in the interval
if n in range(lower_low, lower_high):
return f"{n} in {intervals[index-1]}"
elif n in range(upper_low, upper_high):
return f"{n} in {intervals[index]}"
else:
return f"{n} not in an interval"
print(search_ranges(0, [(0, 10)]))
print(search_ranges(4, [(0, 10)]))
print(search_ranges(11, [(0, 10), (11,20)]))
print(search_ranges(12, [(0, 10), (15,20)]))
print(search_ranges(19, [(0, 10), (11,20)]))
print(search_ranges(11, [(0, 10), (11, 20), (21, 30)]))
Outputs:
0 in (0, 10)
4 in (0, 10)
11 in (11, 20)
12 not in an interval
19 in (11, 20)
11 in (11, 20)
Are the intervals random? i.e. if the range is 0-10 and each there are 5 intervals (0-2)(2-4)(4-6)(6-8)(8-10)
Just do something like
def func(n):
spacing = 2
return n // 2

find sequence of elements in numpy array [duplicate]

In Python or NumPy, what is the best way to find out the first occurrence of a subarray?
For example, I have
a = [1, 2, 3, 4, 5, 6]
b = [2, 3, 4]
What is the fastest way (run-time-wise) to find out where b occurs in a? I understand for strings this is extremely easy, but what about for a list or numpy ndarray?
Thanks a lot!
[EDITED] I prefer the numpy solution, since from my experience numpy vectorization is much faster than Python list comprehension. Meanwhile, the big array is huge, so I don't want to convert it into a string; that will be (too) long.
I'm assuming you're looking for a numpy-specific solution, rather than a simple list comprehension or for loop. One straightforward approach is to use the rolling window technique to search for windows of the appropriate size.
This approach is simple, works correctly, and is much faster than any pure Python solution. It should be sufficient for many use cases. However, it is not the most efficient approach possible, for a number of reasons. For an approach that is more complicated, but asymptotically optimal in the expected case, see the numba-based rolling hash implementation in norok2's answer.
Here's the rolling_window function:
>>> def rolling_window(a, size):
... shape = a.shape[:-1] + (a.shape[-1] - size + 1, size)
... strides = a.strides + (a. strides[-1],)
... return numpy.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
...
Then you could do something like
>>> a = numpy.arange(10)
>>> numpy.random.shuffle(a)
>>> a
array([7, 3, 6, 8, 4, 0, 9, 2, 1, 5])
>>> rolling_window(a, 3) == [8, 4, 0]
array([[False, False, False],
[False, False, False],
[False, False, False],
[ True, True, True],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
To make this really useful, you'd have to reduce it along axis 1 using all:
>>> numpy.all(rolling_window(a, 3) == [8, 4, 0], axis=1)
array([False, False, False, True, False, False, False, False], dtype=bool)
Then you could use that however you'd use a boolean array. A simple way to get the index out:
>>> bool_indices = numpy.all(rolling_window(a, 3) == [8, 4, 0], axis=1)
>>> numpy.mgrid[0:len(bool_indices)][bool_indices]
array([3])
For lists you could adapt one of these rolling window iterators to use a similar approach.
For very large arrays and subarrays, you could save memory like this:
>>> windows = rolling_window(a, 3)
>>> sub = [8, 4, 0]
>>> hits = numpy.ones((len(a) - len(sub) + 1,), dtype=bool)
>>> for i, x in enumerate(sub):
... hits &= numpy.in1d(windows[:,i], [x])
...
>>> hits
array([False, False, False, True, False, False, False, False], dtype=bool)
>>> hits.nonzero()
(array([3]),)
On the other hand, this will probably be somewhat slower.
The following code should work:
[x for x in xrange(len(a)) if a[x:x+len(b)] == b]
Returns the index at which the pattern starts.
(EDITED to include a deeper discussion, better code and more benchmarks)
Summary
For raw speed and efficiency, one can use a Cython or Numba accelerated version (when the input is a Python sequence or a NumPy array, respectively) of one of the classical algorithms.
The recommended approaches are:
find_kmp_cy() for Python sequences (list, tuple, etc.)
find_kmp_nb() for NumPy arrays
Other efficient approaches, are find_rk_cy() and find_rk_nb() which, are more memory efficient but are not guaranteed to run in linear time.
If Cython / Numba are not available, again both find_kmp() and find_rk() are a good all-around solution for most use cases, although in the average case and for Python sequences, the naïve approach, in some form, notably find_pivot(), may be faster. For NumPy arrays, find_conv() (from #Jaime answer) outperforms any non-accelerated naïve approach.
(Full code is below, and here and there.)
Theory
This is a classical problem in computer science that goes by the name of string-searching or string matching problem.
The naive approach, based on two nested loops, has a computational complexity of O(n + m) on average, but worst case is O(n m).
Over the years, a number of alternative approaches have been developed which guarantee a better worst case performances.
Of the classical algorithms, the ones that can be best suited to generic sequences (since they do not rely on an alphabet) are:
the naïve algorithm (basically consisting of two nested loops)
the Knuth–Morris–Pratt (KMP) algorithm
the Rabin-Karp (RK) algorithm
This last algorithm relies on the computation of a rolling hash for its efficiency and therefore may require some additional knowledge of the input for optimal performance.
Eventually, it is best suited for homogeneous data, like for example numeric arrays.
A notable example of numeric arrays in Python is, of course, NumPy arrays.
Remarks
The naïve algorithm, by being so simple, lends itself to different implementations with various degrees of run-time speed in Python.
The other algorithms are less flexible in what can be optimized via language tricks.
Explicit looping in Python may be a speed bottleneck and several tricks can be used to perform the looping outside of the interpreter.
Cython is especially good at speeding up explicit loops for generic Python code.
Numba is especially good at speeding up explicit loops on NumPy arrays.
This is an excellent use-case for generators, so all the code will be using those instead of regular functions.
Python Sequences (list, tuple, etc.)
Based on the Naïve Algorithm
find_loop(), find_loop_cy() and find_loop_nb() which are the explicit-loop only implementation in pure Python, Cython and with Numba JITing respectively. Note the forceobj=True in the Numba version, which is required because we are using Python object inputs.
def find_loop(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
found = True
for j in range(m):
if seq[i + j] != subseq[j]:
found = False
break
if found:
yield i
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
def find_loop_cy(seq, subseq):
cdef Py_ssize_t n = len(seq)
cdef Py_ssize_t m = len(subseq)
for i in range(n - m + 1):
found = True
for j in range(m):
if seq[i + j] != subseq[j]:
found = False
break
if found:
yield i
find_loop_nb = nb.jit(find_loop, forceobj=True)
find_loop_nb.__name__ = 'find_loop_nb'
find_all() replaces the inner loop with all() on a comprehension generator
def find_all(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if all(seq[i + j] == subseq[j] for j in range(m)):
yield i
find_slice() replaces the inner loop with direct comparison == after slicing []
def find_slice(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i:i + m] == subseq:
yield i
find_mix() and find_mix2() replaces the inner loop with direct comparison == after slicing [] but includes one or two additional short-circuiting on the first (and last) character which may be faster because slicing with an int is much faster than slicing with a slice().
def find_mix(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i] == subseq[0] and seq[i:i + m] == subseq:
yield i
def find_mix2(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i] == subseq[0] and seq[i + m - 1] == subseq[m - 1] \
and seq[i:i + m] == subseq:
yield i
find_pivot() and find_pivot2() replace the outer loop with multiple .index() call using the first item of the sub-sequence, while using slicing for the inner loop, eventually with additional short-circuiting on the last item (the first matches by construction). The multiple .index() calls are wrapped in a index_all() generator (which may be useful on its own).
def index_all(seq, item, start=0, stop=-1):
try:
n = len(seq)
if n > 0:
start %= n
stop %= n
i = start
while True:
i = seq.index(item, i)
if i <= stop:
yield i
i += 1
else:
return
else:
return
except ValueError:
pass
def find_pivot(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
for i in index_all(seq, subseq[0], 0, n - m):
if seq[i:i + m] == subseq:
yield i
def find_pivot2(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
for i in index_all(seq, subseq[0], 0, n - m):
if seq[i + m - 1] == subseq[m - 1] and seq[i:i + m] == subseq:
yield i
Based on Knuth–Morris–Pratt (KMP) Algorithm
find_kmp() is a plain Python implementation of the algorithm. Since there is no simple looping or places where one could use slicing with a slice(), there is not much to be done for optimization, except using Cython (Numba would require again forceobj=True which would lead to slow code).
def find_kmp(seq, subseq):
n = len(seq)
m = len(subseq)
# : compute offsets
offsets = [0] * m
j = 1
k = 0
while j < m:
if subseq[j] == subseq[k]:
k += 1
offsets[j] = k
j += 1
else:
if k != 0:
k = offsets[k - 1]
else:
offsets[j] = 0
j += 1
# : find matches
i = j = 0
while i < n:
if seq[i] == subseq[j]:
i += 1
j += 1
if j == m:
yield i - j
j = offsets[j - 1]
elif i < n and seq[i] != subseq[j]:
if j != 0:
j = offsets[j - 1]
else:
i += 1
find_kmp_cy() is Cython implementation of the algorithm where the indices use C int data type, which result in much faster code.
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
def find_kmp_cy(seq, subseq):
cdef Py_ssize_t n = len(seq)
cdef Py_ssize_t m = len(subseq)
# : compute offsets
offsets = [0] * m
cdef Py_ssize_t j = 1
cdef Py_ssize_t k = 0
while j < m:
if subseq[j] == subseq[k]:
k += 1
offsets[j] = k
j += 1
else:
if k != 0:
k = offsets[k - 1]
else:
offsets[j] = 0
j += 1
# : find matches
cdef Py_ssize_t i = 0
j = 0
while i < n:
if seq[i] == subseq[j]:
i += 1
j += 1
if j == m:
yield i - j
j = offsets[j - 1]
elif i < n and seq[i] != subseq[j]:
if j != 0:
j = offsets[j - 1]
else:
i += 1
Based on Rabin-Karp (RK) Algorithm
find_rk() is a pure Python implementation, which relies on Python's hash() for the computation (and comparison) of the hash. Such hash is made rolling by mean of a simple sum(). The roll-over is then computed from the previous hash by subtracting the result of hash() on the just visited item seq[i - 1] and adding up the result of hash() on the newly considered item seq[i + m - 1].
def find_rk(seq, subseq):
n = len(seq)
m = len(subseq)
if seq[:m] == subseq:
yield 0
hash_subseq = sum(hash(x) for x in subseq) # compute hash
curr_hash = sum(hash(x) for x in seq[:m]) # compute hash
for i in range(1, n - m + 1):
curr_hash += hash(seq[i + m - 1]) - hash(seq[i - 1]) # update hash
if hash_subseq == curr_hash and seq[i:i + m] == subseq:
yield i
find_rk_cy() is Cython implementation of the algorithm where the indices use the appropriate C data type, which results in much faster code. Note that hash() truncates "the return value based on the bit width of the host machine."
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
def find_rk_cy(seq, subseq):
cdef Py_ssize_t n = len(seq)
cdef Py_ssize_t m = len(subseq)
if seq[:m] == subseq:
yield 0
cdef Py_ssize_t hash_subseq = sum(hash(x) for x in subseq) # compute hash
cdef Py_ssize_t curr_hash = sum(hash(x) for x in seq[:m]) # compute hash
cdef Py_ssize_t old_item, new_item
for i in range(1, n - m + 1):
old_item = hash(seq[i - 1])
new_item = hash(seq[i + m - 1])
curr_hash += new_item - old_item # update hash
if hash_subseq == curr_hash and seq[i:i + m] == subseq:
yield i
Benchmarks
The above functions are evaluated on two inputs:
random inputs
def gen_input(n, k=2):
return tuple(random.randint(0, k - 1) for _ in range(n))
(almost) worst inputs for the naïve algorithm
def gen_input_worst(n, k=-2):
result = [0] * n
result[k] = 1
return tuple(result)
The subseq has fixed size (32).
Since there are so many alternatives, two separate grouping have been done and some solutions with very small variations and almost identical timings have been omitted (i.e. find_mix2() and find_pivot2()).
For each group both inputs are tested.
For each benchmark the full plot and a zoom on the fastest approach is provided.
Naïve on Random
Naïve on Worst
Other on Random
Other on Worst
(Full code is available here.)
NumPy Arrays
Based on the Naïve Algorithm
find_loop(), find_loop_cy() and find_loop_nb() which are the explicit-loop only implementation in pure Python, Cython and with Numba JITing respectively. The code for the first two are the same as above and hence omitted. find_loop_nb() now enjoys fast JIT compilation. The inner loop has been written in a separate function because it can then be reused for find_rk_nb() (calling Numba functions inside Numba functions does not incur in the function call penalty typical of Python).
#nb.jit
def _is_equal_nb(seq, subseq, m, i):
for j in range(m):
if seq[i + j] != subseq[j]:
return False
return True
#nb.jit
def find_loop_nb(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if _is_equal_nb(seq, subseq, m, i):
yield i
find_all() is the same as above, while find_slice(), find_mix() and find_mix2() are almost identical to the above, the only difference is that seq[i:i + m] == subseq is now the argument of np.all(): np.all(seq[i:i + m] == subseq).
find_pivot() and find_pivot2() share the same ideas as above, except that now uses np.where() instead of index_all() and the need for enclosing the array equality inside an np.all() call.
def find_pivot(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
max_i = n - m
for i in np.where(seq == subseq[0])[0]:
if i > max_i:
return
elif np.all(seq[i:i + m] == subseq):
yield i
def find_pivot2(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
max_i = n - m
for i in np.where(seq == subseq[0])[0]:
if i > max_i:
return
elif seq[i + m - 1] == subseq[m - 1] \
and np.all(seq[i:i + m] == subseq):
yield i
find_rolling() express the looping via a rolling window and the matching is checked with np.all(). This vectorizes all the looping at the expenses of creating large temporary objects, while still substantially appling the naïve algorithm. (The approach is from #senderle answer).
def rolling_window(arr, size):
shape = arr.shape[:-1] + (arr.shape[-1] - size + 1, size)
strides = arr.strides + (arr.strides[-1],)
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
def find_rolling(seq, subseq):
bool_indices = np.all(rolling_window(seq, len(subseq)) == subseq, axis=1)
yield from np.mgrid[0:len(bool_indices)][bool_indices]
find_rolling2() is a slightly more memory efficient variation of the above, where the vectorization is only partial and one explicit looping (along the expected shortest dimension -- the length of subseq) is kept. (The approach is also from #senderle answer).
def find_rolling2(seq, subseq):
windows = rolling_window(seq, len(subseq))
hits = np.ones((len(seq) - len(subseq) + 1,), dtype=bool)
for i, x in enumerate(subseq):
hits &= np.in1d(windows[:, i], [x])
yield from hits.nonzero()[0]
Based on Knuth–Morris–Pratt (KMP) Algorithm
find_kmp() is the same as above, while find_kmp_nb() is a straightforward JIT-compilation of that.
find_kmp_nb = nb.jit(find_kmp)
find_kmp_nb.__name__ = 'find_kmp_nb'
Based on Rabin-Karp (RK) Algorithm
find_rk() is the same as the above, except that again seq[i:i + m] == subseq is enclosed in an np.all() call.
find_rk_nb() is the Numba accelerated version of the above. Uses _is_equal_nb() defined earlier to definitively determine a match, while for the hashing, it uses a Numba accelerated sum_hash_nb() function whose definition is pretty straightforward.
#nb.jit
def sum_hash_nb(arr):
result = 0
for x in arr:
result += hash(x)
return result
#nb.jit
def find_rk_nb(seq, subseq):
n = len(seq)
m = len(subseq)
if _is_equal_nb(seq, subseq, m, 0):
yield 0
hash_subseq = sum_hash_nb(subseq) # compute hash
curr_hash = sum_hash_nb(seq[:m]) # compute hash
for i in range(1, n - m + 1):
curr_hash += hash(seq[i + m - 1]) - hash(seq[i - 1]) # update hash
if hash_subseq == curr_hash and _is_equal_nb(seq, subseq, m, i):
yield i
find_conv() uses a pseudo Rabin-Karp method, where initial candidates are hashed using the np.dot() product and located on the convolution between seq and subseq with np.where(). The approach is pseudo because, while it still uses hashing to identify probable candidates, it is may not be regarded as a rolling hash (it depends on the actual implementation of np.correlate()). Also, it needs to create a temporary array the size of the input. (The approach is from #Jaime answer).
def find_conv(seq, subseq):
target = np.dot(subseq, subseq)
candidates = np.where(np.correlate(seq, subseq, mode='valid') == target)[0]
check = candidates[:, np.newaxis] + np.arange(len(subseq))
mask = np.all((np.take(seq, check) == subseq), axis=-1)
yield from candidates[mask]
Benchmarks
Like before, the above functions are evaluated on two inputs:
random inputs
def gen_input(n, k=2):
return np.random.randint(0, k, n)
(almost) worst inputs for the naïve algorithm
def gen_input_worst(n, k=-2):
result = np.zeros(n, dtype=int)
result[k] = 1
return result
The subseq has fixed size (32).
This plots follow the same scheme as before, summarized below for convenience.
Since there are so many alternatives, two separate grouping have been done and some solutions with very small variations and almost identical timings have been omitted (i.e. find_mix2() and find_pivot2()).
For each group both inputs are tested.
For each benchmark the full plot and a zoom on the fastest approach is provided.
Naïve on Random
Naïve on Worst
Other on Random
Other on Worst
(Full code is available here.)
A convolution based approach, that should be more memory efficient than the stride_tricks based approach:
def find_subsequence(seq, subseq):
target = np.dot(subseq, subseq)
candidates = np.where(np.correlate(seq,
subseq, mode='valid') == target)[0]
# some of the candidates entries may be false positives, double check
check = candidates[:, np.newaxis] + np.arange(len(subseq))
mask = np.all((np.take(seq, check) == subseq), axis=-1)
return candidates[mask]
With really big arrays it may not be possible to use a stride_tricks approach, but this one still works:
haystack = np.random.randint(1000, size=(1e6))
needle = np.random.randint(1000, size=(100,))
# Hide 10 needles in the haystack
place = np.random.randint(1e6 - 100 + 1, size=10)
for idx in place:
haystack[idx:idx+100] = needle
In [3]: find_subsequence(haystack, needle)
Out[3]:
array([253824, 321497, 414169, 456777, 635055, 879149, 884282, 954848,
961100, 973481], dtype=int64)
In [4]: np.all(np.sort(place) == find_subsequence(haystack, needle))
Out[4]: True
In [5]: %timeit find_subsequence(haystack, needle)
10 loops, best of 3: 79.2 ms per loop
you can call tostring() method to convert an array to string, and then you can use fast string search. this method maybe faster when you have many subarray to check.
import numpy as np
a = np.array([1,2,3,4,5,6])
b = np.array([2,3,4])
print a.tostring().index(b.tostring())//a.itemsize
Another try, but I'm sure there is more pythonic & efficent way to do that ...
def array_match(a, b):
for i in xrange(0, len(a)-len(b)+1):
if a[i:i+len(b)] == b:
return i
return None
a = [1, 2, 3, 4, 5, 6]
b = [2, 3, 4]
print array_match(a,b)
1
(This first answer was not in scope of the question, as cdhowie mentionned)
set(a) & set(b) == set(b)
Here is a rather straight-forward option:
def first_subarray(full_array, sub_array):
n = len(full_array)
k = len(sub_array)
matches = np.argwhere([np.all(full_array[start_ix:start_ix+k] == sub_array)
for start_ix in range(0, n-k+1)])
return matches[0]
Then using the original a, b vectors we get:
a = [1, 2, 3, 4, 5, 6]
b = [2, 3, 4]
first_subarray(a, b)
Out[44]:
array([1], dtype=int64)
Quick comparison of three of the proposed solutions (average time of 100 iteration for randomly created vectors.):
import time
import collections
import numpy as np
def function_1(seq, sub):
# direct comparison
seq = list(seq)
sub = list(sub)
return [i for i in range(len(seq) - len(sub)) if seq[i:i+len(sub)] == sub]
def function_2(seq, sub):
# Jamie's solution
target = np.dot(sub, sub)
candidates = np.where(np.correlate(seq, sub, mode='valid') == target)[0]
check = candidates[:, np.newaxis] + np.arange(len(sub))
mask = np.all((np.take(seq, check) == sub), axis=-1)
return candidates[mask]
def function_3(seq, sub):
# HYRY solution
return seq.tostring().index(sub.tostring())//seq.itemsize
# --- assessment time performance
N = 100
seq = np.random.choice([0, 1, 2, 3, 4, 5, 6], 3000)
sub = np.array([1, 2, 3])
tim = collections.OrderedDict()
tim.update({function_1: 0.})
tim.update({function_2: 0.})
tim.update({function_3: 0.})
for function in tim.keys():
for _ in range(N):
seq = np.random.choice([0, 1, 2, 3, 4], 3000)
sub = np.array([1, 2, 3])
start = time.time()
function(seq, sub)
end = time.time()
tim[function] += end - start
timer_dict = collections.OrderedDict()
for key, val in tim.items():
timer_dict.update({key.__name__: val / N})
print(timer_dict)
Which would result (on my old machine) in:
OrderedDict([
('function_1', 0.0008518099784851074),
('function_2', 8.157730102539063e-05),
('function_3', 6.124973297119141e-06)
])
First, convert the list to string.
a = ''.join(str(i) for i in a)
b = ''.join(str(i) for i in b)
After converting to string, you can easily find the index of substring with the following string function.
a.index(b)
Cheers!!

Automatically round arithmetic operations to eight decimals

I am doing some numerical analysis exercise where I need calculate solution of linear system using a specific algorithm. My answer differs from the answer of the book by some decimal places which I believe is due to rounding errors. Is there a way where I can automatically set arithmetic to round eight decimal places after each arithmetic operation? The following is my python code.
import numpy as np
A1 = [4, -1, 0, 0, -1, 4, -1, 0,\
0, -1, 4, -1, 0, 0, -1, 4]
A1 = np.array(A1).reshape([4,4])
I = -np.identity(4)
O = np.zeros([4,4])
A = np.block([[A1, I, O, O],
[I, A1, I, O],
[O, I, A1, I],
[O, O, I, A1]])
b = np.array([1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6])
def conj_solve(A, b, pre=False):
n = len(A)
C = np.identity(n)
if pre == True:
for i in range(n):
C[i, i] = np.sqrt(A[i, i])
Ci = np.linalg.inv(C)
Ct = np.transpose(Ci)
x = np.zeros(n)
r = b - np.matmul(A, x)
w = np.matmul(Ci, r)
v = np.matmul(Ct, w)
alpha = np.dot(w, w)
for i in range(MAX_ITER):
if np.linalg.norm(v, np.infty) < TOL:
print(i+1, "steps")
print(x)
print(r)
return
u = np.matmul(A, v)
t = alpha/np.dot(v, u)
x = x + t*v
r = r - t*u
w = np.matmul(Ci, r)
beta = np.dot(w, w)
if np.abs(beta) < TOL:
if np.linalg.norm(r, np.infty) < TOL:
print(i+1, "steps")
print(x)
print(r)
return
s = beta/alpha
v = np.matmul(Ct, w) + s*v
alpha = beta
print("Max iteration exceeded")
return x
MAX_ITER = 1000
TOL = 0.05
sol = conj_solve(A, b, pre=True)
Using this, I get 2.55516527 as first element of array which should be 2.55613420.
OR, is there a language/program where I can specify the precision of arithmetic?
Precision/rounding during the calculation is unlikely to be the issue.
To test this I ran the calculation with precisions that bracket the precision you are aiming for: once with np.float64, and once with np.float32. Here is a table of the printed results, their approximate decimal precision, and the result of the calculation (ie, the first printed array value).
numpy type decimal places result
-------------------------------------------------
np.float64 15 2.55516527
np.float32 6 2.5551653
Given that these are so much in agreement, I doubt an intermediate precision of 8 decimal places is going to give an answer that's not between these two results (ie, 2.55613420 that's off in the 4th digit).
This isn't part isn't part of my answer, but is a comment on using mpmath. The questioner suggested it in the comments, and it was my first thought too, so I ran a quick test to see if it behaved how I expected with low precision calculations. It didn't, so I abandoned it (but I'm not an expert with it).
Here's my test function, basically multiplying 1/N by N and 1/N repeatedly to emphasise the error in 1/N.
def precision_test(dps=100, N=19, t=mpmath.mpf):
with mpmath.workdps(dps):
x = t(1)/t(N)
print(x)
y = x
for i in range(10000):
y *= x
y *= N
print(y)
This works as expected with, eg, np.float32:
precision_test(dps=2, N=3, t=np.float32)
# 0.33333334
# 0.3334327041164994
Note that the error has propagated into more significant digits, as expected.
But with mpmath, I could never get that to happen (testing with a range of dps and a various prime N values):
precision_test(dps=2, N=3)
# 0.33
# 0.33
Because of this test, I decided mpmath is not going to give normal results for low precision calculations.
TL;DR:
mpmath didn't behave how I expected at low precision so I abandoned it.

String concatenation queries

I have a list of characters, say x in number, denoted by b[1], b[2], b[3] ... b[x]. After x,
b[x+1] is the concatenation of b[1],b[2].... b[x] in that order. Similarly,
b[x+2] is the concatenation of b[2],b[3]....b[x],b[x+1].
So, basically, b[n] will be concatenation of last x terms of b[i], taken left from right.
Given parameters as p and q as queries, how can I find out which character among b[1], b[2], b[3]..... b[x] does the qth character of b[p] corresponds to?
Note: x and b[1], b[2], b[3]..... b[x] is fixed for all queries.
I tried brute-forcing but the string length increases exponentially for large x.(x<=100).
Example:
When x=3,
b[] = a, b, c, a b c, b c abc, c abc bcabc, abc bcabc cabcbcabc, //....
//Spaces for clarity, only commas separate array elements
So for a query where p=7, q=5, answer returned would be 3(corresponding to character 'c').
I am just having difficulty figuring out the maths behind it. Language is no issue
I wrote this answer as I figured it out, so please bear with me.
As you mentioned, it is much easier to find out where the character at b[p][q] comes from among the original x characters than to generate b[p] for large p. To do so, we will use a loop to find where the current b[p][q] came from, thereby reducing p until it is between 1 and x, and q until it is 1.
Let's look at an example for x=3 to see if we can get a formula:
p N(p) b[p]
- ---- ----
1 1 a
2 1 b
3 1 c
4 3 a b c
5 5 b c abc
6 9 c abc bcabc
7 17 abc bcabc cabcbcabc
8 31 bcabc cabcbcabc abcbcabccabcbcabc
9 57 cabcbcabc abcbcabccabcbcabc bcabccabcbcabcabcbcabccabcbcabc
The sequence is clear: N(p) = N(p-1) + N(p-2) + N(p-3), where N(p) is the number of characters in the pth element of b. Given p and x, you can just brute-force compute all the N for the range [1, p]. This will allow you to figure out which prior element of b b[p][q] came from.
To illustrate, say x=3, p=9 and q=45.
The chart above gives N(6)=9, N(7)=17 and N(8)=31. Since 45>9+17, you know that b[9][45] comes from b[8][45-(9+17)] = b[8][19].
Continuing iteratively/recursively, 19>9+5, so b[8][19] = b[7][19-(9+5)] = b[7][5].
Now 5>N(4) but 5<N(4)+N(5), so b[7][5] = b[5][5-3] = b[5][2].
b[5][2] = b[3][2-1] = b[3][1]
Since 3 <= x, we have our termination condition, and b[9][45] is c from b[3].
Something like this can very easily be computed either recursively or iteratively given starting p, q, x and b up to x. My method requires p array elements to compute N(p) for the entire sequence. This can be allocated in an array or on the stack if working recursively.
Here is a reference implementation in vanilla Python (no external imports, although numpy would probably help streamline this):
def so38509640(b, p, q):
"""
p, q are integers. b is a char sequence of length x.
list, string, or tuple are all valid choices for b.
"""
x = len(b)
# Trivial case
if p <= x:
if q != 1:
raise ValueError('q={} out of bounds for p={}'.format(q, p))
return p, b[p - 1]
# Construct list of counts
N = [1] * p
for i in range(x, p):
N[i] = sum(N[i - x:i])
print('N =', N)
# Error check
if q > N[-1]:
raise ValueError('q={} out of bounds for p={}'.format(q, p))
print('b[{}][{}]'.format(p, q), end='')
# Reduce p, q until it is p < x
while p > x:
# Find which previous element character q comes from
offset = 0
for i in range(p - x - 1, p):
if i == p - 1:
raise ValueError('q={} out of bounds for p={}'.format(q, p))
if offset + N[i] >= q:
q -= offset
p = i + 1
print(' = b[{}][{}]'.format(p, q), end='')
break
offset += N[i]
print()
return p, b[p - 1]
Calling so38509640('abc', 9, 45) produces
N = [1, 1, 1, 3, 5, 9, 17, 31, 57]
b[9][45] = b[8][19] = b[7][5] = b[5][2] = b[3][1]
(3, 'c') # <-- Final answer
Similarly, for the example in the question, so38509640('abc', 7, 5) produces the expected result:
N = [1, 1, 1, 3, 5, 9, 17]
b[7][5] = b[5][2] = b[3][1]
(3, 'c') # <-- Final answer
Sorry I couldn't come up with a better function name :) This is simple enough code that it should work equally well in Py2 and 3, despite differences in the range function/class.
I would be very curious to see if there is a non-iterative solution for this problem. Perhaps there is a way of doing this using modular arithmetic or something...

Python-3.x range() with step in float format [duplicate]

How do I iterate between 0 and 1 by a step of 0.1?
This says that the step argument cannot be zero:
for i in range(0, 1, 0.1):
print(i)
Rather than using a decimal step directly, it's much safer to express this in terms of how many points you want. Otherwise, floating-point rounding error is likely to give you a wrong result.
Use the linspace function from the NumPy library (which isn't part of the standard library but is relatively easy to obtain). linspace takes a number of points to return, and also lets you specify whether or not to include the right endpoint:
>>> np.linspace(0,1,11)
array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
>>> np.linspace(0,1,10,endpoint=False)
array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
If you really want to use a floating-point step value, use numpy.arange:
>>> import numpy as np
>>> np.arange(0.0, 1.0, 0.1)
array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
Floating-point rounding error will cause problems, though. Here's a simple case where rounding error causes arange to produce a length-4 array when it should only produce 3 numbers:
>>> numpy.arange(1, 1.3, 0.1)
array([1. , 1.1, 1.2, 1.3])
range() can only do integers, not floating point.
Use a list comprehension instead to obtain a list of steps:
[x * 0.1 for x in range(0, 10)]
More generally, a generator comprehension minimizes memory allocations:
xs = (x * 0.1 for x in range(0, 10))
for x in xs:
print(x)
Building on 'xrange([start], stop[, step])', you can define a generator that accepts and produces any type you choose (stick to types supporting + and <):
>>> def drange(start, stop, step):
... r = start
... while r < stop:
... yield r
... r += step
...
>>> i0=drange(0.0, 1.0, 0.1)
>>> ["%g" % x for x in i0]
['0', '0.1', '0.2', '0.3', '0.4', '0.5', '0.6', '0.7', '0.8', '0.9', '1']
>>>
Increase the magnitude of i for the loop and then reduce it when you need it.
for i * 100 in range(0, 100, 10):
print i / 100.0
EDIT: I honestly cannot remember why I thought that would work syntactically
for i in range(0, 11, 1):
print i / 10.0
That should have the desired output.
NumPy is a bit overkill, I think.
[p/10 for p in range(0, 10)]
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
Generally speaking, to do a step-by-1/x up to y you would do
x=100
y=2
[p/x for p in range(0, int(x*y))]
[0.0, 0.01, 0.02, 0.03, ..., 1.97, 1.98, 1.99]
(1/x produced less rounding noise when I tested).
scipy has a built in function arange which generalizes Python's range() constructor to satisfy your requirement of float handling.
from scipy import arange
Similar to R's seq function, this one returns a sequence in any order given the correct step value. The last value is equal to the stop value.
def seq(start, stop, step=1):
n = int(round((stop - start)/float(step)))
if n > 1:
return([start + step*i for i in range(n+1)])
elif n == 1:
return([start])
else:
return([])
Results
seq(1, 5, 0.5)
[1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]
seq(10, 0, -1)
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
seq(10, 0, -2)
[10, 8, 6, 4, 2, 0]
seq(1, 1)
[ 1 ]
The range() built-in function returns a sequence of integer values, I'm afraid, so you can't use it to do a decimal step.
I'd say just use a while loop:
i = 0.0
while i <= 1.0:
print i
i += 0.1
If you're curious, Python is converting your 0.1 to 0, which is why it's telling you the argument can't be zero.
Here's a solution using itertools:
import itertools
def seq(start, end, step):
if step == 0:
raise ValueError("step must not be 0")
sample_count = int(abs(end - start) / step)
return itertools.islice(itertools.count(start, step), sample_count)
Usage Example:
for i in seq(0, 1, 0.1):
print(i)
[x * 0.1 for x in range(0, 10)]
in Python 2.7x gives you the result of:
[0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8, 0.9]
but if you use:
[ round(x * 0.1, 1) for x in range(0, 10)]
gives you the desired:
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
import numpy as np
for i in np.arange(0, 1, 0.1):
print i
Best Solution: no rounding error
>>> step = .1
>>> N = 10 # number of data points
>>> [ x / pow(step, -1) for x in range(0, N + 1) ]
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
Or, for a set range instead of set data points (e.g. continuous function), use:
>>> step = .1
>>> rnge = 1 # NOTE range = 1, i.e. span of data points
>>> N = int(rnge / step
>>> [ x / pow(step,-1) for x in range(0, N + 1) ]
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
To implement a function: replace x / pow(step, -1) with f( x / pow(step, -1) ), and define f.
For example:
>>> import math
>>> def f(x):
return math.sin(x)
>>> step = .1
>>> rnge = 1 # NOTE range = 1, i.e. span of data points
>>> N = int(rnge / step)
>>> [ f( x / pow(step,-1) ) for x in range(0, N + 1) ]
[0.0, 0.09983341664682815, 0.19866933079506122, 0.29552020666133955, 0.3894183423086505,
0.479425538604203, 0.5646424733950354, 0.644217687237691, 0.7173560908995228,
0.7833269096274834, 0.8414709848078965]
And if you do this often, you might want to save the generated list r
r=map(lambda x: x/10.0,range(0,10))
for i in r:
print i
more_itertools is a third-party library that implements a numeric_range tool:
import more_itertools as mit
for x in mit.numeric_range(0, 1, 0.1):
print("{:.1f}".format(x))
Output
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
This tool also works for Decimal and Fraction.
My versions use the original range function to create multiplicative indices for the shift. This allows same syntax to the original range function.
I have made two versions, one using float, and one using Decimal, because I found that in some cases I wanted to avoid the roundoff drift introduced by the floating point arithmetic.
It is consistent with empty set results as in range/xrange.
Passing only a single numeric value to either function will return the standard range output to the integer ceiling value of the input parameter (so if you gave it 5.5, it would return range(6).)
Edit: the code below is now available as package on pypi: Franges
## frange.py
from math import ceil
# find best range function available to version (2.7.x / 3.x.x)
try:
_xrange = xrange
except NameError:
_xrange = range
def frange(start, stop = None, step = 1):
"""frange generates a set of floating point values over the
range [start, stop) with step size step
frange([start,] stop [, step ])"""
if stop is None:
for x in _xrange(int(ceil(start))):
yield x
else:
# create a generator expression for the index values
indices = (i for i in _xrange(0, int((stop-start)/step)))
# yield results
for i in indices:
yield start + step*i
## drange.py
import decimal
from math import ceil
# find best range function available to version (2.7.x / 3.x.x)
try:
_xrange = xrange
except NameError:
_xrange = range
def drange(start, stop = None, step = 1, precision = None):
"""drange generates a set of Decimal values over the
range [start, stop) with step size step
drange([start,] stop, [step [,precision]])"""
if stop is None:
for x in _xrange(int(ceil(start))):
yield x
else:
# find precision
if precision is not None:
decimal.getcontext().prec = precision
# convert values to decimals
start = decimal.Decimal(start)
stop = decimal.Decimal(stop)
step = decimal.Decimal(step)
# create a generator expression for the index values
indices = (
i for i in _xrange(
0,
((stop-start)/step).to_integral_value()
)
)
# yield results
for i in indices:
yield float(start + step*i)
## testranges.py
import frange
import drange
list(frange.frange(0, 2, 0.5)) # [0.0, 0.5, 1.0, 1.5]
list(drange.drange(0, 2, 0.5, precision = 6)) # [0.0, 0.5, 1.0, 1.5]
list(frange.frange(3)) # [0, 1, 2]
list(frange.frange(3.5)) # [0, 1, 2, 3]
list(frange.frange(0,10, -1)) # []
Lots of the solutions here still had floating point errors in Python 3.6 and didnt do exactly what I personally needed.
Function below takes integers or floats, doesnt require imports and doesnt return floating point errors.
def frange(x, y, step):
if int(x + y + step) == (x + y + step):
r = list(range(int(x), int(y), int(step)))
else:
f = 10 ** (len(str(step)) - str(step).find('.') - 1)
rf = list(range(int(x * f), int(y * f), int(step * f)))
r = [i / f for i in rf]
return r
Suprised no-one has yet mentioned the recommended solution in the Python 3 docs:
See also:
The linspace recipe shows how to implement a lazy version of range that suitable for floating point applications.
Once defined, the recipe is easy to use and does not require numpy or any other external libraries, but functions like numpy.linspace(). Note that rather than a step argument, the third num argument specifies the number of desired values, for example:
print(linspace(0, 10, 5))
# linspace(0, 10, 5)
print(list(linspace(0, 10, 5)))
# [0.0, 2.5, 5.0, 7.5, 10]
I quote a modified version of the full Python 3 recipe from Andrew Barnert below:
import collections.abc
import numbers
class linspace(collections.abc.Sequence):
"""linspace(start, stop, num) -> linspace object
Return a virtual sequence of num numbers from start to stop (inclusive).
If you need a half-open range, use linspace(start, stop, num+1)[:-1].
"""
def __init__(self, start, stop, num):
if not isinstance(num, numbers.Integral) or num <= 1:
raise ValueError('num must be an integer > 1')
self.start, self.stop, self.num = start, stop, num
self.step = (stop-start)/(num-1)
def __len__(self):
return self.num
def __getitem__(self, i):
if isinstance(i, slice):
return [self[x] for x in range(*i.indices(len(self)))]
if i < 0:
i = self.num + i
if i >= self.num:
raise IndexError('linspace object index out of range')
if i == self.num-1:
return self.stop
return self.start + i*self.step
def __repr__(self):
return '{}({}, {}, {})'.format(type(self).__name__,
self.start, self.stop, self.num)
def __eq__(self, other):
if not isinstance(other, linspace):
return False
return ((self.start, self.stop, self.num) ==
(other.start, other.stop, other.num))
def __ne__(self, other):
return not self==other
def __hash__(self):
return hash((type(self), self.start, self.stop, self.num))
This is my solution to get ranges with float steps.
Using this function it's not necessary to import numpy, nor install it.
I'm pretty sure that it could be improved and optimized. Feel free to do it and post it here.
from __future__ import division
from math import log
def xfrange(start, stop, step):
old_start = start #backup this value
digits = int(round(log(10000, 10)))+1 #get number of digits
magnitude = 10**digits
stop = int(magnitude * stop) #convert from
step = int(magnitude * step) #0.1 to 10 (e.g.)
if start == 0:
start = 10**(digits-1)
else:
start = 10**(digits)*start
data = [] #create array
#calc number of iterations
end_loop = int((stop-start)//step)
if old_start == 0:
end_loop += 1
acc = start
for i in xrange(0, end_loop):
data.append(acc/magnitude)
acc += step
return data
print xfrange(1, 2.1, 0.1)
print xfrange(0, 1.1, 0.1)
print xfrange(-1, 0.1, 0.1)
The output is:
[1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]
[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1]
[-1.0, -0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0.0]
For completeness of boutique, a functional solution:
def frange(a,b,s):
return [] if s > 0 and a > b or s < 0 and a < b or s==0 else [a]+frange(a+s,b,s)
You can use this function:
def frange(start,end,step):
return map(lambda x: x*step, range(int(start*1./step),int(end*1./step)))
It can be done using Numpy library. arange() function allows steps in float. But, it returns a numpy array which can be converted to list using tolist() for our convenience.
for i in np.arange(0, 1, 0.1).tolist():
print i
start and stop are inclusive rather than one or the other (usually stop is excluded) and without imports, and using generators
def rangef(start, stop, step, fround=5):
"""
Yields sequence of numbers from start (inclusive) to stop (inclusive)
by step (increment) with rounding set to n digits.
:param start: start of sequence
:param stop: end of sequence
:param step: int or float increment (e.g. 1 or 0.001)
:param fround: float rounding, n decimal places
:return:
"""
try:
i = 0
while stop >= start and step > 0:
if i==0:
yield start
elif start >= stop:
yield stop
elif start < stop:
if start == 0:
yield 0
if start != 0:
yield start
i += 1
start += step
start = round(start, fround)
else:
pass
except TypeError as e:
yield "type-error({})".format(e)
else:
pass
# passing
print(list(rangef(-100.0,10.0,1)))
print(list(rangef(-100,0,0.5)))
print(list(rangef(-1,1,0.2)))
print(list(rangef(-1,1,0.1)))
print(list(rangef(-1,1,0.05)))
print(list(rangef(-1,1,0.02)))
print(list(rangef(-1,1,0.01)))
print(list(rangef(-1,1,0.005)))
# failing: type-error:
print(list(rangef("1","10","1")))
print(list(rangef(1,10,"1")))
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64
bit (AMD64)]
I know I'm late to the party here, but here's a trivial generator solution that's working in 3.6:
def floatRange(*args):
start, step = 0, 1
if len(args) == 1:
stop = args[0]
elif len(args) == 2:
start, stop = args[0], args[1]
elif len(args) == 3:
start, stop, step = args[0], args[1], args[2]
else:
raise TypeError("floatRange accepts 1, 2, or 3 arguments. ({0} given)".format(len(args)))
for num in start, step, stop:
if not isinstance(num, (int, float)):
raise TypeError("floatRange only accepts float and integer arguments. ({0} : {1} given)".format(type(num), str(num)))
for x in range(int((stop-start)/step)):
yield start + (x * step)
return
then you can call it just like the original range()... there's no error handling, but let me know if there is an error that can be reasonably caught, and I'll update. or you can update it. this is StackOverflow.
To counter the float precision issues, you could use the Decimal module.
This demands an extra effort of converting to Decimal from int or float while writing the code, but you can instead pass str and modify the function if that sort of convenience is indeed necessary.
from decimal import Decimal
def decimal_range(*args):
zero, one = Decimal('0'), Decimal('1')
if len(args) == 1:
start, stop, step = zero, args[0], one
elif len(args) == 2:
start, stop, step = args + (one,)
elif len(args) == 3:
start, stop, step = args
else:
raise ValueError('Expected 1 or 2 arguments, got %s' % len(args))
if not all([type(arg) == Decimal for arg in (start, stop, step)]):
raise ValueError('Arguments must be passed as <type: Decimal>')
# neglect bad cases
if (start == stop) or (start > stop and step >= zero) or \
(start < stop and step <= zero):
return []
current = start
while abs(current) < abs(stop):
yield current
current += step
Sample outputs -
from decimal import Decimal as D
list(decimal_range(D('2')))
# [Decimal('0'), Decimal('1')]
list(decimal_range(D('2'), D('4.5')))
# [Decimal('2'), Decimal('3'), Decimal('4')]
list(decimal_range(D('2'), D('4.5'), D('0.5')))
# [Decimal('2'), Decimal('2.5'), Decimal('3.0'), Decimal('3.5'), Decimal('4.0')]
list(decimal_range(D('2'), D('4.5'), D('-0.5')))
# []
list(decimal_range(D('2'), D('-4.5'), D('-0.5')))
# [Decimal('2'),
# Decimal('1.5'),
# Decimal('1.0'),
# Decimal('0.5'),
# Decimal('0.0'),
# Decimal('-0.5'),
# Decimal('-1.0'),
# Decimal('-1.5'),
# Decimal('-2.0'),
# Decimal('-2.5'),
# Decimal('-3.0'),
# Decimal('-3.5'),
# Decimal('-4.0')]
Add auto-correction for the possibility of an incorrect sign on step:
def frange(start,step,stop):
step *= 2*((stop>start)^(step<0))-1
return [start+i*step for i in range(int((stop-start)/step))]
My solution:
def seq(start, stop, step=1, digit=0):
x = float(start)
v = []
while x <= stop:
v.append(round(x,digit))
x += step
return v
Here is my solution which works fine with float_range(-1, 0, 0.01) and works without floating point representation errors. It is not very fast, but works fine:
from decimal import Decimal
def get_multiplier(_from, _to, step):
digits = []
for number in [_from, _to, step]:
pre = Decimal(str(number)) % 1
digit = len(str(pre)) - 2
digits.append(digit)
max_digits = max(digits)
return float(10 ** (max_digits))
def float_range(_from, _to, step, include=False):
"""Generates a range list of floating point values over the Range [start, stop]
with step size step
include=True - allows to include right value to if possible
!! Works fine with floating point representation !!
"""
mult = get_multiplier(_from, _to, step)
# print mult
int_from = int(round(_from * mult))
int_to = int(round(_to * mult))
int_step = int(round(step * mult))
# print int_from,int_to,int_step
if include:
result = range(int_from, int_to + int_step, int_step)
result = [r for r in result if r <= int_to]
else:
result = range(int_from, int_to, int_step)
# print result
float_result = [r / mult for r in result]
return float_result
print float_range(-1, 0, 0.01,include=False)
assert float_range(1.01, 2.06, 5.05 % 1, True) ==\
[1.01, 1.06, 1.11, 1.16, 1.21, 1.26, 1.31, 1.36, 1.41, 1.46, 1.51, 1.56, 1.61, 1.66, 1.71, 1.76, 1.81, 1.86, 1.91, 1.96, 2.01, 2.06]
assert float_range(1.01, 2.06, 5.05 % 1, False)==\
[1.01, 1.06, 1.11, 1.16, 1.21, 1.26, 1.31, 1.36, 1.41, 1.46, 1.51, 1.56, 1.61, 1.66, 1.71, 1.76, 1.81, 1.86, 1.91, 1.96, 2.01]
I am only a beginner, but I had the same problem, when simulating some calculations. Here is how I attempted to work this out, which seems to be working with decimal steps.
I am also quite lazy and so I found it hard to write my own range function.
Basically what I did is changed my xrange(0.0, 1.0, 0.01) to xrange(0, 100, 1) and used the division by 100.0 inside the loop.
I was also concerned, if there will be rounding mistakes. So I decided to test, whether there are any. Now I heard, that if for example 0.01 from a calculation isn't exactly the float 0.01 comparing them should return False (if I am wrong, please let me know).
So I decided to test if my solution will work for my range by running a short test:
for d100 in xrange(0, 100, 1):
d = d100 / 100.0
fl = float("0.00"[:4 - len(str(d100))] + str(d100))
print d, "=", fl , d == fl
And it printed True for each.
Now, if I'm getting it totally wrong, please let me know.
The trick to avoid round-off problem is to use a separate number to move through the range, that starts and half the step ahead of start.
# floating point range
def frange(a, b, stp=1.0):
i = a+stp/2.0
while i<b:
yield a
a += stp
i += stp
Alternatively, numpy.arange can be used.
My answer is similar to others using map(), without need of NumPy, and without using lambda (though you could). To get a list of float values from 0.0 to t_max in steps of dt:
def xdt(n):
return dt*float(n)
tlist = map(xdt, range(int(t_max/dt)+1))

Resources