Polynomials and dictionaries - python-3.x

i have this exercise about polynomials and dictionaries which i did (see below) but i am sure there is a better and easier way to solve it (for question 2 and 3). can anyone show me another way to approach question 2 or 3? Thanks.
here is the exercice:
In this exercise we want to work with polynomials of any degree. Each polynomial can be represented by a dictionary, whose keys correspond to the powers of x, and the values ​​to the coefficients. For example, to represent the polynomial x ^ 6 + 3 * x ^ 2, we can use the dictionary: {6: 1, 2: 3}
1. Write a function evaluer(p, x) that takes a polynomial p and a number x into arguments, and returns the value of polynomial at point x.
Execution example:
evaluer({3: 1, 1: 2, 0: -1}, 2)
OUT: 11
2. Write a function somme_polynomes(p1, p2) which takes two polynomials (dictionaries) into arguments and which
returns a new dictionary representing the sum of the two polynomials p1 and p2.
Execution example:
somme_polynomes ({3: 1, 2: 1, 0: 1}, {4: 2, 2: 3})
OUT: {0: 1, 2: 4, 3: 1, 4: 2}
3. Write a function produit_polynomes(p1, p2) that takes two polynomials as arguments and returns the product of two polynomials in a new dictionary.
Execution example:
produit_polynomes ({3: 1, 2: 1, 0: 1}, {4: 2, 2: 3})
OUT: {2: 3, 4: 5, 5: 3, 6: 2, 7: 2}
here is what i did:
# 1)
def evaluer(p,x):
c = 0
for key,value in p.items():
c += value*(x**key)
return c
# 2)
def somme_polynomes(p1,p2):
p3 = {}
for key,value in p1.items():
for k,v in p2.items():
p3.update({key:value})
p3.update({k:v})
for key in p1:
if key in p2:
add = p1[key]+p2[key]
p3.update({key:add})
if add == 0:
del p3[key]
return p3
# 3)
def produit_polynomes(p1,p2):
p3 = {}
for key,value in p1.items():
for k,v in p2.items():
if key+k in p3:
p3[key+k] += value*v
else:
p3.update({key+k:value*v})
return p3

Your code is fine, here are alternative ways of doing it using more of Python's language (generator expression, dict comprehension) and library (itertools, collections):
def evaluer(p, x):
return sum(v * x**k for k, v in p.items())
def somme_polynomes(p1, p2):
return {k: p1.get(k, 0) + p2.get(k, 0) for k in p1.keys() | p2.keys()}
import itertools as it
from collection import defaultdict
def produit_polynomes(p1, p2):
p3 = defaultdict(int)
for k1, k2 in it.product(p1, p2):
p3[k1+k2] += p1[k1]*p2[k2]
return dict(p3)
If you want to avoid importing any modules then produit_polnomes() could be written without the conditional as:
def produit_polynomes(p1,p2):
p3 = {}
for k1, v1 in p1.items():
for k2, v2 in p2.items():
p3[k1+k2] = p3.get(k1+k2, 0) + v1*v2
return p3

Exercise 2 can be done in a more Pythonic way by using set union for keys and dict comprehension for sums:
def somme_polynomes(p1, p2):
return {p: p1.get(p, 0) + p2.get(p, 0) for p in p1.keys() | p2.keys()}
Exercise 3 on the other hand is best done using nested loops aggregating products to the sum of keys, which is what you are already doing. The only slight enhancement I would make is to use the setdefault method to avoid the if statement:
def produit_polynomes(p1,p2):
p3 = {}
for key,value in p1.items():
for k,v in p2.items():
p3[key + k] = p3.get(key + k, 0) + value * v
return p3

Related

find sequence of elements in numpy array [duplicate]

In Python or NumPy, what is the best way to find out the first occurrence of a subarray?
For example, I have
a = [1, 2, 3, 4, 5, 6]
b = [2, 3, 4]
What is the fastest way (run-time-wise) to find out where b occurs in a? I understand for strings this is extremely easy, but what about for a list or numpy ndarray?
Thanks a lot!
[EDITED] I prefer the numpy solution, since from my experience numpy vectorization is much faster than Python list comprehension. Meanwhile, the big array is huge, so I don't want to convert it into a string; that will be (too) long.
I'm assuming you're looking for a numpy-specific solution, rather than a simple list comprehension or for loop. One straightforward approach is to use the rolling window technique to search for windows of the appropriate size.
This approach is simple, works correctly, and is much faster than any pure Python solution. It should be sufficient for many use cases. However, it is not the most efficient approach possible, for a number of reasons. For an approach that is more complicated, but asymptotically optimal in the expected case, see the numba-based rolling hash implementation in norok2's answer.
Here's the rolling_window function:
>>> def rolling_window(a, size):
... shape = a.shape[:-1] + (a.shape[-1] - size + 1, size)
... strides = a.strides + (a. strides[-1],)
... return numpy.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
...
Then you could do something like
>>> a = numpy.arange(10)
>>> numpy.random.shuffle(a)
>>> a
array([7, 3, 6, 8, 4, 0, 9, 2, 1, 5])
>>> rolling_window(a, 3) == [8, 4, 0]
array([[False, False, False],
[False, False, False],
[False, False, False],
[ True, True, True],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
To make this really useful, you'd have to reduce it along axis 1 using all:
>>> numpy.all(rolling_window(a, 3) == [8, 4, 0], axis=1)
array([False, False, False, True, False, False, False, False], dtype=bool)
Then you could use that however you'd use a boolean array. A simple way to get the index out:
>>> bool_indices = numpy.all(rolling_window(a, 3) == [8, 4, 0], axis=1)
>>> numpy.mgrid[0:len(bool_indices)][bool_indices]
array([3])
For lists you could adapt one of these rolling window iterators to use a similar approach.
For very large arrays and subarrays, you could save memory like this:
>>> windows = rolling_window(a, 3)
>>> sub = [8, 4, 0]
>>> hits = numpy.ones((len(a) - len(sub) + 1,), dtype=bool)
>>> for i, x in enumerate(sub):
... hits &= numpy.in1d(windows[:,i], [x])
...
>>> hits
array([False, False, False, True, False, False, False, False], dtype=bool)
>>> hits.nonzero()
(array([3]),)
On the other hand, this will probably be somewhat slower.
The following code should work:
[x for x in xrange(len(a)) if a[x:x+len(b)] == b]
Returns the index at which the pattern starts.
(EDITED to include a deeper discussion, better code and more benchmarks)
Summary
For raw speed and efficiency, one can use a Cython or Numba accelerated version (when the input is a Python sequence or a NumPy array, respectively) of one of the classical algorithms.
The recommended approaches are:
find_kmp_cy() for Python sequences (list, tuple, etc.)
find_kmp_nb() for NumPy arrays
Other efficient approaches, are find_rk_cy() and find_rk_nb() which, are more memory efficient but are not guaranteed to run in linear time.
If Cython / Numba are not available, again both find_kmp() and find_rk() are a good all-around solution for most use cases, although in the average case and for Python sequences, the naïve approach, in some form, notably find_pivot(), may be faster. For NumPy arrays, find_conv() (from #Jaime answer) outperforms any non-accelerated naïve approach.
(Full code is below, and here and there.)
Theory
This is a classical problem in computer science that goes by the name of string-searching or string matching problem.
The naive approach, based on two nested loops, has a computational complexity of O(n + m) on average, but worst case is O(n m).
Over the years, a number of alternative approaches have been developed which guarantee a better worst case performances.
Of the classical algorithms, the ones that can be best suited to generic sequences (since they do not rely on an alphabet) are:
the naïve algorithm (basically consisting of two nested loops)
the Knuth–Morris–Pratt (KMP) algorithm
the Rabin-Karp (RK) algorithm
This last algorithm relies on the computation of a rolling hash for its efficiency and therefore may require some additional knowledge of the input for optimal performance.
Eventually, it is best suited for homogeneous data, like for example numeric arrays.
A notable example of numeric arrays in Python is, of course, NumPy arrays.
Remarks
The naïve algorithm, by being so simple, lends itself to different implementations with various degrees of run-time speed in Python.
The other algorithms are less flexible in what can be optimized via language tricks.
Explicit looping in Python may be a speed bottleneck and several tricks can be used to perform the looping outside of the interpreter.
Cython is especially good at speeding up explicit loops for generic Python code.
Numba is especially good at speeding up explicit loops on NumPy arrays.
This is an excellent use-case for generators, so all the code will be using those instead of regular functions.
Python Sequences (list, tuple, etc.)
Based on the Naïve Algorithm
find_loop(), find_loop_cy() and find_loop_nb() which are the explicit-loop only implementation in pure Python, Cython and with Numba JITing respectively. Note the forceobj=True in the Numba version, which is required because we are using Python object inputs.
def find_loop(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
found = True
for j in range(m):
if seq[i + j] != subseq[j]:
found = False
break
if found:
yield i
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
def find_loop_cy(seq, subseq):
cdef Py_ssize_t n = len(seq)
cdef Py_ssize_t m = len(subseq)
for i in range(n - m + 1):
found = True
for j in range(m):
if seq[i + j] != subseq[j]:
found = False
break
if found:
yield i
find_loop_nb = nb.jit(find_loop, forceobj=True)
find_loop_nb.__name__ = 'find_loop_nb'
find_all() replaces the inner loop with all() on a comprehension generator
def find_all(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if all(seq[i + j] == subseq[j] for j in range(m)):
yield i
find_slice() replaces the inner loop with direct comparison == after slicing []
def find_slice(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i:i + m] == subseq:
yield i
find_mix() and find_mix2() replaces the inner loop with direct comparison == after slicing [] but includes one or two additional short-circuiting on the first (and last) character which may be faster because slicing with an int is much faster than slicing with a slice().
def find_mix(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i] == subseq[0] and seq[i:i + m] == subseq:
yield i
def find_mix2(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if seq[i] == subseq[0] and seq[i + m - 1] == subseq[m - 1] \
and seq[i:i + m] == subseq:
yield i
find_pivot() and find_pivot2() replace the outer loop with multiple .index() call using the first item of the sub-sequence, while using slicing for the inner loop, eventually with additional short-circuiting on the last item (the first matches by construction). The multiple .index() calls are wrapped in a index_all() generator (which may be useful on its own).
def index_all(seq, item, start=0, stop=-1):
try:
n = len(seq)
if n > 0:
start %= n
stop %= n
i = start
while True:
i = seq.index(item, i)
if i <= stop:
yield i
i += 1
else:
return
else:
return
except ValueError:
pass
def find_pivot(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
for i in index_all(seq, subseq[0], 0, n - m):
if seq[i:i + m] == subseq:
yield i
def find_pivot2(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
for i in index_all(seq, subseq[0], 0, n - m):
if seq[i + m - 1] == subseq[m - 1] and seq[i:i + m] == subseq:
yield i
Based on Knuth–Morris–Pratt (KMP) Algorithm
find_kmp() is a plain Python implementation of the algorithm. Since there is no simple looping or places where one could use slicing with a slice(), there is not much to be done for optimization, except using Cython (Numba would require again forceobj=True which would lead to slow code).
def find_kmp(seq, subseq):
n = len(seq)
m = len(subseq)
# : compute offsets
offsets = [0] * m
j = 1
k = 0
while j < m:
if subseq[j] == subseq[k]:
k += 1
offsets[j] = k
j += 1
else:
if k != 0:
k = offsets[k - 1]
else:
offsets[j] = 0
j += 1
# : find matches
i = j = 0
while i < n:
if seq[i] == subseq[j]:
i += 1
j += 1
if j == m:
yield i - j
j = offsets[j - 1]
elif i < n and seq[i] != subseq[j]:
if j != 0:
j = offsets[j - 1]
else:
i += 1
find_kmp_cy() is Cython implementation of the algorithm where the indices use C int data type, which result in much faster code.
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
def find_kmp_cy(seq, subseq):
cdef Py_ssize_t n = len(seq)
cdef Py_ssize_t m = len(subseq)
# : compute offsets
offsets = [0] * m
cdef Py_ssize_t j = 1
cdef Py_ssize_t k = 0
while j < m:
if subseq[j] == subseq[k]:
k += 1
offsets[j] = k
j += 1
else:
if k != 0:
k = offsets[k - 1]
else:
offsets[j] = 0
j += 1
# : find matches
cdef Py_ssize_t i = 0
j = 0
while i < n:
if seq[i] == subseq[j]:
i += 1
j += 1
if j == m:
yield i - j
j = offsets[j - 1]
elif i < n and seq[i] != subseq[j]:
if j != 0:
j = offsets[j - 1]
else:
i += 1
Based on Rabin-Karp (RK) Algorithm
find_rk() is a pure Python implementation, which relies on Python's hash() for the computation (and comparison) of the hash. Such hash is made rolling by mean of a simple sum(). The roll-over is then computed from the previous hash by subtracting the result of hash() on the just visited item seq[i - 1] and adding up the result of hash() on the newly considered item seq[i + m - 1].
def find_rk(seq, subseq):
n = len(seq)
m = len(subseq)
if seq[:m] == subseq:
yield 0
hash_subseq = sum(hash(x) for x in subseq) # compute hash
curr_hash = sum(hash(x) for x in seq[:m]) # compute hash
for i in range(1, n - m + 1):
curr_hash += hash(seq[i + m - 1]) - hash(seq[i - 1]) # update hash
if hash_subseq == curr_hash and seq[i:i + m] == subseq:
yield i
find_rk_cy() is Cython implementation of the algorithm where the indices use the appropriate C data type, which results in much faster code. Note that hash() truncates "the return value based on the bit width of the host machine."
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
def find_rk_cy(seq, subseq):
cdef Py_ssize_t n = len(seq)
cdef Py_ssize_t m = len(subseq)
if seq[:m] == subseq:
yield 0
cdef Py_ssize_t hash_subseq = sum(hash(x) for x in subseq) # compute hash
cdef Py_ssize_t curr_hash = sum(hash(x) for x in seq[:m]) # compute hash
cdef Py_ssize_t old_item, new_item
for i in range(1, n - m + 1):
old_item = hash(seq[i - 1])
new_item = hash(seq[i + m - 1])
curr_hash += new_item - old_item # update hash
if hash_subseq == curr_hash and seq[i:i + m] == subseq:
yield i
Benchmarks
The above functions are evaluated on two inputs:
random inputs
def gen_input(n, k=2):
return tuple(random.randint(0, k - 1) for _ in range(n))
(almost) worst inputs for the naïve algorithm
def gen_input_worst(n, k=-2):
result = [0] * n
result[k] = 1
return tuple(result)
The subseq has fixed size (32).
Since there are so many alternatives, two separate grouping have been done and some solutions with very small variations and almost identical timings have been omitted (i.e. find_mix2() and find_pivot2()).
For each group both inputs are tested.
For each benchmark the full plot and a zoom on the fastest approach is provided.
Naïve on Random
Naïve on Worst
Other on Random
Other on Worst
(Full code is available here.)
NumPy Arrays
Based on the Naïve Algorithm
find_loop(), find_loop_cy() and find_loop_nb() which are the explicit-loop only implementation in pure Python, Cython and with Numba JITing respectively. The code for the first two are the same as above and hence omitted. find_loop_nb() now enjoys fast JIT compilation. The inner loop has been written in a separate function because it can then be reused for find_rk_nb() (calling Numba functions inside Numba functions does not incur in the function call penalty typical of Python).
#nb.jit
def _is_equal_nb(seq, subseq, m, i):
for j in range(m):
if seq[i + j] != subseq[j]:
return False
return True
#nb.jit
def find_loop_nb(seq, subseq):
n = len(seq)
m = len(subseq)
for i in range(n - m + 1):
if _is_equal_nb(seq, subseq, m, i):
yield i
find_all() is the same as above, while find_slice(), find_mix() and find_mix2() are almost identical to the above, the only difference is that seq[i:i + m] == subseq is now the argument of np.all(): np.all(seq[i:i + m] == subseq).
find_pivot() and find_pivot2() share the same ideas as above, except that now uses np.where() instead of index_all() and the need for enclosing the array equality inside an np.all() call.
def find_pivot(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
max_i = n - m
for i in np.where(seq == subseq[0])[0]:
if i > max_i:
return
elif np.all(seq[i:i + m] == subseq):
yield i
def find_pivot2(seq, subseq):
n = len(seq)
m = len(subseq)
if m > n:
return
max_i = n - m
for i in np.where(seq == subseq[0])[0]:
if i > max_i:
return
elif seq[i + m - 1] == subseq[m - 1] \
and np.all(seq[i:i + m] == subseq):
yield i
find_rolling() express the looping via a rolling window and the matching is checked with np.all(). This vectorizes all the looping at the expenses of creating large temporary objects, while still substantially appling the naïve algorithm. (The approach is from #senderle answer).
def rolling_window(arr, size):
shape = arr.shape[:-1] + (arr.shape[-1] - size + 1, size)
strides = arr.strides + (arr.strides[-1],)
return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)
def find_rolling(seq, subseq):
bool_indices = np.all(rolling_window(seq, len(subseq)) == subseq, axis=1)
yield from np.mgrid[0:len(bool_indices)][bool_indices]
find_rolling2() is a slightly more memory efficient variation of the above, where the vectorization is only partial and one explicit looping (along the expected shortest dimension -- the length of subseq) is kept. (The approach is also from #senderle answer).
def find_rolling2(seq, subseq):
windows = rolling_window(seq, len(subseq))
hits = np.ones((len(seq) - len(subseq) + 1,), dtype=bool)
for i, x in enumerate(subseq):
hits &= np.in1d(windows[:, i], [x])
yield from hits.nonzero()[0]
Based on Knuth–Morris–Pratt (KMP) Algorithm
find_kmp() is the same as above, while find_kmp_nb() is a straightforward JIT-compilation of that.
find_kmp_nb = nb.jit(find_kmp)
find_kmp_nb.__name__ = 'find_kmp_nb'
Based on Rabin-Karp (RK) Algorithm
find_rk() is the same as the above, except that again seq[i:i + m] == subseq is enclosed in an np.all() call.
find_rk_nb() is the Numba accelerated version of the above. Uses _is_equal_nb() defined earlier to definitively determine a match, while for the hashing, it uses a Numba accelerated sum_hash_nb() function whose definition is pretty straightforward.
#nb.jit
def sum_hash_nb(arr):
result = 0
for x in arr:
result += hash(x)
return result
#nb.jit
def find_rk_nb(seq, subseq):
n = len(seq)
m = len(subseq)
if _is_equal_nb(seq, subseq, m, 0):
yield 0
hash_subseq = sum_hash_nb(subseq) # compute hash
curr_hash = sum_hash_nb(seq[:m]) # compute hash
for i in range(1, n - m + 1):
curr_hash += hash(seq[i + m - 1]) - hash(seq[i - 1]) # update hash
if hash_subseq == curr_hash and _is_equal_nb(seq, subseq, m, i):
yield i
find_conv() uses a pseudo Rabin-Karp method, where initial candidates are hashed using the np.dot() product and located on the convolution between seq and subseq with np.where(). The approach is pseudo because, while it still uses hashing to identify probable candidates, it is may not be regarded as a rolling hash (it depends on the actual implementation of np.correlate()). Also, it needs to create a temporary array the size of the input. (The approach is from #Jaime answer).
def find_conv(seq, subseq):
target = np.dot(subseq, subseq)
candidates = np.where(np.correlate(seq, subseq, mode='valid') == target)[0]
check = candidates[:, np.newaxis] + np.arange(len(subseq))
mask = np.all((np.take(seq, check) == subseq), axis=-1)
yield from candidates[mask]
Benchmarks
Like before, the above functions are evaluated on two inputs:
random inputs
def gen_input(n, k=2):
return np.random.randint(0, k, n)
(almost) worst inputs for the naïve algorithm
def gen_input_worst(n, k=-2):
result = np.zeros(n, dtype=int)
result[k] = 1
return result
The subseq has fixed size (32).
This plots follow the same scheme as before, summarized below for convenience.
Since there are so many alternatives, two separate grouping have been done and some solutions with very small variations and almost identical timings have been omitted (i.e. find_mix2() and find_pivot2()).
For each group both inputs are tested.
For each benchmark the full plot and a zoom on the fastest approach is provided.
Naïve on Random
Naïve on Worst
Other on Random
Other on Worst
(Full code is available here.)
A convolution based approach, that should be more memory efficient than the stride_tricks based approach:
def find_subsequence(seq, subseq):
target = np.dot(subseq, subseq)
candidates = np.where(np.correlate(seq,
subseq, mode='valid') == target)[0]
# some of the candidates entries may be false positives, double check
check = candidates[:, np.newaxis] + np.arange(len(subseq))
mask = np.all((np.take(seq, check) == subseq), axis=-1)
return candidates[mask]
With really big arrays it may not be possible to use a stride_tricks approach, but this one still works:
haystack = np.random.randint(1000, size=(1e6))
needle = np.random.randint(1000, size=(100,))
# Hide 10 needles in the haystack
place = np.random.randint(1e6 - 100 + 1, size=10)
for idx in place:
haystack[idx:idx+100] = needle
In [3]: find_subsequence(haystack, needle)
Out[3]:
array([253824, 321497, 414169, 456777, 635055, 879149, 884282, 954848,
961100, 973481], dtype=int64)
In [4]: np.all(np.sort(place) == find_subsequence(haystack, needle))
Out[4]: True
In [5]: %timeit find_subsequence(haystack, needle)
10 loops, best of 3: 79.2 ms per loop
you can call tostring() method to convert an array to string, and then you can use fast string search. this method maybe faster when you have many subarray to check.
import numpy as np
a = np.array([1,2,3,4,5,6])
b = np.array([2,3,4])
print a.tostring().index(b.tostring())//a.itemsize
Another try, but I'm sure there is more pythonic & efficent way to do that ...
def array_match(a, b):
for i in xrange(0, len(a)-len(b)+1):
if a[i:i+len(b)] == b:
return i
return None
a = [1, 2, 3, 4, 5, 6]
b = [2, 3, 4]
print array_match(a,b)
1
(This first answer was not in scope of the question, as cdhowie mentionned)
set(a) & set(b) == set(b)
Here is a rather straight-forward option:
def first_subarray(full_array, sub_array):
n = len(full_array)
k = len(sub_array)
matches = np.argwhere([np.all(full_array[start_ix:start_ix+k] == sub_array)
for start_ix in range(0, n-k+1)])
return matches[0]
Then using the original a, b vectors we get:
a = [1, 2, 3, 4, 5, 6]
b = [2, 3, 4]
first_subarray(a, b)
Out[44]:
array([1], dtype=int64)
Quick comparison of three of the proposed solutions (average time of 100 iteration for randomly created vectors.):
import time
import collections
import numpy as np
def function_1(seq, sub):
# direct comparison
seq = list(seq)
sub = list(sub)
return [i for i in range(len(seq) - len(sub)) if seq[i:i+len(sub)] == sub]
def function_2(seq, sub):
# Jamie's solution
target = np.dot(sub, sub)
candidates = np.where(np.correlate(seq, sub, mode='valid') == target)[0]
check = candidates[:, np.newaxis] + np.arange(len(sub))
mask = np.all((np.take(seq, check) == sub), axis=-1)
return candidates[mask]
def function_3(seq, sub):
# HYRY solution
return seq.tostring().index(sub.tostring())//seq.itemsize
# --- assessment time performance
N = 100
seq = np.random.choice([0, 1, 2, 3, 4, 5, 6], 3000)
sub = np.array([1, 2, 3])
tim = collections.OrderedDict()
tim.update({function_1: 0.})
tim.update({function_2: 0.})
tim.update({function_3: 0.})
for function in tim.keys():
for _ in range(N):
seq = np.random.choice([0, 1, 2, 3, 4], 3000)
sub = np.array([1, 2, 3])
start = time.time()
function(seq, sub)
end = time.time()
tim[function] += end - start
timer_dict = collections.OrderedDict()
for key, val in tim.items():
timer_dict.update({key.__name__: val / N})
print(timer_dict)
Which would result (on my old machine) in:
OrderedDict([
('function_1', 0.0008518099784851074),
('function_2', 8.157730102539063e-05),
('function_3', 6.124973297119141e-06)
])
First, convert the list to string.
a = ''.join(str(i) for i in a)
b = ''.join(str(i) for i in b)
After converting to string, you can easily find the index of substring with the following string function.
a.index(b)
Cheers!!

Rotate String to find Hamming distance equal to K

We are given 2 binary strings (A and B) both of length N and an integer K.
We need to check if there is a rotation of string B present where hamming distance between A and the rotated string is equal to K. We can just remove one character from front and put it at back in single operation.
Example : Let say we are given these 2 string with values as A="01011" and B="01110" and also K=4.
Note : Hamming distance between binary string is number of bit positions in which two corresponding bits in strings are different.
In above example answer will be "YES" as if we rotate string B once it becomes "11100", which has hamming distance of 4, that is equal to K.
Approach :
For every rotated string of B
check that hamming distance with A
if hamming distance == K:
return "YES"
return "NO"
But obviously above approach will execute in O(Length of string x Length of string) times. Is there better approach to solve this. As we don't need to find the actual string, I am just wondering there is some better algorithm to get this answer.
Constraints :
Length of each string <= 2000
Number of test cases to run in one file <=600
First note that we can compute the Hamming distance as the sum of a[i]*(1-b[i]) + b[i]*(1-a[i]) for all i. This simplifies to a[i] + b[i] - 2*a[i]*b[i]. Now in O(n) we can compute the sum of a[i] and b[i] for all i, and this doesn't change with bit rotations, so the only interesting term is 2*a[i]*b[i]. We can compute this term efficiently for all bit rotations by noting that it is equivalent to a circular convolution of a and b. We can efficiently compute such convolutions using the Discrete Fourier transform in O(n log n) time.
For example in Python using numpy:
import numpy as np
def hdist(a, b):
return sum(bool(ai) ^ bool(bi) for ai, bi in zip(a, b))
def slow_circular_hdist(a, b):
return [hdist(a, b[i:] + b[:i]) for i in range(len(b))]
def circular_convolution(a, b):
return np.real(np.fft.ifft(np.fft.fft(a)*np.fft.fft(b[::-1])))[::-1]
def fast_circular_hdist(a, b):
hdist = np.sum(a) + np.sum(b) - 2*circular_convolution(a, b)
return list(np.rint(hdist).astype(int))
Usage:
>>> a = [0, 1, 0, 1, 1]
>>> b = [0, 1, 1, 1, 0]
>>> slow_circular_hdist(a, b)
[2, 4, 2, 2, 2]
>>> fast_circular_hdist(a, b)
[2, 4, 2, 2, 2]
Speed and large correctness test:
>>> x = list((np.random.random(5000) < 0.5).astype(int))
>>> y = list((np.random.random(5000) < 0.5).astype(int))
>>> s = time.time(); slow_circular_hdist(x, y); print(time.time() - s)
6.682933807373047
>>> s = time.time(); fast_circular_hdist(x, y); print(time.time() - s)
0.008500814437866211
>>> slow_circular_hdist(x, y) == fast_circular_hdist(x, y)
True

Why are elements of my array being over written?

I have written a simple function in Python which aims to find, if from two elements a and b, one can be obtained from another by swapping at most one pair of elements in one of the arrays.
This is my function:
def areSimilar(a, b):
test = 0
for i in range(len(b)):
for j in range(len(b)):
b2 = b
b2[i] = b[j]
b2[j] = b[i]
if a == b2:
test = 1
return(test==1)
The issue is that upon inspecting b, it has changed even though I don't actually perform any calculations on b - what's going on!!??
(EDITED: to better address the second point)
There are two issues with your code:
When you do b2 = b this just creates another reference to the underlying object. If b is mutable, any change made to b2 will be reflected in b too.
When a single swapping suffices there is no need to test further, but if you keep on looping the test will be successful again with i and j swapped, so test condition is hit either never or (at least -- depending on the amount of duplicates) twice. While this would not lead to incorrect results, it would normally be regarded as an error in the logic.
To fix your code, you could just create a copy of b. Assuming that by Python arrays you actually mean Python lists one way of doing it would be to create a new list every time by replacing b2 = b with b2 = list(b). A more efficient approach is to perform the swapping on b itself (and swap back):
def are_similar(a, b):
for i in range(len(b)):
for j in range(len(b)):
b[i], b[j] = b[j], b[i]
if a == b:
b[i], b[j] = b[j], b[i] # swap back
return True
else:
b[i], b[j] = b[j], b[i] # swap back
return False
print(are_similar([1, 1, 2, 3], [1, 2, 1, 3]))
# True
print(are_similar([1, 1, 2, 3], [3, 2, 1, 1]))
# False
By contrast, you can see how inefficient (while correct) the copying-based approach is:
def are_similar2(a, b):
for i in range(len(b)):
for j in range(len(b)):
b2 = list(b)
b2[i] = b[j]
b2[j] = b[i]
if a == b2:
return True
return False
print(are_similar2([1, 1, 2, 3], [1, 2, 1, 3]))
# True
print(are_similar2([1, 1, 2, 3], [3, 2, 1, 1]))
# False
with much worse timings, even on relatively small inputs:
a = [1, 1, 2, 3] + list(range(100))
b = [1, 2, 1, 3] + list(range(100))
%timeit are_similar(a, b)
# 10000 loops, best of 3: 22.9 µs per loop
%timeit are_similar2(a, b)
# 10000 loops, best of 3: 73.9 µs per loop
I would got with Sadap's code, but if you want to copy, use :
import copy
def areSimilar(a, b):
test = 0
for i in range(len(b)):
for j in range(len(b)):
b2 = copy.deepcopy(b)
b2[i] = copy.deepcopy(b[j])
b2[j] = copy.deepcopy(b[i])
if a == b2:
test = 1
if test == 1:
return True
else:
return False

How to use exponents under special criteria

I am new to python and was trying to find a way to organize a specific function so I can take a list, apply special criteria to it, and then return another list.
I want to:
1) square a number if it is even
2) cube a number if it is odd
3) and then store those results in a list and return that list
Here is my code:
def square_function(x):
if i % 2 == 0:
x = [i ** (2)]
else:
y = [i ** (3)]
func = [x, y]
return func
I am very new to programming with python so any help you can give would be fantastic.
take a list - apply special criteria to it - and then return another list.
You're looking for the map() function
def foo(x):
return x**2 if x%2==0 else x**3
l = [1,2,3]
I = list(map(foo, l))
Using list comprehension:
>>> a = [1,2,3,4,5]
>>> [x ** 2 if x % 2 == 0 else x ** 3 for x in a]
[1, 4, 27, 16, 125]
I think that this could be what you are looking for:
from math import sqrt
def square_or_cube_function(x):
result = []
for i in x:
if i % 2 == 0:
result.append(sqrt(i))
else:
result.append(i ** 3)
return result
print(square_or_cube_function([1, 4, 5, 8]))
print(square_or_cube_function([5, 7, 16, 32]))
OUTPUT:
[1, 2.0, 125, 2.8284271247461903]
[125, 343, 4.0, 5.656854249492381]
A shorter solution could be:
from math import sqrt
def square_or_cube_function(x):
return [sqrt(i) if i % 2 == 0 else i **3 for i in x]
print(square_or_cube_function([1, 4, 5, 8]))
print(square_or_cube_function([5, 7, 16, 32]))
Same output.
Another LC solution, but using a bit of cleverness:
[x ** (x % 2 + 2) for x in L]

Randomizing two lists and maintaining order in Python 3.4

I'm basically asking the exact same question as was asked here, but for Python 3.4.0.
In 3.4.0, this code:
a = ["Spears", "Adele", "NDubz", "Nicole", "Cristina"]
b = [1, 2, 3, 4, 5]
combined = zip(a, b)
random.shuffle(combined)
a[:], b[:] = zip(*combined)
does not work. What is the correct way to do this in 3.4.0?
In python 3, zip returns a zip object (i.e. it's itertools.izip from python 2).
You need to force it to materialize the list:
combined = list(zip(a, b))
If memory was tight, you can write your own shuffle function to avoid the need to create the zipped list. The one from Python is not very complicated
def shuffle(self, x, random=None, int=int):
"""x, random=random.random -> shuffle list x in place; return None.
Optional arg random is a 0-argument function returning a random
float in [0.0, 1.0); by default, the standard random.random.
Do not supply the 'int' argument.
"""
randbelow = self._randbelow
for i in reversed(range(1, len(args[0]))):
# pick an element in x[:i+1] with which to exchange x[i]
j = randbelow(i+1) if random is None else int(random() * (i+1))
x[i], x[j] = x[j], x[i]
Your function could be this:
def shuffle2(a, b):
for i in reversed(range(1, len(a))):
j = int(random.random() * (i+1))
a[i], a[j] = a[j], a[i]
b[i], b[j] = b[j], b[i]
To shuffle an arbitrary number of lists in unison
def shuffle_many(*args):
for i in reversed(range(1, len(args[0]))):
j = int(random.random() * (i+1))
for x in args:
x[i], x[j] = x[j], x[i]
eg
>>> import random
>>> def shuffle_many(*args):
... for i in reversed(range(1, len(args[0]))):
... j = int(random.random() * (i+1))
... for x in args:
... x[i], x[j] = x[j], x[i]
...
>>> a = ["Spears", "Adele", "NDubz", "Nicole", "Cristina"]
>>> b = [1, 2, 3, 4, 5]
>>> shuffle_many(a, b)
>>> a
['Adele', 'Spears', 'Nicole', 'NDubz', 'Cristina']
>>> b
[2, 1, 4, 3, 5]
Change combined = zip(a,b) to combined = list(zip(a,b)). You need a list, not an iterator, in order to shuffle in place.
In Python 3, zip returns an iterator rather than a list, so cast it to a list before shuffling it:
combined = list(zip(a, b))

Resources