Numpy cumsum with lower limit (vectorized) - python-3.x

I have an array and I would like to get the cumulative sum imposing a lower bound (lb=0) because in my array I have negative elements. Is it possible to vectorize it? I tried with a loop and numba.njit but the execution is slower than pure Python. Below an example of what I would like to get.
Example array:
a = numpy.array([1,1,-1,-1,-1,1,1])
What I get with numpy.cumsum:
[ 1 2 1 0 -1 0 1]
What I want:
[ 1 2 1 0 0 1 2]
The function with loop:
#numba.njit
def cumsum(array, lb=0):
result = numpy.zeros(array.size)
result[0] = array[0]
for k in range(1, array.size):
result[k] = max(lb, result[k-1]+array[k])
return result

When I execute the above, Numba is way faster than pure Python. Note that the first call to cumsum() (when decorated with #numba.njit) also does the compilation, so to be fair you should not time this first call. The second (and third and ...) call to cumsum() will be fast.

Related

Reduce operation in Spark with constant values gives a constant result irrespective of input

ser = sc.parallelize([1,2,3,4,5])
freq = ser.reduce(lambda x,y : 1+2)
print(freq). #answer is 3
If I run reduce operation by giving constant values, it just gives the sum of those 2 numbers. So in this case, the answer is just 3. While I was expecting it would be (3+3+3+3=12) as there are 5 elements and the summation would happen 4 times. Not able to understand the internals of reduce here. Any help please?
You're misunderstanding what reduce does. It does not apply an aggregation operation (which you assume to be sum for some reason) to a mapping of all elements (which you suppose is what you do with lambda x,y : 1+2)
Reducing that RDD will, roughly speaking, do something like this:
call your lambda with 1, 2 -> lambda returns 3
carry 3 and call lambda with 3, 3 -> lambda returns 3
carry 3 and call lambda with 3, 4 -> lambda returns 3
carry 3 and call lambda with 3, 5 -> lambda returns 3
The reduce method returns the last value, which is 3.
If your intention is to compute 1 + 2 for each element in the RDD, then you need to map and then reduce, something like:
freq = ser.map(lambda x: 1 + 2).reduce(lambda a,b: a+b) #see how reduce works
#which you can rewrite as
freq = ser.map(lambda x: 1 + 2).sum()
But the result of this is 15, not 12 (as there are 5 elements). I don't know any operation that computes a mapping value for each "reduction" step and allows further reduction.
It's likely that is the wrong question to ask, but you can possibly do that by using the map & reduce option above, skipping just one element, although I strongly doubt this is intentional (because the commutative and associative operation of reduce can be called an arbitrary number of times depending on how the RDD is partitioned).

The initializer parameter of reduce() on the sum of elements of an iterable

A query regarding this code:
from functools import reduce
def sum_even(it):
return reduce(lambda x, y: x + y if not y % 2 else x, it,0)
print(sum_even([1, 2, 3, 4]))
Why not adding the third parameter of reduce() adds the first odd element of the list?
If you don't pass an initial element explicitly, then the first two elements of the input become the arguments to the first call to the reducer, so x would be 1 and y would be 2. Since your test only excludes odd ys, not xs, the x gets preserved, and all future xs are sums based on that initial 1. Using 0 as an explicit initial value means only 0 and (later) the accumulated total of that 0 and other even numbers is ever passed as x, so an odd number is never passed as x.
Note that this is kind of a silly way to do this. It's much simpler to build this operation from parts, using one tool to filter to even, and another to sum the surviving elements from the filtering operation. For example:
def sum_even(it):
return sum(x for x in it if not x % 2)
is shorter, clearer, and (likely) faster than the reduce you wrote.

Output of a sum integer from a function that iteratively minimizes an arrary

I'm trying to build a function (minSum) to minimize the sum of an array, of various lengths and values over any number of iterations.
The function contains two arguments - the name of an array (num) and the number of steps of modification (k). For each k-step of modification, the function will retrieve an element/integer from the num array, divide it by 2, and update the array with the ceiling of the halved value in the same index position as it was retrieved. Once the k value has been reached, the function should output the sum of the array as a single integer.
For example - if my array (num) is [10. 20. 7] and I will run it over 2 steps (k) the input for the function would be minSum(num, 2).
It would divide 10 by half in kstep 0 resulting in an array of (5, 20, 7)
It would divide 10 by half in kstep 0 resulting in an array of (5, 10, 7)
It would divide 10 by half in kstep 0 resulting in an array of (5, 10, 4) (4 being the ceiling of 3.5).
The output of this would be the sum 0f 5, 10, 4 = 19. By increasing the k-value we should be able to reduce the output to a lower value. In any case, I'm able to use the below code to achieve my goal with the exception of the output being a single integer (our testing system system only receives the final array). Any pointers here? Thanks!
import array as ar
import math
import numpy as np
# 1. INTEGER_ARRAY num
# 2. INTEGER k (number of steps of element removal, transformation and update)
def minSum(num, k):
arr = np.array(num)
i = 0
idx = 0
while i < k:
for element in np.nditer(arr):
thereduced = math.ceil(element/2)
np.put(arr, [idx], thereduced)
if i < arr.size-1:
idx += 1
thesum = int((sum(arr)))
i = i+1
return thesum
The sum() method returns a single value, so you might create an empty array, put the "thesum" value into the array and at the end of the function return the new array with the values, right now your code is just returning a single integer

How to use built-in `slice` to access 2-d blocks in 2-d array?

I have a 2-d numpy array, for which I would like to modify 2-d blocks (like a 3x3 sub-block on a 9x9 sudoku board). Instead of using fancy indexing, I would like to use the built-in slice. Is there a way to make this work? I am thinking that the stride argument (third arg of slice) can be used to do this somehow, but I can't quite figure it out. My attempt is below.
import numpy as np
# make sample array (dim-1)
x = np.linspace(1, 81, 81).astype(int)
i = slice(0, 3)
print(x[i])
# [1 2 3]
# make sample array (dim-2)
X = x.reshape((9, 9))
Say I wanted to access the first 3 rows and first 3 columns of X. I can do it with fancy indexing as:
print(X[:3, :3])
# [[ 1 2 3]
# [10 11 12]
# [19 20 21]]
Trying to use similar logic to the dim-1 case with slice:
j = np.array([slice(0,3), slice(0,3)]) # wrong way to acccess
print(X[j])
Throws the following error:
IndexError: arrays used as indices must be of integer (or boolean) type
If you subscript with X[:3, :3], then behind the curtains you pass a tuple, so (slice(3), slice(3)).
So you can construct a j with:
j = (slice(3), slice(3))
or you can obtain the a, b block with:
j = (slice(3*a, 3*a+3), slice(3*b, 3*b+3))
so here a=0 and b=1 for example will yield the X[0:3, 3:6] part. So a block that contains the first three rows and second three columns.
or you can make a tuple with a variable number of items. For example for an n-dimensional array, you can make an n-tuple that each has a slice(3) object:
j = (slice(3),) * n

Python infinite recursion with formula

### Run the code below and understand the error messages
### Fix the code to sum integers from 1 up to k
###
def f(k):
return f(k-1) + k
print(f(10))
I am confused on how to fix this code while using recursion, I keep getting the error messages
[Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded
Is there a simple way to fix this without using any while loops or creating more than 1 variable?
A recursion should have a termination condition, i.e. the base case. When your variable attains that value there are no more recursive function calls.
e.g. in your code,
def f(k):
if(k == 1):
return k
return f(k-1) + k
print(f(10))
we define the base case 1, if you want to take the sum of values from n to 1. You can put any other number, positive or negative there, if you want the sum to extend upto that number. e.g. maybe you want to take sum from n to -3, then base case would be k == -3.
Python doesn't have optimized tail recursion. You f function call k time. If k is very big number then Python trow RecursionError. You can see what is limit of recursion via sys.getrecursionlimit and change via sys.setrecursionlimit. But changing limit is not good idea. Instead of changing you can change your code logic or pattern.
Your recursion never terminates. You could try:
def f(k):
return k if k < 2 else f(k-1) + k
print(f(10))
You are working out the sum of all of all numbers from 1 to 10 which in essence returns the 10th triangular number. Eg. the number of black circles in each triangle
Using the formula on OEIS gives you this as your code.
def f(k):
return int(k*(k+1)/2)
print(f(10))
How do we know int() doesn't break this? k and k + 1 are adjacent numbers and one of them must have a factor of two so this formula will always return an integer if given an integer.

Resources