If/else statement vs Heaviside function - python-3.x

In my code I have to consider different contributions with respect to different thresholds. In particular I have a function my_index whose output must be compared to the thresholds Z_1, Z_2, Z_3 in order to determine the increment to the variable my_value. In the following MWE, for simplicity sake, the function my_index is just a uniform random generator:
import numpy as np
my_len = 100000
Z_1 = 0.2
Z_2 = 0.4
Z_3 = 0.7
first = 1
second = 2
third = -0.0003
my_value = 0
for i in range(my_len):
my_index = np.random.uniform()
my_value += first*np.heaviside(my_index - Z_1,0)*np.heaviside(Z_2 - my_index,0) + second*np.heaviside(my_index - Z_3,0) + third*np.heaviside(Z_3 - my_index,0)
#if Z_1 < my_index < Z_2 add first
#if my_index > Z_3 add second
#if my_index < Z_3 add third
I have replaced if/else's that could have been used for the thresholds with the Heaviside function see. Keep in mind that, in my original code, this code section has to be iterated up to 10^5 times.
My question is: does this practice make the code faster? Or is the heaviside function (np.heaviside) call better in terms of speed than the if/else control?

In [433]: x=np.arange(-10,10)
In [434]: x
Out[434]:
array([-10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2,
3, 4, 5, 6, 7, 8, 9])
A proper use of heaviside - giving x as array, not a single value:
In [436]: np.heaviside(x,.5)
Out[436]:
array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.5, 1. , 1. ,
1. , 1. , 1. , 1. , 1. , 1. , 1. ])
A list comprehension equivalent:
In [437]: [.5 if i==0 else (0 if i<0 else 1) for i in x]
Out[437]: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 1, 1, 1, 1, 1, 1, 1, 1]
and making an array from that list:
In [438]: np.array([.5 if i==0 else (0 if i<0 else 1) for i in x])
Out[438]:
array([0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.5, 1. , 1. ,
1. , 1. , 1. , 1. , 1. , 1. , 1. ])
Compare the times:
In [439]: timeit np.heaviside(x,.5)
2.5 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [440]: timeit np.array([.5 if i==0 else (0 if i<0 else 1) for i in x])
15.1 µs ± 25.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Iteration on a list is faster (than on an array):
In [441]: timeit np.array([.5 if i==0 else (0 if i<0 else 1) for i in x.tolist()])
6.66 µs ± 195 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and if we skip the conversion back to a list:
In [442]: timeit [.5 if i==0 else (0 if i<0 else 1) for i in x.tolist()]
2.28 µs ± 3.01 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
For a much larger array, heaviside performance is even better:
In [445]: x=np.arange(-1000,1000)
In [446]: timeit [.5 if i==0 else (0 if i<0 else 1) for i in x.tolist()]
211 µs ± 7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [447]: timeit np.heaviside(x,.5)
13 µs ± 201 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
For the random number generation, taking the whole-array approach is also faster:
In [448]: timeit [np.random.uniform() for _ in range(1000)]
4.62 ms ± 20.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [449]: timeit np.random.uniform(1000)
4.74 µs ± 171 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I could also time the scalar use of heaviside - that is worse than the if/else in [446]:
In [450]: timeit [np.heaviside(i,.5) for i in x]
8.64 ms ± 44.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In sum:
use whole-array code where possible
when using Python level iteration, use lists and scalar methods instead

Assuming you use the standard CPython interpreter, then performing a Numpy function call like np.heaviside is likely more expensive than doing basic conditionals. However, both are very inefficient. Indeed, conditionals are generally slow and could be replaced with a branchless implementation here (adding/multiplying booleans converted to integers). The most important optimization is to use vectorization because Numpy is design to be efficient on relatively big arrays and not scalar values (mainly due to additional internal checks and function calls). You can generate all the random value in a big array, apply the heaviside function on it multiple times. The resulting code will certainly be 2 or 3 order of magnitude faster!

Related

How to make a calculation in a pandas daframe depending on a value of a certain column

I have this dataframe and I want to make a calculation depending on a condition, like below:
count prep result
0 10 100
10 100 100
I wanto to create a new column evaluated that is:
if df['count']==0:
df['evaluated'] = df['result'] / df['prep']
else:
df['evaluated'] = df['result'] / df['count']
expected result is:
count prep result evaluated
0 10 100 10
100 10 100 1
What's the best way to do it? My real dataframe has 30k rows.
You can use where or mask:
df['evaluated'] = df['result'].div(df['prep'].where(df['count'].eq(0), df['count']))
Or:
df['evaluated'] = df['result'].div(df['count'].mask(df['count'].eq(0), df['prep']))
Output (assuming there was an error in the provided input):
count prep result evaluated
0 0 10 100 10.0
1 100 10 100 1.0
You can also use np.where from numpy to do that:
df['evaluated'] = np.where(df['count'] == 0,
df['result'] / df['prep'], # == 0
df['result'] / df['count']) # != 0
Performance (not really significant) over 30k rows:
>>> %timeit df['result'].div(df['prep'].where(df['count'].eq(0), df['count']))
652 µs ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df['result'].div(df['count'].mask(df['count'].eq(0), df['prep']))
638 µs ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit np.where(df['count'] == 0, df['result'] / df['prep'], df['result'] / df['count'])
462 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The most efficient way to search every element of a list in a dataframe

I have a over 1M dataset like d. I need to find indexes of a dataframe like seekingframe which is over 1500 element in that dataset.
import pandas as pd
d=pd.DataFrame([225,230,235,240,245,250,255,260,265,270,275,280,285,290,295,300,305,310,315,320])
seekingframe=pd.DataFrame([275,280,285,290,295,300,305,310,315,320,325,330,335,340,345,350,355,180,255,260])
I need to find every element of seekingframe in d as fast as possible. I mean, i need a final array like:
array([ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, -1, -1, -1, -1, -1, -1, -1, -1, 6, 7])
or the difference array like
[11, 12, 13, 14, 15, 16, 17, 18]
or sth that denoting the similarities or differences. Actually, if it is possible, i would rather to drop that different sets.
It's likely faster to use numpy. On these small unique arrays, numpy was more than 100x faster than pandas .isin() without passing assume_unique=True to the numpy function that finds the intersection of two arrays ( np.in1d ) and returns True or False.
It was 300x faster if you did pass assume_unique=True:
#finding similar
%timeit d[d[0].isin(seekingframe[0])].index
404 µs ± 6.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#finding difference
%timeit seekingframe[~seekingframe[0].isin(d[0])].index
458 µs ± 2.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# finding similar with numpy arrays and NOT passing `assume_unique=True`
a = d[0].to_numpy()
b = seekingframe[0].to_numpy()
%timeit np.arange(a.shape[0])[np.in1d(a, b)]
35.4 µs ± 779 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# finding similar with numpy arrays and passing `assume_unique=True`
a = d[0].to_numpy()
b = seekingframe[0].to_numpy()
%timeit np.arange(a.shape[0])[np.in1d(a, b, assume_unique=True)]
12 µs ± 337 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

How can I reduce Execution time of Python code

In this this code I'm calculating difference between squares of n numbers and the square of the sum of n numbers.
Example : n=3, (1+2+3)^2 -(1^2+2^2+3^2) =22
def sum_square_diff(num):
sum1=0
sum2=0
for i in range(1,num+1):
sum1 +=i**2
sum2 +=i
sum2=sum2**2
diff=sum2-sum1
return diff
if __name__=="__main__":
n=int(input())
for i in range(n):
num=int(input())
result=sum_square_diff(num)
print(result)
This code is correct but it takes too much time to complete execution.
In the first place, the formula that you want to compute has a closed-form representation. There is no need for any loops:
n*n*(n+1)*(n+1)/4 - n*(n+1)*(2*n+1)/6
But if you insist, you can get >3x speedup by using numpy instead of raw Python:
def sum_square_diff1(num):
x = np.arange(1,num+1)
return x.sum()**2-(x**2).sum()
In [7]: %timeit sum_square_diff(100)
19.6 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [8]: %timeit sum_square_diff1(100)
5.61 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

What's the most concise way to iterate over a list by pairs in Python?

I've got the following bruteforce option that allows me to iterate over points:
# [x1, y1, x2, y2, ..., xn, yn]
coords = [1, 1, 2, 2, 3, 3]
# The goal is to operate with (x, y) within for loop
for (x, y) in zip(coords[::2], coords[1::2]):
# do something with (x, y) as a point
Is there a more concise / efficient way to do it?
(coords -> items)
Short Answer
If you want your items grouped with a specific length of 2, then
zip(items[::2], items[1::2])
is one of the best compromise in terms of speed and clarity.
If you can afford an extra line, you can get a bit (lot -- for larger inputs) more efficient by using iterators:
it = iter(items)
zip(it, it)
Long Answer
(EDIT: added a method that avoids zip())
You could achieve this in a number of ways.
For convenience, I write those as functions that can be benchmarked.
Also I will leave the size of the group as a parameter n (which, in your case, is 2)
def grouping1(items, n=2):
return zip(*tuple(items[i::n] for i in range(n)))
def grouping2(items, n=2):
return zip(*tuple(itertools.islice(items, i, None, n) for i in range(n)))
def grouping3(items, n=2):
for j in range(len(items) // n):
yield items[j:j + n]
def grouping4(items, n=2):
return zip(*([iter(items)] * n))
def grouping5(items, n=2):
it = iter(items)
while True:
result = []
for _ in range(n):
try:
tmp = next(it)
except StopIteration:
break
else:
result.append(tmp)
if len(result) == n:
yield result
else:
break
Benchmarking these with a relatively short list gives:
short = list(range(10))
%timeit [x for x in grouping1(short)]
# 1.33 µs ± 9.82 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [x for x in grouping2(short)]
# 1.51 µs ± 16.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [x for x in grouping3(short)]
# 1.14 µs ± 28.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [x for x in grouping4(short)]
# 639 ns ± 7.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [x for x in grouping5(short)]
# 3.37 µs ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
For medium sized inputs:
medium = list(range(1000))
%timeit [x for x in grouping1(medium)]
# 21.9 µs ± 466 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [x for x in grouping2(medium)]
# 25.2 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [x for x in grouping3(medium)]
# 65.6 µs ± 233 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [x for x in grouping4(medium)]
# 18.3 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit [x for x in grouping5(medium)]
# 257 µs ± 2.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
For larger inputs:
large = list(range(1000000))
%timeit [x for x in grouping1(large)]
# 49.7 ms ± 840 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [x for x in grouping2(large)]
# 37.5 ms ± 42.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [x for x in grouping3(large)]
# 84.4 ms ± 736 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [x for x in grouping4(large)]
# 31.6 ms ± 85.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [x for x in grouping5(large)]
# 274 ms ± 2.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
As far as efficiency, grouping4() seems to be the fastest, closely followed by grouping1() or grouping3() (depending on the size of the input).
In your case, grouping1() seems a good compromise between speed and clearness, unless you are willing to wrap it up in a function.
Note that grouping4() requires you to use the same iterator multiple times and:
zip(iter(items), iter(items))
would NOT work.
If you want more control over uneven grouping i.e. when the len(items) is not divisible by n, you could replace zip with itertools.zip_longest() from the standard library.
Note also that grouping4() is substantially the grouper() recipe from the itertools official documentation.
You can use iter(object) and next(iterator, default) with a known default to leave your loop:
coords = [1, 1, 2, 2, 3, 3]
it = iter(coords)
while it:
x = next(it, None)
y = next(it, None)
if x is None or y is None:
break
# do something with your pairs
print(x,y)
Output:
1 1
2 2
3 3

How to perform sum pooling in PyTorch

How to perform sum pooling in PyTorch. Specifically, if we have input (N, C, W_in, H_in) and want output (N, C, W_out, H_out) using a particular kernel_size and stride just like nn.Maxpool2d ?
You could use torch.nn.AvgPool1d (or torch.nn.AvgPool2d, torch.nn.AvgPool3d) which are performing mean pooling - proportional to sum pooling. If you really want the summed values, you could multiply the averaged output by the pooling surface.
https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html#torch.nn.AvgPool2d find divisor_override.
set divisor_override=1
you'll get a sumpool
import torch
input = torch.tensor([[[1,2,3],[3,2,1],[3,4,5]]])
sumpool = torch.nn.AvgPool2d(2, stride=1, divisor_override=1)
sumpool(input)
you'll get
tensor([[[ 8, 8],
[12, 12]]])
To expand on benjaminplanche's answer:
I need sum pooling as well and it doesn't seem to directly exist, but it is equivalent to running a conv2d with a weights parameter made of ones. I thought it would be faster to run AvgPool2d and multiply by the kernel size product. Turns out, not exactly.
Bottom line up front:
Use torch.nn.functional.avg_pool2d and its related functions and multiply by the kernel size.
Testing in Jupyter I find:
(Overhead)
%%timeit
x = torch.rand([1,1,1000,1000])
>>> 3.49 ms ± 4.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
_=F.avg_pool2d(torch.rand([1,1,1000,1000]), [10,10])*10*10
>>> 4.99 ms ± 74.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(So 1.50 ms ± 79.0 µs) (I found the *10*10 only adds around 20 µs to the graph)
avePool = nn.AvgPool2d([10, 10], 1, 0)
%%timeit
_=avePool(torch.rand([1,1,1000,1000]))*10*10
>>> 80.9 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
(So 77.4 ms ± 1.58 ms)
y = torch.ones([1,1,10,10])
%%timeit
_=F.conv2d(torch.rand([1,1,1000,1000]), y)
>>> 14.4 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(So 10.9 ms ± 426 µs)
sumPool = nn.Conv2d(1, 1, 10, 1, 0, 1, 1, False)
sumPool.weight = torch.nn.Parameter(y)
%%timeit
_=sumPool(torch.rand([1,1,1000,1000]))
>>> 7.24 ms ± 63.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
(So 3.75 ms ± 68.3 µs)
And as a sanity check.
abs_err = torch.max(torch.abs(avePool(x)*10*10 - sumPool(x)))
magnitude = torch.max(torch.max(avePool(x)*10*10, torch.max(sumPool(x))))
relative_err = abs_err/magnitude
abs_err.item(), magnitude.item(), relative_err.item()
>>> (3.814697265625e-06, 62.89910125732422, 6.064788493631568e-08)
That's probably a reasonable rounding related error.
I do not know why the functional version is faster than making a dedicated kernel, but it looks like if you want to make a dedicated kernel, prefer the Conv2D version, and make the weights untrainable with sumPool.weights.requires_grad = False or with torch.no_grad(): during creation of the kernel parameters. These results may change with kernel size, so test for your own application if you need to speed up this part. Let me know if I missed something...

Resources