Pandas speedup when working with transposed numpy matrix - python-3.x

I was trying to figure which is faster to standardize date between numpy and pandas and using the whole matrix/DataFrame or column by column and I've found this strange behavior showed in the code below
import pandas as pd
import numpy as np
def stand(df):
res = pd.DataFrame()
for col in df:
res[col] = (df[col] - df[col].min()) / df[col].max()
return res
matrix = pd.DataFrame(np.random.randint(0,174000,size=(1000000, 100)))
matrix.shape
(1000000, 100)
%timeit res = (matrix - matrix.min(axis=0))/ matrix.max(axis=0)
2.64 s ± 22.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit stand(matrix)
5.32 s ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
But when starting from a "flipped" numpy matrix and transposing it to create the DataFrame
matrix = pd.DataFrame(np.random.randint(0,174000,size=(100, 1000000)).T)
matrix.shape
(1000000, 100)
%timeit res = (matrix - matrix.min(axis=0))/ matrix.max(axis=0)
2.37 s ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit stand(matrix)
1.2 s ± 8.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The execution of the standardization column by column get ~4 times faster.
This behavior remains also using .values or numpy operations as showed below:
%timeit res = (matrix.values - matrix.min(axis=0).values)/ matrix.max(axis=0).values
2.58 s ± 417 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit stand(matrix)
5.26 s ± 42.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit res = np.divide(np.subtract(matrix.values, matrix.min(axis=0).values), matrix.max(axis=0).values)
2.17 s ± 7.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Flipped matrix transpose
matrix = pd.DataFrame(np.random.randint(0,174000,size=(100, 1000000)).T)
matrix.shape
(1000000, 100)
%timeit res = (matrix.values - matrix.min(axis=0).values)/ matrix.max(axis=0).values
2.2 s ± 8.82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit stand(matrix)
1.33 s ± 190 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit res = np.divide(np.subtract(matrix.values, matrix.min(axis=0).values), matrix.max(axis=0).values)
2.46 s ± 166 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Can someone explain why starting from a reversed matrix and then transposing it before to create the DataFrame changes the performance w.r.t. starting from a non-reversed matrix?

Related

Python: Opposite number performance comparison

Why is
def opposite(number):
number - number*2
returning a faster result than
def opposite(number):
return -number
in python?
time by method
Here you can see the difference of performance of the two methods
def opposite(number):
number - number*2
def opposite2(number):
return -number
%timeit opposite(5)
84.3 ns ± 2.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit opposite2(5)
66.5 ns ± 6.88 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

How can I reduce Execution time of Python code

In this this code I'm calculating difference between squares of n numbers and the square of the sum of n numbers.
Example : n=3, (1+2+3)^2 -(1^2+2^2+3^2) =22
def sum_square_diff(num):
sum1=0
sum2=0
for i in range(1,num+1):
sum1 +=i**2
sum2 +=i
sum2=sum2**2
diff=sum2-sum1
return diff
if __name__=="__main__":
n=int(input())
for i in range(n):
num=int(input())
result=sum_square_diff(num)
print(result)
This code is correct but it takes too much time to complete execution.
In the first place, the formula that you want to compute has a closed-form representation. There is no need for any loops:
n*n*(n+1)*(n+1)/4 - n*(n+1)*(2*n+1)/6
But if you insist, you can get >3x speedup by using numpy instead of raw Python:
def sum_square_diff1(num):
x = np.arange(1,num+1)
return x.sum()**2-(x**2).sum()
In [7]: %timeit sum_square_diff(100)
19.6 µs ± 435 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [8]: %timeit sum_square_diff1(100)
5.61 µs ± 26.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

cupy indexing is slow

I am trying to perform operations on a large cupy array of size 16000. I find mathematical operations such as addition to be quite fast, but indexing using boolean masks to be relatively slow. For example, the following code:
import cupy as cp
arr = cp.random.normal(0, 1, 16000)
%timeit arr * 5
%timeit arr > 0.4
%timeit arr[arr > 0.4] = 0
gives me the output:
28 µs ± 950 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
26.5 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
104 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Any reason why the final indexing is at least twice as slow? I assumed that multiplication should be slower than setting array elements.
Update: This is not true for numpy indexing. Changing the cupy array to numpy, I get:
6.71 µs ± 373 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.42 µs ± 56.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
5.39 µs ± 29.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In the 3rd case, cupy is composing the result via a sequence of operations: cupy_greater, cupy_copy, inclusive_scan_kernel, inclusive_scan_kernel, add_scan_blocked_sum_kernel, CUDA memcpy DtoH (perhaps to provide the number of elements that need to be set to zero), CUDA memset (perhaps to set an array to zero), and finally cupy_scatter_update_mask (to scatter the zeros to their correct locations, perhaps).
This is a considerably more complex sequence than arr*5, which seems to run a single cupy_multiply under the hood. You can probably do better with a cupy user-defined kernel:
import cupy as cp
clamp_generic = cp.ElementwiseKernel(
'T x, T c',
'T y',
'y = (y > x)?c:y',
'clamp_generic')
arr = cp.random.normal(0, 1, 16000)
clamp_generic(0.4, 0, arr)

parsing a panda dataframe column from a dictionary data form into new columns for each dictionary key

In python 3, pandas. Imagine there is a dataframe df with a column x
df=pd.DataFrame(
[
{'x':'{"a":"1","b":"2","c":"3"}'},
{'x':'{"a":"2","b":"3","c":"4"}'}
]
)
The column x has data which looks like a dictionary. Wonder how can I parse them into a new dataframe, so each key here becomes a new column?
The desired output dataframe is like
x,a,b,c
'{"a":"1","b":"2","c":"3"}',1,2,3
'{"a":"2","b":"3","c":"4"}',2,3,4
None of the solution in this post seems to work in this case
parsing a dictionary in a pandas dataframe cell into new row cells (new columns)
df1=pd.DataFrame(df.loc[:,'x'].values.tolist())
print(df1)
result the same dataframe. didn't separate the column into each key per column
Any 2 cents?
Thanks!
You can also map json.loads and convert to a dataframe like;
import json
df1 = pd.DataFrame(df['x'].map(json.loads).tolist(),index=df.index)
print(df1)
a b c
0 1 2 3
1 2 3 4
this tests to be faster than evaluating via ast , below is the benchmark for 40K rows:
m = pd.concat([df]*20000,ignore_index=True)
%%timeit
import json
df1 = pd.DataFrame(m['x'].map(json.loads).tolist(),index=m.index)
#256 ms ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
import ast
df1 = pd.DataFrame(m['x'].map(ast.literal_eval).tolist(),index=m.index)
#1.32 s ± 136 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
import ast
df1 = pd.DataFrame(m['x'].apply(ast.literal_eval).tolist(),index=m.index)
#1.34 s ± 71.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Because string repr of dictionaries is necessary convert values to dictionaries:
import ast, json
#performance for repeated sample data, in real data should be different
m = pd.concat([df]*20000,ignore_index=True)
In [98]: %timeit pd.DataFrame([json.loads(x) for x in m['x']], index=m.index)
206 ms ± 1.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#anky_91 solution
In [99]: %timeit pd.DataFrame(m['x'].map(json.loads).tolist(),index=m.index)
210 ms ± 11.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [100]: %timeit pd.DataFrame(m['x'].map(ast.literal_eval).tolist(),index=m.index)
903 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [101]: %timeit pd.DataFrame(m['x'].apply(ast.literal_eval).tolist(),index=m.index)
893 ms ± 2.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
print(df1)
a b c
0 1 2 3
1 2 3 4
Last for append to original:
df = df.join(df1)
print(df)
x a b c
0 {"a":"1","b":"2","c":"3"} 1 2 3
1 {"a":"2","b":"3","c":"4"} 2 3 4

What's the most concise way to iterate over a list by pairs in Python?

I've got the following bruteforce option that allows me to iterate over points:
# [x1, y1, x2, y2, ..., xn, yn]
coords = [1, 1, 2, 2, 3, 3]
# The goal is to operate with (x, y) within for loop
for (x, y) in zip(coords[::2], coords[1::2]):
# do something with (x, y) as a point
Is there a more concise / efficient way to do it?
(coords -> items)
Short Answer
If you want your items grouped with a specific length of 2, then
zip(items[::2], items[1::2])
is one of the best compromise in terms of speed and clarity.
If you can afford an extra line, you can get a bit (lot -- for larger inputs) more efficient by using iterators:
it = iter(items)
zip(it, it)
Long Answer
(EDIT: added a method that avoids zip())
You could achieve this in a number of ways.
For convenience, I write those as functions that can be benchmarked.
Also I will leave the size of the group as a parameter n (which, in your case, is 2)
def grouping1(items, n=2):
return zip(*tuple(items[i::n] for i in range(n)))
def grouping2(items, n=2):
return zip(*tuple(itertools.islice(items, i, None, n) for i in range(n)))
def grouping3(items, n=2):
for j in range(len(items) // n):
yield items[j:j + n]
def grouping4(items, n=2):
return zip(*([iter(items)] * n))
def grouping5(items, n=2):
it = iter(items)
while True:
result = []
for _ in range(n):
try:
tmp = next(it)
except StopIteration:
break
else:
result.append(tmp)
if len(result) == n:
yield result
else:
break
Benchmarking these with a relatively short list gives:
short = list(range(10))
%timeit [x for x in grouping1(short)]
# 1.33 µs ± 9.82 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [x for x in grouping2(short)]
# 1.51 µs ± 16.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [x for x in grouping3(short)]
# 1.14 µs ± 28.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [x for x in grouping4(short)]
# 639 ns ± 7.56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [x for x in grouping5(short)]
# 3.37 µs ± 16.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
For medium sized inputs:
medium = list(range(1000))
%timeit [x for x in grouping1(medium)]
# 21.9 µs ± 466 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [x for x in grouping2(medium)]
# 25.2 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [x for x in grouping3(medium)]
# 65.6 µs ± 233 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [x for x in grouping4(medium)]
# 18.3 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit [x for x in grouping5(medium)]
# 257 µs ± 2.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
For larger inputs:
large = list(range(1000000))
%timeit [x for x in grouping1(large)]
# 49.7 ms ± 840 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [x for x in grouping2(large)]
# 37.5 ms ± 42.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [x for x in grouping3(large)]
# 84.4 ms ± 736 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [x for x in grouping4(large)]
# 31.6 ms ± 85.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [x for x in grouping5(large)]
# 274 ms ± 2.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
As far as efficiency, grouping4() seems to be the fastest, closely followed by grouping1() or grouping3() (depending on the size of the input).
In your case, grouping1() seems a good compromise between speed and clearness, unless you are willing to wrap it up in a function.
Note that grouping4() requires you to use the same iterator multiple times and:
zip(iter(items), iter(items))
would NOT work.
If you want more control over uneven grouping i.e. when the len(items) is not divisible by n, you could replace zip with itertools.zip_longest() from the standard library.
Note also that grouping4() is substantially the grouper() recipe from the itertools official documentation.
You can use iter(object) and next(iterator, default) with a known default to leave your loop:
coords = [1, 1, 2, 2, 3, 3]
it = iter(coords)
while it:
x = next(it, None)
y = next(it, None)
if x is None or y is None:
break
# do something with your pairs
print(x,y)
Output:
1 1
2 2
3 3

Resources