I try to subtract the mean of each row of a matrix in numpy using broadcasting but I get an error. Any idea why?
Here is the code:
from numpy import *
X = random.rand(5, 10)
Y = X - X.mean(axis = 1)
Error:
ValueError: operands could not be broadcast together with shapes (5,10) (5,)
Thanks!
The mean method is a reduction operation, meaning it converts a 1-d collection of numbers to a single number. When you apply a reduction to an n-dimensional array along an axis, numpy collapses that dimension to the reduced value, resulting in an (n-1)-dimensional array. In your case, since X has shape (5, 10), and you performed a reduction along axis 1, you end up with an array with shape (5,):
In [8]: m = X.mean(axis=1)
In [9]: m.shape
Out[9]: (5,)
When you try to subtract this result from X, you are trying to subtract an array with shape (5,) from an array with shape (5, 10). These shapes are not compatible for broadcasting. (Take a look at the description of broadcasting in the User Guide.)
For broadcasting to work the way you want, the result of the mean operation should be an array with shape (5, 1) (to be compatible with the shape (5, 10)). In recent versions of numpy, the reduction operations, including mean, have an argument called keepdims that tells the function to not collapse the reduced dimension. Instead, a trivial dimension with length 1 is kept:
In [10]: m = X.mean(axis=1, keepdims=True)
In [11]: m.shape
Out[11]: (5, 1)
With older versions of numpy, you can use reshape to restore the collapsed dimension:
In [12]: m = X.mean(axis=1).reshape(-1, 1)
In [13]: m.shape
Out[13]: (5, 1)
So, depending on your version of numpy, you can do this:
Y = X - X.mean(axis=1, keepdims=True)
or this:
Y = X - X.mean(axis=1).reshape(-1, 1)
If you are looking for performance, you can also consider using np.einsum that is supposedly faster than actually using np.sum or np.mean. Thus, the desired output could be obtained like so -
X - np.einsum('ij->i',X)[:,None]/X.shape[1]
Please note that the [:,None] part is similar to keepdims to keep the dimensions of it same as that of the input array. This could also be used in broadcasting.
Runtime tests
1) Comparing just the mean calculation -
In [47]: X = np.random.rand(500, 1000)
In [48]: %timeit X.mean(axis=1, keepdims=True)
1000 loops, best of 3: 1.5 ms per loop
In [49]: %timeit X.mean(axis=1).reshape(-1, 1)
1000 loops, best of 3: 1.52 ms per loop
In [50]: %timeit np.einsum('ij->i',X)[:,None]/X.shape[1]
1000 loops, best of 3: 832 µs per loop
2) Comparing entire calculation -
In [52]: X = np.random.rand(500, 1000)
In [53]: %timeit X - X.mean(axis=1, keepdims=True)
100 loops, best of 3: 6.56 ms per loop
In [54]: %timeit X - X.mean(axis=1).reshape(-1, 1)
100 loops, best of 3: 6.54 ms per loop
In [55]: %timeit X - np.einsum('ij->i',X)[:,None]/X.shape[1]
100 loops, best of 3: 6.18 ms per loop
Related
I am coding PyTorch. Between the torch inference code, I add some peripheral code for my own interest. This code works fine, but it is too slow. The reason might be for iteration. So, i need parallel and fast way of doing this.
It is okay to do this in tensor, Numpy, or just python array.
I made a function named selective_max to find maximum value in arrays. But the problem is that I don't want a maximum among the whole arrays, but among specific candidates which is designated by mask array. Let me show the gist of this function (below shows the code itself)
Input
x [batch_size , dim, num_points, k] : x is a original input, but this becomes [batch_size, num_points, dim, k] by x.permute(0,2,1,3).
batch_size is a well-known definition in the deep learning society. In every mini batch, there are many points. And a single point is represented by dim length feature. For each feature element, there are k potential candidates which is target of max function later.
mask [batch_size, num_points, k] : This array is similar to x without dim. Its element is either 0 or 1. So, I use this as a mask signal, like do max operation only on 1 masked value.
Kindly see the code below with this explanation. I use 3 for iteration. Let's say we target a specific batch and a specific point. For a specific batch and a specific point, x has [dim, k] array. And mask has [k] array which consists of either 0 or 1. So, I extract the non-zero index from [k] array and use this for extracting specific elements in x dim by dim('for k in range(dim)').
Toy example
Let's say we are in the second for iteration. So, we now have [dim, k] for x and [k] for mask. For this toy example, i presume k=3 and dim=4. x = [[3,2,1],[5,6,4],[9,8,7],[12,11,10]], k=[0,1,1]. So, output would be [2,6,8,11], not [3, 6, 9, 12].
Previous attempt
I try { mask.repeat(0,0,1,0) *(element-wise mul) x } and do the max operation. But, '0' might the max value, because the x might have minus values in all array. So, this would result in wrong operation.
def selective_max2(x, mask): # x : [batch_size , dim, num_points, k] , mask : [batch_size, num_points, k]
batch_size = x.size(0)
dim = x.size(1)
num_points = x.size(2)
k = x.size(3)
device = torch.device('cuda')
x = x.permute(0,2,1,3) # : [batch, num_points, dim, k]
#print('permuted x dimension : ',x.size())
x = x.detach().cpu().numpy()
mask = mask.cpu().numpy()
output = np.zeros((batch_size,num_points,dim))
for i in range(batch_size):
for j in range(num_points):
query=np.nonzero(mask[i][j]) # among mask entries, we get the index of nonzero values.
for k in range(dim): # for different k values, we get the max value.
# query is index of nonzero values. so, using query, we can get the values that we want.
output[i][j][k] = np.max(x[i][j][k][query])
output = torch.from_numpy(output).float().to(device=device)
output = output.permute(0,2,1).contiguous()
return output
Disclaimer: I've followed your toy example (however while retaining generality) to write the following solution.
The first thing is to expand your k as x (treating them both as PyTorch tensors):
k_expanded = k.expand_as(x)
Then you select the elements where your 1's exist in the k_expanded, and view the resulting tensor as x number of rows (written as x.shape[0]), and number of 1's in k (or the mask) as the number of columns. Up to this point, we have selected the range we want to query the maximum element for. Then, you find the maximum along the rows dimension (showed in .sum(0)) using max(1)
values, indices = x[k_expanded == 1].view(x.shape[0], (k == 1).sum(0)).max(1)
values
Out[29]: tensor([ 2, 6, 8, 11])
Benchmarks
def find_max_elements_inside_tensor_range(arr, mask, return_indices=False):
mask_expanded = mask.expand_as(arr)
values, indices = x[k_expanded==1].view(x.shape[0], (k == 1).sum(0)).max(1)
return (values, indices) if return_indices else values
Just added a third parameter in case you want to get the numbers indices
%timeit find_max_elements_inside_tensor_range(x, k)
38.4 µs ± 534 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note: the above solution also works for tensors and masks of various shapes.
In python, Suppose I have a 1D array C (c dimensions), and I want to construct a 4D matrix of dimension a x b x c x d, such that the array is duplicated along all other axes.
I.e. no matter what the dimension 1, 2 and 4 indexes are, array[i][j][k][l] = C[k]
Is there any numpy function to do that? Thanks!
For an array ar, you could use np.broadcast_to, to get that higher dim array as a view (hence virtually zero runtime and no memory overhead), like so -
np.broadcast_to(ar[None,None,:,None],(a,b,len(ar),d))
Sample run -
In [115]: ar = np.random.rand(10)
In [116]: a,b,d = 3,4,5
In [117]: np.broadcast_to(ar[None,None,:,None],(a,b,len(ar),d)).shape
Out[117]: (3, 4, 10, 5)
If you need output with its own memory space, append with .copy().
Leading newaxes(None) are optional. Hence, alternatively -
In [121]: np.broadcast_to(ar[:,None],(a,b,len(ar),d)).shape
Out[121]: (3, 4, 10, 5)
I have a bumpy array. I want to find the number of points which lies within an epsilon distance from each point.
My current code is (for a n*2 array, but in general I expect the array to be n * m)
epsilon = np.array([0.5, 0.5])
np.array([ 1/np.float(np.sum(np.all(np.abs(X-x) <= epsilon, axis=1))) for x in X])
But this code might not be efficient when it comes to an array of let us say 1 million rows and 50 columns. Is there a better and more efficient method ?
For example data
X = np.random.rand(10, 2)
you can solve this using broadcasting:
1 / np.sum(np.all(np.abs(X[:, None, ...] - X[None, ...]) <= epsilon, axis=-1), axis=-1)
I have a set of points and would like to know if there is a function (for the sake of convenience and probably speed) that can calculate the area enclosed by a set of points.
for example:
x = np.arange(0,1,0.001)
y = np.sqrt(1-x**2)
points = zip(x,y)
given points the area should be approximately equal to (pi-2)/4. Maybe there is something from scipy, matplotlib, numpy, shapely, etc. to do this? I won't be encountering any negative values for either the x or y coordinates... and they will be polygons without any defined function.
EDIT:
points will most likely not be in any specified order (clockwise or counterclockwise) and may be quite complex as they are a set of utm coordinates from a shapefile under a set of boundaries
Implementation of Shoelace formula could be done in Numpy. Assuming these vertices:
import numpy as np
x = np.arange(0,1,0.001)
y = np.sqrt(1-x**2)
We can redefine the function in numpy to find the area:
def PolyArea(x,y):
return 0.5*np.abs(np.dot(x,np.roll(y,1))-np.dot(y,np.roll(x,1)))
And getting results:
print PolyArea(x,y)
# 0.26353377782163534
Avoiding for loop makes this function ~50X faster than PolygonArea:
%timeit PolyArea(x,y)
# 10000 loops, best of 3: 42 µs per loop
%timeit PolygonArea(zip(x,y))
# 100 loops, best of 3: 2.09 ms per loop.
Timing is done in Jupyter notebook.
The most optimized solution that covers all possible cases, would be to use a geometry package, like shapely, scikit-geometry or pygeos. All of them use C++ geometry packages under the hood. The first one is easy to install via pip:
pip install shapely
and simple to use:
from shapely.geometry import Polygon
pgon = Polygon(zip(x, y)) # Assuming the OP's x,y coordinates
print(pgon.area)
To build it from scratch or understand how the underlying algorithm works, check the shoelace formula:
# e.g. corners = [(2.0, 1.0), (4.0, 5.0), (7.0, 8.0)]
def Area(corners):
n = len(corners) # of corners
area = 0.0
for i in range(n):
j = (i + 1) % n
area += corners[i][0] * corners[j][1]
area -= corners[j][0] * corners[i][1]
area = abs(area) / 2.0
return area
Since this works for simple polygons:
If you have a polygon with holes : Calculate the area of the outer ring and subtrack the areas of the inner rings
If you have self-intersecting rings : You have to decompose them into simple sectors
By analysis of Mahdi's answer, I concluded that the majority of time was spent doing np.roll(). By removing the need of the roll, and still using numpy, I got the execution time down to 4-5µs per loop compared to Mahdi's 41µs (for comparison Mahdi's function took an average of 37µs on my machine).
def polygon_area(x,y):
correction = x[-1] * y[0] - y[-1]* x[0]
main_area = np.dot(x[:-1], y[1:]) - np.dot(y[:-1], x[1:])
return 0.5*np.abs(main_area + correction)
By calculating the correctional term, and then slicing the arrays, there is no need to roll or create a new array.
Benchmarks:
10000 iterations
PolyArea(x,y): 37.075µs per loop
polygon_area(x,y): 4.665µs per loop
Timing was done using the time module and time.clock()
maxb's answer gives good performance but can easily lead to loss of precision when coordinate values or the number of points are large. This can be mitigated with a simple coordinate shift:
def polygon_area(x,y):
# coordinate shift
x_ = x - x.mean()
y_ = y - y.mean()
# everything else is the same as maxb's code
correction = x_[-1] * y_[0] - y_[-1]* x_[0]
main_area = np.dot(x_[:-1], y_[1:]) - np.dot(y_[:-1], x_[1:])
return 0.5*np.abs(main_area + correction)
For example, a common geographic reference system is UTM, which might have (x,y) coordinates of (488685.984, 7133035.984). The product of those two values is 3485814708748.448. You can see that this single product is already at the edge of precision (it has the same number of decimal places as the inputs). Adding just a few of these products, let alone thousands, will result in loss of precision.
A simple way to mitigate this is to shift the polygon from large positive coordinates to something closer to (0,0), for example by subtracting the centroid as in the code above. This helps in two ways:
It eliminates a factor of x.mean() * y.mean() from each product
It produces a mix of positive and negative values within each dot product, which will largely cancel.
The coordinate shift does not alter the total area, it just makes the calculation more numerically stable.
It's faster to use shapely.geometry.Polygon rather than to calculate yourself.
from shapely.geometry import Polygon
import numpy as np
def PolyArea(x,y):
return 0.5*np.abs(np.dot(x,np.roll(y,1))-np.dot(y,np.roll(x,1)))
coords = np.random.rand(6, 2)
x, y = coords[:, 0], coords[:, 1]
With those codes, and do %timeit:
%timeit PolyArea(x,y)
46.4 µs ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit Polygon(coords).area
20.2 µs ± 414 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
cv2.contourArea() in OpenCV gives an alternative method.
example:
points = np.array([[0,0],[10,0],[10,10],[0,10]])
area = cv2.contourArea(points)
print(area) # 100.0
The argument (points, in the above example) is a numpy array with dtype int, representing the vertices of a polygon: [[x1,y1],[x2,y2], ...]
There's an error in the code above as it doesn't take absolute values on each iteration. The above code will always return zero. (Mathematically, it's the difference between taking signed area or wedge product and the actual area
http://en.wikipedia.org/wiki/Exterior_algebra.) Here's some alternate code.
def area(vertices):
n = len(vertices) # of corners
a = 0.0
for i in range(n):
j = (i + 1) % n
a += abs(vertices[i][0] * vertices[j][1]-vertices[j][0] * vertices[i][1])
result = a / 2.0
return result
a bit late here, but have you considered simply using sympy?
a simple code is :
from sympy import Polygon
a = Polygon((0, 0), (2, 0), (2, 2), (0, 2)).area
print(a)
I compared every solutions offered here to Shapely's area method result, they had the right integer part but the decimal numbers differed. Only #Trenton's solution provided the the correct result.
Now improving on #Trenton's answer to process coordinates as a list of tuples, I came up with the following:
import numpy as np
def polygon_area(coords):
# get x and y in vectors
x = [point[0] for point in coords]
y = [point[1] for point in coords]
# shift coordinates
x_ = x - np.mean(x)
y_ = y - np.mean(y)
# calculate area
correction = x_[-1] * y_[0] - y_[-1] * x_[0]
main_area = np.dot(x_[:-1], y_[1:]) - np.dot(y_[:-1], x_[1:])
return 0.5 * np.abs(main_area + correction)
#### Example output
coords = [(385495.19520441635, 6466826.196947694), (385496.1951836388, 6466826.196947694), (385496.1951836388, 6466825.196929455), (385495.19520441635, 6466825.196929455), (385495.19520441635, 6466826.196947694)]
Shapely's area method: 0.9999974610685296
#Trenton's area method: 0.9999974610685296
This is much simpler, for regular polygons:
import math
def area_polygon(n, s):
return 0.25 * n * s**2 / math.tan(math.pi/n)
since the formula is ¼ n s2 / tan(π/n).
Given the number of sides, n, and the length of each side, s
Based on
https://www.mathsisfun.com/geometry/area-irregular-polygons.html
def _area_(coords):
t=0
for count in range(len(coords)-1):
y = coords[count+1][1] + coords[count][1]
x = coords[count+1][0] - coords[count][0]
z = y * x
t += z
return abs(t/2.0)
a=[(5.09,5.8), (1.68,4.9), (1.48,1.38), (4.76,0.1), (7.0,2.83), (5.09,5.8)]
print _area_(a)
The trick is that the first coordinate should also be last.
def find_int_coordinates(n: int, coords: list[list[int]]) -> float:
rez = 0
x, y = coords[n - 1]
for coord in coords:
rez += (x + coord[0]) * (y - coord[1])
x, y = coord
return abs(rez / 2)
I am looking for a way to avoid the nested loops in the following snippet, where A and B are two-dimensional arrays, each of shape (m, n) with m, n beeing arbitray positive integers:
import numpy as np
m, n = 5, 2
a = randint(0, 10, (m, n))
b = randint(0, 10, (m, n))
out = np.empty((n, n))
for i in range(n):
for j in range(n):
out[i, j] = np.sum(A[:, i] + B[:, j])
The above logic is roughly equivalent to
np.einsum('ij,ik', A, B)
with the exception that einsum computes the sum of products.
Is there a way, equivalent to einsum, that computes a sum of sums? Or do I have to write an extension for this operation?
einsum needs to perform elementwise multiplication and then it does summing (optional). As such it might not be applicable/needed to solve our case. Read on!
Approach #1
We can leverage broadcasting such that the first axes are aligned
and second axis are elementwise summed after extending dimensions to 3D. Finally, we need summing along the first axis -
(A[:,:,None] + B[:,None,:]).sum(0)
Approach #2
We can simply do outer addition of columnar summations of each -
A.sum(0)[:,None] + B.sum(0)
Approach #3
And hence, bring in einsum -
np.einsum('ij->j',A)[:,None] + np.einsum('ij->j',B)
You can also use numpy.ufunc.outer, specifically here numpy.add.outer after summing along axis 0 as #Divakar mentioned in #approach 2
In [126]: numpy.add.outer(a.sum(0), b.sum(0))
Out[126]:
array([[54, 67],
[43, 56]])