SUM function results explanation when given two 2-d arrays
When I run the Code in Spyder IDE the Sum function and numpy.add function is showing different results. Can anyone help me to understand how the "SUM" function output is coming when we had given two , 2-d arrays for two parameters in the sum function instead of array and a number. Thank you
import numpy as np
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
print(x)
print(y)
print (x+y)
print(sum(x,y))
print(np.add(x,y))
Output
[[1. 2.]
[3. 4.]]
[[5. 6.]
[7. 8.]]
[[ 6. 8.]
[10. 12.]]
[[ 9. 12.]
[11. 14.]]
[[ 6. 8.]
[10. 12.]]
In Numpy, the + operator is defined to be element-wise addition and in fact equivalent to np.add(...).
The sum(iterable, [start]) built-in function
Sums start and the items of an iterable from left to right and returns the total. start defaults to 0.
So if only given one matrix, it will perform a column-wise summation. If given a second argument, it will (element-wise) add to the sum. So some smaller examples might be
sum(x)
> array([4., 6.])
# i.e. [(1+3), (2+4)]
sum(x, 1)
> array([5., 7.])
# i.e. [(1+1+3), (1+2+4)]
sum(y)
> array([12., 14.])
# i.e. [(5+7), (6+8)]
sum(x, sum(y))
> array([16., 20.])
# i.e. [((5+7)+1+3), ((6+8)+2+4)]
sum(x, y)
> array([[ 9., 12.],
[11., 14.]])
# i.e. [[(5+1+3), (6+2+4)],
# [(7+1+3), (8+2+4)]]
The last sum() is performing the column-wise sum of x, and then adding the result to each element of y with a shared column. Written with Numpy, it's equivalent to
sum(x, y) == x.sum(axis=0) + y
Related
I want to compute a symbolic gradient with sympy, e.g.,
import sympy as sym
x, y, z = sym.symbols("x y z", real=True)
T = sym.cos(x**2+y**2)
gradT = sym.Matrix([sym.diff(T, x), sym.diff(T,y), sym.diff(T,z)])
Now I would like to create a lamddify function with this expression:
func = lambdify((x,y,z), gradT,'numpy')
To use the function I have:
gradT_exact = func(np.linspace(0,2,100), np.linspace(0,2,100), np.linspace(0,2,100))
and I receive the following error:
<lambdifygenerated-3>:2: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return (array([[-2*x*sin(x**2 + y**2)], [-2*y*sin(x**2 + y**2)], [0]]))
If I change T to be a function of x,y,z it gives me no problems...
Why is it giving warnings when T only depends on x and y and z is set to zero.
Thanks in advance!
The gradT expression:
In [84]: gradT
Out[84]:
⎡ ⎛ 2 2⎞⎤
⎢-2⋅x⋅sin⎝x + y ⎠⎥
⎢ ⎥
⎢ ⎛ 2 2⎞⎥
⎢-2⋅y⋅sin⎝x + y ⎠⎥
⎢ ⎥
⎣ 0 ⎦
and its conversion to numpy:
In [87]: print(func.__doc__)
Created with lambdify. Signature:
func(x, y, z)
Expression:
Matrix([[-2*x*sin(x**2 + y**2)], [-2*y*sin(x**2 + y**2)], [0]])
Source code:
def _lambdifygenerated(x, y, z):
return (array([[-2*x*sin(x**2 + y**2)], [-2*y*sin(x**2 + y**2)], [0]]))
If x and y are arrays, then 2 terms will reflect their dimension(s), but the last is [0]. That's why you get the ragged warning.
lambdify does a rather simple lexical translation. It does not implement any deep understanding of numpy arrays. At some level it's your responsibility to check that the numpy code looks reasonable.
with scalar inputs:
In [88]: func(1,2,3)
Out[88]:
array([[1.91784855],
[3.8356971 ],
[0. ]])
but if one input is an array:
In [90]: func(np.array([1,2]),2,3)
<lambdifygenerated-1>:2: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return (array([[-2*x*sin(x**2 + y**2)], [-2*y*sin(x**2 + y**2)], [0]]))
Out[90]:
array([[array([ 1.91784855, -3.95743299])],
[array([ 3.8356971 , -3.95743299])],
[0]], dtype=object)
The result is object dtype containing 2 arrays, plus that [0] list.
To avoid this problem, the lambdify would have to produce a function like:
In [95]: def f(x,y,z):
...: temp = 0*x*y
...: return np.array([-2*x*np.sin(x**2 + y**2), -2*y*np.sin(x**2 + y**2)
...: , temp])
where temp is designed to give 0 value, but with a shape that reflects the broadcasted operations on x and y in the other terms. I think that's asking too much of lambdify.
In [96]:
In [96]: f(np.array([1,2]),2,3)
Out[96]:
array([[ 1.91784855, -3.95743299],
[ 3.8356971 , -3.95743299],
[ 0. , 0. ]])
I am coding PyTorch. Between the torch inference code, I add some peripheral code for my own interest. This code works fine, but it is too slow. The reason might be for iteration. So, i need parallel and fast way of doing this.
It is okay to do this in tensor, Numpy, or just python array.
I made a function named selective_max to find maximum value in arrays. But the problem is that I don't want a maximum among the whole arrays, but among specific candidates which is designated by mask array. Let me show the gist of this function (below shows the code itself)
Input
x [batch_size , dim, num_points, k] : x is a original input, but this becomes [batch_size, num_points, dim, k] by x.permute(0,2,1,3).
batch_size is a well-known definition in the deep learning society. In every mini batch, there are many points. And a single point is represented by dim length feature. For each feature element, there are k potential candidates which is target of max function later.
mask [batch_size, num_points, k] : This array is similar to x without dim. Its element is either 0 or 1. So, I use this as a mask signal, like do max operation only on 1 masked value.
Kindly see the code below with this explanation. I use 3 for iteration. Let's say we target a specific batch and a specific point. For a specific batch and a specific point, x has [dim, k] array. And mask has [k] array which consists of either 0 or 1. So, I extract the non-zero index from [k] array and use this for extracting specific elements in x dim by dim('for k in range(dim)').
Toy example
Let's say we are in the second for iteration. So, we now have [dim, k] for x and [k] for mask. For this toy example, i presume k=3 and dim=4. x = [[3,2,1],[5,6,4],[9,8,7],[12,11,10]], k=[0,1,1]. So, output would be [2,6,8,11], not [3, 6, 9, 12].
Previous attempt
I try { mask.repeat(0,0,1,0) *(element-wise mul) x } and do the max operation. But, '0' might the max value, because the x might have minus values in all array. So, this would result in wrong operation.
def selective_max2(x, mask): # x : [batch_size , dim, num_points, k] , mask : [batch_size, num_points, k]
batch_size = x.size(0)
dim = x.size(1)
num_points = x.size(2)
k = x.size(3)
device = torch.device('cuda')
x = x.permute(0,2,1,3) # : [batch, num_points, dim, k]
#print('permuted x dimension : ',x.size())
x = x.detach().cpu().numpy()
mask = mask.cpu().numpy()
output = np.zeros((batch_size,num_points,dim))
for i in range(batch_size):
for j in range(num_points):
query=np.nonzero(mask[i][j]) # among mask entries, we get the index of nonzero values.
for k in range(dim): # for different k values, we get the max value.
# query is index of nonzero values. so, using query, we can get the values that we want.
output[i][j][k] = np.max(x[i][j][k][query])
output = torch.from_numpy(output).float().to(device=device)
output = output.permute(0,2,1).contiguous()
return output
Disclaimer: I've followed your toy example (however while retaining generality) to write the following solution.
The first thing is to expand your k as x (treating them both as PyTorch tensors):
k_expanded = k.expand_as(x)
Then you select the elements where your 1's exist in the k_expanded, and view the resulting tensor as x number of rows (written as x.shape[0]), and number of 1's in k (or the mask) as the number of columns. Up to this point, we have selected the range we want to query the maximum element for. Then, you find the maximum along the rows dimension (showed in .sum(0)) using max(1)
values, indices = x[k_expanded == 1].view(x.shape[0], (k == 1).sum(0)).max(1)
values
Out[29]: tensor([ 2, 6, 8, 11])
Benchmarks
def find_max_elements_inside_tensor_range(arr, mask, return_indices=False):
mask_expanded = mask.expand_as(arr)
values, indices = x[k_expanded==1].view(x.shape[0], (k == 1).sum(0)).max(1)
return (values, indices) if return_indices else values
Just added a third parameter in case you want to get the numbers indices
%timeit find_max_elements_inside_tensor_range(x, k)
38.4 µs ± 534 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Note: the above solution also works for tensors and masks of various shapes.
I made a simple function that produces a weighted average of several time series using supplied weights. It is designed to handle missing values (NaNs), which is why I am not using numpy's supplied average function.
However, when I feed it my array containing missing values, the array has its nan values replaced by 0s! I would have assumed that since I am changing the name of the array and it is not a global variable this should not happen. I want my X array to retain its original form including the nan value
I am a relative novice using python (obviously).
Example:
X = np.array([[1, 2, 3], [1, 2, 3], [1, 2, np.nan]]) # 3 time series to be weighted together
weights = np.array([[1,1,1]]) # simple example with weights for each series as 1
def WeightedMeanNaN(Tseries, weights):
## calculates weighted mean
N_Tseries = Tseries
Weights = np.repeat(weights, len(N_Tseries), axis=0) # make a vector of weights matching size of time series
loc = np.where(np.isnan(N_Tseries)) # get location of nans
Weights[loc] = 0
N_Tseries[loc] = 0
Weights = Weights/Weights.sum(axis=1)[:,None] # normalize each row so that weights sum to 1
WeightedAve = np.multiply(N_Tseries,Weights)
WeightedAve = WeightedAve.sum(axis=1)
return WeightedAve
WeightedMeanNaN(Tseries = X, weights = weights)
Out[161]: array([2. , 2. , 1.5])
In:X
Out:
array([[1., 2., 3.],
[1., 2., 3.],
[1., 2., 0.]]) # no longer nan!! ```
Where you call
loc = np.where(np.isnan(N_Tseries)) # get location of nans
Weights[loc] = 0
N_Tseries[loc] = 0
You remove all NaNs and set them to zeros.
To reverse this you could iterate over the array and replace zeros with NaNs.
However, this would also set regular zeros to Nans.
So it turns out this is a mistake caused by me being used to working in Matlab. Python treats arguments supplied to the function as pointers to the original object. In contrast, Matlab creates copies that are discarded when the function ends.
I solved my problem by adding ".copy()" when assigning variables in the function, so that the first line in the function above becomes:
N_Tseries = Tseries.copy().
However, one thing that puzzles me is that some people have suggested that using Tseries[:] should also create a copy of Tseries rather than a pointer to the original variable. This did not work for me though.
I found this answer useful:
Python function not supposed to change a global variable
I have a bumpy array. I want to find the number of points which lies within an epsilon distance from each point.
My current code is (for a n*2 array, but in general I expect the array to be n * m)
epsilon = np.array([0.5, 0.5])
np.array([ 1/np.float(np.sum(np.all(np.abs(X-x) <= epsilon, axis=1))) for x in X])
But this code might not be efficient when it comes to an array of let us say 1 million rows and 50 columns. Is there a better and more efficient method ?
For example data
X = np.random.rand(10, 2)
you can solve this using broadcasting:
1 / np.sum(np.all(np.abs(X[:, None, ...] - X[None, ...]) <= epsilon, axis=-1), axis=-1)
I'm trying to figure out how to create a script which calculates a standard deviation for a file. As an example, say I DLed a csv with a list of values on it. I want to find the SD of these values by running a python program. We are not using numpy here!
If you allow the use of the standard library,
import math
xs = [0.5,0.7,0.3,0.2] # values (must be floats!)
mean = sum(xs) / len(xs) # mean
var = sum(pow(x-mean,2) for x in xs) / len(xs) # variance
std = math.sqrt(var) # standard deviation
If not, you need to approximate sqrt by hand. For example, you can use binary search or Newton's Method. Here's a wikipedia page for methods of doing so
with Python 3.4 and above there is a package called statistics, that has standard deviation (pstdev) and other functions
Here is an example of how to use it:
import statistics
data = [1, 1, 2.5, 6.5, 7.3, 8, 9.2]
print(statistics.pstdev(data))
# 3.2159043543498815
from math import sqrt
n= [11, 8, 8, 3, 4, 4, 5, 6, 6, 7, 8]
mean =sum(n)/len(n)
SUM= 0
for i in n :
SUM +=(i-mean)**2
stdeV = sqrt(SUM/(len(n)-1))
print(stdeV)
filename = "C:\Users\mmb0368\Desktop\input.txt"
file = open("C:\Users\mmb0368\Desktop\input.txt","rb")
n = file.readlines()
num_list = map(lambda n: n.rstrip("\n"), n)
num_list = [int(x) for x in num_list]
mean = sum(num_list)/len(num_list)
print mean, max(num_list), min(num_list)
for snDev in num_list:
snDev = mean**(1.0/2)
print snDev
from math import sqrt
def getAverage(mylist):
"""
This function calculates the average of a list of numbers.
Parameters:
mylist (list): List of numbers
Returns:
float: Average of the numbers in the list
Example:
>>> getAverage([1,5,10])
5.333333333333333
"""
return sum(mylist)/len(mylist)
def getStandardDeviation(mylist):
"""
This function calculates the standard deviation of a list of numbers.
Parameters:
mylist (list): List of numbers
Returns:
float: Standard deviation of the numbers in the list
Example:
>>> getStandardDeviation([1,5,10])
4.509249752822894
"""
ls=[]
for i in mylist:
ls.append((i - getAverage(mylist))**2)
return sqrt( sum(ls) / (len(mylist) - 1) )
mylist = [1,5,10]
getAverage(mylist=mylist)
# 5.333333333333333
getStandardDeviation(mylist=mylist)
# 4.509249752822894
This code contains two functions getAverage and getStandardDeviation for calculating average and standard deviation of a list of numbers respectively. The getAverage function takes in a list of numbers and returns the average of those numbers. The getStandardDeviation function takes in a list of numbers and returns the standard deviation of those numbers by first finding the square difference of each number from the average and then taking the square root of the average of those squared differences. A sample list mylist of numbers is defined at the end and both functions are called with this list as argument.