I made a simple function that produces a weighted average of several time series using supplied weights. It is designed to handle missing values (NaNs), which is why I am not using numpy's supplied average function.
However, when I feed it my array containing missing values, the array has its nan values replaced by 0s! I would have assumed that since I am changing the name of the array and it is not a global variable this should not happen. I want my X array to retain its original form including the nan value
I am a relative novice using python (obviously).
Example:
X = np.array([[1, 2, 3], [1, 2, 3], [1, 2, np.nan]]) # 3 time series to be weighted together
weights = np.array([[1,1,1]]) # simple example with weights for each series as 1
def WeightedMeanNaN(Tseries, weights):
## calculates weighted mean
N_Tseries = Tseries
Weights = np.repeat(weights, len(N_Tseries), axis=0) # make a vector of weights matching size of time series
loc = np.where(np.isnan(N_Tseries)) # get location of nans
Weights[loc] = 0
N_Tseries[loc] = 0
Weights = Weights/Weights.sum(axis=1)[:,None] # normalize each row so that weights sum to 1
WeightedAve = np.multiply(N_Tseries,Weights)
WeightedAve = WeightedAve.sum(axis=1)
return WeightedAve
WeightedMeanNaN(Tseries = X, weights = weights)
Out[161]: array([2. , 2. , 1.5])
In:X
Out:
array([[1., 2., 3.],
[1., 2., 3.],
[1., 2., 0.]]) # no longer nan!! ```
Where you call
loc = np.where(np.isnan(N_Tseries)) # get location of nans
Weights[loc] = 0
N_Tseries[loc] = 0
You remove all NaNs and set them to zeros.
To reverse this you could iterate over the array and replace zeros with NaNs.
However, this would also set regular zeros to Nans.
So it turns out this is a mistake caused by me being used to working in Matlab. Python treats arguments supplied to the function as pointers to the original object. In contrast, Matlab creates copies that are discarded when the function ends.
I solved my problem by adding ".copy()" when assigning variables in the function, so that the first line in the function above becomes:
N_Tseries = Tseries.copy().
However, one thing that puzzles me is that some people have suggested that using Tseries[:] should also create a copy of Tseries rather than a pointer to the original variable. This did not work for me though.
I found this answer useful:
Python function not supposed to change a global variable
Related
In PyTorch, given a tensor of size=[3], how to expand it by several dimensions to the size=[3,2,5,5] such that the added dimensions have the corresponding values from the original tensor. For example, making size=[3] vector=[1,2,3] such that the first tensor of size [2,5,5] has values 1, the second one has all values 2, and the third one all values 3.
In addition, how to expand the vector of size [3,2] to [3,2,5,5]?
One way to do it I can think is by means of creating a vector of the same size with ones-Like and then einsum but I think there should be an easier way.
You can first unsqueeze the appropriate number of singleton dimensions, then expand to a view at the target shape with torch.Tensor.expand:
>>> x = torch.rand(3)
>>> target = [3,2,5,5]
>>> x[:, None, None, None].expand(target)
A nice workaround is to use torch.Tensor.reshape or torch.Tensor.view to do perform multiple unsqueezing:
>>> x.view(-1, 1, 1, 1).expand(target)
This allows for a more general approach to handle any arbitrary target shape:
>>> x.view(len(x), *(1,)*(len(target)-1)).expand(target)
For an even more general implementation, where x can be multi-dimensional:
>>> x = torch.rand(3, 2)
# just to make sure the target shape is valid w.r.t to x
>>> assert list(x.shape) == list(target[:x.ndim])
>>> x.view(*x.shape, *(1,)*(len(target)-x.ndim)).expand(target)
I have the following code segment to generate random samples. The generated samples is a list, where each entry of the list is a tensor. Each tensor has two elements. I would like to extract the first element from all tensors in the list; and extract the second element from all tensors in the list as well. How to perform this kind of tensor slice operation
import torch
import pyro.distributions as dist
num_samples = 250
# note that both covariance matrices are diagonal
mu1 = torch.tensor([0., 5.])
sig1 = torch.tensor([[2., 0.], [0., 3.]])
dist1 = dist.MultivariateNormal(mu1, sig1)
samples1 = [pyro.sample('samples1', dist1) for _ in range(num_samples)]
samples1
I'd recommend torch.cat with a list comprehension:
col1 = torch.cat([t[0] for t in samples1])
col2 = torch.cat([t[1] for t in samples1])
Docs for torch.cat: https://pytorch.org/docs/stable/generated/torch.cat.html
ALTERNATIVELY
You could turn your list of 1D tensors into a single big 2D tensor using torch.stack, then do a normal slice:
samples1_t = torch.stack(samples1)
col1 = samples1_t[:, 0] # : means all rows
col2 = samples1_t[:, 1]
Docs for torch.stack: https://pytorch.org/docs/stable/generated/torch.stack.html
I should mention PyTorch tensors come with unpacking out of the box, this means you can unpack the first axis into multiple variables without additional considerations. Here torch.stack will output a tensor of shape (rows, cols), we just need to transpose it to (cols, rows) and unpack:
>>> c1, c2 = torch.stack(samples1).T
So you get c1 and c2 shaped (rows,):
>>> c1
tensor([0.6433, 0.4667, 0.6811, 0.2006, 0.6623, 0.7033])
>>> c2
tensor([0.2963, 0.2335, 0.6803, 0.1575, 0.9420, 0.6963])
Other answers that suggest .stack() or .cat() are perfectly fine from PyTorch perspective.
However, since the context of the question involves pyro, may I add the following:
Since you are doing IID samples
[pyro.sample('samples1', dist1) for _ in range(num_samples)]
A better way to do it with pyro is
dist1 = dist.MultivariateNormal(mu1, sig1).expand([num_samples])
This tells pyro that the distribution is batched with a batch size of num_samples. Sampling from this will produce
>> dist1.sample()
tensor([[-0.8712, 6.6087],
[ 1.6076, -0.2939],
[ 1.4526, 6.1777],
...
[-0.0168, 7.5085],
[-1.6382, 2.1878]])
Now its easy to solve your original question. Just slice it like
samples = dist1.sample()
samples[:, 0] # all first elements
samples[:, 1] # all second elements
SUM function results explanation when given two 2-d arrays
When I run the Code in Spyder IDE the Sum function and numpy.add function is showing different results. Can anyone help me to understand how the "SUM" function output is coming when we had given two , 2-d arrays for two parameters in the sum function instead of array and a number. Thank you
import numpy as np
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
print(x)
print(y)
print (x+y)
print(sum(x,y))
print(np.add(x,y))
Output
[[1. 2.]
[3. 4.]]
[[5. 6.]
[7. 8.]]
[[ 6. 8.]
[10. 12.]]
[[ 9. 12.]
[11. 14.]]
[[ 6. 8.]
[10. 12.]]
In Numpy, the + operator is defined to be element-wise addition and in fact equivalent to np.add(...).
The sum(iterable, [start]) built-in function
Sums start and the items of an iterable from left to right and returns the total. start defaults to 0.
So if only given one matrix, it will perform a column-wise summation. If given a second argument, it will (element-wise) add to the sum. So some smaller examples might be
sum(x)
> array([4., 6.])
# i.e. [(1+3), (2+4)]
sum(x, 1)
> array([5., 7.])
# i.e. [(1+1+3), (1+2+4)]
sum(y)
> array([12., 14.])
# i.e. [(5+7), (6+8)]
sum(x, sum(y))
> array([16., 20.])
# i.e. [((5+7)+1+3), ((6+8)+2+4)]
sum(x, y)
> array([[ 9., 12.],
[11., 14.]])
# i.e. [[(5+1+3), (6+2+4)],
# [(7+1+3), (8+2+4)]]
The last sum() is performing the column-wise sum of x, and then adding the result to each element of y with a shared column. Written with Numpy, it's equivalent to
sum(x, y) == x.sum(axis=0) + y
This question already has answers here:
How do I add an extra column to a NumPy array?
(17 answers)
Closed 5 years ago.
l have the following vector
video_132.shape
Out[64]: (64, 3)
that l would to add to it a new 3D vector of three values
video_146[1][146][45]
such that
video_146[1][146][45].shape
Out[68]: (3,)
and
video_146[1][146][45]
Out[69]: array([217, 207, 198], dtype=uint8)
when l do the following
np.append(video_132,video_146[1][146][45])
l'm supposed to get
video_132.shape
Out[64]: (65, 3) # originally (64,3)
However l get :
Out[67]: (195,) # 64*3+3=195
It seems that it flattens the vector
How can l do the append by preserving the 3D structure ?
For visual simplicity let's rename video_132 --> a, and video_146[1][146][45] --> b. The particular values aren't important so let's say
In [82]: a = np.zeros((64, 3))
In [83]: b = np.ones((3,))
Then we can append b to a using:
In [84]: np.concatenate([a, b[None, :]]).shape
Out[84]: (65, 3)
Since np.concatenate returns a new array, reassign its return value to a to "append" b to a:
a = np.concatenate([a, b[None, :]])
Code for append:
def append(arr, values, axis=None):
arr = asanyarray(arr)
if axis is None:
if arr.ndim != 1:
arr = arr.ravel()
values = ravel(values)
axis = arr.ndim-1
return concatenate((arr, values), axis=axis)
Note how arr is raveled if no axis is provided
In [57]: np.append(np.ones((2,3)),2)
Out[57]: array([1., 1., 1., 1., 1., 1., 2.])
append is really aimed as simple cases like adding a scalar to a 1d array:
In [58]: np.append(np.arange(3),6)
Out[58]: array([0, 1, 2, 6])
Otherwise the behavior is hard to predict.
concatenate is the base operation (builtin) and takes a list, not just two. So we can collect many arrays (or lists) in one list and do one concatenate at the end of a loop. And since it doesn't tweak the dimensions before hand, it forces us to do that ourselves.
So to add a shape (3,) to a (64,3) we have transform that (3,) into (1,3). append requires the same dimension adjustment as concatenate if we specify the axis.
In [68]: np.append(arr,b[None,:], axis=0).shape
Out[68]: (65, 3)
In [69]: np.concatenate([arr,b[None,:]], axis=0).shape
Out[69]: (65, 3)
I am pre-processing a numpy array and want to enter it in as a tensorflow Variable. I've tried following other stack exchange advice, but so far without success. I would like to see if I'm doing something uniquely wrong here.
npW = np.zeros((784,10))
npW[0,0] = 20
W = tf.Variable(tf.convert_to_tensor(npW, dtype = tf.float32))
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
print("npsum", np.sum(npW))
print(tf.reduce_sum(W))
And this is the result.
npsum 20.0
Tensor("Sum:0", shape=(), dtype=float32)
I don't know why the reduced sum of the W variable remains zero. Am i missing something here?
You need to understand that Tensorflow differs from traditionnal computing. First, you declare a computational graph. Then, you run operations through the graph.
Taking your example, you have your numpy variables :
npW = np.zeros((784,10))
npW[0,0] = 20
Next, these instructions are a definition of tensorflow variables, i.e. nodes in the computational graph:
W = tf.Variable(tf.convert_to_tensor(npW, dtype = tf.float32))
sum = tf.reduce_sum(W)
And to be able to compute the operation, you need to run the op through the graph, with a sesssion, i.e. :
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
result = sess.run(sum)
print(result) # print 20
Another way is to call eval instead of sess.run()
print(sum.eval()) # print 20
So i tested it a bit differently and found out that the variable is getting assigned properly, but, the reduced_sum function isn't working as expected. If any one has explanations on that it would be much appreciated.
npW = np.zeros((2,2))
npW[0,0] = 20
W = tf.Variable(npW, dtype = tf.float32)
A= tf.constant([[20,0],[0,0]])
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Train
print("npsum", np.sum(npW))
x=tf.reduce_sum(W,0)
print(x)
print(tf.reduce_sum(A))
print(W.eval())
print(A.eval())
This had output
npsum 20.0
Tensor("Sum:0", shape=(2,), dtype=float32)
Tensor("Sum_1:0", shape=(), dtype=int32)
[[ 20. 0.],
[ 0. 0.]]
[[20 0],
[ 0 0]]