How to do numpy matmul broadcasting between two numpy tensors? - python-3.x

I have the Pauli matrices which are (2x2) and complex
II = np.identity(2, dtype=complex)
X = np.array([[0, 1], [1, 0]], dtype=complex)
Y = np.array([[0, -1j], [1j, 0]], dtype=complex)
Z = np.array([[1, 0], [0, -1]], dtype=complex)
and a depolarizing_error function which takes in a normally distributed random number param, generated by np.random.normal(noise_mean, noise_sd)
def depolarizing_error(param):
XYZ = np.sqrt(param/3)*np.array([X, Y, Z])
return np.array([np.sqrt(1-param)*II, XYZ[0], XYZ[1], XYZ[2]])
Now if I feed in a single number for param of let's say a, my function should return an output of np.array([np.sqrt(1-a)*II, a*X, a*Y, a*Z]) where a is a float and * denotes the element-wise multiplication between a and the entries of the (2x2) matrices II, X, Y, Z.
Now for vectorization purposes, I wish to feed in an array of param i.e.
param = np.array([a, b, c, ..., n]) Eqn(1)
again with all a, b, c, ..., n generated independently by np.random.normal(noise_mean, noise_sd) (I think it's doable with np.random.normal(noise_mean, noise_sd, n) or something)
such that my function now returns:
np.array([[np.sqrt(1-a)*II, a*X, a*Y, a*Z],
[np.sqrt(1-b)*II, b*X, b*Y, b*Z],
................................,
[np.sqrt(1-n)*II, n*X, n*Y, n*Z]])
I thought feeding in something like np.random.normal(noise_mean, noise_sd, n) as param, giving output as np.array([a, b, c,...,n]) would sort itself out and return what I want above. but my XYZ = np.sqrt(param/3)*np.array([X, Y, Z]) ended up doing element-wise dot product instead of element-wise multiplication. I tried using param as np.array([a, b])
and ended up with
np.array([np.dot(np.sqrt(1-[a, b]), II),
np.dot(np.sqrt([a, b]/3), X),
np.dot(np.sqrt([a, b]/3), Y),
np.dot(np.sqrt([a, b]/3), Z)])
instead. So far I've tried something like
def depolarizing_error(param):
XYZ = np.sqrt(param/3)#np.array([X, Y, Z])
return np.array([np.sqrt(1-param)*II, XYZ[0], XYZ[1], XYZ[2]])
thinking that the matmul # will just broadcast it conveniently for me but then I got really bogged down by the dimensions.
Now my motivation for wanting to do all this is because I have another matrix that's given by:
def random_angles(sd, seq_length):
return np.random.normal(0, sd, (seq_length,3))
def unitary_error(params):
e_1 = np.exp(-1j*(params[:,0]+params[:,2])/2)*np.cos(params[:,1]/2)
e_2 = np.exp(-1j*(params[:,0]-params[:,2])/2)*np.sin(params[:,1]/2)
return np.array([[e_1, e_2], [-e_2.conj(), e_1.conj()]],
dtype=complex).transpose(2,0,1)
where here the size of seq_length is equivalent to the number of entries in Eqn(1) param, denoting N = seq_length = |param| say. Here my unitary_error function should give me an output of
np.array([V_1, V_2, ..., V_N])
such that I'll be able to use np.matmul as an attempt to implement vectorization like this
np.array([V_1, V_2, ..., V_N])#np.array([[np.sqrt(1-a)*II, a*X, a*Y, a*Z],
[np.sqrt(1-b)*II, b*X, b*Y, b*Z],
................................,
[np.sqrt(1-n)*II, n*X, n*Y, n*Z]])#np.array([V_1, V_2, ..., V_N])
to finally give
np.array([[V_1#np.sqrt(1-a)*II#V_1, V_1#a*X#V_1, V_1#a*Y#V_1, V_1#a*Z#V_1],
[V_2#np.sqrt(1-b)*II#V_2, V_2#b*X#V_2, V_2#b*Y#V_2, V_2#b*Z#V_2],
................................,
[V_N#np.sqrt(1-n)*II#V_N, V_N#n*X#V_N, V_N#n*Y#V_N, V_N#n*Z#V_N]])
where here # denotes the element-wise dot-product

Related

Multiply a [3, 2, 3] by a [3, 2] tensor in pytorch (dot product along dimension)

Given the following tensors x and y with shapes [3,2,3] and [3,2]. I want to multiply the tensors along the 2nd dimension, this is expected to be a kind of dot product and scaling along the axis and return a [3,2,3] tensor.
import torch
a = [[[0.2,0.3,0.5],[-0.5,0.02,1.0]],[[0.01,0.13,0.06],[0.35,0.12,0.0]], [[1.0,-0.3,1.0],[1.0,0.02, 0.03]] ]
b = [[1,2],[1,3],[0,2]]
x = torch.FloatTensor(a) # shape [3,2,3]
y = torch.FloatTensor(b) # shape [3,2]
The expected output :
Expected output shape should be [3,2,3]
#output = [[[0.2,0.3,0.5],[-1.0,0.04,2.0]],[[0.01,0.13,0.06],[1.05,0.36,0.0]], [[0.0,0.0,0.0],[2.0,0.04, 0.06]] ]
I have tried the two below but none of them is giving the desired output and output shape.
torch.matmul(x,y)
torch.matmul(x,y.unsqueeze(1).shape)
What is the best way to fix this?
This is just broadcasted multiply. So you can insert a unitary dimension on the end of y to make it a [3,2,1] tensor and then multiply by x. There are multiple ways to insert unitary dimensions.
# all equivalent
x * y.unsqueeze(2)
x * y[..., None]
x * y[:, :, None]
x * y.reshape(3, 2, 1)
You could also use torch.einsum.
torch.einsum('abc,ab->abc', x, y)

Slicing a tensor with a dimension varying

I'm trying to slice a PyTorch tensor my_tensor of dimensions s x b x c so that the slicing along the first dimension varies according to a tensor indices of length b, to the effect of:
my_tensor[0:indices, torch.arange(0, b, dtype=torch.long), :] = something
The code above doesn't work and receives the error TypeError: tuple indices must be integers or slices, not tuple.
What I'm aiming for is, for example, if indices = torch.tensor([3, 5, 4]) then:
my_tensor[0:3, 0, :] = something
my_tensor[0:5, 1, :] = something
my_tensor[0:4, 2, :] = something
I'm hoping for a tensorized way to do this so I don't have to resort to a for loop. Also, the method needs to be compatible with TorchScript. Thanks very much.

Meaning of grad_outputs in PyTorch's torch.autograd.grad

I am having trouble understanding the conceptual meaning of the grad_outputs option in torch.autograd.grad.
The documentation says:
grad_outputs should be a sequence of length matching output containing the “vector” in Jacobian-vector product, usually the pre-computed gradients w.r.t. each of the outputs. If an output doesn’t require_grad, then the gradient can be None).
I find this description quite cryptic. What exactly do they mean by Jacobian-vector product? I know what the Jacobian is, but not sure about what product they mean here: element-wise, matrix product, something else? I can't tell from my example below.
And why is "vector" in quotes? Indeed, in the example below I get an error when grad_outputs is a vector, but not when it is a matrix.
>>> x = torch.tensor([1.,2.,3.,4.], requires_grad=True)
>>> y = torch.outer(x, x)
Why do we observe the following output; how was it computed?
>>> y
tensor([[ 1., 2., 3., 4.],
[ 2., 4., 6., 8.],
[ 3., 6., 9., 12.],
[ 4., 8., 12., 16.]], grad_fn=<MulBackward0>)
>>> torch.autograd.grad(y, x, grad_outputs=torch.ones_like(y))
(tensor([20., 20., 20., 20.]),)
However, why this error?
>>> torch.autograd.grad(y, x, grad_outputs=torch.ones_like(x))
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([4]) and output[0] has a shape of torch.Size([4, 4]).
If we take your example we have function f which takes as input x shaped (n,) and outputs y = f(x) shaped (n, n). The input is described as column vector [x_i]_i for i ∈ [1, n], and f(x) is defined as matrix [y_jk]_jk = [x_j*x_k]_jk for j, k ∈ [1, n]².
It is often useful to compute the gradient of the output with respect to the input (or sometimes w.r.t the parameters of f, there are none here). In the more general case though, we are looking to compute dL/dx and not just dy/dx, where dL/dx is the partial derivative of L, computed from y, w.r.t. x.
The computation graph looks like:
x.grad = dL/dx <------- dL/dy y.grad
dy/dx
x -------> y = x*xT
Then, if we look at dL/dx, which is, via the chain rule equal to dL/dy*dy/dx. We have, looking at the interface of torch.autograd.grad, the following correspondences:
outputs <-> y,
inputs <-> x, and
grad_outputs <-> dL/dy.
Looking at the shapes: dL/dx should have the same shape as x (dL/dx can be referred to as the 'gradient' of x), while dy/dx, the Jacobian matrix, would be 3-dimensional. On the other hand dL/dy, which is the incoming gradient, should have the same shape as the output, i.e., y's shape.
We want to compute dL/dx = dL/dy*dy/dx. If we look more closely, we have
dy/dx = [dy_jk/dx_i]_ijk for i, j, k ∈ [1, n]³
Therefore,
dL/dx = [dL/d_x_i]_i, i ∈ [1,n]
= [sum(dL/dy_jk * d(y_jk)/dx_i over j, k ∈ [1, n]²]_i, i ∈ [1,n]
Back to your example, it means for a given i ∈ [1, n]: dL/dx_i = sum(dy_jk/dx_i) over j, k ∈ [1,n]². And dy_jk/dx_i = f(x_j*x_k)/dx_i will equal x_j if i = k, x_k if i = j, and 2*x_i if i = j = k (because of the squared x_i). This being said matrix y is symmetric... So the result comes down to 2*sum(x_i) over i ∈ [1, n]
This means dL/dx is the column vector [2*sum(x)]_i for i ∈ [1, n].
>>> 2*x.sum()*torch.ones_like(x)
tensor([20., 20., 20., 20.])
Stepping back look at this other graph example, here adding an additional operation after y:
x -------> y = x*xT --------> z = y²
If you look at the backward pass on this graph, you have:
dL/dx <------- dL/dy <-------- dL/dz
dy/dx dz/dy
x -------> y = x*xT --------> z = y²
With dL/dx = dL/dy*dy/dx = dL/dz*dz/dy*dy/dx which is in practice computed in two sequential steps: dL/dy = dL/dz*dz/dy, then dL/dx = dL/dy*dy/dx.

Double Trapezoidal Integral in numpy

I have a two-dimensional function $f(x,y)=\exp(y-x)$. I would like to compute the double integral $\int_{0}^{10}\int_{0}^{10}f(x,y) dx dy$ using NumPy trapz. After some reading, they say I should just repeat the trapz twice but it's not working. I have tried the following
import numpy as np
def distFunc(x,y):
f = np.exp(-x+y)
return f
# Values in x to evaluate the integral.
x = np.linspace(.1, 10, 100)
y = np.linspace(.1, 10, 100)
list1=distFunc(x,y)
int_exp2d = np.trapz(np.trapz(list1, y, axis=0), x, axis=0)
The code always gives the error
IndexError: list assignment index out of range
I don't know how to fix this so that the code can work. I thought the inner trapz was to integrate along y first then we end by the second along x. Thank you.
You need to convert x and y to 2D arrays which can be done conveniently in numpy with np.meshgrid. This way, when you call distfunc it will return a 2D array which can be integrated along one axis first and then the other. As your code stands right now, you are passing a 1D list to the first integral (which is fine) and then the second integral receives a scalar value.
import numpy as np
def distFunc(x,y):
f = np.exp(-x+y)
return f
# Values in x to evaluate the integral.
x = np.linspace(.1, 10, 100)
y = np.linspace(.1, 10, 100)
X, Y = np.meshgrid(x, y)
list1=distFunc(X, Y)
int_exp2d = np.trapz(np.trapz(list1, y, axis=0), x, axis=0)

Masking and Instance Normalization in PyTorch

Assume I have a PyTorch tensor, arranged as shape [N, C, L] where N is the batch size, C is the number of channels or features, and L is the length. In this case, if one wishes to perform instance normalization, one does something like:
N = 20
C = 100
L = 40
m = nn.InstanceNorm1d(C, affine=True)
input = torch.randn(N, C, L)
output = m(input)
This will perform a normalization in the L-wise dimension for each N*C = 2000 slices of data, subtracting 2000 means, scaling by 2000 standard deviations, and re-scaling by 100 learnable weight and bias parameters (one per channel). The unspoken assumption here is that all of these values exist and are meaningful.
But I have a situation where, for the slice N=1, I would like to exclude all data after (say) L=35. For the slice N=2 (say) all the data are valid. For the slice N=3, exclude all data after L=30, etc. This mimics data which are one dimensional time sequences, having multiple features, but which are not the same length.
How can I perform an instance norm on such data, get correct statistics, and maintain differentiability/AutoGrad information in PyTorch?
Update: While maintaining GPU performance, or at least not killing it dead.
I cannot...
...Mask with zero values, as this destroys the computer means and variances giving erroneous results
...Mask with np.nan or np.inf, as PyTorch tensors do not ignore such values, but treat them as errors. They are sticky, and lead to garbage results. PyTorch currently lacks the equivalent of np.nanmean and np.nanvar.
...Permute or transpose to an amenable arrangement of data; no such approach gives me what I need
...Use a pack_padded_sequence; instance normalization does not operate on that data structure, and one cannot import data into that structure as far as I know. Also, data re-arrangement would still be necessary, see 3 above.
Am I missing an approach which would give me what I need? Or perhaps am I missing a method of data re-arrangement which would allow 3 or 4 above to work?
This is an issue faced by recurrent neural networks all the time, hence the pack_padded_sequence functionality, but it isn't quite applicable here.
I don't think this is directly possible to implement using the existing InstanceNorm1d, the easiest way would probably be implementing it yourself from scratch. I did a quick implementation that should work. To make it a little bit more general this module requires a boolean mask (a boolean tensor of the same size as the input) that specifies which elements should be considered when passing through the instance norm.
import torch
class MaskedInstanceNorm1d(torch.nn.Module):
def __init__(self, num_features, eps=1e-6, momentum=0.1, affine=True, track_running_stats=False):
super().__init__()
self.num_features = num_features
self.eps = eps
self.momentum = momentum
self.affine = affine
self.track_running_stats = track_running_stats
self.gamma = None
self.beta = None
if self.affine:
self.gamma = torch.nn.Parameter(torch.ones((1, self.num_features, 1), requires_grad=True))
self.beta = torch.nn.Parameter(torch.zeros((1, self.num_features, 1), requires_grad=True))
self.running_mean = None
self.running_variance = None
if self.affine:
self.running_mean = torch.zeros((1, self.num_features, 1), requires_grad=True)
self.running_variance = torch.zeros((1, self.num_features, 1), requires_grad=True)
def forward(self, x, mask):
mean = torch.zeros((1, self.num_features, 1), requires_grad=False)
variance = torch.ones((1, self.num_features, 1), requires_grad=False)
# compute masked mean and variance of batch
for c in range(self.num_features):
if mask[:, c, :].any():
mean[0, c, 0] = x[:, c, :][mask[:, c, :]].mean()
variance[0, c, 0] = (x[:, c, :][mask[:, c, :]] - mean[0, c, 0]).pow(2).mean()
# update running mean and variance
if self.training and self.track_running_stats:
for c in range(self.num_features):
if mask[:, c, :].any():
self.running_mean[0, c, 0] = (1-self.momentum) * self.running_mean[0, c, 0] \
+ self.momentum * mean[0, c, 0]
self.running_variance[0, c, 0] = (1-self.momentum) * self.running_variance[0, c, 0] \
+ self.momentum * variance[0, c, 0]
# compute output
x = (x - mean)/(self.eps + variance).sqrt()
if self.affine:
x = x * self.gamma + self.beta
return x

Resources