Related
I'm trying to implement double stochastic normalisation of an N x N x P tensor as described in Section 3.2 in Gong, CVPR 2019. This can be done easily in the N x N case using matrix operations but I am stuck with the 3D tensor case. What I have so far is
def doubly_stochastic_normalise(E):
"""E: n x n x f"""
E = E / torch.sum(E, dim=1, keepdim=True) # normalised across rows
F = E / torch.sum(E, dim=0, keepdim=True) # normalised across cols
E = torch.einsum('ijp,kjp->ikp', E, F)
return E
but I'm wondering if there is a method without einsum.
In this setting, you can always fall back to using torch.matmul (batched matrix multiplication to be more precise). However, this requires you to transpose the axis. Recall the matrix multiplication for two 3D inputs, in einsum notation, it gives us:
bik,bkj->bij
Notice how the k dimension gets reduces. To get to this setting, we need to transpose the inputs of the operator. In your case we have:
ijp ? kjp -> ikp
↓ ↓ ↑
pij # pjk -> pik
This translates to:
>>> (E.permute(2,0,1) # F.permute(2,1,0)).permute(1,2,0)
# ijp ➝ pij kjp ➝ pjk pik ➝ ikp
You can argue your method is not only shorter but also a lot more readable. I would therefore stick with torch.einsum. The reason why the einsum operator is so useful here is because you can perform axes transpositions on the fly.
I have the following code segment to generate random samples. The generated samples is a list, where each entry of the list is a tensor. Each tensor has two elements. I would like to extract the first element from all tensors in the list; and extract the second element from all tensors in the list as well. How to perform this kind of tensor slice operation
import torch
import pyro.distributions as dist
num_samples = 250
# note that both covariance matrices are diagonal
mu1 = torch.tensor([0., 5.])
sig1 = torch.tensor([[2., 0.], [0., 3.]])
dist1 = dist.MultivariateNormal(mu1, sig1)
samples1 = [pyro.sample('samples1', dist1) for _ in range(num_samples)]
samples1
I'd recommend torch.cat with a list comprehension:
col1 = torch.cat([t[0] for t in samples1])
col2 = torch.cat([t[1] for t in samples1])
Docs for torch.cat: https://pytorch.org/docs/stable/generated/torch.cat.html
ALTERNATIVELY
You could turn your list of 1D tensors into a single big 2D tensor using torch.stack, then do a normal slice:
samples1_t = torch.stack(samples1)
col1 = samples1_t[:, 0] # : means all rows
col2 = samples1_t[:, 1]
Docs for torch.stack: https://pytorch.org/docs/stable/generated/torch.stack.html
I should mention PyTorch tensors come with unpacking out of the box, this means you can unpack the first axis into multiple variables without additional considerations. Here torch.stack will output a tensor of shape (rows, cols), we just need to transpose it to (cols, rows) and unpack:
>>> c1, c2 = torch.stack(samples1).T
So you get c1 and c2 shaped (rows,):
>>> c1
tensor([0.6433, 0.4667, 0.6811, 0.2006, 0.6623, 0.7033])
>>> c2
tensor([0.2963, 0.2335, 0.6803, 0.1575, 0.9420, 0.6963])
Other answers that suggest .stack() or .cat() are perfectly fine from PyTorch perspective.
However, since the context of the question involves pyro, may I add the following:
Since you are doing IID samples
[pyro.sample('samples1', dist1) for _ in range(num_samples)]
A better way to do it with pyro is
dist1 = dist.MultivariateNormal(mu1, sig1).expand([num_samples])
This tells pyro that the distribution is batched with a batch size of num_samples. Sampling from this will produce
>> dist1.sample()
tensor([[-0.8712, 6.6087],
[ 1.6076, -0.2939],
[ 1.4526, 6.1777],
...
[-0.0168, 7.5085],
[-1.6382, 2.1878]])
Now its easy to solve your original question. Just slice it like
samples = dist1.sample()
samples[:, 0] # all first elements
samples[:, 1] # all second elements
I have the Pauli matrices which are (2x2) and complex
II = np.identity(2, dtype=complex)
X = np.array([[0, 1], [1, 0]], dtype=complex)
Y = np.array([[0, -1j], [1j, 0]], dtype=complex)
Z = np.array([[1, 0], [0, -1]], dtype=complex)
and a depolarizing_error function which takes in a normally distributed random number param, generated by np.random.normal(noise_mean, noise_sd)
def depolarizing_error(param):
XYZ = np.sqrt(param/3)*np.array([X, Y, Z])
return np.array([np.sqrt(1-param)*II, XYZ[0], XYZ[1], XYZ[2]])
Now if I feed in a single number for param of let's say a, my function should return an output of np.array([np.sqrt(1-a)*II, a*X, a*Y, a*Z]) where a is a float and * denotes the element-wise multiplication between a and the entries of the (2x2) matrices II, X, Y, Z.
Now for vectorization purposes, I wish to feed in an array of param i.e.
param = np.array([a, b, c, ..., n]) Eqn(1)
again with all a, b, c, ..., n generated independently by np.random.normal(noise_mean, noise_sd) (I think it's doable with np.random.normal(noise_mean, noise_sd, n) or something)
such that my function now returns:
np.array([[np.sqrt(1-a)*II, a*X, a*Y, a*Z],
[np.sqrt(1-b)*II, b*X, b*Y, b*Z],
................................,
[np.sqrt(1-n)*II, n*X, n*Y, n*Z]])
I thought feeding in something like np.random.normal(noise_mean, noise_sd, n) as param, giving output as np.array([a, b, c,...,n]) would sort itself out and return what I want above. but my XYZ = np.sqrt(param/3)*np.array([X, Y, Z]) ended up doing element-wise dot product instead of element-wise multiplication. I tried using param as np.array([a, b])
and ended up with
np.array([np.dot(np.sqrt(1-[a, b]), II),
np.dot(np.sqrt([a, b]/3), X),
np.dot(np.sqrt([a, b]/3), Y),
np.dot(np.sqrt([a, b]/3), Z)])
instead. So far I've tried something like
def depolarizing_error(param):
XYZ = np.sqrt(param/3)#np.array([X, Y, Z])
return np.array([np.sqrt(1-param)*II, XYZ[0], XYZ[1], XYZ[2]])
thinking that the matmul # will just broadcast it conveniently for me but then I got really bogged down by the dimensions.
Now my motivation for wanting to do all this is because I have another matrix that's given by:
def random_angles(sd, seq_length):
return np.random.normal(0, sd, (seq_length,3))
def unitary_error(params):
e_1 = np.exp(-1j*(params[:,0]+params[:,2])/2)*np.cos(params[:,1]/2)
e_2 = np.exp(-1j*(params[:,0]-params[:,2])/2)*np.sin(params[:,1]/2)
return np.array([[e_1, e_2], [-e_2.conj(), e_1.conj()]],
dtype=complex).transpose(2,0,1)
where here the size of seq_length is equivalent to the number of entries in Eqn(1) param, denoting N = seq_length = |param| say. Here my unitary_error function should give me an output of
np.array([V_1, V_2, ..., V_N])
such that I'll be able to use np.matmul as an attempt to implement vectorization like this
np.array([V_1, V_2, ..., V_N])#np.array([[np.sqrt(1-a)*II, a*X, a*Y, a*Z],
[np.sqrt(1-b)*II, b*X, b*Y, b*Z],
................................,
[np.sqrt(1-n)*II, n*X, n*Y, n*Z]])#np.array([V_1, V_2, ..., V_N])
to finally give
np.array([[V_1#np.sqrt(1-a)*II#V_1, V_1#a*X#V_1, V_1#a*Y#V_1, V_1#a*Z#V_1],
[V_2#np.sqrt(1-b)*II#V_2, V_2#b*X#V_2, V_2#b*Y#V_2, V_2#b*Z#V_2],
................................,
[V_N#np.sqrt(1-n)*II#V_N, V_N#n*X#V_N, V_N#n*Y#V_N, V_N#n*Z#V_N]])
where here # denotes the element-wise dot-product
Can someone explain why applying numpy's fft and fft2 to the same 1D array yields different results?
x = np.linspace(0, 2*np.pi, 10)
x = np.reshape(x, (10, 1))
x = np.sin(x)
f1 = np.fft.fft(x)
f2 = np.fft.fft2(x)
np.equal(f1,f2)
Theoretically, f1 should be equal to f2.
Answer rewritten (sorry, the first one was a bit short):
The difference is, that fft takes other imnput arguments than fft2 (Fourier transformation (FT) in 2 dimensions).
If you print(f1) in your example you can see nicely, that all values are roughly 0. This should make you suspicious, as you Fourier transform the sinus.
What happend is, that the fft routine got a list of input aruments instead of an array, so it did the FT for each entry (1 element). This corresponds to a constant function and for that, math tells us: FT(const1)=const1. Four this reason you got the same output like input in fft. The fft2 routine you used properly.
Below you find you example in modified version, which illustrates the point:
import numpy as np
import copy
x1 = np.linspace(0, 2*np.pi, 10)
x2 = np.reshape(x1, (10, 1))
x1 = np.sin(x1)
x2 = np.sin(x2)
f1 = np.fft.fft(x1)
f2 = np.fft.fft2(x2)
print('input arrays for fft and fft2:')
print(x1)
print(x2)
print('your old output of fft, which is exactly equal to the input x2')
print(np.fft.fft(x2))
print('Now we compare our results:')
for ii in range(0,len(x1)):
print('f1:',f1[ii],'\tf2:',f2[ii,0])
print('and see, they agree')
Output:
input arrays for fft and fft2:
[ 0.00000000e+00 6.42787610e-01 9.84807753e-01 8.66025404e-01
3.42020143e-01 -3.42020143e-01 -8.66025404e-01 -9.84807753e-01
-6.42787610e-01 -2.44929360e-16]
[[ 0.00000000e+00]
[ 6.42787610e-01]
[ 9.84807753e-01]
[ 8.66025404e-01]
[ 3.42020143e-01]
[ -3.42020143e-01]
[ -8.66025404e-01]
[ -9.84807753e-01]
[ -6.42787610e-01]
[ -2.44929360e-16]]
your old output of fft, which is exactly equal to the input x2
[[ 0.00000000e+00+0.j]
[ 6.42787610e-01+0.j]
[ 9.84807753e-01+0.j]
[ 8.66025404e-01+0.j]
[ 3.42020143e-01+0.j]
[ -3.42020143e-01+0.j]
[ -8.66025404e-01+0.j]
[ -9.84807753e-01+0.j]
[ -6.42787610e-01+0.j]
[ -2.44929360e-16+0.j]]
Now we compare our results:
f1: (-1.11022302463e-16+0j) f2: (-1.11022302463e-16+0j)
f1: (1.42837120544-4.39607454395j) f2: (1.42837120544-4.39607454395j)
f1: (-0.485917547994+0.668808127899j) f2: (-0.485917547994+0.668808127899j)
f1: (-0.391335729991+0.284322050566j) f2: (-0.391335729991+0.284322050566j)
f1: (-0.36913281032+0.119938520599j) f2: (-0.36913281032+0.119938520599j)
f1: (-0.363970234266+1.55431223448e-15j) f2: (-0.363970234266+1.55431223448e-15j)
f1: (-0.36913281032-0.119938520599j) f2: (-0.36913281032-0.119938520599j)
f1: (-0.391335729991-0.284322050566j) f2: (-0.391335729991-0.284322050566j)
f1: (-0.485917547994-0.668808127899j) f2: (-0.485917547994-0.668808127899j)
f1: (1.42837120544+4.39607454395j) f2: (1.42837120544+4.39607454395j)
and see, they agree
Some examples about fft2, you can find here
I was wondering if someone can maybe have a quick look at the following code snippet and point me in a direction to find my misunderstanding in calculating the probability of a sample for each class in the model and my related code bug. I tried to manually calculate the results provided by the sklearn function lm.predict_proba(X) , sadly the results are different, so i did a mistake.
I think the bug will be in part "d" of the following code walkthrough. Maybe in the math, but I could not see why.
a) Creating and training a logistic regression model ( works fine )
lm = LogisticRegression(random_state=413, multi_class='multinomial', solver='newton-cg')
lm.fit(X, train_labels)
b) Saving coefficient and bias ( works fine )
W = lm.coef_
b = lm.intercept_
c) Using lm.predict_proba(X) ( works fine)
def reshape_single_element(x,num):
singleElement = x[num]
nx,ny = singleElement.shape
return singleElement.reshape((1,nx*ny))
select_image_number = 6
X_select_image_data=reshape_single_element(train_dataset,select_image_number)
Y_probabilities = lm.predict_proba(X_select_image_data)
Y_pandas_probabilities = pd.Series(Y_probabilities[0], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'])
print"estimate probabilities for each class: \n" ,Y_pandas_probabilities , "\n"
print"all probabilities by lm.predict_proba(..) sum up to ", np.sum(Y_probabilities) , "\n"
The output was:
estimate probabilities for each class:
a 0.595426
b 0.019244
c 0.001343
d 0.004033
e 0.017185
f 0.004193
g 0.160380
h 0.158245
i 0.003093
j 0.036860
dtype: float64
all probabilities by lm.predict_proba(..) sum up to 1.0
d) Manually performing the calculation done by lm.predict_proba ( no error/warning, but results are not the same )
manual_calculated_probabilities = []
for select_class_k in range(0,10): #a=0. b=1, c=3 ...
z_for_class_k = (np.sum(W[select_class_k] *X_select_image_data) + b[select_class_k] )
p_for_class_k = 1/ (1 + math.exp(-z_for_class_k))
manual_calculated_probabilities.append(p_for_class_k)
print "formula: ", manual_calculated_probabilities , "\n"
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e = np.exp(x)
dist = e / np.sum(np.exp(x),axis=0)
return dist
abc = softmax(manual_calculated_probabilities)
print "softmax:" , abc
The output was:
formula: [0.9667598370531315, 0.48453459121301334, 0.06154496922245115, 0.16456194859398865, 0.45634781280053394, 0.16999340794727547, 0.8867996361191054, 0.8854473986336552, 0.13124464656251109, 0.642913996162282]
softmax: [ 0.15329642 0.09464644 0.0620015 0.0687293 0.0920159 0.069103610.14151607 0.14132483 0.06647715 0.11088877]
Softmax was used, because of a comment at github logistic.py
For a multi_class problem, if multi_class is set to be "multinomial" the softmax function is used to find the predicted probability of each class.
Note:
print "shape of X: " , X_select_image_data.shape
print "shape of W: " , W.shape
print "shape of b: " , b.shape
shape of X: (1, 784)
shape of W: (10, 784)
shape of b: (10,)
I found a very similar question here, but sadly I could not adapted it to my code so the predictions got the same. I tried many different combinations to calculate the variables 'z_for_class_k' and 'p_for_class_k' but sadly without success to reproduce the prediction values from 'predict_proba(X)'.
I think the problem is with
p_for_class_k = 1/ (1 + math.exp(-z_for_class_k))
1 / (1 + exp(-logit)) is a simplification that works only on binary problems.
The real equation, before being simplified, looks like this:
p_for_classA =
exp(logit_classA) /
[1 + exp(logit_classA) + exp(logit_classB) ... + exp(logit_classC)]
In other words, when calculating a probability for a specific class, you must incorporate ALL the weights and biases from the other classes as well into your formula.
I didn't have the data to test this out, but hopefully this points you in the right direction.
change
p_for_class_k = 1/ (1 + math.exp(-z_for_class_k))
manual_calculated_probabilities.append(p_for_class_k)
to
manual_calculated_probabilities.append(z_for_class_k)
aka the input for softmax is "z"s instead of "p"s, in your notation. multinomial logistic
I was able to replicate the method lr.predict_proba by doing the following:
>>> sigmoid = lambda x: 1/(1+np.exp(-x))
>>> sigmoid(lr.intercept_+np.sum(lr.coef_*X.values, axis=1))
Assuming that X is a numpy array and lr is an object from sklearn.