Pytorch pairwise concatenation of tensors - pytorch

I'd like to compute a pairwise concatenation over a specific dimension in a batched manner.
For instance,
x = torch.tensor([[[0],[1],[2]],[[3],[4],[5]]])
x.shape = torch.Size([2, 3, 1])
I would like to get y such that y is the concatenation of all pairs of vectors across one dimension, ie:
y = torch.tensor([[[[0,0],[0,1],[0,2]],[[1,0],[1,1],[1,2]], [[2,0], [2,1], [2,2]]],
[[[3,3],[3,4],[3,5]],[[4,3],[4,4],[4,5]], [[5,3],[5,4],[5,5]]]])
y.shape = torch.Size([2, 3, 3, 2])
So essentially, for each x[i,:], you generate all pairs of vectors and you concatenate them on the last dimension.
Is there a straightforward way of doing that?

Without loops and using torch.arange(). The trick is to broadcast instead of using a for loop. That will apply the operation over all elements in the dimension with the : character.
​
x = torch.tensor([
[[0.0000, 1.0000, 2.0000],
[3.0000, 4.0000, 5.0000],
[0.0000, -1.0000, -2.0000],
[-3.0000, -4.0000, -5.0000]],
[[0.0000, 10.0000, 20.0000],
[30.0000, 40.0000, 50.0000],
[0.0000, -10.0000, -20.0000],
[-30.0000, -40.0000, -50.0000]
]
])
​
idx_pairs = torch.cartesian_prod(torch.arange(x.shape[1]), torch.arange(x.shape[1]))
y = x[:, idx_pairs].view(x.shape[0], x.shape[1], x.shape[1], -1)
tensor([[[[ 0., 1., 2., 0., 1., 2.],
[ 0., 1., 2., 3., 4., 5.],
[ 0., 1., 2., 0., -1., -2.],
[ 0., 1., 2., -3., -4., -5.]],
[[ 3., 4., 5., 0., 1., 2.],
[ 3., 4., 5., 3., 4., 5.],
[ 3., 4., 5., 0., -1., -2.],
[ 3., 4., 5., -3., -4., -5.]],
[[ 0., -1., -2., 0., 1., 2.],
[ 0., -1., -2., 3., 4., 5.],
[ 0., -1., -2., 0., -1., -2.],
[ 0., -1., -2., -3., -4., -5.]],
[[ -3., -4., -5., 0., 1., 2.],
[ -3., -4., -5., 3., 4., 5.],
[ -3., -4., -5., 0., -1., -2.],
[ -3., -4., -5., -3., -4., -5.]]],
[[[ 0., 10., 20., 0., 10., 20.],
[ 0., 10., 20., 30., 40., 50.],
[ 0., 10., 20., 0., -10., -20.],
[ 0., 10., 20., -30., -40., -50.]],
[[ 30., 40., 50., 0., 10., 20.],
[ 30., 40., 50., 30., 40., 50.],
[ 30., 40., 50., 0., -10., -20.],
[ 30., 40., 50., -30., -40., -50.]],
[[ 0., -10., -20., 0., 10., 20.],
[ 0., -10., -20., 30., 40., 50.],
[ 0., -10., -20., 0., -10., -20.],
[ 0., -10., -20., -30., -40., -50.]],
[[-30., -40., -50., 0., 10., 20.],
[-30., -40., -50., 30., 40., 50.],
[-30., -40., -50., 0., -10., -20.],
[-30., -40., -50., -30., -40., -50.]]]])

One possible way to do that would be:
all_ordered_idx_pairs = torch.cartesian_prod(torch.tensor(range(x.shape[1])),torch.tensor(range(x.shape[1])))
y = torch.stack([x[i][all_ordered_idx_pairs] for i in range(x.shape[0])])
After reshaping the tensor:
y = y.view(x.shape[0], x.shape[1], x.shape[1], -1)
you get:
y = torch.tensor([[[[0,0],[0,1],[0,2]],[[1,0],[1,1],[1,2]], [[2,0], [2,1], [2,2]]],
[[[3,3],[3,4],[3,5]],[[4,3],[4,4],[4,5]], [[5,3],[5,4],[5,5]]]])

Related

Mismatch between Scipy stat (KS-test) distribution and histogram plot of the data set

I have a dataset like this
y = array([ 25., 20., 10., 31., 30., 66., 13., 5., 9., 2., 4.,
9., 6., 26., 72., 7., 5., 18., 8., 12., 4., 7.,
114., 5., 6., 17., 39., 4., 5., 42., 63., 3., 6.,
16., 17., 4., 27., 18., 3., 7., 48., 24., 72., 21.,
12., 13., 106., 120., 5., 34., 52., 22., 2., 8., 9.,
5., 35., 4., 4., 1., 56., 1., 17., 34., 3., 5.,
17., 17., 10., 48., 9., 195., 20., 60., 5., 77., 114.,
59., 1., 1., 1., 67., 9., 4., 1., 13., 6., 46.,
40., 8., 6., 1., 2., 1., 1., 1., 7., 6., 53.,
6., 3., 4., 2., 1., 1., 5., 1., 5., 1., 7.,
1., 1.])
The corresponding histogram from this data is following
number_of_bins = len(y)
bin_cutoffs = np.linspace(np.percentile(y,0), np.percentile(y,99),number_of_bins)
h = plt.hist(y, bins = bin_cutoffs, color='red')
I test the dataset to get the actual parameter from scipy stat KS test with the following code (got this from How to find probability distribution and parameters for real data? (Python 3))
def get_best_distribution(data):
dist_names = ["norm", "exponweib", "weibull_max", "weibull_min","expon","pareto", "genextreme","gamma","beta"]
dist_results = []
params = {}
for dist_name in dist_names:
dist = getattr(st, dist_name)
param = dist.fit(data)
params[dist_name] = param
# Applying the Kolmogorov-Smirnov test
D, p = st.kstest(data, dist_name, args=param)
print("p value for "+dist_name+" = "+str(p))
dist_results.append((dist_name, p))
# select the best fitted distribution
best_dist, best_p = (max(dist_results, key=lambda item: item[1]))
# store the name of the best fit and its p value
print("Best fitting distribution: "+str(best_dist))
print("Best p value: "+ str(best_p))
print("Parameters for the best fit: "+ str(params[best_dist]))
return best_dist, best_p, params[best_dist]
The result shows that its genextreme distribution. The result is as shown bellow:
('genextreme',
0.1823402997669471,
(-1.119997717132149, 5.036499415233003, 6.2122664378291175))
The fitted curve using these attributes is following
From my understanding, the histogram suggests that it is a exponential distribution.But from KS test it shows another.Can anyone explain why this is happening or anything wrong?

Does conv_transpose2d do a convolution or a cross-correlation operator?

The documentation for the nn mentions it does a cross-correlation, however, my results indicate it does a convolution operator.
import torch
import torch.nn.functional as F
im = torch.Tensor([[0,1],[2,3]]).unsqueeze(0).unsqueeze(0)
kernel = torch.Tensor([[0,1,2],[3,4,5], [6,7,8]]).unsqueeze(0).unsqueeze(0)
op = F.conv_transpose2d(im, kernel, stride=2)
print(op)
It outputs :
tensor([[[[ 0., 0., 0., 1., 2.],
[ 0., 0., 3., 4., 5.],
[ 0., 2., 10., 10., 14.],
[ 6., 8., 19., 12., 15.],
[12., 14., 34., 21., 24.]]]])
which would be the result if there was a convolution, I had expected the correlation result to be:
tensor([[[[ 0., 0., 8., 7., 6.],
[ 0., 0., 5., 4., 3.],
[16., 14., 38., 22., 18.],
[10., 8., 21., 12., 9.],
[ 4., 2., 6., 3., 0.]]]])
Did I misunderstand something?

Need pytorch help doing 2D convolutions of N images with N kernels all at once

I have a torch tensor which is a stack of images. Let's say for kicks it is
im=th.arange(4*5*6,dtype=th.float32).view(4,5,6)
which is a 4x5x6 tensor, meaning four 5x6 images stacked vertically.
I want to convolve each layer with its own 2-D kernel so that
I_{out,j} = k_j*I_{in,j}, j=(1...4)
I can obviously do this with a for loop, but I'd like to take advantage of GPU acceleration and do all the convolutions at the same time. No matter what I try, I've only been able to use torch's conv2d or conv3d to produce a single output layer that is the sum of all the 2d convolutions. Or I can make 4 layers where each is the same sum of all the 2d convolutions. Here's a concrete example. Let's use im as defined above. Say that the kernel is defined by
k=th.zeros((4,3,3),dtype=th.float32)
n=-1
for i in range(2):
for j in range(2):
n+=1
k[n,i,j]=1
k[n,2,2]=1
print(k)
tensor([[[1., 0., 0.],
[0., 0., 0.],
[0., 0., 1.]],
[[0., 1., 0.],
[0., 0., 0.],
[0., 0., 1.]],
[[0., 0., 0.],
[1., 0., 0.],
[0., 0., 1.]],
[[0., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]]])
and from above, im is
tensor([[[ 0., 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10., 11.],
[ 12., 13., 14., 15., 16., 17.],
[ 18., 19., 20., 21., 22., 23.],
[ 24., 25., 26., 27., 28., 29.]],
[[ 30., 31., 32., 33., 34., 35.],
[ 36., 37., 38., 39., 40., 41.],
[ 42., 43., 44., 45., 46., 47.],
[ 48., 49., 50., 51., 52., 53.],
[ 54., 55., 56., 57., 58., 59.]],
[[ 60., 61., 62., 63., 64., 65.],
[ 66., 67., 68., 69., 70., 71.],
[ 72., 73., 74., 75., 76., 77.],
[ 78., 79., 80., 81., 82., 83.],
[ 84., 85., 86., 87., 88., 89.]],
[[ 90., 91., 92., 93., 94., 95.],
[ 96., 97., 98., 99., 100., 101.],
[102., 103., 104., 105., 106., 107.],
[108., 109., 110., 111., 112., 113.],
[114., 115., 116., 117., 118., 119.]]])
The right answer is easy if I do the for loop:
import torch.functional as F
for i in range(4):
print(F.conv2d(im[i].expand(1,1,5,6),k[i].expand(1,1,3,3)))
tensor([[[[14., 16., 18., 20.],
[26., 28., 30., 32.],
[38., 40., 42., 44.]]]])
tensor([[[[ 75., 77., 79., 81.],
[ 87., 89., 91., 93.],
[ 99., 101., 103., 105.]]]])
tensor([[[[140., 142., 144., 146.],
[152., 154., 156., 158.],
[164., 166., 168., 170.]]]])
tensor([[[[201., 203., 205., 207.],
[213., 215., 217., 219.],
[225., 227., 229., 231.]]]])
As I noted earlier, the only thing I've been able to get is one sum of those four output images (or four copies of the same summed layer):
F.conv2d(im.expand(1,4,5,6),k.expand(1,4,3,3))
tensor([[[[430., 438., 446., 454.],
[478., 486., 494., 502.],
[526., 534., 542., 550.]]]])
I'm certain that what I want to do is possible, I just haven't been able to wrap my head around it yet. Does anyone have a solution to offer?
This is pretty straight forward if you use a grouped convolution.
From the nn.Conv2d documentation
At groups=in_channels, each input channel is convolved with its own
set of filters
Which is exactly what we want.
The shape of the weights argument to F.conv2d needs to be considered since it changes depending on the value of groups. The first dimension of weights should just be out_channels, which is 4 in this case. The second dimension according to F.conv2d docs should be in_channels / groups, which is 1. So we can perform the operation using
F.conv2d(im.unsqueeze(0), k.unsqueeze(1), groups=4).squeeze(0)
which produces a tensor of shape [4,3,4] with values
tensor([[[ 14., 16., 18., 20.],
[ 26., 28., 30., 32.],
[ 38., 40., 42., 44.]],
[[ 75., 77., 79., 81.],
[ 87., 89., 91., 93.],
[ 99., 101., 103., 105.]],
[[140., 142., 144., 146.],
[152., 154., 156., 158.],
[164., 166., 168., 170.]],
[[201., 203., 205., 207.],
[213., 215., 217., 219.],
[225., 227., 229., 231.]]])

Image composition based on "pattern matrix"

I want to stitch together c rgb images in numpy resulting into a larger image. Images are represented as numpy arrays. I however, do have the following constraints:
I do want to stitch together c rgb images in shape n * m = c, stored in a dictionary, here an example of a dictionary containing 21 images:
c_images = { 5:numpy.array[[x,y,3]], 1:numpy.array[[x,y,3]], ... 21:numpy.array[[x,y,3]]}
I have a "pattern" matrix of size n * m = c, in which each the indexes of the images 0...c are scattered randomly. A randomly generated "pattern matrix" of size n = 3, m = 7, c = 21looks like the following
P_matrix = [[14, 3, 19, 5, 16, 18, 0],
[17, 1, 13, 7, 6, 15, 11],
[4, 9, 10, 12, 8, 20, 2 ]]
What would be the best way to use the pattern matrix, to compose a larger numpy array, based on c images ?
Maybe np.block can help you. Not all the images have to be the same size but they need to fit together. First some example data:
import numpy as np
n, m = 3, 4
img_size = np.array([3, 3])
img_list = [np.zeros(img_size)+i for i in range(n*m)]
# [array([[1., 1., 1.], array([[2., 2., 2.], ...array([[12., 12., 12.],
# [1., 1., 1.], [2., 2., 2.], [12., 12., 12.],
# [1., 1., 1.]]), [2., 2., 2.]]),, [12., 12., 12.]])
rnd_idx = np.random.permutation(range(n*m)).reshape((n, m))
# array([[ 9, 10, 0, 4],
# [ 3, 11, 6, 5],
# [ 2, 8, 7, 1]])
Than you need to create a nested list of your images based on the given pattern, np.block does the rest for you:
img_list_nested = [[img_list[col] for col in rows] for rows in rnd_idx]
img = np.block(img_list_nested)
# array([[ 9., 9., 9., 10., 10., 10., 0., 0., 0., 4., 4., 4.],
# [ 9., 9., 9., 10., 10., 10., 0., 0., 0., 4., 4., 4.],
# [ 9., 9., 9., 10., 10., 10., 0., 0., 0., 4., 4., 4.],
# [ 3., 3., 3., 11., 11., 11., 6., 6., 6., 5., 5., 5.],
# [ 3., 3., 3., 11., 11., 11., 6., 6., 6., 5., 5., 5.],
# [ 3., 3., 3., 11., 11., 11., 6., 6., 6., 5., 5., 5.],
# [ 2., 2., 2., 8., 8., 8., 7., 7., 7., 1., 1., 1.],
# [ 2., 2., 2., 8., 8., 8., 7., 7., 7., 1., 1., 1.],
# [ 2., 2., 2., 8., 8., 8., 7., 7., 7., 1., 1., 1.]])

Proper projection of 3D stacked bar chart values using colors in Python

I am trying to plot the no of pending and declined applications by month.
I am having 2 difficulties.
1.My values on Z axis don match with the values in 'dz' list. For example, if you check the array 'z0', the no.of pending applications is 82 for December. But in the plot you can see that the highest value is below 82 against December. Similarly,Jan has 31 pending but when plotted the value is well below it.This variation exists for all the values plotted.
2.I have many zeros in the list. I would like to add a different color(say transparent) and hide the appearance of zero values in the plot. This is to ensure my plot is clear and I see actual values. I am not sure how to do it inside the for loop as I am stacking values.
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime,date
import calendar
from itertools import cycle, islice
#from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d import axes3d
import numpy as np
# Set plotting style
plt.style.use('seaborn-white')
dz=[]
z0 = np.array([ 31., 23., 11., 8., 7., 6., 6., 6., 5., 4.,
3., 1., 0., 21., 13., 7., 4., 3., 3., 3.,
3., 1., 0., 0., 0., 0., 22., 11., 4., 2.,
1., 1., 0., 0., 0., 0., 0., 0., 0., 38.,
26., 16., 15., 9., 8., 6., 4., 2., 0., 0.,
0., 0., 47., 26., 21., 11., 9., 7., 6., 4.,
0., 0., 0., 0., 0., 51., 31., 17., 14., 9.,
6., 5., 0., 0., 0., 0., 0., 0., 33., 25.,
14., 4., 4., 4., 0., 0., 0., 0., 0., 0.,
0., 35., 24., 14., 9., 5., 0., 0., 0., 0.,
0., 0., 0., 0., 72., 55., 41., 20., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 50., 27., 15.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
77., 44., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 82.])
dz.append(z0)
z1 =[ 14., 5., 8., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 13., 7., 2., 1., 1., 0., 0., 0., 0.,
0., 0., 0., 0., 14., 8., 4., 0., 1., 0., 0.,
0., 0., 0., 0., 0., 0., 19., 3., 5., 0., 2.,
0., 0., 0., 0., 0., 0., 0., 0., 11., 13., 3.,
3., 0., 0., 0., 0., 0., 0., 0., 0., 0., 19.,
10., 3., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 13., 2., 1., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 10., 2., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 11., 1., 3., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 14., 0., 1., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 12., 3.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
15.]
dz.append(z1)
_zpos = z0*0
xlabels = pd.Index(['01', '02', '03', '04', '05', '06', '07', '08', '09',
'10', '11', '12'], dtype='object', name='dates')
ylabels = pd.Index(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug',
'Sep', 'Oct', 'Nov', 'Dec'], dtype='object',
name='Month')
x = np.arange(xlabels.shape[0])
y = np.arange(ylabels.shape[0])
x_M, y_M = np.meshgrid(x, y, copy=False)
fig = plt.figure(figsize=(15, 15))
ax = fig.add_subplot(111, projection='3d')
# Making the intervals in the axes match with their respective entries
ax.w_xaxis.set_ticks(x + 0.5/2.)
ax.w_yaxis.set_ticks(y + 0.5/2.)
# Renaming the ticks as they were before
ax.w_xaxis.set_ticklabels(xlabels)
ax.w_yaxis.set_ticklabels(ylabels)
# Labeling the 3 dimensions
ax.set_xlabel('Months Taken')
ax.set_ylabel('Month created')
ax.set_zlabel('Count')
# Choosing the range of values to be extended in the set colormap
values = np.linspace(0.2, 1., x_M.ravel().shape[0])
# Selecting an appropriate colormap
colors = ['#FFC04C', '#ee2f2f', '#3e9a19',
'#599be5','#bf666f','#a235bf','#848381','#fb90d6','#fb9125']
for i in range(2):
ax.bar3d(x_M.ravel(), y_M.ravel(), _zpos, dx=0.3, dy=0.3, dz=dz[i],
color=colors[i])
_zpos += dz[i]
#plt.gca().invert_xaxis()
#plt.gca().invert_yaxis()
Pending_proxy = plt.Rectangle((0, 0), 1, 1, fc="#FFC04C90")
Declined_proxy = plt.Rectangle((0, 0), 1, 1, fc="#ee2f2f90")
ax.legend([Pending_proxy,
Declined_proxy],['Pending',
'Declined',
])
plt.show()

Resources