Related
Normally if I understood well PyTorch implementation of the Conv2D layer, the padding parameter will expand the shape of the convolved image with zeros to all four sides of the input. So, if we have an image of shape (6,6) and set padding = 2 and strides = 2 and kernel = (5,5), the output will be an image of shape (1,1). Then, padding = 2 will pad with zeroes (2 up, 2 down, 2 left and 2 right) resulting in a convolved image of shape (5,5)
However when running the following script :
import torch
from torch import nn
x = torch.ones(1,1,6,6)
y = nn.Conv2d(in_channels= 1, out_channels=1,
kernel_size= 5, stride = 2,
padding = 2,)(x)
I got the following outputs:
y.shape
==> torch.Size([1, 1, 3, 3]) ("So shape of convolved image = (3,3) instead of (5,5)")
y[0][0]
==> tensor([[0.1892, 0.1718, 0.2627, 0.2627, 0.4423, 0.2906],
[0.4578, 0.6136, 0.7614, 0.7614, 0.9293, 0.6835],
[0.2679, 0.5373, 0.6183, 0.6183, 0.7267, 0.5638],
[0.2679, 0.5373, 0.6183, 0.6183, 0.7267, 0.5638],
[0.2589, 0.5793, 0.5466, 0.5466, 0.4823, 0.4467],
[0.0760, 0.2057, 0.1017, 0.1017, 0.0660, 0.0411]],
grad_fn=<SelectBackward>)
Normally it should be filled with zeroes. I'm confused. Can anyone help please?
The input is padded, not the output. In your case, the conv2d layer will apply a two-pixel padding on all sides just before computing the convolution operation.
For illustration purposes,
>>> weight = torch.rand(1, 1, 5, 5)
Here we apply a convolution with padding=2:
>>> x = torch.ones(1,1,6,6)
>>> F.conv2d(x, weight, stride=2, padding=2)
tensor([[[[ 5.9152, 8.8923, 6.0984],
[ 8.9397, 14.7627, 10.8613],
[ 7.2708, 12.0152, 9.0840]]]])
And we don't use any padding but instead apply it ourselves on the input:
>>> x_padded = F.pad(x, (2,)*4)
tensor([[[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]]])
>>> F.conv2d(x_padded, weight, stride=2)
tensor([[[[ 5.9152, 8.8923, 6.0984],
[ 8.9397, 14.7627, 10.8613],
[ 7.2708, 12.0152, 9.0840]]]])
Suppose I have two tensors S and T defined as:
S = torch.rand((3,2,1))
T = torch.ones((3,2,1))
We can think of these as containing batches of tensors with shapes (2, 1). In this case, the batch size is 3.
I want to concatenate all possible pairings between batches. A single concatenation of batches produces a tensor of shape (4, 1). And there are 3*3 combinations so ultimately, the resulting tensor C must have a shape of (3, 3, 4, 1).
One solution is to do the following:
for i in range(S.shape[0]):
for j in range(T.shape[0]):
C[i,j,:,:] = torch.cat((S[i,:,:],T[j,:,:]))
But the for loop doesn't scale well to large batch sizes. Is there a PyTorch command to do this?
I don't know of any command out-of-the-box that does such operation. However, you can pull it off in a straightforward way using a single matrix multiplication.
The trick is to construct a tensor containing all pairs of batch elements by starting from already stacked S,T tensor. Then by multiplying it with a properly chosen mask tensor... In this method, keeping track of shapes and dimension sizes is essential.
The stack is given by (notice the reshape, we essentially flatten the batch elements from S and T into a single batch axis on ST):
>>> ST = torch.stack((S, T)).reshape(6, 2)
>>> ST
tensor([[0.7792, 0.0095],
[0.1893, 0.8159],
[0.0680, 0.7194],
[1.0000, 1.0000],
[1.0000, 1.0000],
[1.0000, 1.0000]]
# ST.shape = (6, 2)
You can retrieve all (S[i], T[j]) pairs using range and itertools.product:
>>> indices = torch.tensor(list(product(range(0, 3), range(3, 6))))
tensor([[0, 3],
[0, 4],
[0, 5],
[1, 3],
[1, 4],
[1, 5],
[2, 3],
[2, 4],
[2, 5]])
# indices.shape = (9, 2)
From there, we construct one-hot-encodings of the indices using torch.nn.functional.one_hot:
>>> mask = one_hot(indices).float()
tensor([[[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.]],
[[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0.]],
[[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1.]],
[[0., 1., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.]],
[[0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0.]],
[[0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1.]],
[[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.]],
[[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 1., 0.]],
[[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 1.]]])
# mask.shape = (9, 2, 6)
Finally, we compute the matrix multiplication and reshape it to the final form:
>>> (mask#ST).reshape(3, 3, 4, 1)
tensor([[[[0.7792],
[0.0095],
[1.0000],
[1.0000]],
[[0.7792],
[0.0095],
[1.0000],
[1.0000]],
[[0.7792],
[0.0095],
[1.0000],
[1.0000]]],
[[[0.1893],
[0.8159],
[1.0000],
[1.0000]],
[[0.1893],
[0.8159],
[1.0000],
[1.0000]],
[[0.1893],
[0.8159],
[1.0000],
[1.0000]]],
[[[0.0680],
[0.7194],
[1.0000],
[1.0000]],
[[0.0680],
[0.7194],
[1.0000],
[1.0000]],
[[0.0680],
[0.7194],
[1.0000],
[1.0000]]]])
I initially went with torch.einsum: torch.einsum('bf,pib->pif', ST, mask). But, later realized than that bf,pib->pif reduces nicely to a simple torch.Tensor.matmul operation if we switch the two operands: i.e. with pib,bf->pif (subscript b is reduced in the middle).
In numpy something called np.meshgrid is used.
https://stackoverflow.com/a/35608701/3259896
So in pytorch, it would be
torch.stack(
torch.meshgrid(x, y)
).T.reshape(-1,2)
Where x and y are your two lists. You can use any number. x, y , z, etc.
And then you reshape it to the number of lists you use.
So if you used three lists, use .reshape(-1,3), for four use .reshape(-1,4), etc.
So for 5 tensors, use
torch.stack(
torch.meshgrid(a, b, c, d, e)
).T.reshape(-1,5)
I am working on a project that involves the use of NDCG (normalized distributed cumulative gain), and I understand the method's underlying calculations.
So I imported ndcg_score from sklearn.metrics, and then pass in a ground truth array and another array to the ndcg_score function to calculate their NDCG score. The ground truth array has the values [5, 4, 3, 2, 1] while the other array has the values [5, 4, 3, 2, 0], so only the last element is different in these 2 arrays.
from sklearn.metrics import ndcg_score
user_ndcg = ndcg_score(array([[5, 4, 3, 2, 1]]), array([[5, 4, 3, 2, 0]]))
I was expecting the result to be around 0.96233 (9.88507/10.27192). However, user_ndcg actually returned 1.0, which surprised me. Initially I thought this is due to rounding, but this is not the case because when I did an experiment on another set of array: ndcg_score(array([[5, 4, 3, 2, 1]]), array([[5, 4, 0, 2, 0]])), it correctly returned 0.98898.
Does anyone know whether this could be a bug with the sklearn ndcg_score function, or whether I was doing something wrong with my code?
I am assuming you are trying to predict six different classes for this problem (0, 1, 2, 3, 4 and 5). If you want to evaluate the ndcg for five different observations, you have to pass the function two arrays of shape (5, 6) each.
That is, you have transform your ground truth and predictions to arrays of five rows and six columns per row.
# Current form of ground truth and predictions
y_true = [5, 4, 3, 2, 1]
y_pred = [5, 4, 3, 2, 0]
# Transform ground truth to ndarray
y_true_nd = np.zeros(shape=(5, 6))
y_true_nd[np.arange(5), y_true] = 1
# Transform predictions to ndarray
y_pred_nd = np.zeros(shape=(5, 6))
y_pred_nd[np.arange(5), y_pred] = 1
# Calculate ndcg score
ndcg_score(y_true_nd, y_pred_nd)
> 0.8921866522394966
Here's what y_true_nd and y_pred_nd look like:
y_true_nd
array([[0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1., 0.],
[0., 0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.]])
y_pred_nd
array([[0., 0., 0., 0., 0., 1.],
[0., 0., 0., 0., 1., 0.],
[0., 0., 0., 1., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[1., 0., 0., 0., 0., 0.]])
Keras offers a couple of helper functions to process text:
texts_to_sequences and texts_to_matrix
It seems that most people use texts_to_sequences, but it is unclear to me why one is picked over the other and under what conditions you might want to use texts_to_matrix.
texts_to_matrix is easy to understand. It will convert texts to a matrix with columns refering to words and cells carrying number of occurrence or presence. Such a design will be useful for direct application of ML algorithms (logistic regression, decision tree, etc.)
texts_to_sequence will create lists that are collection of integers representing words. Certain functions like Keras-embeddings require this format for preprocessing.
Consider the example below.
txt = ['Python is great and useful', 'Python is easy to learn', 'Python is easy to implement']
txt = pd.Series(txt)
tok = Tokenizer(num_words=10)
tok.fit_on_texts(txt)
mat_texts = tok.texts_to_matrix(txt, mode='count')
mat_texts
Output:
array([[0., 1., 1., 0., 0., 1., 1., 1., 0., 0.],
[0., 1., 1., 1., 1., 0., 0., 0., 1., 0.],
[0., 1., 1., 1., 1., 0., 0., 0., 0., 1.]])
tok.get_config()['word_index']
Output:
'{"python": 1, "is": 2, "easy": 3, "to": 4, "great": 5, "and": 6, "useful": 7, "learn": 8, "implement": 9}'
mat_texts_seq = tok.texts_to_sequences(txt)
mat_texts_seq
Output:-
[[1, 2, 5, 6, 7], [1, 2, 3, 4, 8], [1, 2, 3, 4, 9]]
Let us suppose I have a CNN with starting 2 layers as:
inp_conv = Conv2D(in_channels=1,out_channels=6,kernel_size=(3,3))
Please correct me if I am wrong but I think what this line if code can be thought as that
There is a single grayscale image coming as input where we have to use 6 different kernals of same size (3,3) to make 6 different feature maps from a single image.
And if I have a second Conv2D layer just after first one as
second_conv_connected_to_inp_conv = Conv2D(in_channels=6,out_channels=12,kernel_size=(3,3))
What does this mean in terms of out_channels? Is there going to be 12 new feature maps for each of the 6 feature maps coming as output from first layer OR are there going to be a total of 12 feature maps from 6 incoming features?
To increase 6 channels in your second convolution layer to 12 channels. We take 12 of 6x3x3 filters. Each 6x3x3 filter will give a single Channel as output when the dot product is performed. Since we are taking 12 of those 6x3x3 filters we will get exactly 12 channels as output. For more information check this link.
https://cs231n.github.io/convolutional-networks/#conv
Edit: Think of it in this way. we have 6 input channels i.e HxWx6 where H is height and W is the width of the image. Since there are 6 channels we take 6 3x3 filters(Assuming kernel size is 3). After dot product we again get 6 Channels. But Now we add all the resulting 6 channels to get a single channel. This operation is performed 12 times to get 12 Channels.
For each out_channel, you have a set of kernels for each in_channel.
Equivalently, each out_channel has an in_channel x height x width kernel:
for i in nn.Conv2d(in_channels=2, out_channels=3, kernel_size=(4, 5)).parameters():
print(i)
Output:
Parameter containing:
tensor([[[[-0.0012, 0.0848, -0.1301, -0.1164, -0.0609],
[ 0.0424, -0.0031, 0.1254, -0.0140, 0.0418],
[-0.0478, -0.0311, -0.1511, -0.1047, -0.0652],
[ 0.0059, 0.0625, 0.0949, -0.1072, -0.0689]],
[[ 0.0574, 0.1313, -0.0325, 0.1183, -0.0255],
[ 0.0167, 0.1432, -0.1467, -0.0995, -0.0400],
[-0.0616, 0.1366, -0.1025, -0.0728, -0.1105],
[-0.1481, -0.0923, 0.1359, 0.0706, 0.0766]]],
[[[ 0.0083, -0.0811, 0.0268, -0.1476, -0.1142],
[-0.0815, 0.0998, 0.0927, -0.0701, -0.0057],
[ 0.1011, 0.1572, 0.0628, 0.0214, 0.1060],
[-0.0931, 0.0295, -0.1226, -0.1096, -0.0817]],
[[ 0.0715, 0.0636, -0.0937, 0.0478, 0.0868],
[-0.0200, 0.0060, 0.0366, 0.0981, 0.1518],
[-0.1218, -0.0579, 0.0621, 0.1310, 0.1376],
[ 0.1395, 0.0315, -0.1375, 0.0145, -0.0989]]],
[[[-0.1474, 0.1405, 0.1202, -0.1577, 0.0296],
[-0.0266, -0.0260, -0.0724, 0.0608, -0.0937],
[ 0.0580, 0.0800, 0.1132, 0.0591, -0.1565],
[-0.1026, 0.0789, 0.0331, -0.1233, -0.0910]],
[[ 0.1487, 0.1065, -0.0689, -0.0398, -0.1506],
[-0.0028, -0.1191, -0.1220, -0.0087, 0.0237],
[-0.0648, 0.0938, -0.0962, 0.1435, 0.1084],
[-0.1333, -0.0394, 0.0071, 0.0231, 0.0375]]]], requires_grad=True)
Parameter containing:
tensor([ 0.0620, 0.0095, -0.0771], requires_grad=True)
A more detailed example going from 1 channel input, through 2 and 4 channel convolutions:
import torch
torch.manual_seed(0)
input0 = torch.randint(-1, 1, (1, 1, 8, 8)).type(torch.FloatTensor)
print('input0:', input0.size())
print(input0.data)
layer0 = nn.Conv2d(in_channels=1, out_channels=2, kernel_size=2, stride=2, padding=0, bias=False)
print('\nlayer1:')
for i in layer0.parameters():
print(i.size())
i.data = torch.randint(-1, 1, i.size()).type(torch.FloatTensor)
print(i.data)
output0 = layer0(input0)
print('\noutput0:', output0.size())
print(output0.data)
print('\nlayer1:')
layer1 = nn.Conv2d(in_channels=2, out_channels=4, kernel_size=2, stride=2, padding=0, bias=False)
for i in layer1.parameters():
print(i.size())
i.data = torch.randint(-1, 1, i.size()).type(torch.FloatTensor)
print(i.data)
output1 = layer1(output0)
print('\noutput1:', output1.size())
print(output1.data)
output:
input0: torch.Size([1, 1, 8, 8])
tensor([[[[-1., 0., 0., -1., 0., 0., 0., 0.],
[ 0., 0., 0., -1., -1., 0., -1., -1.],
[-1., -1., -1., 0., -1., 0., 0., -1.],
[-1., 0., 0., 0., 0., -1., 0., -1.],
[ 0., -1., 0., 0., -1., 0., 0., -1.],
[-1., 0., -1., 0., 0., 0., 0., 0.],
[-1., 0., -1., 0., 0., 0., 0., -1.],
[ 0., -1., -1., 0., 0., -1., 0., -1.]]]])
layer1:
torch.Size([2, 1, 2, 2])
tensor([[[[-1., -1.],
[-1., 0.]]],
[[[ 0., -1.],
[ 0., -1.]]]])
output0: torch.Size([1, 2, 4, 4])
tensor([[[[1., 1., 1., 1.],
[3., 1., 1., 1.],
[2., 1., 1., 1.],
[1., 2., 0., 1.]],
[[0., 2., 0., 1.],
[1., 0., 1., 2.],
[1., 0., 0., 1.],
[1., 0., 1., 2.]]]])
layer1:
torch.Size([4, 2, 2, 2])
tensor([[[[-1., -1.],
[-1., -1.]],
[[ 0., -1.],
[ 0., -1.]]],
[[[ 0., 0.],
[ 0., 0.]],
[[ 0., -1.],
[ 0., 0.]]],
[[[ 0., 0.],
[-1., 0.]],
[[ 0., -1.],
[-1., 0.]]],
[[[-1., -1.],
[-1., -1.]],
[[ 0., 0.],
[-1., -1.]]]])
output1: torch.Size([1, 4, 2, 2])
tensor([[[[-8., -7.],
[-6., -6.]],
[[-2., -1.],
[ 0., -1.]],
[[-6., -3.],
[-2., -2.]],
[[-7., -7.],
[-7., -6.]]]])
Breaking down the linear algebra:
np.sum(
# kernel for layer1, in_channel 0, out_channel 0
# multiplied by output0, channel 0, top left corner
(np.array([[-1., -1.],
[-1., -1.]]) * \
np.array([[1., 1.],
[3., 1.]])) + \
# kernel for layer1, in_channel 1, out_channel 0
# multiplied by output0, channel 1, top left corner
(np.array([[ 0., -1.],
[ 0., -1.]]) * \
np.array([[0., 2.],
[1., 0.]]))
)
This will be equal to output1, channel 0, top left corner:
-8.0