Expand the tensor by several dimensions - pytorch

In PyTorch, given a tensor of size=[3], how to expand it by several dimensions to the size=[3,2,5,5] such that the added dimensions have the corresponding values from the original tensor. For example, making size=[3] vector=[1,2,3] such that the first tensor of size [2,5,5] has values 1, the second one has all values 2, and the third one all values 3.
In addition, how to expand the vector of size [3,2] to [3,2,5,5]?
One way to do it I can think is by means of creating a vector of the same size with ones-Like and then einsum but I think there should be an easier way.

You can first unsqueeze the appropriate number of singleton dimensions, then expand to a view at the target shape with torch.Tensor.expand:
>>> x = torch.rand(3)
>>> target = [3,2,5,5]
>>> x[:, None, None, None].expand(target)
A nice workaround is to use torch.Tensor.reshape or torch.Tensor.view to do perform multiple unsqueezing:
>>> x.view(-1, 1, 1, 1).expand(target)
This allows for a more general approach to handle any arbitrary target shape:
>>> x.view(len(x), *(1,)*(len(target)-1)).expand(target)
For an even more general implementation, where x can be multi-dimensional:
>>> x = torch.rand(3, 2)
# just to make sure the target shape is valid w.r.t to x
>>> assert list(x.shape) == list(target[:x.ndim])
>>> x.view(*x.shape, *(1,)*(len(target)-x.ndim)).expand(target)

Related

Slicing a tensor with a dimension varying

I'm trying to slice a PyTorch tensor my_tensor of dimensions s x b x c so that the slicing along the first dimension varies according to a tensor indices of length b, to the effect of:
my_tensor[0:indices, torch.arange(0, b, dtype=torch.long), :] = something
The code above doesn't work and receives the error TypeError: tuple indices must be integers or slices, not tuple.
What I'm aiming for is, for example, if indices = torch.tensor([3, 5, 4]) then:
my_tensor[0:3, 0, :] = something
my_tensor[0:5, 1, :] = something
my_tensor[0:4, 2, :] = something
I'm hoping for a tensorized way to do this so I don't have to resort to a for loop. Also, the method needs to be compatible with TorchScript. Thanks very much.

How to convert signal data set of 400 samples with 5000 data points into a tensor of [400, 1, 5000] in pytorch?

I have 400 sensor recordings and each one is having length of 5000. I want to convert it into a tensor of [400,1,5000] or [batch_size, input_channels, signal_length] for a ML problem to train a 1DCNN network by using pytorch nn.Conv1d.
This operation is often referred to as unsqueezing a dimension. There are multiple ways of achieving this, either with an explicit reshape, or with slicing tricks.
Using torch.Torch.unsqueeze, either out-of-place:
>>> x.unsqueeze(dim=1) # won't affect x
Or in-place with torch.Tensor.unsqueeze_:
>>> x.unsqueeze_(dim=1) # will mutate x
Using indexing:
>>> x[:, None] # will insert a singleton at dim=1
Reshaping the tensor with torch.Tensor.reshape:
>>> x.reshape(len(x), 1, -1)
This is not the recommended method as it doesn't generalize. In my opinion, you should not use reshape or view if you are not actually reshaping the tensor.

slice Pytorch tensors which are saved in a list

I have the following code segment to generate random samples. The generated samples is a list, where each entry of the list is a tensor. Each tensor has two elements. I would like to extract the first element from all tensors in the list; and extract the second element from all tensors in the list as well. How to perform this kind of tensor slice operation
import torch
import pyro.distributions as dist
num_samples = 250
# note that both covariance matrices are diagonal
mu1 = torch.tensor([0., 5.])
sig1 = torch.tensor([[2., 0.], [0., 3.]])
dist1 = dist.MultivariateNormal(mu1, sig1)
samples1 = [pyro.sample('samples1', dist1) for _ in range(num_samples)]
samples1
I'd recommend torch.cat with a list comprehension:
col1 = torch.cat([t[0] for t in samples1])
col2 = torch.cat([t[1] for t in samples1])
Docs for torch.cat: https://pytorch.org/docs/stable/generated/torch.cat.html
ALTERNATIVELY
You could turn your list of 1D tensors into a single big 2D tensor using torch.stack, then do a normal slice:
samples1_t = torch.stack(samples1)
col1 = samples1_t[:, 0] # : means all rows
col2 = samples1_t[:, 1]
Docs for torch.stack: https://pytorch.org/docs/stable/generated/torch.stack.html
I should mention PyTorch tensors come with unpacking out of the box, this means you can unpack the first axis into multiple variables without additional considerations. Here torch.stack will output a tensor of shape (rows, cols), we just need to transpose it to (cols, rows) and unpack:
>>> c1, c2 = torch.stack(samples1).T
So you get c1 and c2 shaped (rows,):
>>> c1
tensor([0.6433, 0.4667, 0.6811, 0.2006, 0.6623, 0.7033])
>>> c2
tensor([0.2963, 0.2335, 0.6803, 0.1575, 0.9420, 0.6963])
Other answers that suggest .stack() or .cat() are perfectly fine from PyTorch perspective.
However, since the context of the question involves pyro, may I add the following:
Since you are doing IID samples
[pyro.sample('samples1', dist1) for _ in range(num_samples)]
A better way to do it with pyro is
dist1 = dist.MultivariateNormal(mu1, sig1).expand([num_samples])
This tells pyro that the distribution is batched with a batch size of num_samples. Sampling from this will produce
>> dist1.sample()
tensor([[-0.8712, 6.6087],
[ 1.6076, -0.2939],
[ 1.4526, 6.1777],
...
[-0.0168, 7.5085],
[-1.6382, 2.1878]])
Now its easy to solve your original question. Just slice it like
samples = dist1.sample()
samples[:, 0] # all first elements
samples[:, 1] # all second elements

Using a subset of classes in ImageNet

I'm aware that subsets of ImageNet exist, however they don't fulfill my requirement. I want 50 classes at their native ImageNet resolutions.
To this end, I used torch.utils.data.dataset.Subset to select specific classes from ImageNet. However, it turns out, class labels/indices must be greater than 0 and less than num_classes.
Since ImageNet contains 1000 classes, the idx of my selected classes quickly goes over 50. How can I reassign the class indices and do so in a way that allows for evaluation later down the road as well?
Is there a way more elegant way to select a subset?
I am not sure I understood your conclusions about labels being greater than zero and less than num_classes. The torch.utils.data.Subset helper takes in a torch.utils.data.Dataset and a sequence of indices, they correspond to indices of data points from the Dataset you would like to keep in the subset. These indices have nothing to do with the classes they belong to.
Here's how I would approach this:
Load your dataset through torchvision.datasets (custom datasets would work the same way). Here I will demonstrate it with FashionMNIST since ImageNet's data is not made available directly through torchvision's API.
>>> ds = torchvision.datasets.FashionMNIST('.')
>>> len(ds)
60000
Define the classes you want to select for the subset dataset. And retrieve all indices from the main dataset which correspond to these classes:
>>> targets = [1, 3, 5, 9]
>>> indices = [i for i, label in enumerate(ds.targets) if label in targets]
You have your subset:
>>> ds_subset = Subset(ds, indices)
>>> len(ds_subset)
24000
At this point, you can use a dictionnary to remap your labels using targets:
>>> remap = {i:x for i, x in enumerate(targets)}
{0: 1, 1: 3, 2: 5, 3: 9}
For example:
>>> x, y = ds_subset[10]
>>> y, remap[y] # old_label, new_label
1, 3

How can I extract nonzero values from tensor in keras

I'm trying to manipulate some data, in Python, inside a custom loss function in Tensorflow.keras
Consider the following example:
b = tf.constant([[0, 3, 1], [0, 5, 2]])
I would like to erase the zero column, or to extract the non-zero ones, such that the final result would be a tensor
[[3,1], [5,2]]
I tried with tf.where, using a mask, but it does not maintain the shape, it just return a 1D tensor with the non zero values.
Furthermore, I need this to work for an arbitrary number of row, the only thing fixed is the number of columns.
this selects all columns with a sum > 0:
tf.transpose(tf.gather_nd(tf.transpose(b), tf.where(tf.reduce_sum(b, axis=0)>0)))

Resources