concatenating multiple Numpy arrays in dictionaries - python-3.x

I have four dictionaries. And each value for each of the key in the dictionary is a 1D numpy array. I want to join all those numpy arrays into one. For example:
first dictionary = {'feature1': array([0., 0., 1., 0.]),
'feature2': array([0., 1., 0., 0.]),
'feature3': array([1., 0., 0.,0.,0.,0.])}
second dictionary = {'feature4': array([0.]),
'feature5': array([0., 0.]),
'feature6': array([0.023]),
'feature7': array([0.009]),
'feature8': array([0.])}
third dictionary = {'feature9': array([ 0., 0., 0., 912., 0., 0., 0.]),
'feature10': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),}
The resultant final numpy should look like:
array([0., 0., 1., 0.,0., 1., 0., 0.,1., 0., 0.,0.,0.,0.,
0.,0., 0.,0.023,0.009,0.,0.,
0., 0., 912., 0., 0., 0.,0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]
As an example I've made this dictionaries smaller but I have unto 50 keys in each of the dictionaries. So basically I want to join all of the numpy arrays in my dictionaries. How can I achieve this? Insights will be appreciated.

you could just do:
output = []
for dictionary in [firstdictionary,seconddictionary,thirddictionary]:
for key, value in dictionary.items():
output += list(value)
output_array = np.array(output)

Related

How padding works in PyTorch

Normally if I understood well PyTorch implementation of the Conv2D layer, the padding parameter will expand the shape of the convolved image with zeros to all four sides of the input. So, if we have an image of shape (6,6) and set padding = 2 and strides = 2 and kernel = (5,5), the output will be an image of shape (1,1). Then, padding = 2 will pad with zeroes (2 up, 2 down, 2 left and 2 right) resulting in a convolved image of shape (5,5)
However when running the following script :
import torch
from torch import nn
x = torch.ones(1,1,6,6)
y = nn.Conv2d(in_channels= 1, out_channels=1,
kernel_size= 5, stride = 2,
padding = 2,)(x)
I got the following outputs:
y.shape
==> torch.Size([1, 1, 3, 3]) ("So shape of convolved image = (3,3) instead of (5,5)")
y[0][0]
==> tensor([[0.1892, 0.1718, 0.2627, 0.2627, 0.4423, 0.2906],
[0.4578, 0.6136, 0.7614, 0.7614, 0.9293, 0.6835],
[0.2679, 0.5373, 0.6183, 0.6183, 0.7267, 0.5638],
[0.2679, 0.5373, 0.6183, 0.6183, 0.7267, 0.5638],
[0.2589, 0.5793, 0.5466, 0.5466, 0.4823, 0.4467],
[0.0760, 0.2057, 0.1017, 0.1017, 0.0660, 0.0411]],
grad_fn=<SelectBackward>)
Normally it should be filled with zeroes. I'm confused. Can anyone help please?
The input is padded, not the output. In your case, the conv2d layer will apply a two-pixel padding on all sides just before computing the convolution operation.
For illustration purposes,
>>> weight = torch.rand(1, 1, 5, 5)
Here we apply a convolution with padding=2:
>>> x = torch.ones(1,1,6,6)
>>> F.conv2d(x, weight, stride=2, padding=2)
tensor([[[[ 5.9152, 8.8923, 6.0984],
[ 8.9397, 14.7627, 10.8613],
[ 7.2708, 12.0152, 9.0840]]]])
And we don't use any padding but instead apply it ourselves on the input:
>>> x_padded = F.pad(x, (2,)*4)
tensor([[[[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1., 1., 1., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]]])
>>> F.conv2d(x_padded, weight, stride=2)
tensor([[[[ 5.9152, 8.8923, 6.0984],
[ 8.9397, 14.7627, 10.8613],
[ 7.2708, 12.0152, 9.0840]]]])

Batched index_fill in PyTorch

I have an index tensor of size (2, 3):
>>> index = torch.empty(6).random_(0,8).view(2,3)
tensor([[6., 3., 2.],
[3., 4., 7.]])
And a value tensor of size (2, 8):
>>> value = torch.zeros(2,8)
tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]])
I want to set the element in value to 1 by the index along dim=-1.** The output should be like:
>>> output
tensor([[0., 0., 1., 1., 0., 0., 1., 0.],
[0., 0., 0., 1., 1., 0., 0., 1.]])
I tried value[range(2), index] = 1 but it triggers an error. I also tried torch.index_fill but it doesn't accept batched indices. torch.scatter requires creating an extra tensor of size 2*8 full of 1, which consumes unnecessary memory and time.
You can actually use torch.Tensor.scatter_ by setting the value (int) option instead of the src option (Tensor).
>>> value.scatter_(dim=-1, index=index.long(), value=1)
>>> value
tensor([[0., 0., 1., 1., 0., 0., 1., 0.],
[0., 0., 0., 1., 1., 0., 0., 1.]])
Make sure the index is of type int64 though.

pytorch loss function for regression model with a vector of values

I'm training a CNN architecture to solve a regression problem using PyTorch where my output is a tensor of 25 values. The input/target tensor could be either all zeros or a gaussian distribution with a sigma value of 2. An example of a 4-sample batch is as this one:
[[0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534 ],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]
My question is how to design a loss function for the model effectively learn the regression output with 25 values.
I have tried 2 types of loss, torch.nn.MSELoss() and torch.nn.MSELoss()-torch.nn.CosineSimilarity(). They sort of work. However, sometimes the network has difficulty converging, especially when there are a lot of samples with all "zeros", which leads the network to output a vector with all 25 small values.
My question is, is there any other loss which we could try?
Your values do not seem widely different in scale so an MSELoss seems like it would work fine. Your model could be collapsing because of the many zeros in your target.
You can always try torch.nn.L1Loss() (but I do not expect it to be much better than torch.nn.MSELoss())
I suggest that you instead try to predict the gaussian mean/mu, and later try to re-create the gaussian for each sample if you really need it.
So you have two alternatives if you choose to try this method.
Alt 1
A good alternative is to encode your target to look like a classification target. Your 25 element vectors become a single value where the original target == 1 (possible classes will 0, 1, 2, ..., 24). We can then assign a sample that contains "only zeroes" as our last class "25". So your target:
[[0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534 ],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]
becomes
[4,
10,
20,
25]
If you do this, then you can try the common torch.nn.CrossEntropyLoss().
I do not know what your dataloader looks like but given a single sample in your original format, you can convert it to my proposed format with:
def encode(tensor):
if tensor.sum() == 0:
return len(tensor)
return torch.argmax(tensor)
and back to a gaussian with:
def decode(value):
n_values = 25
zero = torch.zeros(n_values)
if value == n_values:
return zero
# Create gaussian around value
std = 2
n = torch.arange(n_values) - value
sig = 2*std**2
gauss = torch.exp(-n**2 / sig2)
# Only return 9 values from the gaussian
start_ix = max(value-6, 0)
end_ix = min(value+7,n_values)
zero[start_ix:end_ix] = gauss[start_ix:end_ix]
return zero
(Note I have not tried them with batches, only samples)
Alt 2
The second option is to change your regression targets (still only the argmax positions (mu)) to a nicer regression value in the range 0-1 and have a separate neuron that outputs a "mask value" (also 0-1). Then your batch of:
[[0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534, 0.043937, 0.011109, 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.13534, 0.32465, 0.60653, 0.8825, 1.0000, 0.88250,0.60653, 0.32465, 0.13534 ],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]
becomes
# [Mask, mu]
[
[1, 0.1666], # True, 4/24
[1, 0.4166], # True, 10/24
[1, 0.8333], # True, 20/24
[0, 0] # False, undefined
]
If you are using this setup, then you should be able to use an MSELoss with modification:
def custom_loss(input, target):
# Assume target and input is of shape [Batch, 2]
mask = target[...,1]
mask_loss = torch.nn.functional.mse_loss(input[...,0], target[...,0])
mu_loss = torch.nn.functional.mse_loss(mask*input[...,1], mask*target[...,1])
return (mask_loss + mu_loss) / 2
This loss would only look at the 2nd value (mu) if the mask of the target is 1. Otherwise it only tried to optimize for the correct mask.
To encode to this format you would use:
def encode(tensor):
n_values = 25
if tensor.sum() == 0:
return torch.tensor([0,0])
return torch.argmax(tensor) / (n_values-1)
and to decode:
def decode(tensor):
n_values = 25
# Parse values
mask, value = tensor
mask = torch.round(mask)
value = torch.round((n_values-1)*value)
zero = torch.zeros(n_values)
if mask == 0:
return zero
# Create gaussian around value
std = 2
n = torch.arange(n_values) - value
sig = 2*std**2
gauss = torch.exp(-n**2 / sig2)
# Only return 9 values from the gaussian
start_ix = max(value-6, 0)
end_ix = min(value+7,n_values)
zero[start_ix:end_ix] = gauss[start_ix:end_ix]
return zero

How to build a seq2seq model for ASR, using mfcc vectors and corresponding word embedding vectors of the transcripts as the input and output data?

I am trying to build a voice to text model without using existing speech recognition libraries. I am using common-voice dataset from mozilla. I have done the data preprocessing where I extracted mfcc features from the input audio files and also used word embeddings to get vectors for the transcripts.
mfcc_X_train : mfcc vectors from audio files
array([[-2.59124781e+02, 1.13265526e+02, 1.30979551e+01, ...,
-2.79187146e+00, 1.82840353e+00, -8.83761218e-01],
[-4.37804550e+02, 1.09338910e+02, 1.27755069e+01, ...,
2.80325980e-02, -3.02936100e+00, -4.85614372e+00],
[-4.20299606e+02, 5.03662679e+01, 5.93071849e+00, ...,
2.72814692e+00, -1.02385068e+01, -1.51062112e+00],
...,
[-3.91306660e+02, 5.17953868e+01, 1.03543497e+01, ...,
-4.19143153e+00, -8.23613404e+00, -6.86574230e+00],
[-3.62376932e+02, 6.76604652e+01, 1.77715018e+01, ...,
-8.71072342e-01, -4.66138009e+00, -4.56961645e+00],
[-3.86323644e+02, 1.14792009e+02, -1.33781946e+01, ...,
-1.60223182e-01, -7.69392168e+00, -3.41955318e+00]])
y_train : one hot representations of the embedding vectors
array([[[1., 0., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
[[1., 0., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 1., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
...,
I am stuck in building a seq2seq model for this. Can anyone help how to build a seq2seq model for this use case.
You can try to build model like this: Neural machine translation with attention. The mfcc vectors is like word embeding of encoder input. Set the input for fitting the model.

How to keep using values from a list until the diagonal of a matrix is full using itertools

So I am trying to use a smaller list to populate the diagonal of a larger matrix. I thought using the cycle function in itertools would make this an easy task but I can't seem to get it to work. Here is what I tried
a = np.zeros((10,10))
b = [1, 2, 3, 4, 5]
for i in range(len(a.shape[0])):
a[i, i] = list(itertools.cycle(b))
but this makes it endlessly iterate. I am hoping that it will stop once the diagonal has been filled. Other options that are more pythonic are greatly appreciated!
you mean to use itertools.cycle, not repeat. The latter repeats the element (the list), good luck setting that into a value, specially if you force iteration (since it runs forever)
I'd create a reference on a cycle object outside the loop and assign a value to the diagonal iterating over it manually (the only proper way with cycle). Also note that your loop range was wrong. a.shape[0] is a dimension, no need for len
import numpy as np,itertools
a = np.zeros((10,10))
b = [1, 2, 3, 4, 5]
iterator = itertools.cycle(b)
for i in range(a.shape[0]):
a[i, i] = next(iterator)
result:
>>> a
array([[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 2., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 3., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 4., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 5., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 2., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 3., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 4., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 5.]])
As they loop forever, cycle and repeat should not be used in a context of forced iteration (repeat has an optional parameter to limit the repeats, though).

Resources