How to `dot` weights to batch data in PyTorch? - pytorch

I have batch data and want to dot() to the data. W is trainable parameters.
How to dot between batch data and weights?
hid_dim = 32
data = torch.randn(10, 2, 3, hid_dim)
data = data.view(10, 2*3, hid_dim)
W = torch.randn(hid_dim) # assume trainable parameters via nn.Parameter
result = torch.bmm(data, W).squeeze() # error, want (N, 6)
result = result.view(10, 2, 3)
Update
How about this one?
hid_dim = 32
data = torch.randn(10, 2, 3, hid_dim)
data = tdata.view(10, 2*3, hid_dim)
W = torch.randn(hid_dim, 1) # assume trainable parameters via nn.Parameter
W = W.unsqueeze(0).expand(10, hid_dim, 1)
result = torch.bmm(data, W).squeeze() # error, want (N, 6)
result = result.view(10, 2, 3)

Expand W tensor to match the shape of data tensor. The following should work.
hid_dim = 32
data = torch.randn(10, 2, 3, hid_dim)
data = data.view(10, 2*3, hid_dim)
W = torch.randn(hid_dim)
W = W.unsqueeze(0).unsqueeze(0).expand(*data.size())
result = torch.sum(data * W, 2)
result = result.view(10, 2, 3)
Edit: Your updated code is correct. Since you are converting W to a Bxhid_dimx1 and your data is of shape Bxdxhid_dim, so doing batch matrix multiplication will result in Bxdx1 which is essentially the dot product between W parameter and all the row vectors in data (dxhid_dim).

Related

(Pytorch) Why conv2d results are all different. Their data type is all integer, no float

I've tried and compared three different methods for convolution computation with a custom kernel in Pytorch. Their results are different but I don't understand why that is.
Setup code:
import torch
import torch.nn.functional as F
inp = torch.arange(3*500*700).reshape(1,3,500,700).to(dtype=torch.float32)
wgt = torch.ones((1,3,3,3)).to(dtype=torch.float32)
stride = 1
padding = 0
h = inp.shape[2] - wgt.shape[2] + 1
w = inp.shape[3] - wgt.shape[3] + 1
Method 1
out1 = torch.zeros((1,h,w)).to(dtype=torch.float32)
for o in range(1):
for i in range(3):
for j in range(h):
for k in range(w):
out1[o,j,k] = out1[o,j,k] + (inp[0, i, j*stride:j*stride+3, k*stride:k*stride+3] * wgt[0,i]).sum()
out1 = out1.to(dtype=torch.int)
Method 2
inp_unf = F.unfold(inp, (3,3))
out_unf = inp_unf.transpose(1,2).matmul(wgt.view(1,-1).t()).transpose(1,2)
out2 = F.fold(out_unf, (h,w), (1,1))
out2 = out2.to(dtype=torch.int)
Method 3
out3 = F.conv2d(inp, wgt, bias=None, stride=1, padding=0)
out3 = out3.to(dtype=torch.int)
And here are the results comparison:
>>> h*w
347604
>>> (out1==out2).sum().item()
327338
>>> (out2 == out3).sum().item()
344026
>>> (out1 == out3).sum().item()
330797
>>> out1.shape
(1, 498, 698)
>>> out2.shape
(1, 1, 498, 698)
>>> out3.shape
(1, 1, 498, 698)
Their data types are all int so floating point won't the result. When I use a squared input format such as h=500 and w=500, all three results are all matching. But not for non-squared inputs, such as the one above with h=500 and w=700. Any insight?
All three results are cast to integer data types but keep in mind their computation is done with float32... It is often preferred to check equality between two tensors using torch.isclose:
>>> torch.isclose(out1, out2).float().mean()
tensor(1.)
>>> torch.isclose(out2, out3).float().mean()
tensor(1.)
>>> torch.isclose(out1, out3).float().mean()
tensor(1.)

multiplying each element of a matrix by a vector (or array)

Say I have a an input array of size (64,100)
t = torch.randn((64,100))
Now say I want to multiply each of the 6400 elements of t with 6400 separate vectors each of size 256 to produce a tensor of size [64, 100, 256]. This is what I am doing currently -
import copy
def clones(module, N):
"Produce N identical layers."
return nn.ModuleList([copy.deepcopy(module) for _ in range(N)])
linears = clones(nn.Linear(1,256, bias=False), 6400)
idx = 0
t_final = []
for i in range(64):
t_bs = []
for j in range(100):
t1 = t[i, j] * linears[idx].weight.view(-1)
idx += 1
t_bs.append(t1)
t_bs = torch.cat(t_bs).view(1, 100, 256)
t_final.append(t_bs)
t_final = torch.cat(t_final)
print(t_final.shape)
Output: torch.Size([64, 100, 256])
Is there a faster and cleaner way of doing the same thing? I tried torch.matmul and torch.dot but couldn't do any better.
It seems broadcast is what you are looking for.
t = torch.randn((64,100)).view(6400, 1)
weights = torch.randn((6400, 256))
output = (t * weights).view(64, 100, 256)
You don't actually need to clone your linear layer if you really want to multiply tenor t with the same weight of linear layer for 6400 times. rather you can do the following:
t = torch.randn((64,100)).unsqueeze(-1)
w = torch.rand((256)).view(1,1,256).repeat(64, 100, 1)
#or
w = torch.stack(6400*[torch.rand((256))]).view(64,100,256)
result = t*w # shape: [64, 100, 256]
However, If your want to keep the same structure you currently have, then you can do something following:
t = torch.randn((64,100)).unsqueeze(-1)
w = torch.stack([linears[i].weight for i in range(len(linears))]).view(64,100,256)
result = t*w # shape: [64, 100, 256]

How to append a 2D numpy array to form a 3D array?

I have a 2D NumPy array of shape (20,87), now I want to append multiple arrays of such type such that after appending let's say 100 of those my final shape would be (100,20,87).
I tried normal appending by using numpy.append() but it did not work.
Edit:
I have attached the link to my colab notebook.
This is my code:
audio_sr = []
x_train = []
y_train = []
x_test = []
y_test = []
for data in path:
ipd.Audio(data)
i = 0
label = 0
while i <= 120 :
audio, sr = librosa.load(data, offset=i, duration=2)
audio_sr.append((audio, sr))
i += 2
label += 1
all_mel_db = []
for audio, sr in audio_sr:
mel_spec = librosa.feature.melspectrogram(audio, sr = sr, n_mels = 20)
mel_db = librosa.power_to_db(mel_spec, ref = np.max)
all_mel_db.append(mel_db)
x_train = []
x_test = np.array(x_test)
x_train = np.dstack(all_mel_db).reshape((len(all_mel_db), 20, 87))
And I got this error
ValueError: all the input array dimensions for the concatenation axis
must match exactly, but along dimension 1, the array at index 0 has
size 87 and the array at index 121 has size 66
on using np.append() after reshaping, I got
for mel_db in all_mel_db:
mel_db = np.reshape(1, mel_db.shape[0], mel_db.shape[1])
x_train = np.append(x_train, mel_db)
Error:
Non-string object detected for the array ordering. Please pass in 'C',
'F', 'A', or 'K' instead
And on simply converting to a NumPy array:
x_train = np.array(all_mel_db)
I got:
ValueError: could not broadcast input array from shape (20,87) into
shape (20)
You can use the numpy.dstack to produce output of shape (20, 87, N) by:
a = np.random.rand(20, 87)
b = np.random.rand(20, 87)
c = np.random.rand(20, 87)
d = np.random.rand(20, 87)
collection = np.dstack([a, b, c, d])
collection.shape # out: (20, 87, 4)
collection = np.dstack([a, b, c, d]).reshape((4, 20, 87))
collection.shape # out: (4, 20, 87)
Cheers.
You need to make all your audio segment features the same size and then you can stack them vertically. Example you files are in pkl format;
max_pad_len = 87 #max length features
def ft_extraction():
for item in dirs:
x, sr = librosa.load(path+item)
mfccs = librosa.feature.mfcc(x, sr=sr)
pad_width = max_pad_len - mfccs.shape[1] #eg 80 - 15
mfccs = np.pad(mfccs, pad_width=([0,0],[0,pad_width]), mode='constant', constant_values=0)
#print (mfccs)
pickle.dump(mfccs,open(path+item +'.pkl','wb'))
ft_extraction()
And then stack them;
# vertical stack all feature files
audio_all_3d = []
for file in os.listdir(newpath):
if file.endswith('.pkl'):
myfile = open(newpath+file,"rb")
audio_all_3d.append(pickle.load(myfile))
myfile.close()
aud_ft = np.stack(audio_all_3d)
You will get 3 Dimensional shape
(100,20,87)

Loop over tensor dimension 0 (NoneType) with second tensor values

I have a tensor a, I'd like to loop over the rows and index values based on another tensor l. i.e. l suggests the length of the vector I need.
sess = tf.InteractiveSession()
a = tf.constant(np.random.rand(3,4)) # shape=(3,4)
a.eval()
Out:
array([[0.35879311, 0.35347166, 0.31525201, 0.24089784],
[0.47296348, 0.96773956, 0.61336239, 0.6093023 ],
[0.42492552, 0.2556728 , 0.86135674, 0.86679779]])
l = tf.constant(np.array([3,2,4])) # shape=(3,)
l.eval()
Out:
array([3, 2, 4])
Expected output:
[array([0.35879311, 0.35347166, 0.31525201]),
array([0.47296348, 0.96773956]),
array([0.42492552, 0.2556728 , 0.86135674, 0.86679779])]
The tricky part is the fact that a could have None as first dimension since it's what is usually defined as batch size through placeholder.
I can not just use mask and condition as below since I need to compute the variance of each row individually.
condition = tf.sequence_mask(l, tf.reduce_max(l))
a_true = tf.boolean_mask(a, condition)
a_true
Out:
array([0.35879311, 0.35347166, 0.31525201, 0.47296348, 0.96773956,
0.42492552, 0.2556728 , 0.86135674, 0.86679779])
I also tried to use tf.map_fn but can't get it to work.
elems = (a, l)
tf.map_fn(lambda x: x[0][:x[1]], elems)
Any help will be highly appreciated!
TensorArray object can store tensors of different shapes. However, it is still not that simple. Take a look at this example that does what you want using tf.while_loop() with tf.TensorArray and tf.slice() function:
import tensorflow as tf
import numpy as np
batch_data = np.array([[0.35879311, 0.35347166, 0.31525201, 0.24089784],
[0.47296348, 0.96773956, 0.61336239, 0.6093023 ],
[0.42492552, 0.2556728 , 0.86135674, 0.86679779]])
batch_idx = np.array([3, 2, 4]).reshape(-1, 1)
x = tf.placeholder(tf.float32, shape=(None, 4))
idx = tf.placeholder(tf.int32, shape=(None, 1))
n_items = tf.shape(x)[0]
init_ary = tf.TensorArray(dtype=tf.float32,
size=n_items,
infer_shape=False)
def _first_n(i, ta):
ta = ta.write(i, tf.slice(input_=x[i],
begin=tf.convert_to_tensor([0], tf.int32),
size=idx[i]))
return i+1, ta
_, first_n = tf.while_loop(lambda i, ta: i < n_items,
_first_n,
[0, init_ary])
first_n = [first_n.read(i) # <-- extracts the tensors
for i in range(batch_data.shape[0])] # that you're looking for
with tf.Session() as sess:
res = sess.run(first_n, feed_dict={x:batch_data, idx:batch_idx})
print(res)
# [array([0.3587931 , 0.35347167, 0.315252 ], dtype=float32),
# array([0.47296348, 0.9677396 ], dtype=float32),
# array([0.4249255 , 0.2556728 , 0.86135674, 0.8667978 ], dtype=float32)]
Note
We still had to use batch_size to extract elements one by one from first_n TensorArray using read() method. We can't use any other method that returns Tensor because we have rows of different sizes (except TensorArray.concat method but it will return all elements stacked in one dimension).
If TensorArray will have less elements than index you pass to TensorArray.read(index) you will get InvalidArgumentError.
You can't use tf.map_fn because it returns a tensor that must have all elements of the same shape.
The task is simpler if you only need to compute variances of the first n elements of each row (without actually gather elements of different sizes together). In this case we could directly compute variance of sliced tensor, put it to TensorArray and then stack it to tensor:
n_items = tf.shape(x)[0]
init_ary = tf.TensorArray(dtype=tf.float32,
size=n_items,
infer_shape=False)
def _variances(i, ta, begin=tf.convert_to_tensor([0], tf.int32)):
mean, varian = tf.nn.moments(
tf.slice(input_=x[i], begin=begin, size=idx[i]),
axes=[0]) # <-- compute variance
ta = ta.write(i, varian) # <-- write variance of each row to `TensorArray`
return i+1, ta
_, variances = tf.while_loop(lambda i, ta: i < n_items,
_variances,
[ 0, init_ary])
variances = variances.stack() # <-- read from `TensorArray` to `Tensor`
with tf.Session() as sess:
res = sess.run(variances, feed_dict={x:batch_data, idx:batch_idx})
print(res) # [0.0003761 0.06120085 0.07217039]

how to batch a variable length spectogram in tensorflow

I have to train a denoising autoencoder but i need to batch the 5-frame noisy powerspectrum with 1 frame clean powerspectrum , but i dono how to batch the spectrogram since my data are all variable length in time-series.
def parse_line(noise_file,clean_file):
noise_binary = tf.read_file(noise_file)
noise_binary = tf.contrib.ffmpeg.decode_audio(noise_binary, file_format='wav', samples_per_second=16000, channel_count=1)
noise_stfts = tf.contrib.signal.stft(tf.reshape(noise_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
noise_powerspectrum = tf.log(tf.abs(noise_stfts)**2)
noise_data = tf.squeeze(tf.contrib.signal.frame(noise_powerspectrum,frame_length=5,frame_step=1,axis=1))
clean_binary = tf.read_file(clean_file)
clean_binary = tf.contrib.ffmpeg.decode_audio(clean_binary, file_format='wav', samples_per_second=16000, channel_count=1)
clean_stfts = tf.contrib.signal.stft(tf.reshape(clean_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
clean_powerspectrum = tf.log(tf.abs(clean_stfts)**2)
clean_data = tf.squeeze(clean_powerspectrum)[:-4]
return noise_data, clean_data
my tf.data pipeline is as shown below
shuffle_batch = 10
batch_size = 10
dataset = tf.data.Dataset.from_tensor_slices((noise_datalist,clean_datalist))
dataset = dataset.shuffle(shuffle_batch) # shuffle number of files perbatch
dataset = dataset.map(parse_line,num_parallel_calls=8)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)
dataset = dataset.make_one_shot_iterator()
next_element = dataset.get_next()
this is the errors that shows
InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [443,5,257] and element 1 had shape [280,5,257].
[[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]
when i change the batch_size to 1 it works and get one data. How can I batch this variable length data or even maybe batch all data to 1 like [443,5,257] and [280,5,257] to [723,5,257]?

Resources