Stack expects each tensor to be equal size - pytorch

I am following PyTorch tutorial on speech command recogniton and trying to implement my own recognition of 22 sentences in german language. In the tutorial they use padding for audio tensors, but for labels they use only torch.stack. Because of that, I have an error, as I start training the network:
RuntimeError: stack expects each tensor to be equal size, but got [456] at entry 0 and [470] at entry 1.
I do understand what this says, but since I am new to PyTorch can't unfortunately implement padding function for sentences from scratch. Therefore I would be happy if you could give me some hints and tipps for this.
Here is the code for collate_fn and pad_sequence functions:
def pad_sequence(batch):
# Make all tensor in a batch the same length by padding with zeros
batch = [item.t() for item in batch]
batch = torch.nn.utils.rnn.pad_sequence(batch, batch_first=True, padding_value=0.)
return batch.permute(0, 2, 1)
def collate_fn(batch):
# A data tuple has the form:
# waveform, label
tensors, targets = [], []
# Gather in lists, and encode labels as indices
for waveform, label in batch:
tensors += [waveform]
targets += [label]
# Group the list of tensors into a batched tensor
tensors = pad_sequence(tensors)
targets = torch.stack(targets)
return tensors, targets

As I started working directly with pad_sequence, I understood how simple it works. So, in my case I needed only bunch of strings (batch), which were automatically compared by PyTorch and extended to the maximal length of the one of the several strings in the batch.
My code looks now like this:
def pad_AudioSequence(batch):
# Make all tensor in a batch the same length by padding with zeros
batch = [item.t() for item in batch]
batch = torch.nn.utils.rnn.pad_sequence(batch, batch_first=True, padding_value=0.)
return batch.permute(0, 2, 1)
def pad_TextSequence(batch):
return torch.nn.utils.rnn.pad_sequence(batch,batch_first=True, padding_value=0)
def collate_fn(batch):
# A data tuple has the form:
# waveform, label
tensors, targets = [], []
# Gather in lists, and encode labels as indices
for waveform, label in batch:
tensors += [waveform]
targets += [label]
# Group the list of tensors into a batched tensor
tensors = pad_AudioSequence(tensors)
targets = pad_TextSequence(targets)
return tensors, targets
For those, who still don't understand how that works, here is little example:
encDecClass2 = dummyEncoderDecoder()
sent1 = audioWorkerClass.sentences[4] # wie viel Prozent hat der Akku noch?
sent2 = audioWorkerClass.sentences[5] # Wie spät ist es?
sent3 = audioWorkerClass.sentences[6] # Mach einen Timer für 5 Sekunden.
# encode sentences into tensor of numbers, representing words, using my own enc-dec class
sent1 = encDecClass2.encode(sent1) # tensor([11, 94, 21, 94, 22, 94, 23, 94, 24, 94, 25, 94, 26, 94, 15, 94])
sent2 = encDecClass2.encode(sent2) # tensor([27, 94, 28, 94, 12, 94, 29, 94, 15, 94])
sent3 = encDecClass2.encode(sent3) # tensor([30, 94, 31, 94, 32, 94, 33, 94, 34, 94, 35, 94, 19, 94])
print(sent1.shape) # torch.Size([16])
print(sent2.shape) # torch.Size([10])
print(sent3.shape) # torch.Size([14])
batch = []
# add sentences to the batch as separate arrays
batch +=[sent1]
batch +=[sent2]
batch +=[sent3]
output = pad_sequence(batch,batch_first=True, padding_value=0)
print(f"{output}\n{output.shape}")
#############################################################################
# output:
# tensor([[11, 94, 21, 94, 22, 94, 23, 94, 24, 94, 25, 94, 26, 94, 15, 94],
# [27, 94, 28, 94, 12, 94, 29, 94, 15, 94, 0, 0, 0, 0, 0, 0],
# [30, 94, 31, 94, 32, 94, 33, 94, 34, 94, 35, 94, 19, 94, 0, 0]])
# torch.Size([3, 16])
#############################################################################
As you may see all arrays were equalized to the maximum length of those three arrays and padded with zeros. Shape of the output is 3x16, because we had three sentences and longest array had sequence of 16 in the batch.

Related

Pytorch transformation for just certain batch

Hi is there any method for apply trasnformation for certain batch?
It means, I want apply trasnformation for just last batch in every epochs.
What I tried is here
import torch
class test(torch.utils.data.Dataset):
def __init__(self):
self.source = [i for i in range(10)]
def __len__(self):
return len(self.source)
def __getitem__(self, idx):
print(idx)
return self.source[idx]
ds = test()
dl = torch.utils.data.DataLoader(dataset = ds, batch_size = 3,
shuffle = False, num_workers = 5)
for i in dl:
print(i)
because I thought that if I could get idx number, it would be possible to apply for certain batchs.
However If using num_workers outputs are
0
1
2
3
964
57
8
tensor([0, 1, 2])
tensor([3, 4, 5])
tensor([6, 7, 8])
tensor([9])
which are not I thought
without num_worker
0
1
2
tensor([0, 1, 2])
3
4
5
tensor([3, 4, 5])
6
7
8
tensor([6, 7, 8])
9
tensor([9])
So the question is
Why idx works so with num_workers?
How can I apply trasnform for certain batchs (or certain idx)?
When you have num_workers > 1, you have multiple subprocesses doing data loading in parallel. So what is likely happening is that there is a race condition for the print step, and the order you see in the output depends on which subprocess goes first each time.
For most transforms, you can apply them on a specific batch simply by calling the transform after the batch has been loaded. To do this just for the last batch, you could do something like:
for batch_idx, batch_data in dl:
# check if batch is the last batch
if ((batch_idx+1) * batch_size) >= len(ds):
batch_data = transform(batch_data)
I found that
class test_dataset(torch.utils.data.Dataset):
def __init__(self):
self.a = [i for i in range(100)]
def __len__(self):
return len(self.a)
def __getitem__(self, idx):
a = torch.tensor(self.a[idx])
#print(idx)
return idx
a = torch.utils.data.DataLoader(
test_dataset(), batch_size = 10, shuffle = False,
num_workers = 10, pin_memory = True)
for i in a:
print(i)
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tensor([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
tensor([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
tensor([30, 31, 32, 33, 34, 35, 36, 37, 38, 39])
tensor([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
tensor([50, 51, 52, 53, 54, 55, 56, 57, 58, 59])
tensor([60, 61, 62, 63, 64, 65, 66, 67, 68, 69])
tensor([70, 71, 72, 73, 74, 75, 76, 77, 78, 79])
tensor([80, 81, 82, 83, 84, 85, 86, 87, 88, 89])
tensor([90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

How to display name and grade for a student in a dictiionary who has the highest grade

I'm a brand new to programming and I'm stuck on a practice exercise.
EDIT: exact error code "avgDict[k] =max(sum(v)/ float(len(v)))
TypeError: 'float' object is not iterable"
If I remove max it's printing every student's avg.
# student_grades contains scores (out of 100) for 5 assignments
diction = {
'Andrew': [56, 79, 90, 22, 50],
'Colin': [88, 62, 68, 75, 78],
'Alan': [95, 88, 92, 85, 85],
'Mary': [76, 88, 85, 82, 90],
'Tricia': [99, 92, 95, 89, 99]
}
def averageGrades(diction):
avgDict = {}
for k, v in diction.items():
avgDict[k] =max(sum(v)/ float(len(v)))
return avgDict
What your code is doing right now is iterating through each key-value pair of the dictionary (where the key is the student's name and the value is the list of the student's grades), and then calculating the average for the student. Then, the way you are using max right now, it is trying to find the max of a single student's average. This is why you are receiving the error, because max expects either an iterable or multiple parameters, and a float (which is the value produced by sum(v) / float(len(v))) is not an iterable. You should instead compute all of the averages first, and then find the max value in the dictionary of averages:
diction = {
'Andrew': [56, 79, 90, 22, 50],
'Colin': [88, 62, 68, 75, 78],
'Alan': [95, 88, 92, 85, 85],
'Mary': [76, 88, 85, 82, 90],
'Tricia': [99, 92, 95, 89, 99]
}
def averageGrades(diction):
avgDict = {}
for k, v in diction.items():
avgDict[k] = sum(v) / len(v)
return max(avgDict.items(), key=lambda i: i[1]) # find the pair which has the highest value
print(averageGrades(diction)) # ('Tricia', 94.8)
Sidenote, in Python 3, using / does normal division (as opposed to integer division) by default, so casting len(v) to a float is unnecessary.
Alternatively, if you don't need to create the avgDict variable, you can just determine the max directly without the intermediate variable:
def averageGrades(diction):
return max([(k, sum(v) / len(v)) for k, v in diction.items()], key=lambda i: i[1])
print(averageGrades(diction)) # ('Tricia', 94.8)

flatten array of arrays into a vector of arrays

I'm generating matrix representations of images with height*width size, and I need to transform them in a vector of pixels. To generate the images, I'm using the following instruction
np.array([[np.random.randint(0, 255, 3) for dummy_row in range(height)] for dummy_col in range(width)])
e.g., (2x2) image
array([[[132, 235, 40],
[234, 1, 160]],
[[ 69, 108, 218],
[198, 179, 165]]])
I tried to use flatten(), but is not creating a one-dimension array of pixels, but is pilling all the values together
array([132, 235, 40, 234, 1, 160, 69, 108, 218, 198, 179, 165])
when I'm requiring
array([132, 235, 40], [234, 1, 160], [69, 108, 218], [198, 179, 165]])
is there a built-in function to get this output?
Just use:
arr.reshape(-1, n_channels)
or similar (where arr is the NumPy array containing the image data).

How to convert mmod_rectangles to rectangles via Dlib?

In this code used detectors of dlib.
dlib.get_frontal_face_detector()
dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('res/model.dat')
# detector = dlib.cnn_face_detection_model_v1('mmod_human_face_detector.dat')
cap = cv.VideoCapture(0)
while True:
_, frame = cap.read(0)
gray = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)
dets = detector(gray, 0)
print(dets)
for det in dets:
landmarks = shape_to_np(predictor(gray, det))
cv.imshow('test', frame)
if cv.waitKey(1) == ord('q'):
break
When used the cnn detector, dets look as:
mmod_rectangles[[(258, 254) (422, 417)]]
And an exception is thrown in the predictor line:
TypeError: __call__(): incompatible function arguments. The following
argument types are supported:
1. (self: dlib.shape_predictor, image: array, box: dlib.rectangle) -> dlib.full_object_detection
Invoked with: <dlib.shape_predictor object at 0x7f37a12ba9d0>,
array([[71, 69, 70, ..., 71, 70, 73],
[71, 72, 71, ..., 72, 72, 75],
[71, 70, 71, ..., 72, 72, 73],
...,
[27, 27, 27, ..., 75, 71, 68],
[27, 27, 27, ..., 74, 71, 71],
[24, 25, 27, ..., 73, 71, 70]], dtype=uint8), <dlib.mmod_rectangle object at 0x7f37819beea0>
But when used get_frontal_face_detector, the dets look as:
rectangles[[(273, 234) (453, 413)]]
And the code works correctly.
try to perform
faceRect = det.rect
landmarks = shape_to_np(predictor(gray, faceRect))
perhas version problem
faceRect = det[0].rect
landmarks = shape_to_np(predictor(gray, faceRect))

How to break repeating-key XOR Challenge using Single-byte XOR cipher

This Question is about challenge number 6 in set number 1 in the challenges of "the cryptopals crypto challenges".
The challenge is:
There's a file here. It's been base64'd after being encrypted with repeating-key XOR.
Decrypt it.
After that there's a description of steps to decrypt the file, There is total of 8 steps. You can find them in the site.
I have been trying to solve this challenge for a while and I am struggling with the final two steps. Even though I've solved challenge number 3, and it contains the solution for these steps.
Note: It is, of course, possible that there is a mistake in the first 6 steps but they seems to work well after looking at the print after every step.
My code:
Written in Python 3.6.
In order to not deal with web requests, and since it is not the purpose of this challenge. I just copied the content of the file to a string in the begging, You can do this as well before running the code.
import base64
# Encoding the file from base64 to binary
file = base64.b64decode("""HUIfTQsP...JwwRTWM=""")
print(file)
print()
# Step 1 - guess key size
KEYSIZE = 4
# Step 2 - find hamming distance - number of differing bits
def hamming2(s1, s2):
"""Calculate the Hamming distance between two bit strings"""
assert len(s1) == len(s2)
return sum(c1 != c2 for c1, c2 in zip(s1, s2))
def distance(a, b): # Hamming distance
calc = 0
for ca, cb in [(a[i], b[i]) for i in range(len(a))]:
bina = '{:08b}'.format(int(ca))
binb = '{:08b}'.format(int(cb))
calc += hamming2(bina, binb)
return calc
# Test step 2
print("distance: 'this is a test' and 'wokka wokka!!!' =", distance([ord(c) for c in "this is a test"], [ord(c) for c in "wokka wokka!!!"])) # 37 - Working
print()
# Step 3
key_sizes = []
# For each key size
for KEYSIZE in range(2, 41):
# take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes -
# file[0:KEYSIZE], file[KEYSIZE:2*KEYSIZE]
# and find the edit distance between them
# Normalize this result by dividing by KEYSIZE
key_sizes.append((distance(file[0:KEYSIZE], file[KEYSIZE:2*KEYSIZE]) / KEYSIZE, KEYSIZE))
key_sizes.sort(key=lambda a: a[0])
# Step 4
for val, key in key_sizes:
print(key, ":", val)
KEYSIZE = key_sizes[0][1]
print()
# Step 5 + 6
# Each line is a list of all the bytes in that index
splited_file = [[] for i in range(KEYSIZE)]
counter = 0
for char in file:
splited_file[counter].append(char)
counter += 1
counter %= KEYSIZE
for line in splited_file:
print(line)
print()
# Step 7
# Code from another level
# Gets a string and a single char
# Doing a single-byte XOR over it
def single_char_string(a, b):
final = ""
for c in a:
final += chr(c ^ b)
return final
# Going over all the bytes and listing the result arter the XOR by number of bytes
def find_single_byte(in_string):
helper_list = []
for num in range(256):
helper_list.append((single_char_string(in_string, num), num))
helper_list.sort(key=lambda a: a[0].count(' '), reverse=True)
return helper_list[0]
# Step 8
final_key = ""
key_list = []
for line in splited_file:
result = find_single_byte(line)
print(result)
final_key += chr(result[1])
key_list.append(result[1])
print(final_key)
print(key_list)
Output:
b'\x1dB\x1fM\x0b\x0f\x02\x1fO\x13N<\x1aie\x1fI...\x08VA;R\x1d\x06\x06TT\x0e\x10N\x05\x16I\x1e\x10\'\x0c\x11Mc'
distance: 'this is a test' and 'wokka wokka!!!' = 37
5 : 1.2
3 : 2.0
2 : 2.5
.
.
.
26 : 3.5
28 : 3.5357142857142856
9 : 3.5555555555555554
22 : 3.727272727272727
6 : 4.0
[29, 15, 78, 31, 19, 27, 0, 32, ... 17, 26, 78, 38, 28, 2, 1, 65, 6, 78, 16, 99]
[66, 2, 60, 73, 1, 1, 30, 3, 13, ... 26, 14, 0, 26, 79, 99, 8, 79, 11, 4, 82, 59, 84, 5, 39]
[31, 31, 19, 26, 79, 47, 17, 28, ... 71, 89, 12, 1, 16, 45, 78, 3, 120, 11, 42, 82, 84, 22, 12]
[77, 79, 105, 14, 7, 69, 73, 29, 101, ... 54, 70, 78, 55, 7, 79, 31, 88, 10, 69, 65, 8, 29, 14, 73, 17]
[11, 19, 101, 78, 78, 54, 100, 67, 82, ... 1, 76, 26, 1, 2, 73, 21, 72, 73, 49, 27, 86, 6, 16, 30, 77]
('=/n?3; \x00\x13&-,>1...r1:n\x06<"!a&n0C', 32)
('b"\x1ci!!>ts es(ogg ...5i<% tc:. :oC(o+$r\x1bt%\x07', 32)
('??:<+6!=ngm2i4\x0byD...&h9&2:-)sm.a)u\x06&=\x0ct&~n +=&*4X:<(3:o\x0f1<mE gy,!0\rn#X+\nrt6,', 32)
('moI.\'ei=Et\'\x1c:l ...6k=\x1b m~t*\x155\x1ei+=+ts/e*9$sgl0\'\x02\x16fn\x17\'o?x*ea(=.i1', 32)
('+3Enn\x16Dcr<$,)\x01...i5\x01,hi\x11;v&0>m', 32)
[32, 32, 32, 32, 32]
Notice that in the printing of the key as string you cannot see it but there is 5 chars in there.
It is not the correct answer since you can see that in the forth part - after the XOR, the results do not look like words... Probably a problem in the last two functions but I couldn't figure it out.
I've also tried some other lengths and It does not seems to be the problem.
So what I'm asking is not to fix my code, I want to solve this challenge by myself :). I would like you to tell me where I am wrong? why? and how should I continue?
Thank you for your help.
After a lot of thinking and checking the conclusion was that the problem is in step number 3. The result was not good enough since I looked only at the first two blocks.
I fixed the code so it will calculate the KEYSIZE according to all of the blocks.
The code of Step 3 now look like this:
# Step 3
key_sizes = []
# For each key size
for KEYSIZE in range(2, 41):
running_sum = []
for i in range(0, int(len(file) / KEYSIZE) - 1):
running_sum.append(distance(file[i * KEYSIZE:(i + 1) * KEYSIZE],
file[(i + 1) * KEYSIZE:(i + 2) * KEYSIZE]) / KEYSIZE)
key_sizes.append((sum(running_sum)/ len(running_sum), KEYSIZE))
key_sizes.sort(key=lambda a: a[0])
Thanks for any one who tried to help.

Resources