How to use CTC Loss Seq2Seq correctly? - pytorch

I am trying to create ASR model by myself and learn how to use CTC loss.
I test and I see this:
ctc_loss = nn.CTCLoss(blank=95)
output: tensor([[63, 8, 1, 38, 29, 14, 41, 71, 14, 29, 45, 41, 3]]): torch.Size([1, 13]); output_size: tensor([13])
input1: torch.Size([167, 1, 96]); input1_size: tensor([167])
After applying the argmax on this input (= prediction of phonems)
torch.argmax(input1, dim=2)
I get a series of symbols:
tensor([[63, 63, 63, 63, 63, 63, 95, 95, 63, 63, 95, 95, 8, 8, 8, 95, 8, 95,
8, 8, 95, 95, 95, 1, 1, 95, 1, 95, 1, 1, 95, 95, 38, 95, 95, 38,
38, 38, 38, 38, 29, 29, 29, 29, 29, 29, 29, 95, 29, 29, 95, 95, 95, 95,
95, 95, 95, 95, 95, 95, 14, 95, 14, 95, 95, 95, 95, 14, 95, 14, 41, 41,
41, 95, 41, 41, 41, 41, 41, 41, 71, 71, 71, 95, 71, 71, 71, 71, 71, 95,
95, 14, 14, 95, 14, 14, 95, 14, 14, 95, 29, 29, 95, 29, 29, 29, 29, 29,
29, 29, 45, 95, 95, 45, 45, 95, 45, 45, 45, 45, 41, 95, 41, 41, 95, 95,
95, 41, 41, 41, 3, 3, 3, 3, 3, 95, 3, 3, 3, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95]])
and the following loss values.
ctc_loss(input1, output, input_size, output_size)
# Returns 222.8446
With a different input:
input2: torch.Size([167, 1, 96]) input2_size: tensor([167])
torch.argmax(input2, dim=2)
the prediction is just a sequence of blank symbols.
tensor([[95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,
95, 95, 95, 95, 95]])
However, the loss value with the same desired output is much lower.
ctc_loss(input2, output, input_size, output_size)
# Returns 3.7955
I don't know why input1 is better than input2 but the loss of input1 is higher than input2? Can someone explain that?

The CTC loss does not operate on the argmax predictions but on the entire output distribution. The CTC loss is the sum of the negative log-likelihood of all possible output sequences that produce the desired output. The output symbols might be interleaved with the blank symbols, which leaves exponentially many possibilities. It is, in theory, possible that the sum of the negative log-likelihoods of the correct outputs is low and still the most probable sequence is all blanks.
In practice, this is quite rare, so I guess there might be a problem somewhere else. The CTCLoss as implemented in PyTorch requires log probabilities as the input that you get, e.g., by applying the log_softmax function. Different sorts of input might lead to strange results such the one you observe.

Related

How to divide data in excel into 4 columns whose sum is almost equal to 1/4 of the sum of all values/

I have this set of numbers.
74, 15, 60, 5, 61, 56, 4, 23, 47, 66, 20, 54, 39, 9, 34, 37, 45, 93, 85, 79, 4, 76, 85, 51, 78, 60, 95, 50, 79, 62, 21, 75, 18, 5, 79, 46, 76, 92, 11, 100, 51, 39, 80, 92, 95, 20, 62, 1, 22, 69, 65, 45, 34, 42, 40, 8, 29, 82, 38, 9, 100, 78, 22, 11, 57, 71, 38, 35, 37, 32, 19, 58, 91, 90, 91, 26, 38, 85, 96, 3, 80, 18, 32, 74, 97, 60, 65, 85, 92, 38, 12, 31, 37, 76, 84, 9, 17, 33, 20, 19
I would like to divide this set of numbers into 4 parts/columns in excel so that their sum is as close as possible to 1/4 of the value of all numbers
Total of all numbers is 5049. 1/4 equals 1262.25 .
Is the sort of thing you're after?
All those numbers, sorted numerically, then added to columns, the totals almost matching to a quarter of the total...

How to convert payload to human redeable form

I have been programming a program using fbchat and found an interesting function that appealed me
class listen(fbchat.Client):
def onMessageUnsent(
self,
mid=None,
author_id=None,
thread_id=None,
thread_type=None,
ts=None,
msg=None,
):
print(msg)
client = listen('','',session_cookies=cookies)
client.listen()
and It gives the following output but how do I convert it to human redeable form...?
{'payload': [123, 34, 100, 101, 108, 116, 97, 115, 34, 58, 91, 123, 34, 100, 101, 108, 116, 97,
82, 101, 99, 97, 108, 108, 77, 101, 115, 115, 97, 103, 101, 68, 97, 116, 97, 34, 58, 123, 34,
116, 104, 114, 101, 97, 100, 75, 101, 121, 34, 58, 123, 34, 111, 116, 104, 101, 114, 85, 115,
101, 114, 70, 98, 73, 100, 34, 58, 49, 48, 48, 48, 52, 52, 53, 55, 50, 49, 57, 50, 57, 48, 54,
125, 44, 34, 109, 101, 115, 115, 97, 103, 101, 73, 68, 34, 58, 34, 109, 105, 100, 46, 36, 99,
65, 65, 66, 97, 95, 88, 69, 118, 56, 73, 112, 55, 121, 77, 120, 76, 114, 86, 49, 109, 100, 87,
85, 49, 112, 48, 70, 108, 34, 44, 34, 100, 101, 108, 101, 116, 105, 111, 110, 84, 105, 109,
101, 115, 116, 97, 109, 112, 34, 58, 49, 54, 48, 52, 54, 51, 55, 48, 57, 54, 53, 54, 56, 44, 34,
115, 101, 110, 100, 101, 114, 73, 68, 34, 58, 49, 48, 48, 48, 52, 52, 53, 55, 50, 49, 57, 50,
57, 48, 54, 44, 34, 109, 101, 115, 115, 97, 103, 101, 84, 105, 109, 101, 115, 116, 97, 109, 112,
34, 58, 48, 125, 125, 93, 125], 'class': 'ClientPayload'}
what does it even mean...?
This isn't either base64 or hexadecimal...
It's a list of ASCII codes. Try this:
"".join(map(chr, msg["payload"]))
The result is:
'{"deltas":[{"deltaRecallMessageData":{"threadKey":{"otherUserFbId":100044572192906},"messageID":"mid.$cAABa_XEv8Ip7yMxLrV1mdWU1p0Fl","deletionTimestamp":1604637096568,"senderID":100044572192906,"messageTimestamp":0}}]}'
which looks like a JSON string you can parse using json.loads(...), for example:
import json
import pprint
# Fetch msg here using the code in the question body
json_string = "".join(map(chr, msg["payload"]))
d = json.loads(json_string)
pprint.pprint(d)

I have a list of integers that is called numbers in this code Print every number that is greater than 90 in python?

Here's my code :
numbers = [76, 83, 16, 69, 52, 78, 10, 77, 45, 52, 32, 17, 58, 54, 79, 72, 55, 50, 81, 74, 45, 33, 38, 10, 40, 44, 70, 81, 79, 28, 83, 41, 14, 16, 27, 38, 20, 84, 24, 50, 59, 71, 1, 13, 56, 91, 29, 54, 65, 23, 60, 57, 13, 39, 58, 94, 94, 42, 46, 58, 59, 29, 69, 60, 83, 9, 83, 5, 64, 70, 55, 89, 67, 89, 70, 8, 90, 17, 48, 17, 94, 18, 98, 72, 96, 26, 13, 7, 58, 67, 38, 48, 43, 98, 65, 8, 74, 44, 92]
while numbers>=90:
print(numbers)
Here the Output :
Traceback (most recent call last): File "main.py", line 3, in while numbers>=90: TypeError: '>=' not supported between instances of 'list' and 'int'
numbers = [76, 83, 16, 69, 52, 78, 10, 77, 45, 52, 32, 17, 58, 54, 79, 72, 55, 50, 81, 74, 45, 33, 38, 10, 40, 44, 70, 81, 79, 28, 83, 41, 14, 16, 27, 38, 20, 84, 24, 50, 59, 71, 1, 13, 56, 91, 29, 54, 65, 23, 60, 57, 13, 39, 58, 94, 94, 42, 46, 58, 59, 29, 69, 60, 83, 9, 83, 5, 64, 70, 55, 89, 67, 89, 70, 8, 90, 17, 48, 17, 94, 18, 98, 72, 96, 26, 13, 7, 58, 67, 38, 48, 43, 98, 65, 8, 74, 44, 92]
for number in numbers:
if number >= 90:
print(number)

Converting a .mat file to cv image

I have a .mat file and want to convert it into a CV image format such that I can use it for a CNN model.
I am trying to obtain an RGB/ other colored image and not gray.
I tried doing the following(below) but I get a grayscale image, but when I plot the actual mat file using matplotlib it is not grayscale. Also, the .mat file has a px_spacing array apart from the image array. I am not sure how this is helpful.
def mat_to_image(mat_image):
f = loadmat(mat_image,appendmat=True)
image = np.array(f.get('I')).astype(np.float32)
mean = image.mean()
std= image.std()
print(mean, std)
hi = np.max(image)
lo = np.min(image)
image = (((image - lo)/(hi-lo))*255).astype(np.uint8)
im = Image.fromarray(image,mode='RGB')
return im
images=mat_to_image(dir/filename)
cv_img = cv2.cvtColor(np.array(images), cv2.COLOR_GRAY2RGB)
Normally plotting the .mat file fetches a non-grayscale(RGB image)
imgplot= plt.imshow(loadmat(img,appendmat=True).get('I'))
plt.show()
Here is how the mat file looks after print(loadmat('filename'))
{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Sep 9 11:32:54 2019',
'__version__': '1.0',
'__globals__': [],
'I': array([
[ 81, 75, 74, 75, -11, 14, 49, 37, 29, -24, -183, -349, -581, -740],
[ 51, 33, 67, 36, 1, 42, 30, 49, 47, 42, 14, -85, -465, -727],
[ 23, 31, 36, 20, 54, 70, 44, 56, 56, 79, 62, 19, -204, -595],
[ 7, 12, 36, 47, 59, 68, 74, 56, 59, 100, 74, 34, -3, -353],
[ 23, 19, 51, 87, 86, 79, 91, 76, 96, 95, 52, 51, 74, -141],
[ 18, 51, 54, 97, 93, 94, 98, 83, 119, 71, 36, 69, 50, -16],
[ -10, 5, 53, 92, 69, 87, 103, 114, 118, 77, 51, 68, 30, 0],
[ -24, 11, 74, 80, 49, 68, 106, 129, 107, 63, 57, 70, 39, -1],
[ -45, 43, 83, 69, 43, 64, 98, 108, 90, 35, 27, 55, 31, -13],
[ -9, 32, 83, 78, 66, 106, 89, 85, 58, 43, 31, 39, 28, 7],
[ 45, 35, 76, 45, 51, 84, 55, 66, 49, 41, 39, 28, 13, -7],
[ 85, 67, 61, 45, 69, 53, 23, 32, 31, -12, -34, -182, -376, -425],
[ 136, 93, 71, 54, 30, 39, 17, -21, -29, -43, -101, -514, -792, -816]
], dtype=int16),
'px_spacing': array([[0.78125]])}

Showing a list empty despite performing operations on it

Actually i need to plot all the variations occured only in the october month of 2012 so for that i am counting the 30 rows so that i can use them in xlim for plotting.
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
poll_df=pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv')
row_in=0
xlimit=[]
poll_df=poll_df[poll_df['Start Date'].str[:7] == '2012-10']
for date in poll_df['Start Date']:
if date[0:7] == '2012-10':
xlimit.append(row_in)
row_in += 1
else:
row_in+=1
print(min(xlimit))
print(max(xlimit))
But i don't understand why xlimit is coming out empty despite performing operations on it.
With a download of that URL I can load it with np.genfromtxt:
In [232]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
...: ng=None)
/usr/local/bin/ipython3:1: ConversionWarning: Some errors were detected !
Line #77 (got 13 columns instead of 17)
Line #238 (got 13 columns instead of 17)
Line #460 (got 18 columns instead of 17)
Line #488 (got 18 columns instead of 17)
Line #493 (got 13 columns instead of 17)
Line #507 (got 18 columns instead of 17)
Line #515 (got 18 columns instead of 17)
Line #538 (got 18 columns instead of 17)
Line #550 (got 18 columns instead of 17)
#!/usr/bin/python3
It's not quite as forgiving as pandas when dealing with shorter/longer length lines.
In [233]: data.shape
Out[233]: (577,)
In [234]: data.dtype
Out[234]: dtype([('Pollster', '<U56'), ('Start_Date', '<U10'), ('End_Date', '<U10'), ('Entry_DateTime_ET', '<U20'), ('Number_of_Observations', '<i8'), ('Population', '<U26'), ('Mode', '<U15'), ('Obama', '<f8'), ('Romney', '<f8'), ('Undecided', '<f8'), ('Other', '<f8'), ('Pollster_URL', '<U113'), ('Source_URL', '<U189'), ('Partisan', '<U11'), ('Affiliation', '<U5'), ('Question_Text', '?'), ('Question_Iteration', '<i8')])
The start_date field looks like:
In [235]: data['Start_Date'][:10]
Out[235]:
array(['2012-11-04', '2012-11-03', '2012-11-03', '2012-11-03',
'2012-11-03', '2012-11-03', '2012-11-03', '2012-11-01',
'2012-11-02', '2012-11-02'], dtype='
I can search it with where. I'm using astype to restrict the field to 7 characters.
In [236]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[236]:
array([18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
I can use usecols to get around the variable line lengths - assuming the 'bad' lines just differ in the latter fields.
In [237]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
...: ng=None,usecols=range(10))
In [238]: data.shape
Out[238]: (586,)
In [239]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[239]:
array([ 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100])
I can get the same list with an iterative search like yours:
In [244]: alist = []
In [245]: for i,date in enumerate(data['Start_Date']):
...: if date[:7] == '2012-10':
...: alist.append(i)
...:
In [246]: len(alist)
Out[246]: 82
In [247]: np.array(alist)
Out[247]:
array([ 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100])

Resources