Description text in speech recognition - keras

I'm trying to build my own speech recognition network. I understood how to pre-process audio. But I can't figure out the pre-processing of the text.
I have a alphabet:
alphabet = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14,'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}
And I encode each letter of the sentence into a number (27 is a space):
array([list([27, 23, 8, 5, 14, 27, 8, 5, 27, 19, 16, 5, 1, 11, 19, 27, 9, 14, 27, 15, 21, 18, 27, 12, 1, 14, 7, 21, 1, 7, 5, 27, 9, 27, 3, 1, 14, 27, 9, 14, 20, 5, 18, 16, 18, 5, 20, 27, 23, 8, 1, 20, 27, 8, 5, 27, 8, 1, 19, 27, 19, 1, 9, 4, 27]),
list([27, 19, 15, 27, 14, 15, 23, 27, 9, 27, 6, 5, 1, 18, 27, 14, 15, 20, 8, 9, 14, 7, 27, 2, 5, 3, 1, 21, 19, 5, 27, 9, 20, 27, 23, 1, 19, 27, 20, 8, 15, 19, 5, 27, 15, 13, 5, 14, 19, 27, 20, 8, 1, 20, 27, 2, 18, 15, 21, 7, 8, 20, 27, 25, 15, 21, 27, 20, 15, 27, 13, 5, 27]),
list([27, 14, 9, 7, 8, 20, 27, 6, 5, 12, 12, 27, 1, 14, 4, 27, 1, 14, 27, 1, 19, 19, 15, 18, 20, 13, 5, 14, 20, 27, 15, 6, 27, 6, 9, 7, 8, 20, 9, 14, 7, 27, 13, 5, 14, 27, 1, 14, 4, 27, 13, 5, 18, 3, 8, 1, 14, 20, 19, 27, 5, 14, 20, 5, 18, 5, 4, 27, 1, 14, 4, 27, 5, 24, 9, 20, 5, 4, 27, 20, 8, 5, 27, 20, 5, 14, 20, 27]),
list([27, 9, 27, 8, 5, 1, 18, 4, 27, 1, 27, 6, 1, 9, 14, 20, 27, 13, 15, 22, 5, 13, 5, 14, 20, 27, 21, 14, 4, 5, 18, 27, 13, 25, 27, 6, 5, 5, 20, 27]),
list([27, 25, 15, 21, 27, 3, 1, 13, 5, 27, 19, 15, 27, 20, 8, 1, 20, 27, 25, 15, 21, 27, 3, 15, 21, 12, 4, 27, 12, 5, 1, 18, 14, 27, 1, 2, 15, 21, 20, 27, 25, 15, 21, 18, 27, 4, 18, 5, 1, 13, 19, 27, 19, 1, 9, 4, 27, 20, 8, 5, 27, 15, 12, 4, 27, 23, 15, 13, 1, 14, 27])],
dtype=object)
Here are 5 sentences.
I just create one network layer and try to transfer this data there in order to get a number corresponding to the letter.
model = Sequential()
model.add(Dense(27, input_shape=(20,), activation='softmax'))
model.compile(loss='mean_squared_error',optimizer='Adam', metrics=['accuracy'])
for X, y in batch(X_train, y_train, 5):
model.train_on_batch(X, y)
batch() just breaks X_train, y_train into batch.
5 is size of batch.
But when I try to start the network I get an error
Error when checking target: expected dense_25 to have shape (27,) but got array with shape (1,)
UPD:
I'm using MFCC for X
audio, sr = librosa.load(pathTrain+"\\"+str(file), mono=True, sr=None)
fileMFCC = librosa.feature.mfcc(audio)
mean_scale = np.mean(fileMFCC, axis=0)
std_scale = np.std(fileMFCC, axis=0)
fileMFCC = (fileMFCC - mean_scale[np.newaxis, :]) / std_scale[np.newaxis, :]
X is
[array([[-4.35889894, -4.35889894, -4.35455134, ..., -3.95851777,
-3.99308173, -4.05261022],
[ 0.22941573, 0.22941573, 0.31913073, ..., 1.87189324,
1.7987301 , 1.66804349],
[ 0.22941573, 0.22941573, 0.31165866, ..., -0.27962786,
-0.19009062, -0.13788484],
...,
[ 0.22941573, 0.22941573, 0.18657944, ..., 0.14699792,
0.12751924, 0.16724807],
[ 0.22941573, 0.22941573, 0.18478513, ..., 0.00674492,
-0.04570105, 0.01231168],
[ 0.22941573, 0.22941573, 0.18232521, ..., 0.2571599 ,
0.22477036, 0.09153304]])
etc.

Related

In pytorch, torch.unique is returning repititions

I have this 2-D tensor:
tmp = torch.tensor([[ 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5,
5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11,
11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 17,
17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 23,
23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27, 27, 27, 28, 28, 28, 29,
29, 29, 30, 30, 30, 31, 31, 31, 31],
[ 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5,
5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11,
11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 15, 0, 16, 16, 17,
17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 23,
23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27, 27, 27, 28, 28, 28, 29,
29, 29, 30, 30, 30, 31, 31, 31, 31]])
So there is 0 in the 50th column of row 2. When I apply torch.unique along
dim=1, I get:
a,c = torch.unique(tmp,dim=1,return_counts=True)
a
tensor([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]])
It can be seen that the second row of the output has two 0s and the first row has two 16s. Am I doing something wrong here or this is suspicious?
It is because you specified dim=1. PyTorch is thus checking for unique pairs (which it correctly does). Like (0, 0), (1, 1), (16, 0): these are the unique pairs that it generated. In general the pair (temp[0,i], temp[1,i]) is unique for all i.
If you want all the elements to be unique, just throw away the dim: torch.unique(tmp).
If you need to maintain the two list structure, the output cannot be stored as a single tensor because their sizes might not match. You can do something like output1 = torch.unique(tmp[0]) and output2 = torch.unique(tmp[1]).

How to update values in list of dictionaries

I want to update the value of a key in dictionary. This is a snippet of a list that contains over 300 dictionaries
chats = [
{'hour': 10, 'operator': 'john_doe', 'duration': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 11, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'joseph_doe', 'duration': [5, 6, 7, 8, 9], 'date': '2019-09-09'}
]
script: I am getting an error on that script. I am looping to know if this dict is already in so that I can update the duration.
chat_list = list()
for chat in chats:
hour = chat.get('hour')
operator = chat.get("operator")
if len(chat_list) == 0:
chat_list.append(chat)
else:
found = False
for i in chat_list:
hour2 = chat.get('hour')
operator2 = chat.get("operator")
if (hour2 == hour) and (operator == operator2):
found = True
#concat both dictionary
i['duration'] = i.get('duration') + chat.get("duration")
if found == True:
found = False
else:
chat_list.append(chat)
My expected output is
chat_list = [
{'hour': 10, 'operator': 'john_doe', 'duration': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 11, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'joseph_doe', 'duration': [5, 6, 7, 8, 9], 'date': '2019-09-09'}
]
or
df = pd.DataFrame(chat_list)
df['duration'] = df['duration'].apply(lambda x: list(set(x)))
To be honest, I didn't tested your algorithm. Instead I took it as a small challenge and I wrote the following algorithm which doesn't need to copy chats in to a new list.
It finds the first occurrence of "similar" chat and concat the duration arrays. Then it deletes the "duplicated" chat. Further explanation in the code itself:
chats = [
{'hour': 10, 'operator': 'john_doe', 'duration': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 11, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'joseph_doe', 'duration': [5, 6, 7, 8, 9], 'date': '2019-09-09'}
]
index = 0
while index < len(chats) - 1:
chat = chats[index]
# detect if there is another "similar" chat in the list (before this one)
first_index = next(
i for i, first_chat in enumerate(chats)
if chat.get('hour') == first_chat.get('hour') and chat.get('operator') == first_chat.get('operator')
)
# if the first index found is not this one:
# - concat `duration` arrays
# - delete this (duplicated) chat
if index != first_index:
chats[first_index]['duration'] += chat['duration']
del chats[index]
# otherwise continue and increment the index
else:
index += 1
print(chats)

Length of passed values is 1, index implies 260

output of predicted_classes
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 4, 4, 2, 4, 4, 4, 4, 5, 4, 4, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 7, 7, 7, 7, 7, 7, 7, 13, 7, 7, 8, 11, 8, 8, 8,
11, 8, 11, 11, 8, 11, 9, 9, 9, 9, 9, 9, 9, 9, 8, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 8, 11, 11, 11,
11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 3, 13, 3,
3, 13, 13, 13, 14, 14, 14, 14, 14, 14, 2, 14, 14, 14, 15, 15, 15,
15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 20, 16, 16,
17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18,
18, 18, 18, 19, 19, 19, 19, 8, 19, 19, 19, 19, 19, 20, 20, 20, 20,
20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22,
22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23,
23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25,
25, 25, 25, 25, 25])
output of y_true
0 0
1 0
2 0
3 0
4 0
..
255 25
256 25
257 25
258 25
259 25
Name: label, Length: 260, dtype: int64
I want to get the indices with this code, and getting this value error.
predicted_classes = model.predict_classes(X_test)
y_true = data_test.iloc[:, 0]
correct = np.nonzero(predicted_classes==y_true)[0]
incorrect = np.nonzero(predicted_classes!=y_true)[0]
trace of error
ValueError Traceback (most recent call last)
in
4 #get the indices to be plotted
5 y_true = data_test.iloc[:, 0]
----> 6 correct = np.nonzero(predicted_classes!=y_true)[0]
7 incorrect = np.nonzero(predicted_classes==y_true)[0]
in nonzero(*args, **kwargs)
//anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py in nonzero(a)
1894
1895 """
-> 1896 return _wrapfunc(a, 'nonzero')
1897
1898
//anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
56 bound = getattr(obj, method, None)
57 if bound is None:
---> 58 return _wrapit(obj, method, *args, **kwds)
59
60 try:
//anaconda3/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapit(obj, method, *args, **kwds)
49 if not isinstance(result, mu.ndarray):
50 result = asarray(result)
---> 51 result = wrap(result)
52 return result
53
//anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __array_wrap__(self, result, context)
1916 return result
1917 d = self._construct_axes_dict(self._AXIS_ORDERS, copy=False)
-> 1918 return self._constructor(result, **d).__finalize__(self)
1919
1920 # ideally we would define this to avoid the getattr checks, but
//anaconda3/lib/python3.7/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
290 if len(index) != len(data):
291 raise ValueError(
--> 292 f"Length of passed values is {len(data)}, "
293 f"index implies {len(index)}."
294 )
ValueError: Length of passed values is 1, index implies 260.
Please let me know where I am going wrong.
A quick search reveals that an old version of the documentation advises to use .to_numpy().nonzero() as a replacement for Series.nonzero().

remove element from the list then display list without element that remove

I have this nested list:
a = [[1, 3, 6, 11, 16, 21, 25, 28, 31, 32, 33, 34, 35, 36],
[1, 2, 5, 9, 15, 20, 24, 26, 30, 36],
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 34, 35, 36],
[1, 2, 4, 8, 14, 18, 23, 36],
[1, 2, 5, 9, 15, 20, 24, 27, 30, 36],
[1, 3, 6, 11, 16, 22, 25, 28, 31, 32, 33, 34, 35, 36],
[1, 3, 7, 12, 17, 36],
[1, 2, 4, 8, 14, 19, 23, 36],
[1, 2, 5, 10, 15, 20, 24, 26, 30, 36],
[1, 3, 6, 11, 16, 22, 25, 29, 31, 32, 33, 34, 35, 36],
[1, 2, 5, 10, 15, 20, 24, 27, 30, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 32, 33, 35, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 33, 34, 35,36],
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 35, 36]]
I need to choose max length of sublist in nested list, than compare item of sublist with nested list. If item in sublist equal then same item in nested list remove and in final print nested list without this item.
I hope I understand your question correctly.
You want input to be:
a = [[1, 3, 6, 11, 16, 21, 25, 28, 31, 32, 33, 34, 35, 36],
[1, 2, 5, 9, 15, 20, 24, 26, 30, 36],
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 34, 35, 36],
[1, 2, 4, 8, 14, 18, 23, 36],
[1, 2, 5, 9, 15, 20, 24, 27, 30, 36],
[1, 3, 6, 11, 16, 22, 25, 28, 31, 32, 33, 34, 35, 36],
[1, 3, 7, 12, 17, 36],
[1, 2, 4, 8, 14, 19, 23, 36],
[1, 2, 5, 10, 15, 20, 24, 26, 30, 36],
[1, 3, 6, 11, 16, 22, 25, 29, 31, 32, 33, 34, 35, 36],
[1, 2, 5, 10, 15, 20, 24, 27, 30, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 32, 33, 35, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 33, 34, 35, 36],
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 35, 36]]
We are removing
[1, 3, 6, 11, 16, 22, 25, 29, 31, 32, 33, 34, 35, 36]
and
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 34, 35, 36]
since they are of the same length.
The output should be:
a = [[1, 2, 5, 9, 15, 20, 24, 26, 30, 36],
[1, 2, 4, 8, 14, 18, 23, 36],
[1, 2, 5, 9, 15, 20, 24, 27, 30, 36],
[1, 3, 7, 12, 17, 36],
[1, 2, 4, 8, 14, 19, 23, 36],
[1, 2, 5, 10, 15, 20, 24, 26, 30, 36],
[1, 2, 5, 10, 15, 20, 24, 27, 30, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 32, 33, 35, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 33, 34, 35, 36],
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 35, 36]]
with the previous lists removed.
Your question was not worded clearly, but I hope this is what you wanted. Here is the code:
# assume a is not empty
d = {} # list of the max length -> number of occurrences in 2d array
# find the length of the longest list
maxLen = len(a[0])
for l in a:
if len(l) > maxLen:
maxLen = len(l)
# add lists of the same max length and their count to the dictionary
for l in a:
if len(l) == maxLen:
#convert list to string because python does not support list being key of a dictionary
l_string = str(l)
if l_string in d:
d[l_string] += 1
else:
d[l_string] = 1
# remove
for l_string in d:
while d[l_string] > 0:
# convert string back to list and remove
a.remove(eval(l_string))
d[l_string] -= 1
# test result if you want
for row in a:
print(row)

Python 3.6 / my function changes my global variables

I am trying really hard for my function not to mess my global 'b' value because I wish to re-use that list in another similar function using sets but it seems that even not using the same name (y), they (b and y) are still bound together...
most of the print lines are for debugging only as I was not understanding what was happening.
What am I doing wrong?
import random
a = random.sample(range(1,25),8)
b = random.sample(range(1,25),11)
a.sort()
b.sort()
def list_rdup(x,y):
print('Loop remove duplicates:')
print('x:',x)
print('y:',y)
for i in x:
y.append(i)
y.sort()
print('y modified:',y)
c = []
for i in y:
if i in c:
pass
else:
c.append(i)
return c
print('a:',a)
print('b:',b)
print(list_rdup(a,b))
print('a:',a)
print('b:',b)
Output: we see a and b in their original state.. then I run the function and
print a and b again to show that b was modified in the process...
a: [1, 6, 10, 11, 12, 13, 17, 22]
b: [1, 2, 3, 7, 13, 16, 17, 19, 20, 21, 24]
Loop remove duplicates:
x: [1, 6, 10, 11, 12, 13, 17, 22]
y: [1, 2, 3, 7, 13, 16, 17, 19, 20, 21, 24]
y modified: [1, 1, 2, 3, 6, 7, 10, 11, 12, 13, 13, 16, 17, 17, 19, 20, 21, 22, 24]
[1, 2, 3, 6, 7, 10, 11, 12, 13, 16, 17, 19, 20, 21, 22, 24]
a: [1, 6, 10, 11, 12, 13, 17, 22]
b: [1, 1, 2, 3, 6, 7, 10, 11, 12, 13, 13, 16, 17, 17, 19, 20, 21, 22, 24]
The call list_rdup(a,b) simply passes the reference of a and b which are stored in x and y. So, any change in x and y will change a and b. If you do not want a and b to change make a copy by using b_copy = b.copy().
To avoid altering the elements, copy the array.
import random
a=[1, 6, 10, 11, 12, 13, 17, 22]
b=[1, 2, 3, 7, 13, 16, 17, 19, 20, 21, 24]
#a.sort()
#b.sort()
def list_rdup(x,y):
print('Loop remove duplicates:')
print('x:',x)
print('y:',y)
for i in x:
y.append(i)
y.sort()
print('y modified:',y)
c = []
for i in y:
if i in c:
pass
else:
c.append(i)
return c
print('a:',a)
print('b:',b)
print(list_rdup(a[:], b[:]))
print('a:',a)
print('b:',b)
a: [1, 6, 10, 11, 12, 13, 17, 22]
b: [1, 2, 3, 7, 13, 16, 17, 19, 20, 21, 24]
Loop remove duplicates:
x: [1, 6, 10, 11, 12, 13, 17, 22]
y: [1, 2, 3, 7, 13, 16, 17, 19, 20, 21, 24]
y modified: [1, 1, 2, 3, 6, 7, 10, 11, 12, 13, 13, 16, 17, 17, 19, 20, 21, 22, 24]
[1, 2, 3, 6, 7, 10, 11, 12, 13, 16, 17, 19, 20, 21, 22, 24]
a: [1, 6, 10, 11, 12, 13, 17, 22]
b: [1, 2, 3, 7, 13, 16, 17, 19, 20, 21, 24]
Also consider deep copying if you have non-primitive values in the array.

Resources