How do I replace every nth instance in an ndarray? - python-3.x

I have a numpy array atoms.numbers which looks like:
array([27, 27, 27, 27, 27, 27, 57, 57, 57, 57, 57, 57, 57, 57, 27, 27, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 27, 27, 27, 27, 27, 27, 57, 57, 57, 57, 57,
57, 57, 57, 27, 27, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8])
I can replace all of the same instance such as every '57' in the array using:
atoms.numbers[atoms.numbers==57]=38
which gives:
array([27, 27, 27, 27, 27, 27, 38, 38, 38, 38, 38, 38, 38, 38, 27, 27, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 27, 27, 27, 27, 27, 27, 38, 38, 38, 38, 38,
38, 38, 38, 27, 27, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8])
I would like to be able to replace every nth instance in the array. I have tried:
n=5
atoms.numbers[atoms.numbers==57][::n]=38
Which does not work.

Use np.where to find the indexes of the items of interest. Find every n'th index. Update the items:
locations = np.where(numbers == 57)[0]
numbers[locations[::n]] = 38

Related

In pytorch, torch.unique is returning repititions

I have this 2-D tensor:
tmp = torch.tensor([[ 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5,
5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11,
11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 15, 16, 16, 16, 17,
17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 23,
23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27, 27, 27, 28, 28, 28, 29,
29, 29, 30, 30, 30, 31, 31, 31, 31],
[ 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5,
5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11,
11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 15, 0, 16, 16, 17,
17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 23,
23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27, 27, 27, 28, 28, 28, 29,
29, 29, 30, 30, 30, 31, 31, 31, 31]])
So there is 0 in the 50th column of row 2. When I apply torch.unique along
dim=1, I get:
a,c = torch.unique(tmp,dim=1,return_counts=True)
a
tensor([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]])
It can be seen that the second row of the output has two 0s and the first row has two 16s. Am I doing something wrong here or this is suspicious?
It is because you specified dim=1. PyTorch is thus checking for unique pairs (which it correctly does). Like (0, 0), (1, 1), (16, 0): these are the unique pairs that it generated. In general the pair (temp[0,i], temp[1,i]) is unique for all i.
If you want all the elements to be unique, just throw away the dim: torch.unique(tmp).
If you need to maintain the two list structure, the output cannot be stored as a single tensor because their sizes might not match. You can do something like output1 = torch.unique(tmp[0]) and output2 = torch.unique(tmp[1]).

How to update values in list of dictionaries

I want to update the value of a key in dictionary. This is a snippet of a list that contains over 300 dictionaries
chats = [
{'hour': 10, 'operator': 'john_doe', 'duration': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 11, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'joseph_doe', 'duration': [5, 6, 7, 8, 9], 'date': '2019-09-09'}
]
script: I am getting an error on that script. I am looping to know if this dict is already in so that I can update the duration.
chat_list = list()
for chat in chats:
hour = chat.get('hour')
operator = chat.get("operator")
if len(chat_list) == 0:
chat_list.append(chat)
else:
found = False
for i in chat_list:
hour2 = chat.get('hour')
operator2 = chat.get("operator")
if (hour2 == hour) and (operator == operator2):
found = True
#concat both dictionary
i['duration'] = i.get('duration') + chat.get("duration")
if found == True:
found = False
else:
chat_list.append(chat)
My expected output is
chat_list = [
{'hour': 10, 'operator': 'john_doe', 'duration': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 11, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'joseph_doe', 'duration': [5, 6, 7, 8, 9], 'date': '2019-09-09'}
]
or
df = pd.DataFrame(chat_list)
df['duration'] = df['duration'].apply(lambda x: list(set(x)))
To be honest, I didn't tested your algorithm. Instead I took it as a small challenge and I wrote the following algorithm which doesn't need to copy chats in to a new list.
It finds the first occurrence of "similar" chat and concat the duration arrays. Then it deletes the "duplicated" chat. Further explanation in the code itself:
chats = [
{'hour': 10, 'operator': 'john_doe', 'duration': [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'john_doe', 'duration': [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 11, 'operator': 'john_doe', 'duration': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], 'date': '2019-09-09'},
{'hour': 10, 'operator': 'joseph_doe', 'duration': [5, 6, 7, 8, 9], 'date': '2019-09-09'}
]
index = 0
while index < len(chats) - 1:
chat = chats[index]
# detect if there is another "similar" chat in the list (before this one)
first_index = next(
i for i, first_chat in enumerate(chats)
if chat.get('hour') == first_chat.get('hour') and chat.get('operator') == first_chat.get('operator')
)
# if the first index found is not this one:
# - concat `duration` arrays
# - delete this (duplicated) chat
if index != first_index:
chats[first_index]['duration'] += chat['duration']
del chats[index]
# otherwise continue and increment the index
else:
index += 1
print(chats)

remove element from the list then display list without element that remove

I have this nested list:
a = [[1, 3, 6, 11, 16, 21, 25, 28, 31, 32, 33, 34, 35, 36],
[1, 2, 5, 9, 15, 20, 24, 26, 30, 36],
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 34, 35, 36],
[1, 2, 4, 8, 14, 18, 23, 36],
[1, 2, 5, 9, 15, 20, 24, 27, 30, 36],
[1, 3, 6, 11, 16, 22, 25, 28, 31, 32, 33, 34, 35, 36],
[1, 3, 7, 12, 17, 36],
[1, 2, 4, 8, 14, 19, 23, 36],
[1, 2, 5, 10, 15, 20, 24, 26, 30, 36],
[1, 3, 6, 11, 16, 22, 25, 29, 31, 32, 33, 34, 35, 36],
[1, 2, 5, 10, 15, 20, 24, 27, 30, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 32, 33, 35, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 33, 34, 35,36],
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 35, 36]]
I need to choose max length of sublist in nested list, than compare item of sublist with nested list. If item in sublist equal then same item in nested list remove and in final print nested list without this item.
I hope I understand your question correctly.
You want input to be:
a = [[1, 3, 6, 11, 16, 21, 25, 28, 31, 32, 33, 34, 35, 36],
[1, 2, 5, 9, 15, 20, 24, 26, 30, 36],
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 34, 35, 36],
[1, 2, 4, 8, 14, 18, 23, 36],
[1, 2, 5, 9, 15, 20, 24, 27, 30, 36],
[1, 3, 6, 11, 16, 22, 25, 28, 31, 32, 33, 34, 35, 36],
[1, 3, 7, 12, 17, 36],
[1, 2, 4, 8, 14, 19, 23, 36],
[1, 2, 5, 10, 15, 20, 24, 26, 30, 36],
[1, 3, 6, 11, 16, 22, 25, 29, 31, 32, 33, 34, 35, 36],
[1, 2, 5, 10, 15, 20, 24, 27, 30, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 32, 33, 35, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 33, 34, 35, 36],
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 35, 36]]
We are removing
[1, 3, 6, 11, 16, 22, 25, 29, 31, 32, 33, 34, 35, 36]
and
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 34, 35, 36]
since they are of the same length.
The output should be:
a = [[1, 2, 5, 9, 15, 20, 24, 26, 30, 36],
[1, 2, 4, 8, 14, 18, 23, 36],
[1, 2, 5, 9, 15, 20, 24, 27, 30, 36],
[1, 3, 7, 12, 17, 36],
[1, 2, 4, 8, 14, 19, 23, 36],
[1, 2, 5, 10, 15, 20, 24, 26, 30, 36],
[1, 2, 5, 10, 15, 20, 24, 27, 30, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 32, 33, 35, 36],
[1, 3, 6, 11, 16, 21, 25, 28, 31, 33, 34, 35, 36],
[1, 3, 6, 11, 16, 21, 25, 29, 31, 32, 33, 35, 36]]
with the previous lists removed.
Your question was not worded clearly, but I hope this is what you wanted. Here is the code:
# assume a is not empty
d = {} # list of the max length -> number of occurrences in 2d array
# find the length of the longest list
maxLen = len(a[0])
for l in a:
if len(l) > maxLen:
maxLen = len(l)
# add lists of the same max length and their count to the dictionary
for l in a:
if len(l) == maxLen:
#convert list to string because python does not support list being key of a dictionary
l_string = str(l)
if l_string in d:
d[l_string] += 1
else:
d[l_string] = 1
# remove
for l_string in d:
while d[l_string] > 0:
# convert string back to list and remove
a.remove(eval(l_string))
d[l_string] -= 1
# test result if you want
for row in a:
print(row)

Description text in speech recognition

I'm trying to build my own speech recognition network. I understood how to pre-process audio. But I can't figure out the pre-processing of the text.
I have a alphabet:
alphabet = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14,'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}
And I encode each letter of the sentence into a number (27 is a space):
array([list([27, 23, 8, 5, 14, 27, 8, 5, 27, 19, 16, 5, 1, 11, 19, 27, 9, 14, 27, 15, 21, 18, 27, 12, 1, 14, 7, 21, 1, 7, 5, 27, 9, 27, 3, 1, 14, 27, 9, 14, 20, 5, 18, 16, 18, 5, 20, 27, 23, 8, 1, 20, 27, 8, 5, 27, 8, 1, 19, 27, 19, 1, 9, 4, 27]),
list([27, 19, 15, 27, 14, 15, 23, 27, 9, 27, 6, 5, 1, 18, 27, 14, 15, 20, 8, 9, 14, 7, 27, 2, 5, 3, 1, 21, 19, 5, 27, 9, 20, 27, 23, 1, 19, 27, 20, 8, 15, 19, 5, 27, 15, 13, 5, 14, 19, 27, 20, 8, 1, 20, 27, 2, 18, 15, 21, 7, 8, 20, 27, 25, 15, 21, 27, 20, 15, 27, 13, 5, 27]),
list([27, 14, 9, 7, 8, 20, 27, 6, 5, 12, 12, 27, 1, 14, 4, 27, 1, 14, 27, 1, 19, 19, 15, 18, 20, 13, 5, 14, 20, 27, 15, 6, 27, 6, 9, 7, 8, 20, 9, 14, 7, 27, 13, 5, 14, 27, 1, 14, 4, 27, 13, 5, 18, 3, 8, 1, 14, 20, 19, 27, 5, 14, 20, 5, 18, 5, 4, 27, 1, 14, 4, 27, 5, 24, 9, 20, 5, 4, 27, 20, 8, 5, 27, 20, 5, 14, 20, 27]),
list([27, 9, 27, 8, 5, 1, 18, 4, 27, 1, 27, 6, 1, 9, 14, 20, 27, 13, 15, 22, 5, 13, 5, 14, 20, 27, 21, 14, 4, 5, 18, 27, 13, 25, 27, 6, 5, 5, 20, 27]),
list([27, 25, 15, 21, 27, 3, 1, 13, 5, 27, 19, 15, 27, 20, 8, 1, 20, 27, 25, 15, 21, 27, 3, 15, 21, 12, 4, 27, 12, 5, 1, 18, 14, 27, 1, 2, 15, 21, 20, 27, 25, 15, 21, 18, 27, 4, 18, 5, 1, 13, 19, 27, 19, 1, 9, 4, 27, 20, 8, 5, 27, 15, 12, 4, 27, 23, 15, 13, 1, 14, 27])],
dtype=object)
Here are 5 sentences.
I just create one network layer and try to transfer this data there in order to get a number corresponding to the letter.
model = Sequential()
model.add(Dense(27, input_shape=(20,), activation='softmax'))
model.compile(loss='mean_squared_error',optimizer='Adam', metrics=['accuracy'])
for X, y in batch(X_train, y_train, 5):
model.train_on_batch(X, y)
batch() just breaks X_train, y_train into batch.
5 is size of batch.
But when I try to start the network I get an error
Error when checking target: expected dense_25 to have shape (27,) but got array with shape (1,)
UPD:
I'm using MFCC for X
audio, sr = librosa.load(pathTrain+"\\"+str(file), mono=True, sr=None)
fileMFCC = librosa.feature.mfcc(audio)
mean_scale = np.mean(fileMFCC, axis=0)
std_scale = np.std(fileMFCC, axis=0)
fileMFCC = (fileMFCC - mean_scale[np.newaxis, :]) / std_scale[np.newaxis, :]
X is
[array([[-4.35889894, -4.35889894, -4.35455134, ..., -3.95851777,
-3.99308173, -4.05261022],
[ 0.22941573, 0.22941573, 0.31913073, ..., 1.87189324,
1.7987301 , 1.66804349],
[ 0.22941573, 0.22941573, 0.31165866, ..., -0.27962786,
-0.19009062, -0.13788484],
...,
[ 0.22941573, 0.22941573, 0.18657944, ..., 0.14699792,
0.12751924, 0.16724807],
[ 0.22941573, 0.22941573, 0.18478513, ..., 0.00674492,
-0.04570105, 0.01231168],
[ 0.22941573, 0.22941573, 0.18232521, ..., 0.2571599 ,
0.22477036, 0.09153304]])
etc.

How to simplify my code by using a loop

I would like to obtain the subtotal (average, min, max...) of a group of data. I have achieve the goal using the code below. How can I use loop to simplify it? Many thanks!
Sub AddSubs()
Worksheets("Summary (3)").Activate
'http://msdn.microsoft.com/en-us/library/office/ff838166(v=office.15).aspx
Selection.Subtotal GroupBy:=14, Function:=xlAverage, SummaryBelowData:=False, Replace:=False, PageBreaks:=True, TotalList:=Array(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
Worksheets("Summary (3)").Activate
Selection.Subtotal GroupBy:=14, Function:=xlStDev, SummaryBelowData:=False, Replace:=False, PageBreaks:=True, TotalList:=Array(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
Worksheets("Summary (3)").Activate
Selection.Subtotal GroupBy:=14, Function:=xlMin, SummaryBelowData:=False, Replace:=False, PageBreaks:=True, TotalList:=Array(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
Worksheets("Summary (3)").Activate
Selection.Subtotal GroupBy:=14, Function:=xlMax, SummaryBelowData:=False, Replace:=False, PageBreaks:=True, TotalList:=Array(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
Worksheets("Summary (3)").Activate
Selection.Subtotal GroupBy:=14, Function:=xlCount, SummaryBelowData:=False, Replace:=False, PageBreaks:=True, TotalList:=Array(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
End Sub
Further to my comment. This is one way of simplifying your code
Sub AddSubs()
Worksheets("Summary (3)").Activate
Dim constList As Collection
Set constList = New Collection
constList.Add (xlAverage)
constList.Add (xlStDev)
constList.Add (xlMin)
constList.Add (xlMax)
constList.Add (xlCount)
Dim cnst
For Each cnst In constList
Selection.Subtotal GroupBy:=14, Function:=cnst, SummaryBelowData:=False, Replace:=False, PageBreaks:=True, TotalList:=Array(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
Next
End Sub
or even simpler (per #simocos hint)
Sub Main()
Dim cnst
For Each cnst In Array(xlAverage, xlStDev, xlMin, xlMax, xlCount)
Selection.Subtotal GroupBy:=14, Function:=cnst, SummaryBelowData:=False, Replace:=False, PageBreaks:=True, TotalList:=Array(3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
Next
End Sub

Resources