How to find similar events of a signal in python - python-3.x

I have one signal and now I want to find the similar events happened multiple times in that signal. For example the below picture we can see there are three events.
I have created this using below values:
import matplotlib.pyplot as plt
y = [
0, 1, 2,
.3, .4, .4, .6, .5, .4, .3, .4, .5, .6, .7, .6, .5, .4, .3, .5, .7, .8, .6, .3, .4, .5, .6, .5, .4, .3, #1
.3, .3, .3, .3, .3, 5, 4,
0, 1, 3, 4, 8, 9, 13, 10, 7, 4, 6, 3, 4, 3, #2
.3, 4, 4.4, 4.3, 3, 3.4, 3.2, 4, 3.8, 4, 6, 6, 5, 4, 1,
.3, .4, .5, .6, .5, .4, .3, .4, .5, .6, .7, .6, .3, .4, .3, .5, .7, .8, .6, .3, .4, .6, .6, .5, .4, .3, # 1
0, 1, 3, 4, 6, 9, 13.5, 9.5, 7, 4, 6, 3, 4, 3, #2
.3, .4, .5, .4, .5, .4, .3, .4, .5, .6, .7, .6, .5, .4, .3, .5, .7, .8, .6, .3, .4, .5, .6, .5, .4, .3, # 1
0, 1, 3, 4, 8, 9, 14, 10, 7, 4, 6, 3, 4, 3, #2
.3, 2, 1, 1,
2, 3, 4, 4.5, 4, 3, 2, 3, 4, 4, 4, 3, 2, #3
1,2,2,1,1,
2, 3, 4, 4, 4, 3, 2, 3, 4, 5, 4, 3, 2, #3
1,2,3, .2, .1, 0
]
plt.plot(y)
plt.show()
First event:
Second event:
Third event:
First event happend 3 times, second event 3 times and third event 2 times. They are almost similar. In real situation the similarity will be little bit less than shown in the above picture.
Now i want to find:
1. similar type events
2. how similar they are
3. where it happend.
For example first event occured again in:
timestamp | similarity
04 - 22 | 100%
60 - 90 | 95%
110 - 130 | 96%
I want to do it in python. How can i do this? Is there any signal processing libraries for this kind of task?

Using numpy you could differentiate using numpy.diff which could give you a clue based on a certain threshold where your events happen. After you get those events, you can use numpy.correlate between your events and the input signal to find the most likely places where the events occur.
correlation
differentiation
Follow-up
If you want to completely automate it for any arbitrary signal, there's gonna have to be some machine learning in there somewhere. I won't give you a machine learning algo outright. I will, however, give you pointers as to what parameters could be considered as factors in your algo.
Take a look at the following
import matplotlib.pyplot as plt
import numpy as np
y = [0, 1, 2,
.3, .4, .4, .6, .5, .4, .3, .4, .5, .6, .7, .6, .5, .4, .3, .5, .7, .8, .6, .3, .4, .5, .6, .5, .4, .3,
.3, .3, .3, .3, .3, 5, 4,
0, 1, 3, 4, 8, 9, 13, 10, 7, 4, 6, 3, 4, 3,
.3, 4, 4.4, 4.3, 3, 3.4, 3.2, 4, 3.8, 4, 6, 6, 5, 4, 1,
.3, .4, .5, .6, .5, .4, .3, .4, .5, .6, .7, .6, .3, .4, .3, .5, .7, .8, .6, .3, .4, .6, .6, .5, .4, .3,
0, 1, 3, 4, 6, 9, 13.5, 9.5, 7, 4, 6, 3, 4, 3,
.3, .4, .5, .4, .5, .4, .3, .4, .5, .6, .7, .6, .5, .4, .3, .5, .7, .8, .6, .3, .4, .5, .6, .5, .4, .3,
0, 1, 3, 4, 8, 9, 14, 10, 7, 4, 6, 3, 4, 3,
.3, 2, 1, 1,
2, 3, 4, 4.5, 4, 3, 2, 3, 4, 4, 4, 3, 2,
1, 2, 2, 1, 1,
2, 3, 4, 4, 4, 3, 2, 3, 4, 5, 4, 3, 2,
1, 2, 3, .2, .1, 0
]
x = 5 #Mean Convolution Window Width
v = np.full(x,1/x) #Convolution vector
y=np.convolve(y,v) #Apply X-Width Convolution on signal to get average between X samples
diff_y = np.abs(np.diff(y))
plt.figure()
plt.plot(diff_y)
plt.show()
diff_y_filtered=diff_y
diff_y_filtered[diff_y < 0.20*np.nanmax(diff_y)] = 0 #Differientiate the signal to look for large variations
plt.figure()
plt.plot(y)
plt.plot(diff_y_filtered)
plt.show()
diff_y_peaks = np.diff(diff_y_filtered) #Second derivative
diff_y_peaks = np.convolve(diff_y_peaks,[-1/8,-1/4,-1/2,1.75,-1/2,-1/4,-1/8]) #You may wanna tweak the filter
plt.figure()
plt.plot(diff_y_peaks)
plt.show()
diff_y_peaks[diff_y_peaks < 0.5] = 0; #This is a parameter that would be tweaked
diff_y_peaks[diff_y_peaks > 0]=np.nanmax(y) #Make things pretty
plt.figure()
plt.stem(diff_y_peaks)
plt.plot(y)
plt.show()
First and second derivatives come to mind because you'd be looking for large variations, I presume, in the signal. Even if events contain large variations themselves, this shouldn't be a problem because of a factor I will point out later on.
I first smoothed the signal to avoid having large variations within events. Then I applied the first derivative to find large variations. I then cut small peaks with a threshold. That threshold's value could be a parameter in your model.
I then filter a second time and convolve again. You can tweak that filter to give you better results. Some filters find more possible cutoff points, some less. It's not a big deal if you get many cutoff points next to each other, because you can train your model to find event lengths that are too small, or you can directly specify that, say, 2 cutoff points 5 samples apart don't constitute an event.
Personally, I believe you might want to look at getting more events that are contained within a bigger event, that will probably get you the best results.

Related

How to add legends on gauge chart using plotly.graph_object

I have created a gauge chart but I want to mention the meaning of labels, like 0 = low and 5 = high. that means I will need two labels (low on the left and high on the right).
Here is how my graph look like:
code:
import plotly.graph_objects as go
fig = go.Figure(go.Indicator(
domain = {'x': [0, 1], 'y': [0, 1]},
value = 4.3,
mode = "gauge+number+delta",
title = {'text': "General satisfaction"},
delta = {'reference': 2.5},
gauge = {'axis': {'range': [None, 5], 'tickwidth': 1,'tickcolor': "black"},
'bar': {'color': "MidnightBlue"},
'steps' : [
{'range': [0, 1], 'color': "DarkTurquoise"},
{'range': [1, 2], 'color': "MediumTurquoise"},
{'range': [2, 3], 'color': "Turquoise"},
{'range': [3, 4], 'color': "PaleTurquoise"},
{'range': [4, 5], 'color': "lightcyan"}],
'threshold' : {'line': {'color': "brown", 'width': 4}, 'thickness': 0.75, 'value': 4.8}}))
fig.show()
Is there any parameter that can help me in this case?
graph objects indicator tickmode, tickvals and ticktext
demonstrated below
fig.update_traces(
gauge={
"axis": {
"tickmode": "array",
"tickvals": list(range(6)),
"ticktext": ["0 - low" if i == 0 else "5 - high" if i==5 else i for i in range(6)],
}
}
)

how to change float 5.0 to number 5 in list of lists

I have the following list
X=[[[0, 'rating'], [1, 4.0], [2, 5.0], [1, 5.0], [0, 4.0], [8, 5.0], [3, 2.0], [5, 5.0], [4, 3.0], [2, 5.0]]]
y=[1, 1, 1, 1, 1, 0, 1, 0, 1, 1]
And I want to fit with sklearn.linear_model in order to classify and count the accuracy of the training data.
By using the following code
classifier = Perceptron(tol=1e-5, random_state=0)
classifier.fit(X,y)
I got this error: ValueError: could not convert string to float: 'rating'
I guess the problem is the float 5.0, but how can I simply change it? I tried with [[int(x) for x in x[1]]]

Adding unique titles to subplots in matplotlib using a for loop

I have 8 subplots which I have created through a for loop, I've managed to add axis labels which was fine because these were identical across all charts. However my titles are different. Can anyone please tell me how to amend my for loop so it iterates through the titles? I have tried ax.set_title(titles) but this just lumps them all together rather than iterating through each one. I've also tried making the titles individual lists within a list but this didn't work either. The data relate to the Insanity Fit Test which is carried out 5 times throughout the Insanity program (in case you were wondering what the x axis was showing). My code is below:
fit_list = [[78.0, 94.0, 108.0, 117.0, 124.0], [40.0, 46.0, 48.0, 50.0, 50.0], [70.0, 90.0, 103.0, 100.0, 111.0],
[37.0, 38.0, 44.0, 55.0, 72.0], [5.0, 9.0, 8.0, 9.0, 9.0], [11.0, 15.0, 17.0, 18.0, 21.0],
[24.0, 30.0, 32.0, 34.0, 36.0], [35.0, 44.0, 50.0, 53.0, 64.0]]
x = [[1, 2, 3, 4, 5,], [1, 2, 3, 4, 5,], [1, 2, 3, 4, 5,], [1, 2, 3, 4, 5,],
[1, 2, 3, 4, 5,], [1, 2, 3, 4, 5,], [1, 2, 3, 4, 5,], [1, 2, 3, 4, 5,]]
y = fit_list
titles = ['Switch kicks', 'Power jacks', 'Power knees', 'Power jumps', 'Globe jumps', 'Suicide jumps',
'Push up jacks', 'Low plank obliques']
fig, axes = plt.subplots(nrows = 2, ncols = 4, figsize = (20, 10))
for exercise, ax in enumerate(axes.flatten()):
ax.bar(x[exercise], y[exercise], color = 'red')
ax.set_xlabel('Fit test event')
ax.set_ylabel('Number of reps')
plt.tight_layout()
Any help would be greatly appreciated. Thank you in advance.
One option could be the following:
for t, exercise in zip(titles, range(8)):
ax = axes.ravel()[exercise]
ax.bar(x[exercise], y[exercise], color = 'red')
ax.set_xlabel('Fit test event')
ax.set_ylabel('Number of reps')
ax.set_title(t)

Sorting the following probability distributions from lowest to highest entropy

I am trying to order 4 dictionary lists from lowest to highest and I am invalid syntax (I am new to bioinformatics)
I have tried inline sorting
lists = sorted(list_dct.items, key=lambda k: k['name'])
list_dct = [{'name': 0.5, 0, 0, 0.5},
{'name' : 0.25, 0.25, 0.25, 0.25},
{'name' : 0, 0, 0, 1},
{'name' : 0.25, 0, 0.5, 0.25}]
print(lists)
I am getting an invalid syntax message... I should get the lists sorted by row lowest to row highest
You need to construct your dictionaries correctly. I've chosen to make the values a list. Then sort them with a list comprehension:
list_dct = [{'name': [0.5, 0, 0, 0.5]},
{'name' : [0.25, 0.25, 0.25, 0.25]},
{'name' : [0, 0, 0, 1]},
{'name' : [0.25, 0, 0.5, 0.25]}]
sorted([ d.get('name') for d in list_dct ])
1.) Define list_dct before the sorted() function, otherwise it's syntax error
2.) You want to sort whole list_dct, not list_dct.items()
3.) Make custom key= sorting function, where from each item we're sorting we select 'name' key.
list_dct = [{'name': [0.5, 0, 0, 0.5]},
{'name' : [0.25, 0.25, 0.25, 0.25]},
{'name' : [0, 0, 0, 1]},
{'name' : [0.25, 0, 0.5, 0.25]}]
lists = sorted(list_dct, key=lambda k: k['name'])
from pprint import pprint
pprint(lists)
Prints:
[{'name': [0, 0, 0, 1]},
{'name': [0.25, 0, 0.5, 0.25]},
{'name': [0.25, 0.25, 0.25, 0.25]},
{'name': [0.5, 0, 0, 0.5]}]

Using dijkstra_path function in networkx library

I'm using networkx library to find shortest path between two nodes using dijkstra algo as follows
import networkx as nx
A = [[0, 100, 0, 0 , 40, 0],
[100, 0, 20, 0, 0, 70],
[0, 20, 0, 80, 50, 0],
[0, 0, 80, 0, 0, 30],
[40, 0, 50, 0, 0, 60],
[0, 70, 0, 30, 60, 0]];
print(nx.dijkstra_path(A, 0, 4))
In the above code I'm using matrix directly, But library requires graph to be created as follows
G = nx.Graph()
G = nx.add_node(<node>)
G.add_edge(<node 1>, <node 2>)
It is very time consuming to create matrix by using above commands. Is there any way to give input as weighted matrix to the dijkstra_path function.
First you need to convert your adjacency matrix to a numpy matrix with np.array.
Then you can simply create your graph with from_numpy_matrix.
import networkx as nx
import numpy as np
A = [[0, 100, 0, 0 , 40, 0],
[100, 0, 20, 0, 0, 70],
[0, 20, 0, 80, 50, 0],
[0, 0, 80, 0, 0, 30],
[40, 0, 50, 0, 0, 60],
[0, 70, 0, 30, 60, 0]]
a = np.array(A)
G = nx.from_numpy_matrix(a)
print(nx.dijkstra_path(G, 0, 4))
Output:
[0, 4]
Side note: you can check the graph edges with the following code.
for edge in G.edges(data=True):
print(edge)
Output:
(0, 1, {'weight': 100})
(0, 4, {'weight': 40})
(1, 2, {'weight': 20})
(1, 5, {'weight': 70})
(2, 3, {'weight': 80})
(2, 4, {'weight': 50})
(3, 5, {'weight': 30})
(4, 5, {'weight': 60})

Resources