Find start/stop location of sharp cumulative events - python-3.x

Here is an example set of data:
EDIT: Included some more data.
x = [0, 5, 6,15, 20, 40, 73,
100,101,102,103,104,105,106,108,111,115,
116,117,118,119,120,123,124,125,126,127,
128,129,130,131, 150,161,170, 183, 194,
210, 234, 257, 271,272,273,274, 275,276,
277,278,279,280,281,282,283,284,285,287,
288,291,292,293,294,295,296,297,298,300,301,
302,303,304,305,306,307,308,309,310,311,
340, 351, 358, 360, 380, 390, 400,401,
402,403, 404, 405, 408, 409, 413, 420,
425,426,427,428,429,430,431,432,433,434,435,
436, 440, 450, 455]
y = np.arange(1, len(x)+1)
Here is what the data visually looks like and has the potentially for each sharp increase to be longer. The last sharp increase also has a pause, but I would like it to still be considered one set of data. Black dots are the gradient.
I am attempting to find the the start/end x-values for each sharp increase in cumulative counts. So the output should be an array of indexes, like what Riley has done.
A vectorized method would be ideal to help with any time constraints to quickly go through data. Here is rough outline of what has been done so far within a pandas dataframe.
Shift the "x-data" and take a difference
See if sequential differences are below a threshold to create logic array
Do rolling sum on logic array with so Trues will continue add to count
Find when rolling sum exceeds another threshold
Compare with previous value to ensure it is increase/decreasing for start/stop times
Add times to index list
It seems a little finicky on some of the rolling averages and isn't as quick as I would like. Multiplying some of these large arrays with logic arrays seems to take a good amount of time.
EDIT: Here is the code Riley has provided and offers an excellent start. It is also only a couple lines a code, versus my method above was almost 50 or so.
rate_threshold = 0.25
min_consecutive = 8
above_rate = np.gradient(y,x) >= rate_threshold
sequence_diff = np.diff(np.lib.stride_tricks.sliding_window_view(above_rate, window_shape=min_consecutive).all(axis=1).astype(int))
intervals = np.hstack([np.argwhere(sequence_diff==1) + 1, np.argwhere(sequence_diff==-1) + min_consecutive-1])
The new issue comes from the final sharp increase of data. Two sets of start/end points are returned, where the desired would just be one.
My initial thought is to include some kind of average routine with the sliding window to account for these drops in the gradient so the end is so hard set.

Not sure what your desired output would look like, so let's start with this, verify it does it what you want it to, then go from there
rate_threshold = 1
min_consecutive = 5
above_rate = np.diff(y)/np.diff(x) >= rate_threshold
sequence_diff = np.diff(sliding_window_view(above_rate, window_shape=min_consecutive).all(axis=1).astype(int))
intervals = np.hstack([np.argwhere(sequence_diff==1) + 1, np.argwhere(sequence_diff==-1) + min_consecutive-1])
intervals is a 2d numpy array of indices whose 2 columns are first index, and last index in a sequence (of length min_consecutive) of rates above the threshold
array([[ 7, 12],
[16, 20],
[22, 29],
[39, 52],
[56, 62],
[64, 74]], dtype=int64)

Related

Python3 bisect_left: return value does not match the Python documentation description

My codes are as follows:
import bisect
a = [186, 186, 150, 200, 160, 130, 197, 200]
print("left",bisect.bisect_left(a,150))
The return value is: 0
But as specified in the document of Python 3.9:
If x is already present in a, the insertion point will be before (to the left of) any existing entries.
150 exists in the list "a", so the return value should be 1 (i.e., a.index(150) - 1), but the actual returned value is 0 .
Would you please explain the reason?
The bisect module and generally the underlying binary search algorithm is made for sorted data. For unsorted data, the result is effectively arbitrary.
For the bisect_left algorithm sorted-ness means the algorithm does not have to check for equality: In a sequence a the position i "to the left" of any existing x is the one where a[i] < x and x <= a[i + 1]. This is because sorted-ness enforces a[j] <= a[j+1].
As such, technically the insertion point will be before (to the left of) any existing entries equal or larger than x. Sorted-ness guarantees that this is before any existing entries of x.
For the sequence [186, 186, 150, 200, 160, 130, 197, 200] and x=150, the insertion point is 0 because:
The list is initially bisected into [186, 186, 150, 200] and [160, ...].
The head of the right bisect is equal or larger than x; assuming sorted'ness, there cannot be a value smaller than x in it.
All values in the left bisect are equal or larger than x; assuming sorted'ness, the insertion point must be before all of them.
The only point before all values of the left bisection is 0.

How to delete the borders and add the name in the bar itself

I would like to delete the lines which are actually shown in the picture and also put the number (their values) in each graph, I mean the value which belong to each one. How can I do it?
The values are from a data set taken from Kaggle.
Here is some code to help you get the requested layout.
The states and the numbers are from Wikipedia.
import matplotlib.pyplot as plt
states = ['Acre', 'Alagoas', 'Amazonas', 'Amapá', 'Bahia', 'Ceará', 'Federal District',
'Espírito Santo', 'Goiás', 'Maranhão', 'Minas Gerais', 'Mato Grosso do Sul',
'Mato Grosso', 'Pará', 'Paraíba', 'Pernambuco', 'Piauí', 'Paraná', 'Rio de Janeiro',
'Rio Grande do Norte', 'Rondônia', 'Roraima', 'Rio Grande do Sul', 'Santa Catarina',
'Sergipe', 'São Paulo', 'Tocantins']
fires = [2918, 73, 7625, 24, 2383, 327, 68, 229, 1786, 5596, 2919, 451, 15476, 10747, 81, 132,
2818, 181, 396, 68, 6441, 4608, 2029, 1107, 62, 1616, 6436]
fires, states = zip(*sorted(zip(fires, states))) #sort both arrays on number of fires
fires = fires[-15:] # limit to the 15 highest numbers
states = states[-15:]
fig, ax = plt.subplots(figsize=(8, 6))
ax.barh(states, fires, color="#08519c")
plt.box(False) # remove the complete box around the plot
plt.xticks([]) # remove all the ticks on the x-axis
ax.yaxis.set_ticks_position('none') # removes the tick marks on the y-axis but leaves the text
for i, v in enumerate(fires):
ax.text(v + 180, i, f'{v:,}'.replace(',', '.'), color='#08519c', fontweight='normal', ha='left', va='center')
plt.subplots_adjust(left=0.22) # more space to read the names
plt.title('Wildfires Brazil 2019', fontsize=20, y=0.98) # title larger and a bit lower
plt.show()
PS: about
for i, v in enumerate(fires):
ax.text(v + 180, i, f'{v:,}'.replace(',', '.'), color='#08519c', fontweight='normal', ha='left', va='center')
This has a v going through each element of fires, one by one. i is the index for which fires[i] == b. ax.text(x, y, 'some text') puts a text on a certain position, where they are measured with the same distances as those marked on the axes (that's why default the axes are shown). When the axes are just text instead of numbers, they are numbered internally 0, 1, 2, 3, ... . So, x=v + 180 is the x-position where number-of-fires v+180 would be. And y=i means just the position of label number i.

Risk scoring in python

I have a metric to detect fraud, say calls, transfer rate, aux time, and so on.
I have grouped them into bins based on quartiles and now I have to give ratings from 1 to 5 based on bins. For example: calls > 150 assign ranking as 1, calls <=150 and >=300 as 2 and so on. Likewise for all the metrics.
I tried the following:
np.where(x.Calls<=125.8,1,
np.where(x.Calls>=153.2 & x.Calls<=190.0,2,np.where(x.Calls>=190.0 & x.Calls<=235.0,3,np.where(x.Calls>=235.0 & x.Calls<=304.4,4,np.where(x.Calls>=304.4,5,0))))
Error:
File "<ipython-input-32-41fe2292e308>", line 2
np.where(x.Calls>=153.2 & x.Calls<=190.0,2,np.where(x.Calls>=190.0 &
x.Calls<=235.0,3,np.where(x.Calls>=235.0 &
x.Calls<=304.4,4,np.where(x.Calls>=304.4,5,0))))
^ SyntaxError: unexpected EOF while parsing
I want the code to take the range of values from the quartiles got and give ratings to it on its own.
Your specific error indicates that you have left some parentheses open.
But you're getting this error because the nested np.where approach is really hard to implement (and therefore debug and maintain). So it's worth thinking about other ways.
The rules you want to implement aren't totally clear to me, but I think np.digitize might help you make progress. It 'quantizes' your data: you give it an array-like of bins, and it returns the bin each value of an array appears in. It works like this:
>>> import numpy as np
>>> a = np.array([55, 99, 65, 121, 189, 205, 211, 304, 999])
>>> bins = [100, 200, 300]
>>> np.digitize(a, bins=bins)
array([0, 0, 0, 1, 1, 2, 2, 3, 3])

python count number of times pandas series value goes above value X

I see there are various python/pandas argument to count number of occurrences.
However, is there a way to count the number of times, when the series values goes above $X? It shouldn't double-count for the time when the series value is above $X. It should count only when it goes above it.
For example, in the data series below, I would expect the count = 2, and not 4; given X = $200.
list_a = [100, 150, 210, 185, 166, 205, 230, 222, 195, 190, 160]
pd.DataFrame(list_a, columns = ['mtm_level'])
Check with cumsum of a Boolean Series + where + nunique. Here "above" is not inclusive of X
X = 200
df['mtm_level'].le(X).cumsum().where(df['mtm_level'].gt(X)).nunique()
#2
X = 1000
df['mtm_level'].le(X).cumsum().where(df['mtm_level'].gt(X)).nunique()
#0

pygames / drawing circle that bounces off walls

b1 = {'rect':pygame.Rect(300, 80, 50, 100), 'color':RED, 'dir':UPRIGHT}
b2 = {'rect':pygame.Rect(200, 200, 20, 20), 'color':GREEN, 'dir':UPLEFT}
b3 = {'rect':pygame.Rect(100, 150, 60, 60), 'color':BLUE, 'dir':DOWNLEFT}
b4 = {'rect':pygame.draw.circle((300, 50), 20, 0,), 'color':PURPLE, 'dir':DOWNRIGHT}
blocks = [b1, b2, b3]
# draw the block onto the surface
pygame.draw.rect(windowSurface, b['color'], b['rect'])
pygame.draw.circle(windowSurface, b['color'], (300, 50), 20, 0)
Brand new to pygames (programming in general). This is one of our first assignments, to edit a simple program that bounces squares off the walls. We are to add a circle but I'm not able to figure out how to fit it into the pre-existing dict structure for the rectangles. I'm getting the following error, what am I not seeing:
Traceback (most recent call last):
File "C:\Users\CA115\Downloads\animation1.py", line 32, in <module>
b4 = {'rect':pygame.draw.circle((300, 50), 20, 0,), 'color':PURPLE, 'dir':DOWNRIGHT}
TypeError: function takes at least 4 arguments (3 given)
A quick look at Pygame's documentation shows that you are providing less parameters than expected. In general, that's all Python is telling you when it said "TypeError: function takes at least 4 arguments (3 given)".
A little more detail: In line 32, as your error trace specifies, what causes the problem is the dictionary entry 'rect':pygame.draw.circle((300, 50), 20, 0,). From the documentation I linked, you should be able to figure out what went wrong here.
Another hint. Since you said that you are new to programming in general, you should note that the parameters for pygame.Rect and pygame.draw.circle have a significant difference: Rect takes in all primitive Python values ("numbers") while circle needs a particular object.

Resources