What average precision is the plot_precision_recall_curve() function plotting? - scikit-learn

After using the plot_precision_recall_curve() from scikit learn I was wondering what average precision this function is using. When looking in the docs, this is what I find for a binary target:
# %%
# Compute the average precision score
# ...................................
from sklearn.metrics import average_precision_score
average_precision = average_precision_score(y_test, y_score)
print('Average precision-recall score: {0:0.2f}'.format(
average_precision))
This is my data:
clf_4 = svm.SVC()
clf_4.fit(X_train, y_train)
y_clf_4 = clf_4.predict(X_test)
y1_test = np.array([1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1]
y1_clf4 = np.array([0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1]
average_precision_5 = average_precision_score(y1_test, y1_clf4)
average_precision_5
Out: 0.5625
Now we use the plot_precision_recall_curve with X_test being this (same as above):
X_test= np.array([[0.01167537, 0.04676259, 0.02145552, 0.015625 , 0. ,
0. , 0. , 0.5 , 0.01020408, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 1. , 0. ,
0. , 0. , 0. , 1. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.00478415, 0.01258993, 0.06759886, 0.09375 , 0. ,
0. , 0. , 0.43421053, 0. , 1. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.01503446, 0.04136691, 0.02600806, 0.015625 , 0. ,
0. , 1. , 0.13157895, 0.02721088, 0. ,
0. , 0. , 0. , 0. , 0. ,
1. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 1. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.017396 , 0.04856115, 0.07737383, 0.046875 , 0. ,
0. , 0. , 0.44736842, 0.04421769, 0. ,
0. , 1. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
1. , 0. , 0. , 1. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.0072882 , 0.01079137, 0.07866155, 0.078125 , 1. ,
0. , 0. , 0.63157895, 0. , 1. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.00733909, 0.0323741 , 0.0487578 , 0.046875 , 0. ,
0. , 0. , 0.44736842, 0.02040816, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 1. , 0. , 0. , 1. ,
0. , 1. , 0. , 0. , 0. ,
0. , 0. , 1. , 0. , 0. ,
0. ],
[0.02579371, 0.11151079, 0.03639438, 0.0625 , 0. ,
0. , 0. , 0.53947368, 0.02380952, 0. ,
0. , 0. , 1. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 1. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.00203581, 0.03417266, 0.12611863, 0.125 , 0. ,
0. , 0. , 0.05263158, 0.00680272, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 1. ,
0. , 0. , 1. , 0. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.00527275, 0.03057554, 0.0344563 , 0.03125 , 0. ,
0. , 1. , 0.09210526, 0.00680272, 1. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.00590385, 0.02158273, 0.05135926, 0.046875 , 0. ,
0. , 0. , 0.43421053, 0.00340136, 1. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 1. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.01910608, 0.16366906, 0.05917014, 0.03125 , 1. ,
0. , 1. , 0.28947368, 0.12244898, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 1. , 0. , 0. , 1. ,
0. , 0. , 1. , 0. , 0. ,
0. , 0. , 1. , 0. , 0. ,
0. ],
[0.12737045, 0.13669065, 0.07280827, 0.078125 , 1. ,
0. , 0. , 0.46052632, 0.07823129, 0. ,
0. , 1. , 0. , 0. , 0. ,
0. , 1. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
1. , 1. , 0. , 0. , 0. ,
0. , 0. , 0. , 1. , 0. ,
0. ],
[0.0537861 , 0.17446043, 0.14109651, 0.078125 , 0. ,
0. , 0. , 0.32894737, 0.08843537, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 1. ,
0. , 0. , 1. , 0. , 0. ,
0. , 1. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.01027066, 0.05755396, 0.06110172, 0.078125 , 1. ,
0. , 0. , 0.30263158, 0.01360544, 1. ,
0. , 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 1. , 0. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.0085504 , 0.01978417, 0.03185484, 0.03125 , 1. ,
1. , 0. , 0.51315789, 0.00340136, 0. ,
0. , 0. , 1. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 1. , 0. ,
0. , 1. , 0. , 0. , 0. ,
0. , 0. , 0. , 1. , 0. ,
0. ],
[0.02224122, 0.05215827, 0.06370968, 0.0625 , 0. ,
0. , 0. , 0.47368421, 0.04081633, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 1. ,
0. , 0. , 0. , 1. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.00896774, 0.05035971, 0.00974896, 0.015625 , 0. ,
0. , 0. , 0.5 , 0.02721088, 0. ,
0. , 0. , 0. , 0. , 0. ,
1. , 0. , 0. , 0. , 0. ,
0. , 0. , 1. , 0. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.03302084, 0.07014388, 0.00779787, 0.015625 , 1. ,
1. , 0. , 0.25 , 0.03741497, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 1. , 0. ,
0. , 0. , 1. , 0. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.00630083, 0.06115108, 0.01495838, 0. , 0. ,
0. , 0. , 0.10526316, 0.00340136, 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 1. , 0. ,
0. , 0. , 1. , 0. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ],
[0.00951741, 0.03776978, 0.13261576, 0.140625 , 1. ,
1. , 0. , 0.47368421, 0.0170068 , 1. ,
0. , 1. , 0. , 0. , 0. ,
0. , 1. , 0. , 0. , 0. ,
0. , 0. , 1. , 0. , 0. ,
0. , 1. , 0. , 1. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. ]])
Now we can use the plot_precision_recall_curve function and print the two results, and they differ:
disp = plot_precision_recall_curve(clf_4, X_test, y1_test)
disp.ax_.set_title(f'2-class Precision-Recall curve:{average_precision_5}')
So where does the difference come from?

The y_score parameter of average_precision_score needs to be probability estimates (or a similar continuous score), not the hard classification results. So your average_precision_5 is incorrect.

Related

Get index value from array with condition

I have a np array like that.
a = [ [0. 0. 1. 0.]
[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 1. 0.]
]
I want to get all rows index where in 3rd column if item value is ==1
a[:,2:2+1]==1
In that case my result would be
index = [0 3,3]
Is there any function that I can use for that?
import numpy as np
a=np.array([[0, 0, 1, 0],[0, 1, 0, 0],[1,0, 0, 0],[0, 0, 1, 0]])
index,value_first_at_index=np.where(a[:,2:3]==1)
print(index)

Replace python numpy matrix value based on a condition, without using a for loop

I have a symetric numpy matrix, for example.
matrix([[0. , 0.125, 0.75 , 0. , 0. ],
[0.125, 0. , 0. , 0. , 0. ],
[0.75 , 0. , 0. , 0. , 0.375],
[0. , 0. , 0. , 0. , 1.2 ],
[0. , 0. , 0.375, 1.2 , 0. ]])
If a value in the array is greater than zero, is it possible to replace that value with the multiplication of the sum of that given row and column. For example 0.125 would be replaced by 0.109375, as row_sum * col_sum = 0.125 *(0.125+0.75)=0.109375.
I know it can be done using for loop, but is it possible to do using standard numpy library as I want to avoid for loops.
Declaring the given matrix
import numpy as np
arr=np.array([[0. , 0.125, 0.75 , 0. , 0. ],
[0.125, 0. , 0. , 0. , 0. ],
[0.75 , 0. , 0. , 0. , 0.375],
[0. , 0. , 0. , 0. , 1.2 ],
[0. , 0. , 0.375, 1.2 , 0. ]])
using list comprehension and np.argwhere for conditional indices:
def replace(x,y,arr=arr,column_sums=arr.sum(axis=0),row_sum=arr.sum(axis=1)):
arr[x][y]=row_sum[x]*column_sums[y]
_=[replace(x,y) for x,y in np.argwhere(arr>0)]
The output:
array([[0. , 0.109375, 0.984375, 0. , 0. ],
[0.109375, 0. , 0. , 0. , 0. ],
[0.984375, 0. , 0. , 0. , 1.771875],
[0. , 0. , 0. , 0. , 1.89 ],
[0. , 0. , 1.771875, 1.89 , 0. ]])
Note that the code can be more optimized its laid out for better understanding
What about using numpy's indexing features?
arr[arr > 0] = x

Why the '\n' symbol is in numpy empty array?

During debug I see this:
It is just creation of empty array... Why '\n'? How to make array without it?
Program to create a numpy array of zeros
# Python 3.x
import numpy as np
y1 = np.zeros(100)
print(y1)
print("Shape of y1")
print(y1.shape)
I have directed the output of this into a 'tester.txt' file
python3 numpynewline.py >> tester.txt
The output although has a newline for displays, the shape of the array is not effected by it
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Shape of y1
(100,)
Just 100 elements
The output just looks to be having a newline for display, there is no actual '\n' in the array, you must be reading the y1 through a terminal or something, otherwise the normal python is not having any such characteristic of creating a numpy array with '\n' character
Implemented on Ubuntu 17.04, and Python 3.6

Normalizing sparse.csc_matrix by its diagonals

I have a scipy.sparse.csc_matrix with dtype = np.int32. I want to efficiently divide each column (or row, whichever faster for csc_matrix) of the matrix by the diagonal element in that column. So mnew[:,i] = m[:,i]/m[i,i] . Note that I need to convert my matrix to np.double (since mnew elements will be in [0,1]) and since the matrix is massive and very sparse I wonder if I can do it in some efficient/no for loop/never going dense way.
Best,
Ilya
Make a sparse matrix:
In [379]: M = sparse.random(5,5,.2, format='csr')
In [380]: M
Out[380]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [381]: M.diagonal()
Out[381]: array([ 0., 0., 0., 0., 0.])
too many 0s in the diagonal - lets add a nonzero diagonal:
In [382]: D=sparse.dia_matrix((np.random.rand(5),0),shape=(5,5))
In [383]: D
Out[383]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements (1 diagonals) in DIAgonal format>
In [384]: M1 = M+D
In [385]: M1
Out[385]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 10 stored elements in Compressed Sparse Row format>
In [387]: M1.A
Out[387]:
array([[ 0.35786668, 0.81754484, 0. , 0. , 0. ],
[ 0. , 0.41928992, 0. , 0.01371273, 0. ],
[ 0. , 0. , 0.4685924 , 0. , 0.35724102],
[ 0. , 0. , 0.77591294, 0.95008721, 0.16917791],
[ 0. , 0. , 0. , 0. , 0.16659141]])
Now it's trivial to divide each column by its diagonal (this is a matrix 'product')
In [388]: M1/M1.diagonal()
Out[388]:
matrix([[ 1. , 1.94983185, 0. , 0. , 0. ],
[ 0. , 1. , 0. , 0.01443313, 0. ],
[ 0. , 0. , 1. , 0. , 2.1444144 ],
[ 0. , 0. , 1.65583764, 1. , 1.01552603],
[ 0. , 0. , 0. , 0. , 1. ]])
Or divide the rows - (multiply by a column vector)
In [391]: M1/M1.diagonal()[:,None]
oops, these are dense; let's make the diagonal sparse
In [408]: md = sparse.csr_matrix(1/M1.diagonal()) # do the inverse here
In [409]: md
Out[409]:
<1x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [410]: M.multiply(md)
Out[410]:
<5x5 sparse matrix of type '<class 'numpy.float64'>'
with 5 stored elements in Compressed Sparse Row format>
In [411]: M.multiply(md).A
Out[411]:
array([[ 0. , 1.94983185, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0.01443313, 0. ],
[ 0. , 0. , 0. , 0. , 2.1444144 ],
[ 0. , 0. , 1.65583764, 0. , 1.01552603],
[ 0. , 0. , 0. , 0. , 0. ]])
md.multiply(M) for the column version.
Division of sparse matrix - similar except it is using the sum of the rows instead of the diagonal. Deals a bit more with the potential 'divide-by-zero' issue.

2d Kalman filter with acceleration

I would like to implement a kalman filter in Python for some tracking software I'm working on. It is a 2D coordinate system using a single vector x for position, velocity and acceleration of x and y coordinates,
I am using the following update and predict method:
# UPDATE
y = Z - (H * x)
S = H * P * H.T + R # residual convariance
K = P * H.T * S.I # Kalman gain
x = x + K*y
I = np.matrix(np.eye(F.shape[0])) # identity matrix
P = (I - K*H)*P
# PREDICT x, P based on motion
x = F*x
P = F*P*F.T + Q
return x, P
Using other resources I have come up with the following state transition matrix and measurement matrix but am not sure how correct they are:
F = np.matrix('''
1. 0. 1. 0. 0.5 0.;
0. 1. 0. 1. 0. 0.5;
0. 0. 1. 0. 1. 0.;
0. 0. 0. 1. 0. 1.;
0. 0. 0. 0. 1. 0.;
0. 0. 0. 0. 0. 1.
'''),
H = np.matrix('''
1. 0. 0. 0. 0. 0.;
0. 1. 0. 0. 0. 0.;
0. 0. 0. 0. 1. 0.;
0. 0. 0. 0. 0. 1.'''),
Basically it doesn't work and makes my tracked path more jittery and I have no clue where I'm going wrong. Can anyone help?

Resources