scikit-learn roc_curve: why does it return a threshold value = 2 some time? - scikit-learn

Correct me if I'm wrong: the "thresholds" returned by scikit-learn's roc_curve should be an array of numbers that are in [0,1]. However, it sometimes gives me an array with the first number close to "2". Is it a bug or I did sth wrong? Thanks.
In [1]: import numpy as np
In [2]: from sklearn.metrics import roc_curve
In [3]: np.random.seed(11)
In [4]: aa = np.random.choice([True, False],100)
In [5]: bb = np.random.uniform(0,1,100)
In [6]: fpr,tpr,thresholds = roc_curve(aa,bb)
In [7]: thresholds
Out[7]:
array([ 1.97396826, 0.97396826, 0.9711752 , 0.95996265, 0.95744405,
0.94983331, 0.93290463, 0.93241372, 0.93214862, 0.93076592,
0.92960511, 0.92245024, 0.91179548, 0.91112166, 0.87529458,
0.84493853, 0.84068543, 0.83303741, 0.82565223, 0.81096657,
0.80656679, 0.79387241, 0.77054807, 0.76763223, 0.7644911 ,
0.75964947, 0.73995152, 0.73825262, 0.73466772, 0.73421299,
0.73282534, 0.72391126, 0.71296292, 0.70930102, 0.70116428,
0.69606617, 0.65869235, 0.65670881, 0.65261474, 0.6487222 ,
0.64805644, 0.64221486, 0.62699782, 0.62522484, 0.62283401,
0.61601839, 0.611632 , 0.59548669, 0.57555854, 0.56828967,
0.55652111, 0.55063947, 0.53885029, 0.53369398, 0.52157349,
0.51900774, 0.50547317, 0.49749635, 0.493913 , 0.46154029,
0.45275916, 0.44777116, 0.43822067, 0.43795921, 0.43624093,
0.42039077, 0.41866343, 0.41550367, 0.40032843, 0.36761763,
0.36642721, 0.36567017, 0.36148354, 0.35843793, 0.34371331,
0.33436415, 0.33408289, 0.33387442, 0.31887024, 0.31818719,
0.31367915, 0.30216469, 0.30097917, 0.29995201, 0.28604467,
0.26930354, 0.2383461 , 0.22803687, 0.21800338, 0.19301808,
0.16902881, 0.1688173 , 0.14491946, 0.13648451, 0.12704826,
0.09141459, 0.08569481, 0.07500199, 0.06288762, 0.02073298,
0.01934336])

Most of the time these thresholds are not used, for example in calculating the area under the curve, or plotting the False Positive Rate against the True Positive Rate.
Yet to plot what looks like a reasonable curve, one needs to have a threshold that incorporates 0 data points. Since Scikit-Learn's ROC curve function need not have normalised probabilities for thresholds (any score is fine), setting this point's threshold to 1 isn't sufficient; setting it to inf is sensible but coders often expect finite data (and it's possible the implementation also works for integer thresholds). Instead the implementation uses max(score) + epsilon where epsilon = 1. This may be cosmetically deficient, but you haven't given any reason why it's a problem!

From the documentation:
thresholds : array, shape = [n_thresholds]
Decreasing thresholds on the decision function used to compute
fpr and tpr. thresholds[0] represents no instances being predicted
and is arbitrarily set to max(y_score) + 1.
So the first element of thresholds is close to 2 because it is max(y_score) + 1, in your case thresholds[1] + 1.

this seems like a bug to me - in roc_curve(aa,bb), 1 is added to the first threshold. You should create an issue here https://github.com/scikit-learn/scikit-learn/issues

Related

detect highest peaks automatically from noisy data python

Is there any way to detect the highest peaks using a python library without setting any parameter?. I'm developing a user interface and I want the algorithm to be able to detect highest peaks automatically...
I want it to be able to detect these peaks in picture below:
graph here
Data looks like this:
8.60291e-07
-1.5491e-06
5.64568e-07
-9.51195e-07
1.07203e-06
4.6521e-07
6.43967e-07
-9.86092e-07
-9.82323e-07
6.38977e-07
-1.93884e-06
-2.98309e-08
1.33543e-06
1.05064e-06
1.17332e-06
-1.53549e-07
-8.9357e-07
1.59176e-06
-2.17331e-06
1.46756e-06
5.63301e-07
-8.77556e-07
7.47681e-09
-8.30101e-07
-3.6647e-07
5.27046e-07
-1.94983e-06
1.89018e-07
1.22533e-06
8.00735e-07
-8.51166e-07
1.13437e-06
-2.75787e-07
1.79601e-06
-1.67875e-06
1.13529e-06
-1.29865e-06
9.9688e-07
-9.34486e-07
8.89931e-07
-3.88634e-07
1.15124e-06
-4.23569e-07
-1.8029e-07
1.20537e-07
4.10736e-07
-9.99077e-07
-3.62984e-07
2.97916e-06
-1.95828e-06
-1.07398e-06
2.422e-06
-6.33202e-07
-1.36953e-06
1.6694e-06
-4.71764e-07
3.98849e-07
-1.0071e-06
-9.72984e-07
8.13553e-07
2.64193e-06
-3.12365e-06
1.34049e-06
-1.30419e-06
1.48369e-07
1.26033e-06
-2.59872e-07
4.28284e-07
-6.44356e-07
2.99934e-07
8.34335e-07
3.53226e-07
-7.08252e-07
4.1243e-07
2.41525e-06
-8.92159e-07
8.82339e-08
4.31945e-06
3.75152e-06
1.091e-06
3.8204e-06
-1.21356e-06
3.35564e-06
-1.06234e-06
-5.99808e-07
2.18155e-06
5.90652e-07
-1.36728e-06
-4.97017e-07
-7.77283e-08
8.68263e-07
4.37645e-07
-1.26514e-06
2.26413e-06
-8.52966e-07
-7.35596e-07
4.11911e-07
1.7585e-06
-inf
1.10779e-08
-1.49507e-06
9.87305e-07
-3.85296e-06
4.31265e-06
-9.89227e-07
-1.33537e-06
4.1713e-07
1.89362e-07
3.21968e-07
6.80237e-08
2.31636e-07
-2.98523e-07
7.99133e-07
7.36305e-07
6.39862e-07
-1.11932e-06
-1.57262e-06
1.86305e-06
-3.63716e-07
3.83865e-07
-5.23293e-07
1.31812e-06
-1.23608e-06
2.54684e-06
-3.99796e-06
2.90441e-06
-5.20203e-07
1.36295e-06
-1.89317e-06
1.22366e-06
-1.10373e-06
2.71276e-06
9.48181e-07
7.70881e-06
5.17066e-06
6.21254e-06
1.3513e-05
1.47878e-05
8.78543e-06
1.61819e-05
1.68438e-05
1.16082e-05
5.74059e-06
4.92458e-06
1.11884e-06
-1.07419e-06
-1.28517e-06
-2.70949e-06
1.65662e-06
1.42964e-06
3.40604e-06
-5.82825e-07
1.98288e-06
1.42819e-06
1.65517e-06
4.42749e-07
-1.95609e-06
-2.1756e-07
1.69164e-06
8.7204e-08
-5.35324e-07
7.43546e-07
-1.08687e-06
2.07289e-06
2.18529e-06
-2.8161e-06
1.88821e-06
4.07272e-07
1.063e-06
8.47244e-07
1.53879e-06
-9.0799e-07
-1.26709e-07
2.40044e-06
-9.48166e-07
1.41788e-06
3.67615e-07
-1.29199e-06
3.868e-06
9.54654e-06
2.51951e-05
2.2769e-05
7.21716e-06
1.36545e-06
-1.32681e-06
-3.09641e-06
4.90417e-07
2.99335e-06
1.578e-06
6.0025e-07
2.90656e-06
-2.08258e-06
-1.54214e-06
2.19757e-07
3.74982e-06
-1.76944e-06
2.15018e-06
-1.01935e-06
4.37469e-07
1.39078e-06
6.39587e-07
-1.7807e-06
-6.16455e-09
1.61557e-06
1.59644e-06
-2.35217e-06
5.29449e-07
1.9169e-06
-7.54822e-07
2.00342e-06
-3.28452e-06
3.91663e-06
1.66016e-08
-2.65897e-06
-1.4064e-06
4.67987e-07
1.67786e-06
4.69543e-07
-8.90106e-07
-1.4584e-06
1.37915e-06
1.98483e-06
-2.3735e-06
4.45618e-07
1.91504e-06
1.09653e-06
-8.00873e-07
1.32321e-06
2.04846e-06
-1.50656e-06
7.23816e-07
2.06049e-06
-2.43918e-06
1.64417e-06
2.65411e-07
-2.66107e-06
-8.01788e-07
2.05121e-06
-1.74988e-06
1.83594e-06
-8.14026e-07
-2.69342e-06
1.81152e-06
1.11664e-07
-4.21863e-06
-7.20551e-06
-5.92407e-07
-1.44629e-06
-2.08136e-06
2.86105e-06
3.77911e-06
-1.91898e-06
1.41742e-06
2.67914e-07
-8.55835e-07
-9.8584e-07
-2.74115e-06
3.39044e-06
1.39639e-06
-2.4964e-06
8.2486e-07
2.02432e-06
1.65793e-06
-1.43094e-06
-3.36807e-06
-8.96515e-07
5.31323e-06
-8.27209e-07
-1.39221e-06
-3.3754e-06
2.12372e-06
3.08218e-06
-1.42947e-06
-2.36777e-06
3.86218e-06
2.29327e-06
-3.3941e-06
-1.67291e-06
2.63828e-06
2.21008e-07
7.07794e-07
1.8172e-06
-2.00082e-06
1.80664e-06
6.69739e-07
-3.95395e-06
1.92148e-06
-1.07187e-06
-4.04938e-07
-1.76553e-06
2.7099e-06
1.30768e-06
1.41812e-06
-1.55518e-07
-3.78302e-06
4.00137e-06
-8.38623e-07
4.54651e-07
1.00027e-06
1.32196e-06
-2.62717e-06
1.67865e-06
-6.99249e-07
2.8837e-06
-1.00516e-06
-3.68011e-06
1.61847e-06
1.90887e-06
1.59641e-06
4.16779e-07
-1.35245e-06
1.65717e-06
-2.92667e-06
3.6203e-07
2.53528e-06
-2.0578e-07
-3.41919e-07
-1.42154e-06
-2.33322e-06
3.07175e-06
-2.69165e-08
-8.21045e-07
2.3175e-06
-7.22992e-07
1.49069e-06
8.75488e-07
-2.02676e-06
-2.81158e-07
3.6004e-06
-3.94708e-06
4.72983e-06
-1.38873e-06
-6.92139e-08
-1.4678e-06
1.04251e-06
-2.06625e-06
3.10406e-06
-8.13873e-07
7.23694e-07
-9.78912e-07
-8.65967e-07
7.37335e-07
1.52563e-06
-2.33591e-06
1.78265e-06
9.58435e-07
-5.22064e-07
-2.29736e-07
-4.26996e-06
-6.61411e-06
1.14789e-06
-4.32697e-06
-5.32779e-06
2.12241e-06
-1.40726e-06
1.76086e-07
-3.77194e-06
-2.71326e-06
-9.49402e-08
1.70807e-07
-2.495e-06
4.22324e-06
-3.62476e-06
-9.56055e-07
7.16583e-07
3.01447e-06
-1.41229e-06
-1.67694e-06
7.61627e-07
3.55881e-06
2.31015e-06
-9.50378e-07
4.45251e-08
-1.94791e-06
2.27081e-06
-3.34717e-06
3.05688e-06
4.57062e-07
3.87326e-06
-2.39215e-06
-3.52682e-06
-2.05212e-06
5.26495e-06
-3.28613e-07
-5.76569e-07
-7.46338e-07
5.98795e-06
8.80493e-07
-4.82965e-06
2.56839e-06
-1.58792e-06
-2.2294e-06
1.83841e-06
2.65482e-06
-3.10474e-06
-3.46741e-07
2.45557e-06
2.01328e-06
-3.92606e-06
inf
-8.11737e-07
5.72174e-07
1.57245e-06
8.02612e-09
-2.901e-06
1.22079e-06
-6.31714e-07
3.06241e-06
1.20059e-06
-1.80344e-06
4.90784e-07
3.74243e-06
-2.94342e-07
-3.45764e-08
-3.42099e-06
-1.43695e-06
5.91064e-07
3.47308e-06
3.78232e-06
4.01093e-07
-1.58435e-06
-3.47375e-06
1.34943e-06
1.11768e-06
1.95212e-06
-8.28033e-07
1.53705e-06
6.38031e-07
-1.84702e-06
1.34689e-06
-6.98669e-07
1.81653e-06
-2.42355e-06
-1.35257e-06
3.04367e-06
-1.21976e-06
1.61896e-06
-2.69528e-06
1.84601e-06
6.45447e-08
-4.94263e-07
3.47568e-06
-2.00531e-06
3.56693e-06
-3.19446e-06
2.72141e-06
-1.39059e-06
2.20032e-06
-1.76819e-06
2.32727e-07
-3.47382e-07
2.11823e-07
-5.22614e-07
2.69846e-06
-1.47983e-06
2.14554e-06
-6.27594e-07
-8.8501e-10
7.89124e-07
-2.8653e-07
8.30902e-07
-2.12857e-06
-1.90887e-07
1.07593e-06
1.40781e-06
2.41641e-06
-4.52689e-06
2.37207e-06
-2.19479e-06
1.65131e-06
1.2706e-06
-2.18387e-06
-1.72821e-07
5.41687e-07
7.2879e-07
7.56927e-07
1.57739e-06
-3.79395e-07
-1.02887e-06
-1.20987e-06
1.43066e-06
8.96301e-08
5.09766e-07
-2.8812e-06
-2.35944e-06
2.25912e-06
-2.78967e-06
-4.69913e-06
1.60822e-06
6.9342e-07
4.6225e-07
-1.33276e-06
-3.59033e-06
1.11206e-06
1.83521e-06
2.39163e-06
2.3468e-08
5.91431e-07
-8.80249e-07
-2.77405e-08
-1.13184e-06
-1.28036e-06
1.66229e-06
2.81784e-06
-2.97589e-06
8.73413e-08
1.06439e-06
2.39075e-06
-2.76974e-06
1.20862e-06
-5.12817e-07
-5.19104e-07
4.51324e-07
-4.7168e-07
2.35608e-06
5.46906e-07
-1.66748e-06
5.85236e-07
6.42944e-07
2.43164e-07
4.01031e-07
-1.93646e-06
2.07416e-06
-1.16116e-06
4.27155e-07
5.2951e-07
9.09149e-07
-8.71887e-08
-1.5564e-09
1.07266e-06
-9.49402e-08
2.04016e-06
-6.38123e-07
-1.94241e-06
-5.17294e-06
-2.18622e-06
-8.26703e-06
2.54364e-06
4.32614e-06
8.3847e-07
-2.85309e-06
2.72345e-06
-3.42752e-06
-1.36871e-07
2.23346e-06
5.26825e-07
1.3566e-06
-2.17111e-06
2.1463e-07
2.06479e-06
1.76929e-06
-1.2655e-06
-1.3797e-06
3.10706e-06
-4.72189e-06
4.38138e-06
6.41815e-07
-3.25623e-08
-4.93707e-06
5.05743e-06
5.17578e-07
-5.30524e-06
3.62463e-06
5.68909e-07
1.16226e-06
1.10843e-06
-5.00854e-07
9.48761e-07
-2.18701e-06
-3.57635e-07
4.26709e-06
-1.50836e-06
-5.84412e-06
3.5054e-06
3.94019e-06
-4.7623e-06
2.05856e-06
-2.22992e-07
1.64969e-06
2.64694e-06
-8.49487e-07
-3.63562e-06
1.0386e-06
1.69461e-06
-2.05798e-06
3.60349e-06
3.42651e-07
-1.46686e-06
1.19949e-06
-1.60519e-06
2.37793e-07
6.12366e-07
-1.54669e-06
1.43668e-06
1.87009e-06
-2.22626e-06
2.15155e-06
-3.10571e-06
2.05188e-06
-4.40002e-07
2.06683e-06
-1.11362e-06
5.96924e-07
-2.64471e-06
2.4892e-06
1.13083e-06
-3.23181e-07
5.10651e-07
2.73499e-07
-1.24899e-06
1.40564e-06
-9.3158e-07
1.45947e-06
3.70544e-07
-1.62628e-06
-1.70215e-06
1.72098e-06
8.19031e-07
-5.57709e-07
1.10107e-06
-2.81845e-06
1.57654e-07
3.30716e-06
-9.75403e-07
1.73126e-07
1.30447e-06
7.64771e-08
-6.65344e-07
-1.4346e-06
5.03171e-06
-2.84576e-06
2.3212e-06
-2.73373e-06
2.16675e-08
2.24026e-06
-4.11682e-08
-3.36642e-06
1.78775e-06
1.28174e-08
-9.32068e-07
2.97177e-06
-1.05338e-06
9.42505e-07
2.02362e-07
-1.81326e-06
2.16995e-06
2.83722e-07
-1.2648e-06
9.21814e-07
-8.9447e-07
-1.61597e-06
3.5036e-06
-6.79626e-08
1.52823e-06
-2.98682e-06
5.57404e-07
9.5166e-07
7.10419e-07
-1.28528e-06
-3.76038e-07
-1.03845e-06
2.96631e-06
-1.18356e-06
-2.77313e-07
3.24149e-06
-1.85455e-06
-1.27747e-07
3.6264e-07
4.66431e-07
-1.54443e-06
1.38437e-06
-1.53119e-06
7.4231e-07
-1.2388e-06
1.99774e-06
1.15799e-06
1.39478e-06
-2.93527e-06
-2.03012e-06
2.46667e-06
2.16751e-06
-2.50354e-06
3.95905e-07
5.74371e-07
1.33575e-07
-3.98315e-07
4.93927e-07
-5.23987e-07
-1.74713e-07
6.49384e-07
-7.16766e-07
2.35733e-06
-4.91333e-08
-1.88138e-06
1.74722e-06
4.03503e-07
3.5965e-07
1.44836e-07]
The task you are describing could be treated like anomaly/outlier detection.
One possible solution is to use a Z-score transformation and treat every value with a z score above a certain threshold as an outlier. Because there is no clear definition of an outlier it won't be able to detect such peaks without setting any parameters (threshold).
One possible solution could be:
import numpy as np
def detect_outliers(data):
outliers = []
d_mean = np.mean(data)
d_std = np.std(data)
threshold = 3 # this defines what you would consider a peak (outlier)
for point in data:
z_score = (point - d_mean)/d_std
if np.abs(z_score) > threshold:
outliers.append(point)
return outliers
# create normal data
data = np.random.normal(size=100)
# create outliers
outliers = np.random.normal(100, size=3)
# combine normal data and outliers
full_data = data.tolist() + outliers.tolist()
# print outliers
print(detect_outliers(full_data))
If you only want to detect peaks, remove the np.abs function call from the code.
This code snippet is based on a Medium Post, which also provides another way of detecting outliers.

How does sklearn.linear_model.LinearRegression work with insufficient data?

To solve a 5 parameter model, I need at least 5 data points to get a unique solution. For x and y data below:
import numpy as np
x = np.array([[-0.24155831, 0.37083184, -1.69002708, 1.4578805 , 0.91790011,
0.31648635, -0.15957368],
[-0.37541846, -0.14572825, -2.19695883, 1.01136142, 0.57288752,
0.32080956, -0.82986857],
[ 0.33815532, 3.1123936 , -0.29317028, 3.01493602, 1.64978158,
0.56301755, 1.3958912 ],
[ 0.84486735, 4.74567324, 0.7982888 , 3.56604097, 1.47633894,
1.38743513, 3.0679506 ],
[-0.2752026 , 2.9110031 , 0.19218081, 2.0691105 , 0.49240373,
1.63213241, 2.4235483 ],
[ 0.89942508, 5.09052174, 1.26048572, 3.73477373, 1.4302902 ,
1.91907482, 3.70126468]])
y = np.array([-0.81388378, -1.59719762, -0.08256274, 0.61297275, 0.99359647,
1.11315445])
I used only 6 data to fit a 8 parameter model (7 slopes and 1 intercept).
lr = LinearRegression().fit(x, y)
print(lr.coef_)
array([-0.83916772, -0.57249998, 0.73025938, -0.02065629, 0.47637768,
-0.36962192, 0.99128474])
print(lr.intercept_)
0.2978781587718828
Clearly, it's using some kind of assignment to reduce the degrees of freedom. I tried to look into the source code but couldn't found anything about that. What method do they use to find the parameter of under specified model?
You don't need to reduce the degrees of freedom, it simply finds a solution to the least squares problem min sum_i (dot(beta,x_i)+beta_0-y_i)**2. For example, in the non-sparse case it uses the linalg.lstsq module from scipy. The default solver for this optimization problem is the gelsd LAPACK driver. If
A= np.concatenate((ones_v, X), axis=1)
is the augmented array with ones as its first column, then your solution is given by
x=numpy.linalg.pinv(A.T*A)*A.T*y
Where we use the pseudoinverse precisely because the matrix may not be of full rank. Of course, the solver doesn't actually use this formula but uses singular value Decomposition of A to reduce this formula.

Obtaining hyperpolarization depth from electrophysiological graph

I am working on electrophysiological data which is in .abf format.
I want to obtain the hyperpolarization depth as indicated above in the figure. This is what I have done so far;
import matplotlib.pyplot as plt
import pyabf
import pandas as pd
abf = pyabf.ABF("test.abf")
abf.setSweep(10) # I can access a given sweep. Here sweep 10
df = pd.DataFrame({'time': abf.sweepX, 'current':abf.sweepY})
df1 = df.loc[15650:15800]
df1.plot(x='time', y='current')
I am thinking to apply change in derivative to find the first point of interest (x1,y1) and then lower point (x2,y2), but it looks complex. I would appreciate if someone give some hint or procedure.
The dataset as follow,
time current
0.7825 -63.323975
0.78255 -63.171387
0.7826 -62.89673
0.78265 -62.713623
0.7827 -62.469482
0.78275 -62.37793
0.7828 -62.10327
0.78285 -61.950684
0.7829 -61.76758
0.78295 -61.584473
0.783 -61.401367
0.78305 -61.24878
0.7831 -61.035156
0.78315 -60.85205
0.7832 -60.72998
0.78325 -60.516357
0.7833 -60.455322
0.78335 -60.2417
0.7834 -60.08911
0.78345 -59.96704
0.7835 -59.814453
0.78355 -59.661865
0.7836 -59.509277
0.78365 -59.417725
0.7837 -59.23462
0.78375 -59.11255
0.7838 -58.95996
0.78385 -58.86841
0.7839 -58.685303
0.78395 -58.59375
0.784 -58.441162
0.78405 -58.34961
0.7841 -58.19702
0.78415 -58.044434
0.7842 -57.922363
0.78425 -57.769775
0.7843 -57.678223
0.78435 -57.434082
0.7844 -57.34253
0.78445 -56.9458
0.7845 -56.274414
0.78455 -54.96216
0.7846 -53.253174
0.78465 -51.208496
0.7847 -48.950195
0.78475 -46.325684
0.7848 -43.09082
0.78485 -38.42163
0.7849 -31.036377
0.78495 -22.033691
0.785 -13.397217
0.78505 -6.072998
0.7851 -0.61035156
0.78515 2.7160645
0.7852 3.9367676
0.78525 3.4179688
0.7853 1.3427734
0.78535 -1.4953613
0.7854 -5.0964355
0.78545 -9.185791
0.7855 -13.641357
0.78555 -18.249512
0.7856 -23.132324
0.78565 -27.98462
0.7857 -32.714844
0.78575 -37.261963
0.7858 -41.47339
0.78585 -45.22705
0.7859 -48.553467
0.78595 -51.54419
0.786 -53.985596
0.78605 -56.18286
0.7861 -58.013916
0.78615 -59.539795
0.7862 -60.760498
0.78625 -61.88965
0.7863 -62.652588
0.78635 -63.323975
0.7864 -63.934326
0.78645 -64.2395
0.7865 -64.60571
0.78655 -64.78882
0.7866 -65.00244
0.78665 -64.971924
0.7867 -65.093994
0.78675 -65.03296
0.7868 -64.971924
0.78685 -64.819336
0.7869 -64.78882
0.78695 -64.66675
0.787 -64.48364
0.78705 -64.42261
0.7871 -64.2395
0.78715 -64.11743
0.7872 -63.964844
0.78725 -63.842773
0.7873 -63.659668
0.78735 -63.568115
0.7874 -63.446045
0.78745 -63.26294
0.7875 -63.171387
0.78755 -62.98828
0.7876 -62.89673
0.78765 -62.74414
0.7877 -62.713623
0.78775 -62.530518
0.7878 -62.438965
0.78785 -62.37793
0.7879 -62.25586
0.78795 -62.164307
0.788 -62.042236
0.78805 -62.01172
0.7881 -61.88965
0.78815 -61.88965
0.7882 -61.73706
0.78825 -61.706543
0.7883 -61.645508
0.78835 -61.61499
0.7884 -61.523438
0.78845 -61.462402
0.7885 -61.431885
0.78855 -61.340332
0.7886 -61.37085
0.78865 -61.279297
0.7887 -61.279297
0.78875 -61.157227
0.7888 -61.187744
0.78885 -61.09619
0.7889 -61.157227
0.78895 -61.12671
0.789 -61.09619
0.78905 -61.12671
0.7891 -61.00464
0.78915 -61.00464
0.7892 -60.97412
0.78925 -60.97412
0.7893 -60.943604
0.78935 -61.00464
0.7894 -60.913086
0.78945 -60.97412
0.7895 -60.943604
0.78955 -60.913086
0.7896 -60.943604
0.78965 -60.85205
0.7897 -60.85205
0.78975 -60.821533
0.7898 -60.88257
0.78985 -60.88257
0.7899 -60.913086
0.78995 -60.88257
0.79 -60.913086
We can plot the difference in current between consecutive points (which essentially is to a constant factor the derivative, since times are evenly spaced). First chart shows the actual diffs. Based on this we can set some threshold, such as 0.3, and apply it to filter the main DataFrame. The filtered values are shown in orange on the second chart:
fig, ax = plt.subplots(2, figsize=(8,8))
# plot derivative
df['current'].diff().plot(ax=ax[0])
# current
threshold = 0.4
df['filtered'] = df.loc[df['current'].diff().abs() > threshold]
df.plot(ax=ax[1])
# add spans
x = df['filtered'].dropna()
ax[1].axhspan(x.iloc[0], x.iloc[-1], alpha=0.3, edgecolor='skyblue', facecolor="none", hatch='////')
ax[1].axvspan(x.index.min(), x.index.max(), alpha=0.3, edgecolor='orange', facecolor="none", hatch='\\\\')
Output:
If you're interested in range values, you can dropna values in the filtered subset and find min and max from the index:
print('min', df['filtered'].dropna().index.min())
print('max', df['filtered'].dropna().index.max())
Output:
min 0.78445
max 0.7865
For the value of the gap you can use:
abs(df['filtered'].dropna().iloc[-1] - df['filtered'].dropna().iloc[0])
Output:
7.6599100000000035
Note: We can alternatively also get left edges of these spans as points where diff in the point is lower than the threshold and diff in the next point is higher than the threshold, and similarly for the right edges. This would also work in case we have multiple peaks:
threshold = 0.3
x = df['current'].diff().abs()
spanA = df.loc[(x < threshold) & (x.shift(-1) >= threshold)]
spanB = df.loc[(x >= threshold) & (x.shift(-1) < threshold)]
print(spanA)
current
time
0.7844 -57.34253
print(spanB)
current
time
0.7865 -64.60571

How to record the value of a variable within odeint?

I would like to know if there is a way to record the value of a specific variable within the function of integration, without having to print it within the definition of the function, which in many cases, due to the algorithm of prediction-correction, lead to more or less values than the final vector returned by the function?
Example let's try with this code:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def essai(y, t):
a = y[0]
c1 = a
a = c1 / a**2
return [a]
# Solving
essai0 = [10]
t = np.linspace(0, 2000, 10)
y = odeint(essai, essai0, t)
a = y[:, 0]
# Graphs
fig, ax = plt.subplots()
ax.plot(t, a, 'k--', label='a')
legend = ax.legend(loc='lower right', shadow=True, fontsize='x-large')
legend.get_frame().set_facecolor('#FFFCCC') # 00FFCC
plt.xlabel('x')
plt.ylabel('y')
plt.title('y vs x')
plt.show()
I would like to record the values of c1 which depends on a. What should I do?
If I print, it I get (because of pred-corr algorithm):
10.0
10.001203411814794
10.00120326701222
10.002406534059283
10.00240638930896
10.031168251789499
10.03116843523562
10.059847893733858
10.059848247411573
10.088446178306066
10.088446526968276
10.178981333917179
10.1789826635142
10.26872274187664
10.268720875457465
10.251795853148066
10.251794757670828
10.324093402400061
10.324093338929458
10.395889284010963
10.395889126663482
10.467192620394076
10.467192470562162
10.60836217080531
10.608361512785885
10.747675991273601
10.747676529983982
10.885208084361661
10.88520861500753
11.021024408838219
11.021024559158226
11.15518691385528
11.15518704871583
11.389028983440005
11.389029612664437
11.618166387462095
11.618166372845774
11.842871925632974
11.842870666797078
12.063390475531826
12.0633901508557
12.279950446401756
12.279950250452782
12.492757035192547
12.492756877414479
12.790475076345272
12.79047467718475
13.081418818481728
13.081418595295522
13.366029970579808
13.366030900758636
13.644707388512776
13.644707798536366
13.917805722870085
13.917805853240296
14.185647189512732
14.185647276304193
14.448524340486092
14.44852440612534
14.849045554474056
14.849045812160185
15.239043242348172
15.239044113472564
15.619306858637934
15.619307570817467
15.990530200625596
15.990530706701604
16.353328829257094
16.35332918566708
16.70825155213741
16.708251810028536
17.055790075751844
17.055790265472186
17.52054793291328
17.520548366986496
17.97329155702487
17.97329263337524
18.414908470097206
18.41490919183692
18.84617978510828
18.846180323693773
19.26780035288661
19.26780072790131
19.68039039537204
19.680390669145883
20.084506483562638
20.084506685872917
20.63204921728682
20.632049705019547
21.165431430483114
21.16543268212929
21.685699626883885
21.685700483180575
22.193774842932424
22.193775478119036
22.69047628806277
22.69047673120133
23.176535191516802
23.1765355148269
23.652607704971896
23.652607943862492
24.296731084127696
24.296731656936466
24.92421316694978
24.924214631653445
25.536282592848192
25.536283593100098
26.134020839947766
26.134021582629195
26.718389929663125
26.718390447872228
27.290248649274574
27.290249027491374
27.8503676838429
27.85036796338048
28.60821935477876
28.608220025227006
29.346505899333515
29.346507613905608
30.066670806260635
30.066671977520553
30.769984796557875
30.769985666417984
31.457578314647648
31.457578921761066
32.13046057231114
32.13046101551341
32.78953730742519
32.789537635058444
33.68118868621462
33.68118947182226
34.54983545122736
34.549837459883506
35.39717380841791
35.397175180698845
36.22469707822626
36.224698097642104
37.033733817898586
37.03373452954837
37.82547018189015
37.82547070150822
38.60097077071101
38.60097115490064
39.65004988104156
39.650050802111195
40.67207751401193
40.67207986867377
41.669047220267416
41.66904882908885
42.64271422854618
42.64271542393563
43.594640193459966
43.59464102811222
44.52621945824691
44.52622006777859
45.43870353935591
45.438703990091476
46.67300975177773
46.673010832232926
47.87550305124021
47.87550581301012
49.04852683447106
49.04852872160157
50.194144483083306
50.19414588551954
51.3141919066777
51.31419288605143
52.41030839692969
52.41030911225109
53.48396538985435
53.483965918885744
54.9362075454971
54.93620881348237
56.35103457439806
56.35103781516747
57.73120149400896
57.731203708595864
59.07913425147381
59.07913589751868
60.39699143853227
60.39699258818307
61.68670054765226
61.68670138744394
62.94999176730058
62.949992388453296
64.65865496068966
64.65865644932029
Which is much more values than I may expect with t = np.linspace(0, 2000, 10) which divide the intervale of time in tenth of 200.
I have thought to this problem for a long time without find a really good way to do it and I would be delighted to know how to bypass this problem.
There is no relation between the evaluation points of the ODE function in the internal solver steps and the requested sample points of the solution for the output. Moreover, the evaluation points can deviate from the solution trajectory with some error of an order lower than the order of the integration method.
The easiest way to do what you want in a structured fashion is to define the c1 function as a separate function and then to call it on the results
def c1_func(y): return y[0]
def essai(y, t):
a = y[0]
c1 = c1_func(y)
a = c1 / a**2
return [a]
...
y = odeint(...
c1_val = c1_func(y.T)
plt.plot(x, c1_val)
or so.

PACF function in statsmodels.tsa.stattools gives numbers greater than 1 when using ywunbiased?

I have a dataframe which is of length 177 and I want to calculate and plot the partial auto-correlation function (PACF).
I have the data imported etc and I do:
from statsmodels.tsa.stattools import pacf
ys = pacf(data[key][array].diff(1).dropna(), alpha=0.05, nlags=176, method="ywunbiased")
xs = range(lags+1)
plt.figure()
plt.scatter(xs,ys[0])
plt.grid()
plt.vlines(xs, 0, ys[0])
plt.plot(ys[1])
The method used results in numbers greater than 1 for very long lags (90ish) which is incorrect and I get a RuntimeWarning: invalid value encountered in sqrtreturn rho, np.sqrt(sigmasq) but since I can't see their source code I don't know what this means.
To be honest, when I search for PACF, all the examples only carry out PACF up to 40 lags or 60 or so and they never have any significant PACF after lag=2 and so I couldn't compare to other examples either.
But when I use:
method="ols"
# or
method="ywmle"
the numbers are corrected. So it must be the algo they use to solve it.
I tried importing inspect and getsource method but its useless it just shows that it uses another package and I can't find that.
If you also know where the problem arises from, I would really appreciate the help.
For your reference, the values for data[key][array] are:
[1131.130005, 1144.939941, 1126.209961, 1107.300049, 1120.680054, 1140.839966, 1101.719971, 1104.23999, 1114.579956, 1130.199951, 1173.819946, 1211.920044, 1181.27002, 1203.599976, 1180.589966, 1156.849976, 1191.5, 1191.329956, 1234.180054, 1220.329956, 1228.810059, 1207.01001, 1249.47998, 1248.290039, 1280.079956, 1280.660034, 1294.869995, 1310.609985, 1270.089966, 1270.199951, 1276.660034, 1303.819946, 1335.849976, 1377.939941, 1400.630005, 1418.300049, 1438.23999, 1406.819946, 1420.859985, 1482.369995, 1530.619995, 1503.349976, 1455.27002, 1473.98999, 1526.75, 1549.380005, 1481.140015, 1468.359985, 1378.550049, 1330.630005, 1322.699951, 1385.589966, 1400.380005, 1280.0, 1267.380005, 1282.829956, 1166.359985, 968.75, 896.23999, 903.25, 825.880005, 735.090027, 797.869995, 872.8099980000001, 919.1400150000001, 919.320007, 987.4799800000001, 1020.6199949999999, 1057.079956, 1036.189941, 1095.630005, 1115.099976, 1073.869995, 1104.48999, 1169.430054, 1186.689941, 1089.410034, 1030.709961, 1101.599976, 1049.329956, 1141.199951, 1183.26001, 1180.550049, 1257.640015, 1286.119995, 1327.219971, 1325.829956, 1363.609985, 1345.199951, 1320.640015, 1292.280029, 1218.890015, 1131.420044, 1253.300049, 1246.959961, 1257.599976, 1312.410034, 1365.680054, 1408.469971, 1397.910034, 1310.329956, 1362.160034, 1379.319946, 1406.579956, 1440.670044, 1412.160034, 1416.180054, 1426.189941, 1498.109985, 1514.680054, 1569.189941, 1597.569946, 1630.73999, 1606.280029, 1685.72998, 1632.969971, 1681.550049, 1756.540039, 1805.810059, 1848.359985, 1782.589966, 1859.449951, 1872.339966, 1883.949951, 1923.569946, 1960.22998, 1930.6700440000002, 2003.369995, 1972.290039, 2018.050049, 2067.560059, 2058.899902, 1994.9899899999998, 2104.5, 2067.889893, 2085.51001, 2107.389893, 2063.110107, 2103.840088, 1972.180054, 1920.030029, 2079.360107, 2080.409912, 2043.939941, 1940.2399899999998, 1932.22998, 2059.73999, 2065.300049, 2096.949951, 2098.860107, 2173.600098, 2170.949951, 2168.27002, 2126.149902, 2198.810059, 2238.830078, 2278.8701170000004, 2363.639893, 2362.719971, 2384.199951, 2411.800049, 2423.409912, 2470.300049, 2471.649902, 2519.360107, 2575.26001, 2584.840088, 2673.610107, 2823.810059, 2713.830078, 2640.8701170000004, 2648.050049, 2705.27002, 2718.3701170000004, 2816.290039, 2901.52002, 2913.97998]
Your time series is pretty clearly not stationary, so that Yule-Walker assumptions are violated.
More generally, PACF is usually appropriate with stationary time series. You might difference your data first, before considering the partial autocorrelations.

Resources