I ran this code in python
power_analysis = smp.TTestIndPower()
power_analysis.solve_power(
nobs1=143293,
power=0.8,
alpha=0.05,
ratio=1.0*143398/143293,
alternative='two-sided'
)
The output is = 0.01046481401112572
Is the output effect size? Can the effect size be calculated even without the standard deviation and mean of sample 1, 2?
Is there any way to detect the highest peaks using a python library without setting any parameter?. I'm developing a user interface and I want the algorithm to be able to detect highest peaks automatically...
I want it to be able to detect these peaks in picture below:
graph here
Data looks like this:
8.60291e-07
-1.5491e-06
5.64568e-07
-9.51195e-07
1.07203e-06
4.6521e-07
6.43967e-07
-9.86092e-07
-9.82323e-07
6.38977e-07
-1.93884e-06
-2.98309e-08
1.33543e-06
1.05064e-06
1.17332e-06
-1.53549e-07
-8.9357e-07
1.59176e-06
-2.17331e-06
1.46756e-06
5.63301e-07
-8.77556e-07
7.47681e-09
-8.30101e-07
-3.6647e-07
5.27046e-07
-1.94983e-06
1.89018e-07
1.22533e-06
8.00735e-07
-8.51166e-07
1.13437e-06
-2.75787e-07
1.79601e-06
-1.67875e-06
1.13529e-06
-1.29865e-06
9.9688e-07
-9.34486e-07
8.89931e-07
-3.88634e-07
1.15124e-06
-4.23569e-07
-1.8029e-07
1.20537e-07
4.10736e-07
-9.99077e-07
-3.62984e-07
2.97916e-06
-1.95828e-06
-1.07398e-06
2.422e-06
-6.33202e-07
-1.36953e-06
1.6694e-06
-4.71764e-07
3.98849e-07
-1.0071e-06
-9.72984e-07
8.13553e-07
2.64193e-06
-3.12365e-06
1.34049e-06
-1.30419e-06
1.48369e-07
1.26033e-06
-2.59872e-07
4.28284e-07
-6.44356e-07
2.99934e-07
8.34335e-07
3.53226e-07
-7.08252e-07
4.1243e-07
2.41525e-06
-8.92159e-07
8.82339e-08
4.31945e-06
3.75152e-06
1.091e-06
3.8204e-06
-1.21356e-06
3.35564e-06
-1.06234e-06
-5.99808e-07
2.18155e-06
5.90652e-07
-1.36728e-06
-4.97017e-07
-7.77283e-08
8.68263e-07
4.37645e-07
-1.26514e-06
2.26413e-06
-8.52966e-07
-7.35596e-07
4.11911e-07
1.7585e-06
-inf
1.10779e-08
-1.49507e-06
9.87305e-07
-3.85296e-06
4.31265e-06
-9.89227e-07
-1.33537e-06
4.1713e-07
1.89362e-07
3.21968e-07
6.80237e-08
2.31636e-07
-2.98523e-07
7.99133e-07
7.36305e-07
6.39862e-07
-1.11932e-06
-1.57262e-06
1.86305e-06
-3.63716e-07
3.83865e-07
-5.23293e-07
1.31812e-06
-1.23608e-06
2.54684e-06
-3.99796e-06
2.90441e-06
-5.20203e-07
1.36295e-06
-1.89317e-06
1.22366e-06
-1.10373e-06
2.71276e-06
9.48181e-07
7.70881e-06
5.17066e-06
6.21254e-06
1.3513e-05
1.47878e-05
8.78543e-06
1.61819e-05
1.68438e-05
1.16082e-05
5.74059e-06
4.92458e-06
1.11884e-06
-1.07419e-06
-1.28517e-06
-2.70949e-06
1.65662e-06
1.42964e-06
3.40604e-06
-5.82825e-07
1.98288e-06
1.42819e-06
1.65517e-06
4.42749e-07
-1.95609e-06
-2.1756e-07
1.69164e-06
8.7204e-08
-5.35324e-07
7.43546e-07
-1.08687e-06
2.07289e-06
2.18529e-06
-2.8161e-06
1.88821e-06
4.07272e-07
1.063e-06
8.47244e-07
1.53879e-06
-9.0799e-07
-1.26709e-07
2.40044e-06
-9.48166e-07
1.41788e-06
3.67615e-07
-1.29199e-06
3.868e-06
9.54654e-06
2.51951e-05
2.2769e-05
7.21716e-06
1.36545e-06
-1.32681e-06
-3.09641e-06
4.90417e-07
2.99335e-06
1.578e-06
6.0025e-07
2.90656e-06
-2.08258e-06
-1.54214e-06
2.19757e-07
3.74982e-06
-1.76944e-06
2.15018e-06
-1.01935e-06
4.37469e-07
1.39078e-06
6.39587e-07
-1.7807e-06
-6.16455e-09
1.61557e-06
1.59644e-06
-2.35217e-06
5.29449e-07
1.9169e-06
-7.54822e-07
2.00342e-06
-3.28452e-06
3.91663e-06
1.66016e-08
-2.65897e-06
-1.4064e-06
4.67987e-07
1.67786e-06
4.69543e-07
-8.90106e-07
-1.4584e-06
1.37915e-06
1.98483e-06
-2.3735e-06
4.45618e-07
1.91504e-06
1.09653e-06
-8.00873e-07
1.32321e-06
2.04846e-06
-1.50656e-06
7.23816e-07
2.06049e-06
-2.43918e-06
1.64417e-06
2.65411e-07
-2.66107e-06
-8.01788e-07
2.05121e-06
-1.74988e-06
1.83594e-06
-8.14026e-07
-2.69342e-06
1.81152e-06
1.11664e-07
-4.21863e-06
-7.20551e-06
-5.92407e-07
-1.44629e-06
-2.08136e-06
2.86105e-06
3.77911e-06
-1.91898e-06
1.41742e-06
2.67914e-07
-8.55835e-07
-9.8584e-07
-2.74115e-06
3.39044e-06
1.39639e-06
-2.4964e-06
8.2486e-07
2.02432e-06
1.65793e-06
-1.43094e-06
-3.36807e-06
-8.96515e-07
5.31323e-06
-8.27209e-07
-1.39221e-06
-3.3754e-06
2.12372e-06
3.08218e-06
-1.42947e-06
-2.36777e-06
3.86218e-06
2.29327e-06
-3.3941e-06
-1.67291e-06
2.63828e-06
2.21008e-07
7.07794e-07
1.8172e-06
-2.00082e-06
1.80664e-06
6.69739e-07
-3.95395e-06
1.92148e-06
-1.07187e-06
-4.04938e-07
-1.76553e-06
2.7099e-06
1.30768e-06
1.41812e-06
-1.55518e-07
-3.78302e-06
4.00137e-06
-8.38623e-07
4.54651e-07
1.00027e-06
1.32196e-06
-2.62717e-06
1.67865e-06
-6.99249e-07
2.8837e-06
-1.00516e-06
-3.68011e-06
1.61847e-06
1.90887e-06
1.59641e-06
4.16779e-07
-1.35245e-06
1.65717e-06
-2.92667e-06
3.6203e-07
2.53528e-06
-2.0578e-07
-3.41919e-07
-1.42154e-06
-2.33322e-06
3.07175e-06
-2.69165e-08
-8.21045e-07
2.3175e-06
-7.22992e-07
1.49069e-06
8.75488e-07
-2.02676e-06
-2.81158e-07
3.6004e-06
-3.94708e-06
4.72983e-06
-1.38873e-06
-6.92139e-08
-1.4678e-06
1.04251e-06
-2.06625e-06
3.10406e-06
-8.13873e-07
7.23694e-07
-9.78912e-07
-8.65967e-07
7.37335e-07
1.52563e-06
-2.33591e-06
1.78265e-06
9.58435e-07
-5.22064e-07
-2.29736e-07
-4.26996e-06
-6.61411e-06
1.14789e-06
-4.32697e-06
-5.32779e-06
2.12241e-06
-1.40726e-06
1.76086e-07
-3.77194e-06
-2.71326e-06
-9.49402e-08
1.70807e-07
-2.495e-06
4.22324e-06
-3.62476e-06
-9.56055e-07
7.16583e-07
3.01447e-06
-1.41229e-06
-1.67694e-06
7.61627e-07
3.55881e-06
2.31015e-06
-9.50378e-07
4.45251e-08
-1.94791e-06
2.27081e-06
-3.34717e-06
3.05688e-06
4.57062e-07
3.87326e-06
-2.39215e-06
-3.52682e-06
-2.05212e-06
5.26495e-06
-3.28613e-07
-5.76569e-07
-7.46338e-07
5.98795e-06
8.80493e-07
-4.82965e-06
2.56839e-06
-1.58792e-06
-2.2294e-06
1.83841e-06
2.65482e-06
-3.10474e-06
-3.46741e-07
2.45557e-06
2.01328e-06
-3.92606e-06
inf
-8.11737e-07
5.72174e-07
1.57245e-06
8.02612e-09
-2.901e-06
1.22079e-06
-6.31714e-07
3.06241e-06
1.20059e-06
-1.80344e-06
4.90784e-07
3.74243e-06
-2.94342e-07
-3.45764e-08
-3.42099e-06
-1.43695e-06
5.91064e-07
3.47308e-06
3.78232e-06
4.01093e-07
-1.58435e-06
-3.47375e-06
1.34943e-06
1.11768e-06
1.95212e-06
-8.28033e-07
1.53705e-06
6.38031e-07
-1.84702e-06
1.34689e-06
-6.98669e-07
1.81653e-06
-2.42355e-06
-1.35257e-06
3.04367e-06
-1.21976e-06
1.61896e-06
-2.69528e-06
1.84601e-06
6.45447e-08
-4.94263e-07
3.47568e-06
-2.00531e-06
3.56693e-06
-3.19446e-06
2.72141e-06
-1.39059e-06
2.20032e-06
-1.76819e-06
2.32727e-07
-3.47382e-07
2.11823e-07
-5.22614e-07
2.69846e-06
-1.47983e-06
2.14554e-06
-6.27594e-07
-8.8501e-10
7.89124e-07
-2.8653e-07
8.30902e-07
-2.12857e-06
-1.90887e-07
1.07593e-06
1.40781e-06
2.41641e-06
-4.52689e-06
2.37207e-06
-2.19479e-06
1.65131e-06
1.2706e-06
-2.18387e-06
-1.72821e-07
5.41687e-07
7.2879e-07
7.56927e-07
1.57739e-06
-3.79395e-07
-1.02887e-06
-1.20987e-06
1.43066e-06
8.96301e-08
5.09766e-07
-2.8812e-06
-2.35944e-06
2.25912e-06
-2.78967e-06
-4.69913e-06
1.60822e-06
6.9342e-07
4.6225e-07
-1.33276e-06
-3.59033e-06
1.11206e-06
1.83521e-06
2.39163e-06
2.3468e-08
5.91431e-07
-8.80249e-07
-2.77405e-08
-1.13184e-06
-1.28036e-06
1.66229e-06
2.81784e-06
-2.97589e-06
8.73413e-08
1.06439e-06
2.39075e-06
-2.76974e-06
1.20862e-06
-5.12817e-07
-5.19104e-07
4.51324e-07
-4.7168e-07
2.35608e-06
5.46906e-07
-1.66748e-06
5.85236e-07
6.42944e-07
2.43164e-07
4.01031e-07
-1.93646e-06
2.07416e-06
-1.16116e-06
4.27155e-07
5.2951e-07
9.09149e-07
-8.71887e-08
-1.5564e-09
1.07266e-06
-9.49402e-08
2.04016e-06
-6.38123e-07
-1.94241e-06
-5.17294e-06
-2.18622e-06
-8.26703e-06
2.54364e-06
4.32614e-06
8.3847e-07
-2.85309e-06
2.72345e-06
-3.42752e-06
-1.36871e-07
2.23346e-06
5.26825e-07
1.3566e-06
-2.17111e-06
2.1463e-07
2.06479e-06
1.76929e-06
-1.2655e-06
-1.3797e-06
3.10706e-06
-4.72189e-06
4.38138e-06
6.41815e-07
-3.25623e-08
-4.93707e-06
5.05743e-06
5.17578e-07
-5.30524e-06
3.62463e-06
5.68909e-07
1.16226e-06
1.10843e-06
-5.00854e-07
9.48761e-07
-2.18701e-06
-3.57635e-07
4.26709e-06
-1.50836e-06
-5.84412e-06
3.5054e-06
3.94019e-06
-4.7623e-06
2.05856e-06
-2.22992e-07
1.64969e-06
2.64694e-06
-8.49487e-07
-3.63562e-06
1.0386e-06
1.69461e-06
-2.05798e-06
3.60349e-06
3.42651e-07
-1.46686e-06
1.19949e-06
-1.60519e-06
2.37793e-07
6.12366e-07
-1.54669e-06
1.43668e-06
1.87009e-06
-2.22626e-06
2.15155e-06
-3.10571e-06
2.05188e-06
-4.40002e-07
2.06683e-06
-1.11362e-06
5.96924e-07
-2.64471e-06
2.4892e-06
1.13083e-06
-3.23181e-07
5.10651e-07
2.73499e-07
-1.24899e-06
1.40564e-06
-9.3158e-07
1.45947e-06
3.70544e-07
-1.62628e-06
-1.70215e-06
1.72098e-06
8.19031e-07
-5.57709e-07
1.10107e-06
-2.81845e-06
1.57654e-07
3.30716e-06
-9.75403e-07
1.73126e-07
1.30447e-06
7.64771e-08
-6.65344e-07
-1.4346e-06
5.03171e-06
-2.84576e-06
2.3212e-06
-2.73373e-06
2.16675e-08
2.24026e-06
-4.11682e-08
-3.36642e-06
1.78775e-06
1.28174e-08
-9.32068e-07
2.97177e-06
-1.05338e-06
9.42505e-07
2.02362e-07
-1.81326e-06
2.16995e-06
2.83722e-07
-1.2648e-06
9.21814e-07
-8.9447e-07
-1.61597e-06
3.5036e-06
-6.79626e-08
1.52823e-06
-2.98682e-06
5.57404e-07
9.5166e-07
7.10419e-07
-1.28528e-06
-3.76038e-07
-1.03845e-06
2.96631e-06
-1.18356e-06
-2.77313e-07
3.24149e-06
-1.85455e-06
-1.27747e-07
3.6264e-07
4.66431e-07
-1.54443e-06
1.38437e-06
-1.53119e-06
7.4231e-07
-1.2388e-06
1.99774e-06
1.15799e-06
1.39478e-06
-2.93527e-06
-2.03012e-06
2.46667e-06
2.16751e-06
-2.50354e-06
3.95905e-07
5.74371e-07
1.33575e-07
-3.98315e-07
4.93927e-07
-5.23987e-07
-1.74713e-07
6.49384e-07
-7.16766e-07
2.35733e-06
-4.91333e-08
-1.88138e-06
1.74722e-06
4.03503e-07
3.5965e-07
1.44836e-07]
The task you are describing could be treated like anomaly/outlier detection.
One possible solution is to use a Z-score transformation and treat every value with a z score above a certain threshold as an outlier. Because there is no clear definition of an outlier it won't be able to detect such peaks without setting any parameters (threshold).
One possible solution could be:
import numpy as np
def detect_outliers(data):
outliers = []
d_mean = np.mean(data)
d_std = np.std(data)
threshold = 3 # this defines what you would consider a peak (outlier)
for point in data:
z_score = (point - d_mean)/d_std
if np.abs(z_score) > threshold:
outliers.append(point)
return outliers
# create normal data
data = np.random.normal(size=100)
# create outliers
outliers = np.random.normal(100, size=3)
# combine normal data and outliers
full_data = data.tolist() + outliers.tolist()
# print outliers
print(detect_outliers(full_data))
If you only want to detect peaks, remove the np.abs function call from the code.
This code snippet is based on a Medium Post, which also provides another way of detecting outliers.
Correct me if I'm wrong: the "thresholds" returned by scikit-learn's roc_curve should be an array of numbers that are in [0,1]. However, it sometimes gives me an array with the first number close to "2". Is it a bug or I did sth wrong? Thanks.
In [1]: import numpy as np
In [2]: from sklearn.metrics import roc_curve
In [3]: np.random.seed(11)
In [4]: aa = np.random.choice([True, False],100)
In [5]: bb = np.random.uniform(0,1,100)
In [6]: fpr,tpr,thresholds = roc_curve(aa,bb)
In [7]: thresholds
Out[7]:
array([ 1.97396826, 0.97396826, 0.9711752 , 0.95996265, 0.95744405,
0.94983331, 0.93290463, 0.93241372, 0.93214862, 0.93076592,
0.92960511, 0.92245024, 0.91179548, 0.91112166, 0.87529458,
0.84493853, 0.84068543, 0.83303741, 0.82565223, 0.81096657,
0.80656679, 0.79387241, 0.77054807, 0.76763223, 0.7644911 ,
0.75964947, 0.73995152, 0.73825262, 0.73466772, 0.73421299,
0.73282534, 0.72391126, 0.71296292, 0.70930102, 0.70116428,
0.69606617, 0.65869235, 0.65670881, 0.65261474, 0.6487222 ,
0.64805644, 0.64221486, 0.62699782, 0.62522484, 0.62283401,
0.61601839, 0.611632 , 0.59548669, 0.57555854, 0.56828967,
0.55652111, 0.55063947, 0.53885029, 0.53369398, 0.52157349,
0.51900774, 0.50547317, 0.49749635, 0.493913 , 0.46154029,
0.45275916, 0.44777116, 0.43822067, 0.43795921, 0.43624093,
0.42039077, 0.41866343, 0.41550367, 0.40032843, 0.36761763,
0.36642721, 0.36567017, 0.36148354, 0.35843793, 0.34371331,
0.33436415, 0.33408289, 0.33387442, 0.31887024, 0.31818719,
0.31367915, 0.30216469, 0.30097917, 0.29995201, 0.28604467,
0.26930354, 0.2383461 , 0.22803687, 0.21800338, 0.19301808,
0.16902881, 0.1688173 , 0.14491946, 0.13648451, 0.12704826,
0.09141459, 0.08569481, 0.07500199, 0.06288762, 0.02073298,
0.01934336])
Most of the time these thresholds are not used, for example in calculating the area under the curve, or plotting the False Positive Rate against the True Positive Rate.
Yet to plot what looks like a reasonable curve, one needs to have a threshold that incorporates 0 data points. Since Scikit-Learn's ROC curve function need not have normalised probabilities for thresholds (any score is fine), setting this point's threshold to 1 isn't sufficient; setting it to inf is sensible but coders often expect finite data (and it's possible the implementation also works for integer thresholds). Instead the implementation uses max(score) + epsilon where epsilon = 1. This may be cosmetically deficient, but you haven't given any reason why it's a problem!
From the documentation:
thresholds : array, shape = [n_thresholds]
Decreasing thresholds on the decision function used to compute
fpr and tpr. thresholds[0] represents no instances being predicted
and is arbitrarily set to max(y_score) + 1.
So the first element of thresholds is close to 2 because it is max(y_score) + 1, in your case thresholds[1] + 1.
this seems like a bug to me - in roc_curve(aa,bb), 1 is added to the first threshold. You should create an issue here https://github.com/scikit-learn/scikit-learn/issues
I am doing some classification task with Support Vector Machines (SVM).
I am using libSVM (with Matlab support) to predict probability estimates matrix. However, the libSVM displays message that;
Model does not support probabiliy estimates
Below is my sample code;
(train_label contains labels for training data and test_label contains label for test data)
model = svmtrain(train_label, train_data, '-t 2 -g .01 -c 0.7 -b 1);
[y,accuracy,prob_estimates]=svmpredict(test_label,test_data,model,'-b 1');
Can someone tell me if there is something wrong with the way I am doing it? Any help/suggestion will be appreciated.
Don't know about the Matlab implementation, but usually you have to set this option:
-b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
I am using libsvm in the same way without any problem.
In your code only a ' is missing in the following line
model = svmtrain(train_label, train_data, '-t 2 -g .01 -c 0.7 -b 1);
It should be
model = svmtrain(train_label, train_data, '-t 2 -g .01 -c 0.7 -b 1');
I had the same problem, model hasn't got ProbA and ProbB in it.
Before it was like this and giving error:
linear_model = svmtrain(trainClass, trainData, ['-t 0', cmd]);
Then I changed it to this, error dissappared:) - removed cmd and put exact values
linear_model = svmtrain(trainClass, trainData, ['-t 0 -c 1 -g 0.125 -b 1']);
if still gives error try to change c and g parameters.
Hope this helps.
It is because your model does not support probabiliy estimates.
You should use '-b 1' option both at training and testing process.
See also: https://stackoverflow.com/a/43509667/7893127
You may just train the model with default parameter.
Try to use '-b 1' when you are training and testing programe.
C:\setup\python36\Lib\site-packages\svm.py default value of self.probability is 0. You can set it 1.
I am working on an NLP classification project using Naive Bayes classifier in Weka. I intend to use semi-supervised machine learning, hence working with unlabeled data. When I test the model obtained from my labeled training data on an independent set of unlabeled test data, Weka ignores all the unlabeled instances. Can anybody please guide me how to solve this? Someone has already asked this question here before but there wasn't any appropriate solution provided. Here is a sample test file:
#relation referents
#attribute feature1 NUMERIC
#attribute feature2 NUMERIC
#attribute feature3 NUMERIC
#attribute feature4 NUMERIC
#attribute class{1 -1}
#data
1, 7, 1, 0, ?
1, 5, 1, 0, ?
-1, 1, 1, 0, ?
1, 1, 1, 1, ?
-1, 1, 1, 1, ?
The problem is that when you specify a training set -t train.arff and a test set test.arff, the mode of operation is to calculate the performance of the model based on the test set. But you can't calculate a performance of any kind without knowing the actual class. Without the actual class, how will you know if your prediction if right or wrong?
I used the data you gave as train.arff and as test.arff with arbitrary class labels assigned by me. The relevant output lines are:
=== Error on training data ===
Correctly Classified Instances 4 80 %
Incorrectly Classified Instances 1 20 %
Kappa statistic 0.6154
Mean absolute error 0.2429
Root mean squared error 0.4016
Relative absolute error 50.0043 %
Root relative squared error 81.8358 %
Total Number of Instances 5
=== Confusion Matrix ===
a b <-- classified as
2 1 | a = 1
0 2 | b = -1
and
=== Error on test data ===
Total Number of Instances 0
Ignored Class Unknown Instances 5
=== Confusion Matrix ===
a b <-- classified as
0 0 | a = 1
0 0 | b = -1
Weka can give you those statistics for the training set, because it knows the actual class labels and the predicted ones (applying the model on the training set). For the test set, it can't get any information about the performance, because it doesn't know about the true class labels.
What you might want to do is:
java -cp weka.jar weka.classifiers.bayes.NaiveBayes -t train.arff -T test.arff -p 1-4
which in my case would give you:
=== Predictions on test data ===
inst# actual predicted error prediction (feature1,feature2,feature3,feature4)
1 1:? 1:1 1 (1,7,1,0)
2 1:? 1:1 1 (1,5,1,0)
3 1:? 2:-1 0.786 (-1,1,1,0)
4 1:? 2:-1 0.861 (1,1,1,1)
5 1:? 2:-1 0.861 (-1,1,1,1)
So, you can get the predictions, but you can't get a performance, because you have unlabeled test data.