How to convert scaled data into original data in SVM - svm

I am new to SVM. I am using Support Vector Regression for forecasting in LibSVM. For LibSVM, I firstly scaled [0,1] my training and testing set in the same range and then select the optimal parameters. After I run svm-train and svm-predict, I got the predicted values for testing set in a scaled format.How do i get the original values back for output? Which conversion I need to use.
svm-scale -l 0 -u 1 -s range uni.train> unitrain.scale
svm-scale -r range uni.test > unitest.scale
svm-train -c 32768 -g 0.03125 unitrain.scale
svm-predict unitest.scale unitrain.scale.model uni.txt
Accuracy = 87.5% (266/304) (classification)
svm-train -s 3 -c 32768 -g 0.03125 unitrain.scale
svm-predict unitest.scale unitrain.scale.model uni.txt
Mean squared error = 0.558619 (regression)
Squared correlation coefficient = 0.514304 (regression)
After this, the output is uni.txt i.e. 0.619153 -1.10326 -0.94956 ........ -1.00426 -0.311161 -0.725936
How do i get the original values for output in uni.txt, these are in scaled format?

Related

How to calculate effect size usisng statsmodels.stats.power

I ran this code in python
power_analysis = smp.TTestIndPower()
power_analysis.solve_power(
nobs1=143293,
power=0.8,
alpha=0.05,
ratio=1.0*143398/143293,
alternative='two-sided'
)
The output is = 0.01046481401112572
Is the output effect size? Can the effect size be calculated even without the standard deviation and mean of sample 1, 2?

detect highest peaks automatically from noisy data python

Is there any way to detect the highest peaks using a python library without setting any parameter?. I'm developing a user interface and I want the algorithm to be able to detect highest peaks automatically...
I want it to be able to detect these peaks in picture below:
graph here
Data looks like this:
8.60291e-07
-1.5491e-06
5.64568e-07
-9.51195e-07
1.07203e-06
4.6521e-07
6.43967e-07
-9.86092e-07
-9.82323e-07
6.38977e-07
-1.93884e-06
-2.98309e-08
1.33543e-06
1.05064e-06
1.17332e-06
-1.53549e-07
-8.9357e-07
1.59176e-06
-2.17331e-06
1.46756e-06
5.63301e-07
-8.77556e-07
7.47681e-09
-8.30101e-07
-3.6647e-07
5.27046e-07
-1.94983e-06
1.89018e-07
1.22533e-06
8.00735e-07
-8.51166e-07
1.13437e-06
-2.75787e-07
1.79601e-06
-1.67875e-06
1.13529e-06
-1.29865e-06
9.9688e-07
-9.34486e-07
8.89931e-07
-3.88634e-07
1.15124e-06
-4.23569e-07
-1.8029e-07
1.20537e-07
4.10736e-07
-9.99077e-07
-3.62984e-07
2.97916e-06
-1.95828e-06
-1.07398e-06
2.422e-06
-6.33202e-07
-1.36953e-06
1.6694e-06
-4.71764e-07
3.98849e-07
-1.0071e-06
-9.72984e-07
8.13553e-07
2.64193e-06
-3.12365e-06
1.34049e-06
-1.30419e-06
1.48369e-07
1.26033e-06
-2.59872e-07
4.28284e-07
-6.44356e-07
2.99934e-07
8.34335e-07
3.53226e-07
-7.08252e-07
4.1243e-07
2.41525e-06
-8.92159e-07
8.82339e-08
4.31945e-06
3.75152e-06
1.091e-06
3.8204e-06
-1.21356e-06
3.35564e-06
-1.06234e-06
-5.99808e-07
2.18155e-06
5.90652e-07
-1.36728e-06
-4.97017e-07
-7.77283e-08
8.68263e-07
4.37645e-07
-1.26514e-06
2.26413e-06
-8.52966e-07
-7.35596e-07
4.11911e-07
1.7585e-06
-inf
1.10779e-08
-1.49507e-06
9.87305e-07
-3.85296e-06
4.31265e-06
-9.89227e-07
-1.33537e-06
4.1713e-07
1.89362e-07
3.21968e-07
6.80237e-08
2.31636e-07
-2.98523e-07
7.99133e-07
7.36305e-07
6.39862e-07
-1.11932e-06
-1.57262e-06
1.86305e-06
-3.63716e-07
3.83865e-07
-5.23293e-07
1.31812e-06
-1.23608e-06
2.54684e-06
-3.99796e-06
2.90441e-06
-5.20203e-07
1.36295e-06
-1.89317e-06
1.22366e-06
-1.10373e-06
2.71276e-06
9.48181e-07
7.70881e-06
5.17066e-06
6.21254e-06
1.3513e-05
1.47878e-05
8.78543e-06
1.61819e-05
1.68438e-05
1.16082e-05
5.74059e-06
4.92458e-06
1.11884e-06
-1.07419e-06
-1.28517e-06
-2.70949e-06
1.65662e-06
1.42964e-06
3.40604e-06
-5.82825e-07
1.98288e-06
1.42819e-06
1.65517e-06
4.42749e-07
-1.95609e-06
-2.1756e-07
1.69164e-06
8.7204e-08
-5.35324e-07
7.43546e-07
-1.08687e-06
2.07289e-06
2.18529e-06
-2.8161e-06
1.88821e-06
4.07272e-07
1.063e-06
8.47244e-07
1.53879e-06
-9.0799e-07
-1.26709e-07
2.40044e-06
-9.48166e-07
1.41788e-06
3.67615e-07
-1.29199e-06
3.868e-06
9.54654e-06
2.51951e-05
2.2769e-05
7.21716e-06
1.36545e-06
-1.32681e-06
-3.09641e-06
4.90417e-07
2.99335e-06
1.578e-06
6.0025e-07
2.90656e-06
-2.08258e-06
-1.54214e-06
2.19757e-07
3.74982e-06
-1.76944e-06
2.15018e-06
-1.01935e-06
4.37469e-07
1.39078e-06
6.39587e-07
-1.7807e-06
-6.16455e-09
1.61557e-06
1.59644e-06
-2.35217e-06
5.29449e-07
1.9169e-06
-7.54822e-07
2.00342e-06
-3.28452e-06
3.91663e-06
1.66016e-08
-2.65897e-06
-1.4064e-06
4.67987e-07
1.67786e-06
4.69543e-07
-8.90106e-07
-1.4584e-06
1.37915e-06
1.98483e-06
-2.3735e-06
4.45618e-07
1.91504e-06
1.09653e-06
-8.00873e-07
1.32321e-06
2.04846e-06
-1.50656e-06
7.23816e-07
2.06049e-06
-2.43918e-06
1.64417e-06
2.65411e-07
-2.66107e-06
-8.01788e-07
2.05121e-06
-1.74988e-06
1.83594e-06
-8.14026e-07
-2.69342e-06
1.81152e-06
1.11664e-07
-4.21863e-06
-7.20551e-06
-5.92407e-07
-1.44629e-06
-2.08136e-06
2.86105e-06
3.77911e-06
-1.91898e-06
1.41742e-06
2.67914e-07
-8.55835e-07
-9.8584e-07
-2.74115e-06
3.39044e-06
1.39639e-06
-2.4964e-06
8.2486e-07
2.02432e-06
1.65793e-06
-1.43094e-06
-3.36807e-06
-8.96515e-07
5.31323e-06
-8.27209e-07
-1.39221e-06
-3.3754e-06
2.12372e-06
3.08218e-06
-1.42947e-06
-2.36777e-06
3.86218e-06
2.29327e-06
-3.3941e-06
-1.67291e-06
2.63828e-06
2.21008e-07
7.07794e-07
1.8172e-06
-2.00082e-06
1.80664e-06
6.69739e-07
-3.95395e-06
1.92148e-06
-1.07187e-06
-4.04938e-07
-1.76553e-06
2.7099e-06
1.30768e-06
1.41812e-06
-1.55518e-07
-3.78302e-06
4.00137e-06
-8.38623e-07
4.54651e-07
1.00027e-06
1.32196e-06
-2.62717e-06
1.67865e-06
-6.99249e-07
2.8837e-06
-1.00516e-06
-3.68011e-06
1.61847e-06
1.90887e-06
1.59641e-06
4.16779e-07
-1.35245e-06
1.65717e-06
-2.92667e-06
3.6203e-07
2.53528e-06
-2.0578e-07
-3.41919e-07
-1.42154e-06
-2.33322e-06
3.07175e-06
-2.69165e-08
-8.21045e-07
2.3175e-06
-7.22992e-07
1.49069e-06
8.75488e-07
-2.02676e-06
-2.81158e-07
3.6004e-06
-3.94708e-06
4.72983e-06
-1.38873e-06
-6.92139e-08
-1.4678e-06
1.04251e-06
-2.06625e-06
3.10406e-06
-8.13873e-07
7.23694e-07
-9.78912e-07
-8.65967e-07
7.37335e-07
1.52563e-06
-2.33591e-06
1.78265e-06
9.58435e-07
-5.22064e-07
-2.29736e-07
-4.26996e-06
-6.61411e-06
1.14789e-06
-4.32697e-06
-5.32779e-06
2.12241e-06
-1.40726e-06
1.76086e-07
-3.77194e-06
-2.71326e-06
-9.49402e-08
1.70807e-07
-2.495e-06
4.22324e-06
-3.62476e-06
-9.56055e-07
7.16583e-07
3.01447e-06
-1.41229e-06
-1.67694e-06
7.61627e-07
3.55881e-06
2.31015e-06
-9.50378e-07
4.45251e-08
-1.94791e-06
2.27081e-06
-3.34717e-06
3.05688e-06
4.57062e-07
3.87326e-06
-2.39215e-06
-3.52682e-06
-2.05212e-06
5.26495e-06
-3.28613e-07
-5.76569e-07
-7.46338e-07
5.98795e-06
8.80493e-07
-4.82965e-06
2.56839e-06
-1.58792e-06
-2.2294e-06
1.83841e-06
2.65482e-06
-3.10474e-06
-3.46741e-07
2.45557e-06
2.01328e-06
-3.92606e-06
inf
-8.11737e-07
5.72174e-07
1.57245e-06
8.02612e-09
-2.901e-06
1.22079e-06
-6.31714e-07
3.06241e-06
1.20059e-06
-1.80344e-06
4.90784e-07
3.74243e-06
-2.94342e-07
-3.45764e-08
-3.42099e-06
-1.43695e-06
5.91064e-07
3.47308e-06
3.78232e-06
4.01093e-07
-1.58435e-06
-3.47375e-06
1.34943e-06
1.11768e-06
1.95212e-06
-8.28033e-07
1.53705e-06
6.38031e-07
-1.84702e-06
1.34689e-06
-6.98669e-07
1.81653e-06
-2.42355e-06
-1.35257e-06
3.04367e-06
-1.21976e-06
1.61896e-06
-2.69528e-06
1.84601e-06
6.45447e-08
-4.94263e-07
3.47568e-06
-2.00531e-06
3.56693e-06
-3.19446e-06
2.72141e-06
-1.39059e-06
2.20032e-06
-1.76819e-06
2.32727e-07
-3.47382e-07
2.11823e-07
-5.22614e-07
2.69846e-06
-1.47983e-06
2.14554e-06
-6.27594e-07
-8.8501e-10
7.89124e-07
-2.8653e-07
8.30902e-07
-2.12857e-06
-1.90887e-07
1.07593e-06
1.40781e-06
2.41641e-06
-4.52689e-06
2.37207e-06
-2.19479e-06
1.65131e-06
1.2706e-06
-2.18387e-06
-1.72821e-07
5.41687e-07
7.2879e-07
7.56927e-07
1.57739e-06
-3.79395e-07
-1.02887e-06
-1.20987e-06
1.43066e-06
8.96301e-08
5.09766e-07
-2.8812e-06
-2.35944e-06
2.25912e-06
-2.78967e-06
-4.69913e-06
1.60822e-06
6.9342e-07
4.6225e-07
-1.33276e-06
-3.59033e-06
1.11206e-06
1.83521e-06
2.39163e-06
2.3468e-08
5.91431e-07
-8.80249e-07
-2.77405e-08
-1.13184e-06
-1.28036e-06
1.66229e-06
2.81784e-06
-2.97589e-06
8.73413e-08
1.06439e-06
2.39075e-06
-2.76974e-06
1.20862e-06
-5.12817e-07
-5.19104e-07
4.51324e-07
-4.7168e-07
2.35608e-06
5.46906e-07
-1.66748e-06
5.85236e-07
6.42944e-07
2.43164e-07
4.01031e-07
-1.93646e-06
2.07416e-06
-1.16116e-06
4.27155e-07
5.2951e-07
9.09149e-07
-8.71887e-08
-1.5564e-09
1.07266e-06
-9.49402e-08
2.04016e-06
-6.38123e-07
-1.94241e-06
-5.17294e-06
-2.18622e-06
-8.26703e-06
2.54364e-06
4.32614e-06
8.3847e-07
-2.85309e-06
2.72345e-06
-3.42752e-06
-1.36871e-07
2.23346e-06
5.26825e-07
1.3566e-06
-2.17111e-06
2.1463e-07
2.06479e-06
1.76929e-06
-1.2655e-06
-1.3797e-06
3.10706e-06
-4.72189e-06
4.38138e-06
6.41815e-07
-3.25623e-08
-4.93707e-06
5.05743e-06
5.17578e-07
-5.30524e-06
3.62463e-06
5.68909e-07
1.16226e-06
1.10843e-06
-5.00854e-07
9.48761e-07
-2.18701e-06
-3.57635e-07
4.26709e-06
-1.50836e-06
-5.84412e-06
3.5054e-06
3.94019e-06
-4.7623e-06
2.05856e-06
-2.22992e-07
1.64969e-06
2.64694e-06
-8.49487e-07
-3.63562e-06
1.0386e-06
1.69461e-06
-2.05798e-06
3.60349e-06
3.42651e-07
-1.46686e-06
1.19949e-06
-1.60519e-06
2.37793e-07
6.12366e-07
-1.54669e-06
1.43668e-06
1.87009e-06
-2.22626e-06
2.15155e-06
-3.10571e-06
2.05188e-06
-4.40002e-07
2.06683e-06
-1.11362e-06
5.96924e-07
-2.64471e-06
2.4892e-06
1.13083e-06
-3.23181e-07
5.10651e-07
2.73499e-07
-1.24899e-06
1.40564e-06
-9.3158e-07
1.45947e-06
3.70544e-07
-1.62628e-06
-1.70215e-06
1.72098e-06
8.19031e-07
-5.57709e-07
1.10107e-06
-2.81845e-06
1.57654e-07
3.30716e-06
-9.75403e-07
1.73126e-07
1.30447e-06
7.64771e-08
-6.65344e-07
-1.4346e-06
5.03171e-06
-2.84576e-06
2.3212e-06
-2.73373e-06
2.16675e-08
2.24026e-06
-4.11682e-08
-3.36642e-06
1.78775e-06
1.28174e-08
-9.32068e-07
2.97177e-06
-1.05338e-06
9.42505e-07
2.02362e-07
-1.81326e-06
2.16995e-06
2.83722e-07
-1.2648e-06
9.21814e-07
-8.9447e-07
-1.61597e-06
3.5036e-06
-6.79626e-08
1.52823e-06
-2.98682e-06
5.57404e-07
9.5166e-07
7.10419e-07
-1.28528e-06
-3.76038e-07
-1.03845e-06
2.96631e-06
-1.18356e-06
-2.77313e-07
3.24149e-06
-1.85455e-06
-1.27747e-07
3.6264e-07
4.66431e-07
-1.54443e-06
1.38437e-06
-1.53119e-06
7.4231e-07
-1.2388e-06
1.99774e-06
1.15799e-06
1.39478e-06
-2.93527e-06
-2.03012e-06
2.46667e-06
2.16751e-06
-2.50354e-06
3.95905e-07
5.74371e-07
1.33575e-07
-3.98315e-07
4.93927e-07
-5.23987e-07
-1.74713e-07
6.49384e-07
-7.16766e-07
2.35733e-06
-4.91333e-08
-1.88138e-06
1.74722e-06
4.03503e-07
3.5965e-07
1.44836e-07]
The task you are describing could be treated like anomaly/outlier detection.
One possible solution is to use a Z-score transformation and treat every value with a z score above a certain threshold as an outlier. Because there is no clear definition of an outlier it won't be able to detect such peaks without setting any parameters (threshold).
One possible solution could be:
import numpy as np
def detect_outliers(data):
outliers = []
d_mean = np.mean(data)
d_std = np.std(data)
threshold = 3 # this defines what you would consider a peak (outlier)
for point in data:
z_score = (point - d_mean)/d_std
if np.abs(z_score) > threshold:
outliers.append(point)
return outliers
# create normal data
data = np.random.normal(size=100)
# create outliers
outliers = np.random.normal(100, size=3)
# combine normal data and outliers
full_data = data.tolist() + outliers.tolist()
# print outliers
print(detect_outliers(full_data))
If you only want to detect peaks, remove the np.abs function call from the code.
This code snippet is based on a Medium Post, which also provides another way of detecting outliers.

scikit-learn roc_curve: why does it return a threshold value = 2 some time?

Correct me if I'm wrong: the "thresholds" returned by scikit-learn's roc_curve should be an array of numbers that are in [0,1]. However, it sometimes gives me an array with the first number close to "2". Is it a bug or I did sth wrong? Thanks.
In [1]: import numpy as np
In [2]: from sklearn.metrics import roc_curve
In [3]: np.random.seed(11)
In [4]: aa = np.random.choice([True, False],100)
In [5]: bb = np.random.uniform(0,1,100)
In [6]: fpr,tpr,thresholds = roc_curve(aa,bb)
In [7]: thresholds
Out[7]:
array([ 1.97396826, 0.97396826, 0.9711752 , 0.95996265, 0.95744405,
0.94983331, 0.93290463, 0.93241372, 0.93214862, 0.93076592,
0.92960511, 0.92245024, 0.91179548, 0.91112166, 0.87529458,
0.84493853, 0.84068543, 0.83303741, 0.82565223, 0.81096657,
0.80656679, 0.79387241, 0.77054807, 0.76763223, 0.7644911 ,
0.75964947, 0.73995152, 0.73825262, 0.73466772, 0.73421299,
0.73282534, 0.72391126, 0.71296292, 0.70930102, 0.70116428,
0.69606617, 0.65869235, 0.65670881, 0.65261474, 0.6487222 ,
0.64805644, 0.64221486, 0.62699782, 0.62522484, 0.62283401,
0.61601839, 0.611632 , 0.59548669, 0.57555854, 0.56828967,
0.55652111, 0.55063947, 0.53885029, 0.53369398, 0.52157349,
0.51900774, 0.50547317, 0.49749635, 0.493913 , 0.46154029,
0.45275916, 0.44777116, 0.43822067, 0.43795921, 0.43624093,
0.42039077, 0.41866343, 0.41550367, 0.40032843, 0.36761763,
0.36642721, 0.36567017, 0.36148354, 0.35843793, 0.34371331,
0.33436415, 0.33408289, 0.33387442, 0.31887024, 0.31818719,
0.31367915, 0.30216469, 0.30097917, 0.29995201, 0.28604467,
0.26930354, 0.2383461 , 0.22803687, 0.21800338, 0.19301808,
0.16902881, 0.1688173 , 0.14491946, 0.13648451, 0.12704826,
0.09141459, 0.08569481, 0.07500199, 0.06288762, 0.02073298,
0.01934336])
Most of the time these thresholds are not used, for example in calculating the area under the curve, or plotting the False Positive Rate against the True Positive Rate.
Yet to plot what looks like a reasonable curve, one needs to have a threshold that incorporates 0 data points. Since Scikit-Learn's ROC curve function need not have normalised probabilities for thresholds (any score is fine), setting this point's threshold to 1 isn't sufficient; setting it to inf is sensible but coders often expect finite data (and it's possible the implementation also works for integer thresholds). Instead the implementation uses max(score) + epsilon where epsilon = 1. This may be cosmetically deficient, but you haven't given any reason why it's a problem!
From the documentation:
thresholds : array, shape = [n_thresholds]
Decreasing thresholds on the decision function used to compute
fpr and tpr. thresholds[0] represents no instances being predicted
and is arbitrarily set to max(y_score) + 1.
So the first element of thresholds is close to 2 because it is max(y_score) + 1, in your case thresholds[1] + 1.
this seems like a bug to me - in roc_curve(aa,bb), 1 is added to the first threshold. You should create an issue here https://github.com/scikit-learn/scikit-learn/issues

SVM Model does not support probability estimation?

I am doing some classification task with Support Vector Machines (SVM).
I am using libSVM (with Matlab support) to predict probability estimates matrix. However, the libSVM displays message that;
Model does not support probabiliy estimates
Below is my sample code;
(train_label contains labels for training data and test_label contains label for test data)
model = svmtrain(train_label, train_data, '-t 2 -g .01 -c 0.7 -b 1);
[y,accuracy,prob_estimates]=svmpredict(test_label,test_data,model,'-b 1');
Can someone tell me if there is something wrong with the way I am doing it? Any help/suggestion will be appreciated.
Don't know about the Matlab implementation, but usually you have to set this option:
-b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)
I am using libsvm in the same way without any problem.
In your code only a ' is missing in the following line
model = svmtrain(train_label, train_data, '-t 2 -g .01 -c 0.7 -b 1);
It should be
model = svmtrain(train_label, train_data, '-t 2 -g .01 -c 0.7 -b 1');
I had the same problem, model hasn't got ProbA and ProbB in it.
Before it was like this and giving error:
linear_model = svmtrain(trainClass, trainData, ['-t 0', cmd]);
Then I changed it to this, error dissappared:) - removed cmd and put exact values
linear_model = svmtrain(trainClass, trainData, ['-t 0 -c 1 -g 0.125 -b 1']);
if still gives error try to change c and g parameters.
Hope this helps.
It is because your model does not support probabiliy estimates.
You should use '-b 1' option both at training and testing process.
See also: https://stackoverflow.com/a/43509667/7893127
You may just train the model with default parameter.
Try to use '-b 1' when you are training and testing programe.
C:\setup\python36\Lib\site-packages\svm.py default value of self.probability is 0. You can set it 1.

Weka ignoring unlabeled data

I am working on an NLP classification project using Naive Bayes classifier in Weka. I intend to use semi-supervised machine learning, hence working with unlabeled data. When I test the model obtained from my labeled training data on an independent set of unlabeled test data, Weka ignores all the unlabeled instances. Can anybody please guide me how to solve this? Someone has already asked this question here before but there wasn't any appropriate solution provided. Here is a sample test file:
#relation referents
#attribute feature1 NUMERIC
#attribute feature2 NUMERIC
#attribute feature3 NUMERIC
#attribute feature4 NUMERIC
#attribute class{1 -1}
#data
1, 7, 1, 0, ?
1, 5, 1, 0, ?
-1, 1, 1, 0, ?
1, 1, 1, 1, ?
-1, 1, 1, 1, ?
The problem is that when you specify a training set -t train.arff and a test set test.arff, the mode of operation is to calculate the performance of the model based on the test set. But you can't calculate a performance of any kind without knowing the actual class. Without the actual class, how will you know if your prediction if right or wrong?
I used the data you gave as train.arff and as test.arff with arbitrary class labels assigned by me. The relevant output lines are:
=== Error on training data ===
Correctly Classified Instances 4 80 %
Incorrectly Classified Instances 1 20 %
Kappa statistic 0.6154
Mean absolute error 0.2429
Root mean squared error 0.4016
Relative absolute error 50.0043 %
Root relative squared error 81.8358 %
Total Number of Instances 5
=== Confusion Matrix ===
a b <-- classified as
2 1 | a = 1
0 2 | b = -1
and
=== Error on test data ===
Total Number of Instances 0
Ignored Class Unknown Instances 5
=== Confusion Matrix ===
a b <-- classified as
0 0 | a = 1
0 0 | b = -1
Weka can give you those statistics for the training set, because it knows the actual class labels and the predicted ones (applying the model on the training set). For the test set, it can't get any information about the performance, because it doesn't know about the true class labels.
What you might want to do is:
java -cp weka.jar weka.classifiers.bayes.NaiveBayes -t train.arff -T test.arff -p 1-4
which in my case would give you:
=== Predictions on test data ===
inst# actual predicted error prediction (feature1,feature2,feature3,feature4)
1 1:? 1:1 1 (1,7,1,0)
2 1:? 1:1 1 (1,5,1,0)
3 1:? 2:-1 0.786 (-1,1,1,0)
4 1:? 2:-1 0.861 (1,1,1,1)
5 1:? 2:-1 0.861 (-1,1,1,1)
So, you can get the predictions, but you can't get a performance, because you have unlabeled test data.

Resources