OpenCV Training Custom Haar Cascade - python-3.x

I am trying to train a haar cascade on my face. I have everything setup including the positive, negetive, the vec file, etc. but when I run the opencv_traincascade, it gave me a terminate called after throwing an instance of 'std::bad_alloc' error. Then I added this line to my arguments -nonsym -mem 512 and it gave me this error: terminate called after throwing an instance of 'std::logic_error'.
Here is the command I am running:
opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt\
> -numStages 20 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 1000\
> -numNeg 600 -w 80 -h 40 -mode ALL -precalcValBufSize 1024\
> -precalcIdxBufSize 1024\
> -nonsym\
> -mem 512\
Any help would be greatly appreciated!

You have to get rid of the -nonsym -mem 512 and instead put -mode ALL. So new the command looks like this:
opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt\
> -numStages 20 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 1000\
> -numNeg 600 -w 80 -h 40 -mode ALL -precalcValBufSize 1024\
> -precalcIdxBufSize 1024\
> -mode ALL
The -nonsym -mem512 commands don't actually exist.

Related

How supress "jpegtopnm: WRITING PPM FILE" etc within output of jpegtopnm[Solved]

I want to see the sizes of images within a directory. For this purpose I do
$ for file in *.jpg; do jpegtopnm $file | pnmfile; done
Then I can see
jpegtopnm: WRITING PPM FILE
stdin: PPM raw, 960 by 1280 maxval 255
jpegtopnm: WRITING PPM FILE
stdin: PPM raw, 960 by 1280 maxval 255
jpegtopnm: WRITING PPM FILE
stdin: PPM raw, 1200 by 1600 maxval 255
and so on.
I would like to see
960 by 1280
960 by 1280
1200 by 1600
.............
How one can do this?
Answer
The command jpegtopnm is a part of netpbm - package of graphics manipulation programs and libraries:
$ apt-file -l find pnmfile
netbpm
Then we must read "man netbpm":
-quiet Suppress all informational messages
Thus we solved our problem:
$ for file in *.jpg; do jpegtopnm $file -quiet | pnmfile | cut -c 16-28; done
4000 by 3000
2592 by 1944
4000 by 3000
............
About "cut -c 16-28".
This is a filter that selects characters from 16 to 28 in a string
"stdin: PPM raw, 960 by 1280 maxval 255".
If you have at your directory images with different sizes such as 4000x5000, 300x400, 2x3, 40x67 etc it won't work properly. For that reason you have to use more complicated way. It is a "cut" filter by fields(-f). The field separator will be a space character(-d ' ').
$ for file in *.jpg; do jpegtopnm $file -quiet | pnmfile | cut -d ' ' -f 3-5; done
700 by 900
65 by 40
2 by 3
7000 by 9000
4000 by 3000
............

Best hyperparameters from GridSearch give inverted results compared to the actual SMOreg prediction results

I am a beginner in Machine Learning, and I have a question regarding the results of Grid Search to find the best hyperparameters of C and gamma for SMOreg.
I am doing Grid Search to find the best C and gamma that will give me the highest correlation. I am using GridSearch as shown below:
weka.classifiers.meta.GridSearch -E CC -y-property kernel.gamma -y-min -3.0 -y-max 3.0 -y-step 1.0 -y-base 10.0 -y-expression pow(BASE,I) -x-property C -x-min -3.0 -x-max 3.0 -x-step 1.0 -x-base 10.0 -x-expression pow(BASE,I) -sample-size 100.0 -traversal ROW-WISE -log-file "D:\Program Files\Weka-3-9-4" -num-slots 1 -S 1 -W weka.classifiers.functions.SMOreg -- -C 1.0 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G 0.01"
As shown above, I set the evaluation into CC (Correlation Coefficient), the XProperty into C and YProperty into kernel.gamma, with both the C and the gamma parameters ranging from 10^-3,10^-2,10^-1,1,10,100, and 1000. It should be noted that I use SMOreg as the base classifier and RBFKernel as the kernel function. Furthermore, I set the evaluation metrics into Correlation, MAE, RMSE, and MAPE.
The hyperparameter output of the GridSearch is shown below:
weka.classifiers.meta.GridSearch:
Classifier: weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G 10.0"
X property: C
Y property: kernel.gamma
Evaluation: Correlation coefficient
Coordinates: [0.0, 1.0]
Values: 1.0 (X coordinate), 10.0 (Y coordinate)
The meaning of this result is that the best value for C is 1 and the best gamma is 10
At the end of the GridSearch result, the following summary data is shown:
=== Summary ===
Correlation coefficient 0.945
Mean absolute percentage error 0.1434
Mean absolute error 3.8465
Root mean squared error 5.5417
Total Number of Instances 309
This implies that, by using the C=1 and gamma=10 respectively, the prediction result should be the one showed in the summary above.
However, when I am doing an SMOreg with the same dataset using the best parameters acquired from the GridSearch (C=1 and gamma=10), I get different results as shown below:
The classifier:
weka.classifiers.functions.SMOreg -C 1.0 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G 10.0"
The output:
=== Summary ===
Correlation coefficient 0.9417
Mean absolute percentage error 0.1381
Mean absolute error 3.7672
Root mean squared error 5.7114
Total Number of Instances 309
Funny thing is, if I invert the best hyperparameters result (C=10, gamma=1), I get the perfectly similar result to the GridSearch prediction result:
The classifier:
weka.classifiers.functions.SMOreg -C 10.0 -N 0 -I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V -P 1.0E-12 -L 0.001 -W 1" -K "weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G 1.0"
The output:
=== Summary ===
Correlation coefficient 0.945
Mean absolute percentage error 0.1434
Mean absolute error 3.8465
Root mean squared error 5.5417
Total Number of Instances 309
Does anyone know why this happens?
One more thing: In the GridSearch, the maximum search space for both C and gamma is 1000. However, I tried using C=100, and gamma=1 and it gave better/higher Correlation Coefficient number compared to the best hyperparameters according to the GridSearch which is C=1 and gamma=10. Why the GridSearch didn't give the best results?
Thank you.

Weka command line attributes arguments

On the command line, I'm able to get this rolling with no problem:
java weka.Run weka.classifiers.timeseries.WekaForecaster -W
"weka.classifiers.functions.MultilayerPerceptron -L 0.01 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H 20 " -t "C:\MyFile.arff" -F DirectionNumeric -L 1 -M 3 -prime 3 -horizon 6 -holdout 100 -G TradeDay -dayofweek -weekend -future
But once I try to put the skip list, I start to get errors saying that it's missing a date that is not in the skip list even though the date is in fact on it:
java weka.Run weka.classifiers.timeseries.WekaForecaster -W "weka.classifiers.functions.MultilayerPerceptron -L 0.01 -M 0.2 -N 500 -V 0 -S 0 -E 20 -H 20 " -t "C:\MyFile.arff" -F DirectionNumeric -L 1 -M 3 -prime 3 -horizon 6 -holdout 100 -G TradeDay -dayofweek -weekend -future -skip ""2014-06-07#yyyy-MM-dd, 2014-06-12"
Does anybody knows how to get this working? Weka is low on documentation as far as I know.
Thank's in advance!
Forget it. I got it, the problem was the 's' must be in capital letters:
-Skip
instead of
-skip.

Weka, SVM technique output

I am working with SVM at Weka
I have some data and I try SVM (I tried different values of C) technique to analize the data. But the output has totally confused me, that is why I hope some help.
This is the output for a polynomial kernel of degree 1:
Scheme:weka.classifiers.meta.CVParameterSelection -P "C 0.001 10.0 5.0" -X 10 -S 1 -W weka.classifiers.functions.SMO -- -C 0.7 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0"
Classifier Options: -C 7.5003 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0"
=== Summary ===
Correctly Classified Instances 83 79.0476 %
Incorrectly Classified Instances 22 20.9524 %
Kappa statistic 0.6555
Mean absolute error 0.0487
Root mean squared error 0.1549
Relative absolute error 91.5633 %
Root relative squared error 100.2828 %
Total Number of Instances 105
This is the output for a polynomial kernel of degree 2:
Scheme:weka.classifiers.meta.CVParameterSelection -P "C 0.001 10.0 5.0" -X 10 -S 1 -W weka.classifiers.functions.SMO -- -C 0.7 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 2.0"
Classifier Options: -C 2.5008 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 2.0"
=== Summary ===
Correctly Classified Instances 87 82.8571 %
Incorrectly Classified Instances 18 17.1429 %
Kappa statistic 0.7236
Mean absolute error 0.0486
Root mean squared error 0.1547
Relative absolute error 91.4748 %
Root relative squared error 100.1845 %
Total Number of Instances 105
This is the output for a gaussian kernel and gamma value of 1.0:
Scheme:weka.classifiers.meta.CVParameterSelection -P "C 0.001 10.0 5.0" -X 10 -S 1 -W weka.classifiers.functions.SMO -- -C 0.7 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G 1.0"
Classifier Options: -C 2.5008 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G 1.0"
=== Summary ===
Correctly Classified Instances 87 82.8571 %
Incorrectly Classified Instances 18 17.1429 %
Kappa statistic 0.721
Mean absolute error 0.0486
Root mean squared error 0.1547
Relative absolute error 91.4571 %
Root relative squared error 100.1651 %
Total Number of Instances 105
These are my questions:
Why the classifiers at “Classifier Options: -Cxxxxx” have different values of C? I think it is related to the values of C that I am checking with “C 0.001 10.0 5.0”, but I may be wrong and I may be getting wrong conclusions. I think that with “C 0.001 10.0 5.0” it tries the values 0.001, 0.01, 0.1, 1.0 and 10.0. If that is true why the values of C are: 7.5003 and 2.5008??
The percentage of the Correctly Classified Instances is very similar in all of the different runs, I don’t understand why… What does it mean? If it were 12% I would think SVM was not a proper technique (it wasn't linearly separable) but with 80% (no or little variations after changing the parameters) I don’t know what to think…
If I check the data in the “Visualize” of Weka the data doesn’t seem linearly separable but due to the conversion of a bigger number of dimensions when I use SVM I don’t think it is possible to get conclusions from the visualize.
1) Yes you use CVParameterSelection which trys different parameters in your case from 0.001 to 10 in 5 steps. The steps would be 0.001 + k * (10-0.001)/4) because your first step is allready defined as 0.001. If you round the following values (for k= 0/1/2/3/4) you see that they fit.
Step 1) 0.001
Step 2) 2.50075
Step 3) 5.005
Step 4) 7.50025
Step 5) 10.0
2) If you would have 12% accuracy there would be something really strange. Random classification (assuming you have balanced data) would result in 50%. I cannot tell you if 82% is a good result because i don't know about your data.
3) To your question about linar separability. Thats exactly why the SVM is so good. SVM with the right Kernel transforms your data into a higher featurespace to get rid of the non-separability. An RBF kernel can transform your data even into an infinite-dimensional feature space. Thats why a perfect linar separation is allways possible. At this point overfitting can occur. To avoid that and reach a good generalisation you have your complexity parameter C.
Here you can read more in a good stackexchange post about the last point
https://stats.stackexchange.com/questions/80398/how-can-svm-find-an-infinite-feature-space-where-linear-separation-is-always-p

Setting a cron job for python script

I want to run the following command for cron job
python test.py -sau 0 -bg 200000 -t mcs3245 > g2g.log
I have setup a cron job like below
5 0 * * * /local/mnt/workspace/username/scripts/python test.py -sau 0 -bg 200000 -t mcs3245 > g2g.log
am getting the following error
/bin/sh: /local/mnt/workspace/username/scripts/python: No such file or directory
Can anyonehelp on what is wrong and how to set this up?
Unless /local/mnt/workspace/username/scripts/ is the directory of your python installation I would suggest something like this:
5 0 * * * /usr/bin/python /path/to/script/test.py -sau 0 -bg 200000 -t mcs3245 > g2g.log
If you want to execute the script as user USERNAME:
5 0 * * * USERNAME /usr/bin/python /path/to/script/test.py -sau 0 -bg 200000 -t mcs3245 > g2g.log
Found that last one here on superuser.com.

Resources