in the face recognition training part, trainner.yml file is generated. is it useful to find the confuison matrix? - confusion-matrix

I was working on a face recognition code. I wanted to find the performance metrics (accuracy, precision, etc.) of my algorithm. I've used HaarCascade Algorithm and LBPH FaceRecogniser When I searched for it on the net, I could find only those sources where already existing datasets are taken and the parameters are computed. I want to use the data obtained from training my model (trained from the images folder). A folder named "trainner.yml" is generated automatically after running the file.
is the data from the trainner.yml file my dataset? What is my dataset now and how can I find the confusion matrix
Thanks

Related

Can I combine the results from a model.fit that are in a .h5 file

So I am using the keras module to create a facial recognition program but I have hit one problem, my computer can`t compute all the answers at once so I change the data to smaller amount and calculate each part until it is close to 100% accuracy. My data is constantly being trained with different data e.g. Happy face, Sad face and Confused face the code is then trained with this data then another set of data Angry face, Lonely face and Amazed face and the code is trained with this data. The two datasets are run at different times but both produce a h5 file with the data they have collected. How can I combine these two or more files into one singular file. I am guessing that the model may have to be retrained with the h5 files and then produce a singular 5h file but I do not know. Anyone know how to combine two trained models saved in h5 files?
The code below shows where I train and save the model before changing the data and rerunning the code.
model.fit(train_generator, steps_per_epoch=int(train / batch_size), epochs=epochs, callbacks=[checkpoint, lr_scheduler])

How to train CNN on LFW dataset?

I want to train a facial recognition CNN from scratch. I can write a Keras Sequential() model following popular architectures and copying their networks.
I wish to use the LFW dataset, however I am confused regarding the technical methodology. Do I have to crop each face to a tight-fitting box? That seems impractical, as the dataset has 13000+ faces.
Lastly, I know it's stupid, but all I have to do is preprocess the images (of course), then fit the model to these images? What's the exact procedure?
Your question is very open ended. Before preprocessing and fitting the model, you need to understand Object Detection. Once you understand what object detection you will get answer to your 1st question whether you are required to manually crop every 13000 image. The answer is no. However, you will have to draw bounding boxes around faces and assign label to images if they are not available in the training data.
Your second question is very vague . What do you mean by exact procedure? Is it the steps you need to do or how to do preprocessing and fitting of the model in python/or any other language? There are lots of references available on the internet about how to do preprocessing and model training for every specific problem. There are no universal steps which can be applied to any problem

Not getting proper result in Model Training while using Azure Machine Learning Studio with Two Class Bayes Point Machine Algorithm

We are using Azure Machine Learning Studio for building Trained Model and for that we have used Two Class Bayes Point Machine Algorithm.
For sample data , we have imported .CSV file that contains columns such as: Tweets and Label.
After deploying the web service, we got improper output.
We want our algorithm to predict the result of Label as 0 or 1 on the basis of different types tweets, that are already stored in the dataset.
While testing it with the tweets that are there in the dataset, it gives proper result, but the problem occurs while testing it with other tweets(that are not there in the dataset).
You can view our experiment over here:
Experiment
Are you planning to do a binary classification based on the textual data on tweets? If so you should try doing feature hashing before doing the classification.

Query on Machine Learning with Scikit-Learn data uploading

I have trying to develop machine learning based image classification system using Scikit-Learn. I am trying to do is multi class classification. the biggest problem i am facing with Scikit-Learn is how to load the data. Then I came across one of the examples face_recognition.py. which using fetch_lfw_people to fetch data from internet. I could see this example actually does multi class classification. I was trying to find some documentation on the example but was unable to find. I have some question here, what does fetch_lfw_people do ? what does this function load in the lfw_people. Also what i saw in the data folder there are some text file .is the code reading the text files/? My main intention is to load my set of image data but i am unable to do it with fetch_lfw_people in case i change the path that my image folder by data_home and funneled=False.I get erros, I hope i get some answers here
First thing first. You can't directly give images as an input to your classifier. You have to extract some features from you images. Or you can load your image using opencv and use the numpy array as an input to your classifier.
I would suggest you to read some basics of image classification , like how you can train your classifier and all.
Coming to your question about fetch_lfw_people function. It might be downloading already pre-processed image data from any text file. If you are training from your images you have to first convert your image data to some numerical features.

Timeline Detection

I am trying to do a timeline detection problem using text classification. As a newbie I am confused as to how I can go about with this. Is this a classification problem? i.e, Can I use the years(timelines) as outcomes and solve this as a classification problem?
You should be able to solve this as a classification problem as you suggest. An option could be to find or build a corpus consisting of texts tagged with the period in which they're set, and train a classification algorithm on this data set.
Another option could be to train a word space model on such a data set, and generate vectors for different periods of time (e.g. the 50s, 60s etc.). You could then create a document vector for the text you wish to classify, and find which of these time vectors yields the best match.
Might not work, but it could be interesting to see what results you get.
Hope this helps!

Resources