what should be the target in this deep learning image classification problem - python-3.x

I am doing a image classification project using CNN in keras. I have a dataset of about 900 photos of about 70 people .Each person has multiple photos of his different age.
My goal is to predict the correct ID of the person if any one of his photo is in the input.
Here is the glimpse of the data.
My questions are:
What should be my target column ?Is Target 'AGE' or 'ID'? 2-Do I
need to do hot-encoding of the target column? For example if I used
ID as my target,then do I have to do one-hot-encoding of ID column?
If I used ID as my target,then after one-hot-encoding, does it
mean,I will be having 70 classes?
I need information about the
output layer. My goal is to find whether the photo belong to the
same ID or not,so what should be the output layer? Shall I use
softmax with 70 outputs ?
Another question about the output layer
is that can I use a softmax with 70 outputs and then feed it to a
layer of sigmoid with single output ?

You are going to identify the same person using different age images. For example, in the dataset, you have 100 different images of khan and you trained a model. Now you provide the 101st image of khan, the model will detect it. So your target column should be ID.
yes, there are 70 classes and you get one hot encoded vector of 900x70
It should be a softmax layer because the sigmoid layer is used for binary class or multilabel problem. As you have to detect 70 different people from each other, you need a softmax class.
I don't think so, in this way your model would not be capable of telling which person image is this (the one provided as a test)

Related

Darknet Yolov3 - Custom training on pre-trained model

Actually in darknet yolov3 model has coco.names file for labels which include 80 classes.
Now if I want to train a custom model with two labels only, where one label is already there in coco.names and another is not there.
For example I want to train a model to detect for cell phone and dslr camera, so cell phone class already exist in coco.names whereas dslr camera is not there in its labels file.
So can I train custom model using two classes cell phone and dslr camera and give data of only dslr camera for training and it will predict for both dslr camera and cell phone or shall I train with both data of cell phone and dslr images or is there any other way out.
I am a bit new to ML, so any help would be great
Thanks
So you want to fine tune a pre-trained model.
You need to think of classes by just being a set of end nodes of a network, the labels (phone, camera) are just a naming convention for them, and to give us visual guidance.
These nodes are fully connected (with associated weights) to the previous layer of the network, the total number of these intermediate connections varies depending on the number of end nodes (classes) you have.
With the fully trained model, you can't just select the nodes you want, and take out the rest, and add a few more. Because the previous layer (and full network) was trained to give estimates/predictions taking into account a certain number of final nodes.
So basically you need to give a full reset on the last layer (the head), and restart it with the desired number of classes. The idea here, is that you take advantage of the previous training effort on a broader dataset, and fine tune it to your desired data.
Short answer, you need data for both, and need to change the model to accept 2 classes only.
To configure that specific model for the new number of classes and data, I believe you can find some guidance and instructions here

Creating input data for BERT modelling - multiclass text classification

I'm trying to build a keras model to classify text for 45 different classes. I'm a little confused about preparing my data for the input as required by google's BERT model.
Some blog posts insert data as a tf dataset with input_ids, segment ids, and mask ids, as in this guide, but then some only go with input_ids and masks, as in this guide.
Also in the second guide, it notes that the segment mask and attention mask inputs are optional.
Can anyone explain whether or not those two are required for a multiclass classification task?
If it helps, each row of my data can consist of any number of sentences within a reasonably sized paragraph. I want to be able to classify each paragraph/input to a single label.
I can't seem to find many guides/blogs about using BERT with Keras (Tensorflow 2) for a multiclass problem, indeed many of them are for multi-label problems.
I guess it is too late to answer but I had the same question. I went through huggingface code and found that if attention_mask and segment_type ids are None then by default it pays attention to all tokens and all the segments are given id 0.
If you want to check it out, you can find the code here
Let me know if this clarifies it or you think otherwise.

How to implement multi state LSTM RNN in keras

I have 1000 distinct users and the dataset consists activities of these users over the past 1 year. Total records are over 300K. The inputs for the LSTM RNN has the feature vectors corresponding to these users. The user is also included because behavior of each user may vary from person to person. The network should learn behavior of each user and should be able to predict the next behavior based on the past information of the same user.
How to maintain separate hidden states for each user within an LSTM RNN.
Following blog post is similar to my problem:
https://towardsdatascience.com/multi-state-lstms-for-categorical-features-66cc974df1dc
Update
My dataset looks like:
DATASET
I transformed my dataset into a 3D the numpy array and reshaped it as (No of records, timesteps, n_features).
The questions are:
1) Is it necessary to encode the "user" attribute?
2) what is the correct batch size for this problem? Is it batch = 1000 (no. of distinct users)?
3) Do I need to include each user's data in each batch input to the model?
OR
Please suggest the correct implementation of this problem.
This is just automatic. You don't need to do anything.
The LSTM layer will certainly have a state matrix the size of your batch of users. (Otherwise it wouldn't be useful)

How to use CNN model to detect object recognized by YOLO

Let start by saying that i have 2 pre-trained models (in hdf5 files):
The first model is a YOLO-based model, trained on dataset A, which is used to locate human in any images (note that: a trained images o this model may contain many people inside)
The second model is a CNN model which is used to detect gender of a person (male or female) based on the image which only contains 1 person.
Suppose that i only want to use these 2 models and do not want to re-train or modify anything on the dataset. How could i locate female person in a picture of Dataset A?
A possible solution that i think could work:
First use the first model to detect, that is to create bounding boxes around persons in the images.
Crop the bounding boxes into unique images. Feed those images to the second model to see if that person is Female/Male
However, this solution is slow in performance. So is there anyway that can festen this solution or perform this task in different ways?

How to extract specific features of same person using different image?

The aim of my project is extracting specific facial features on mobile phone. This a verification application using user's face. Given two different images of the same person, extracting the features as close as possible.
Right now, I use the pretrained model and weights of VGGFace team as a feature extractor, you can download the model in here. However, when I extracted features based on the model, the result was not good enough, I described what I did and what I want as below:
I extract features from Emma Watson' images, image_1 returns feature_1, image2 returns feature_2 and so on (vector length = 2048). If feature[i] > 0.0, convert it to 1.
for i in range(0, 2048):
if feature1[0][i] > 0.0:
feature1[0][i] = 1
Then, I compare the two features vector using Hamming distance. Hamming distance is just a naive way to compare, in real project, I will quantize those features before comparing. However, the distance between 2 images of Emma still large even though I use 2 neural facial expression images (same emotion, different emotion type return worse result).
My question is how could I train the model to extract features of target user. Imaging, Emma is a target user, and her phone only need to extract her features. When someone try to unlock Emma's phone, her phone extract this person's face then compare with saved Emma's features. In addition, I don't want to train a model to classify 2 classes Emma and not Emma. The thing I need is comparing extracted features.
To sum up, If we compare features from different images of the same person, the distance (differences) should be "close" (small). If we compare features from different images of different people, the distance should be "far" (large).
Thank you so much.
I'd do the following: We want to compute the features from a deep layer from a ConvNet to ultimately compare new images with a base image. Let's say this deep layer gives you the feature vector f. Now, create a dataset with pairs of images and a label y. Say, y = 1 if both images are of same person as the base image and y = 0 if they are different. Then, calculate the element wise difference and feed it into a logistic regression unit to get your y_hat: y_hat = sigmoid(np.multiply(W, np.sum(abs(f1 - f2)) + b). You will have to create a "Siamese" network where you have two same ConvNets, one giving you f1 for one image and another one for f2 for another image from the same example pair. Siamese networks need to have the exact weights at all times so you will need to ensure that their weights are same as each other at all times. As you train this new network, you should get desired results.

Resources