I'm searching for an objects dataset for Phd purposes. I'm working on objects recognition.
This dataset should contain a big number of images containg for each object, the whole object and others images containg several attributes (features) of the same object of the dataset.
(exemple : dog + head + tail + paws )
Any suggestions please??
Related
I'm familiar with SBERT and its pre-trained models and they are amazing! But at the same time, I want to understand how the results are calculated, and I can't find anything more specific in their website.
For example, I have a document and I want to find other documents that are similar to it. I used 2 documents containing 200-250 words each (I changed the model.max_seq_length to 350 so the model can handle bigger texts), and in the end we can see that the cosine-similarity is 0.79. Is that all we can see? Is there a way to extract the main phrases/keywords that made the model return this high value of similarity?
Thanks in advance!
Have you tried to make either a simple word-count-comparison between the two documents and other random documents? Or a tf-idf, if the two documents are part of a bigger corpus?
Another thing you can do, is to look inside the "stored_embeddings" matrix (see code below and here) in which SBERT encodes your sentences (e.g. for 20.000 documents you'll get a 20.000*384 matrix), after having saved it into a pickle file like:
from sentence_transformers import SentenceTransformer
import pickle
embeddings = model.encode(sentences)
with open('embeddings.pkl', "wb") as fOut:
pickle.dump({'sentences': sentences, 'embeddings': embeddings}, fOut, protocol=pickle.HIGHEST_PROTOCOL)
with open('embeddings.pkl', "rb") as fIn:
stored_data = pickle.load(fIn)
stored_embeddings = stored_data['embeddings']
The stored embeddings variable can be handled as a numpy matrix and can therefore be (for example) indexed to access single elements. By looking at the values of the single 384 dimensions (to do this, you can go column by column but in case of a big matrix I suggest you not to .enumerate(), it'll take forever) and compare the values that the two documents take in one precise dimension. You can see which dimension has the highest values or variance, for example.
I'm not saying it'll be interpretable what you'll find, but at least you can try and see what you find.
I'm currently dealing with a classification task on a CT dataset. In CT datasets, multiple slices belong to one single patient, while setting up my dataset, I arrange my data as follows:
dataset/0/patient_1/1.png,2.png...
dataset/0/patient_2/1.png,2.png...
I wonder is there a way to let my network to classify by patient instead of by slices?
thank you
Each slice is a 2D image, while for each patient you have a 3D volume of CT voxels.
If you want to work per-patient, rather than per-slice, you'll need to organize your data to output batches of 3D information (of shape batchxchannelxdepthxheightxwidth) and make your model process 3D information (e.g., using Conv3D instead of Conv2D)
I am trying to run the official script for video classification.
I want to tweak some functions and running through all examples would cost me too much time.
I wonder how can I slice the training kinetics dataset based on that script.
This is the code I added before
train_sampler = RandomClipSampler(dataset.video_clips, args.clips_per_video)
in the script: (let's say I just want to run 100 examples.)
tr_split_len = 100
dataset = torch.utils.data.random_split(dataset, [tr_split_len, len(dataset)-tr_split_len])[0]
Then when hitting train_sampler = RandomClipSampler(dataset.video_clips, args.clips_per_video)
, it pops out the error:
AttributeError: 'Subset' object has no attribute 'video_clips'
Yeah, so the type of dataset converts from torchvision.datasets.kinetics.Kinetics400 to torch.utils.data.dataset.Subset.
I understand. So how can I do it? (hopefully not the way using break in the dataloader loop).
Thanks.
It seems that torchvision.datasets.kinetics.Kinetics400 internally uses an object of class VideoClips to store the information about the clips. It is stored in the member variable Kinetics4000().video_clips.
The VideoClips class has a function called subset, that takes a list of indices and returns a new VideoClips object with only the clips with the specified indices. You could then just replace the old VideoClips object with the new one in your dataset.
Let start by saying that i have 2 pre-trained models (in hdf5 files):
The first model is a YOLO-based model, trained on dataset A, which is used to locate human in any images (note that: a trained images o this model may contain many people inside)
The second model is a CNN model which is used to detect gender of a person (male or female) based on the image which only contains 1 person.
Suppose that i only want to use these 2 models and do not want to re-train or modify anything on the dataset. How could i locate female person in a picture of Dataset A?
A possible solution that i think could work:
First use the first model to detect, that is to create bounding boxes around persons in the images.
Crop the bounding boxes into unique images. Feed those images to the second model to see if that person is Female/Male
However, this solution is slow in performance. So is there anyway that can festen this solution or perform this task in different ways?
(yolo - object detection)
if there are two dogs in the image and I trained on only one of them in all images that exist in the training set,
is the other dogs in the training set that I didn't label and train on them will affect on the process and will cause to consider them part of background?
I am asking especially about yolo darknet object detection.
it seems so, because after 3000 batches it didn't detect anything.
so the question, should I train on all objects (like all dogs in all training set) or it doesn't matter because the yolo will take the features only from the labeled ones and ignore the background?
Yes, it is important that all the objects that you want to find - are marked on image from training dataset. You teach to find objects where they are, and not to find objects where none exist.
CNN Yolo try to solve 3 problems:
to mark by rectangle the objects for which Yolo trained - positive error on last layer
don't mark one object as another object - negative error on last layer
don't mark any objects at background - negative error on last layer
I.e. Yolo looking for differences, why the first dog is considered to be an object, and the second considered the background. If you want to find any dogs, but you label only some of them, and labeled dogs are not statistically different from not labeled dogs, then it will be extremely low accuracy of detection. Because abs(positive_error) ~= abs(negative_error) and result of training sum(positive_errors) + sum(negative_errors) ~= 0. It is a contradictory task - you want at the same time: and find a dog, and don't find the dog.
But if labeled dogs are statistically different from not labeled dogs, for example if labeled bulldogs and not labeled labradors, then Yolo-network will been trained to distinguish one from another.
it seems so, because after 3000 batches it didn't detect anything.
It is not enough, Yolo requires 10000 - 40000 iterations.