After model conversion the model details missing - hololens

I use Azure object anchor service to convert a geometry model with obj format. This model is a human head model. Before conversion, the nose, ear and eyes are clearly present in the model. But after conversion, all these details are missing. What I obtained is a very coarse head model. Due to the missing of the details, the detection result is not accurate. I would like to know how to keep the details as I do model conversion?
Thanks.
YL

For this one, it is likely the model size is just too small. A human head would be much smaller than the 1-meter threshold. The service requires larger objects per the documentation here:
https://learn.microsoft.com/en-us/azure/object-anchors/overview#asset-requirements
"Each dimension of an asset should be between 1 meter to 10 meters, and the file size should be less than 150 MB."

Related

How to use a trained deeplearning model at different resolutions?

I have trained a model for image segmentation task on 320x240x3 resolution images using tensorflow 2.x. I am wondering if there is a way to use the same model or tweak the model to make it work on different resolutions?
I have to use a model trained on a 320x240 resolution for Full HD (1920x1080) and SD(1280x720) images but as the GPU Memory is not sufficient to train the model at the specified resolutions with my architecture, I have trained it on 320x240 images.
I am looking for a scalable solution that works at all the resolutions. Any Suggestions?
The answer to your question is no: you cannot use a model trained at a particular resolution to be used at different resolution; in essence, this is why we train the models at different resolutions, to check the performance and possibly improve it.
The suggestion below omits one crucial aspect: that, depending on the task at hand, increasing the resolution can considerably improve the results in object detection and image segmentation, particularly if you have small objects.
The only solution for your problem, considering the GPU memory constraint, is to try to split the initial image into smaller parts (or maybe tiles) and train per part(say 320x240) and then reconstruct the initial image; otherwise, there is no other solution than to increase the GPU memory in order to train at higher resolutions.
PS: I understood your question after reading it a couple of times; I suggest that you modify a little bit the details w.r.t the resolution.
YEAH, you can do it in high resolution image. But the small resolution is easy to train and it is easy for the model to find the features of the image. Training in small resolution models saves your time and makes your model faster since it has the less number of parameters. HD images contains large amount of pixels, so if you train your model in higher resolution images, it makes your training and model slower as it contains large number of parameters due to the presence of higher number of pixels and it makes difficult for your model to find features in the high resolution image. So, mostly your are advisable to use lower resolution instead of higher resolution.

How to handle shared data between samples and batches in Keras

I'm using Keras for timeseries prediction and I want to create a model that is based on the self-attention mechanism that will not use any RNNs. For each sample we look at the last x timesteps of samples to predict the next sample.
In other words I want to feed the network (num_batches, num_samples, timesteps, features) and get (num_batches, predictions).
There is 1 problems with this.
There is a lot of unnecessary duplication of data where sample n has basically the same timesteps and features as sample n+1, only shifted 1 to the left.
How would you handle this assuming you dataset is very large?
I am not very familiar with this, but if your issue is "I have too many replicated data" I think you can solve your problem devising a generator for your data, and then pass the generator as input for the Keras/TensorFlow fit function (according to TensorFlow APIs specification, it is stated that it supports generators as input).
If your question is related to the logic behind the model, I do not see the issue. It is like that you have a sliding window, for each window you predict one value, and then you move the window by a certain amount (in your case, one). Could you argue a little more about your concern?

How many images(minimum) should be there in each classes for training YOLO?

I am trying to implement YOLOv2 on my custom dataset. Is there any minimum number of images required for each class?
There is no minimum images per class for training. Of course the lower number you have, the model will converge slowly and the accuracy will be low.
What important, according to Alexey's (popular forked darknet and the creator of YOLO v4) how to improve object detection is :
For each object which you want to detect - there must be at least 1
similar object in the Training dataset with about the same: shape,
side of object, relative size, angle of rotation, tilt, illumination.
So desirable that your training dataset include images with objects at
diffrent: scales, rotations, lightings, from different sides, on
different backgrounds - you should preferably have 2000 different
images for each class or more, and you should train 2000*classes
iterations or more
https://github.com/AlexeyAB/darknet
So I think you should have minimum 2000 images per class if you want to get the optimum accuracy. But 1000 per class is not bad also. Even with hundreds of images per class you can still get decent (not optimum) result. Just collect as many images as you can.
It depends.
There is an objective minimum of one image per class. That may work with some accuracy, in principle, if using data-augmentation strategies and fine-tuning a pretrained YOLO network.
The objective reality, however, is that you may need as many as 1000 images per class, depending on your problem.

CNN multi-class network

what approach should i take when I want my CNN multi-class network to output something like [0.1, 0,1] when image doesn't belong
to any class. Using softmax and categorical_crossentropy for multi-class would give me output that sums up to 1 so still not what I want.
I'm new to neural networks so sorry for silly question and thanks in advance for any help.
I think you are gonna think about Bayesian Learning. First, talking about uncertainty.
For example, given several pictures of dog breeds as training data—when a user uploads a photo of his dog—the hypothetical website should return a prediction with rather high confidence. But what should happen if a user uploads a photo of a cat and asks the website to decide on a dog breed?
The above is an example of out of distribution test data. The model has been trained on photos of dogs of different breeds, and has (hopefully) learnt to distinguish between them well. But the model has never seen a cat before, and a photo of a cat would lie outside of the data distribution the model was trained on. This illustrative example can be extended to more serious settings, such as MRI scans with structures a diagnostics system has never observed before, or scenes an autonomous car steering system has never been trained on.
A possible desired behaviour of a model in such cases would be to return a prediction (attempting to extrapolate far away from our observed data), but return an answer with the added information that the point lies outside of the data distribution. We want our model to possess some quantity conveying a high level of uncertainty with such inputs (alternatively, conveying low confidence).
Then, I think you could read briefly this paper when they also apply to classification task and generate uncertainty for classes (dog, cat...). From this paper, you can extend your finding to application using this paper, and I think you will find what you want.

SVM: Adding Clinical Features To Feature Vector Extracted From Image

I'm using SVM to classify clinical images of patients belonging to two different groups (patients vs. controls). I use PCA to extract a vector of features from each image, but I'd like to add other clinical information (for example, the output value of a clinical exam) in order to include it in the classification process.
Is there a way to do this?
I didn't find exhaustive suggestions in literature.
Thanks in advance.
You could just append the new information at the end of each sample. Other approach that you could try is having two additional classifiers, one that you could train with the additional information and a third classifier that would take the output of the other two classifiers as input to product a final prediction.
The question is pretty old, I' post my answer though.
If you have to scale your values, make sure that the new values are scaled to the similar range of your values in PCA-vector.
If your PCA vectors of features have constant length, you just start enumerating your features from length+1 e.g. for SVM input (libsvm):
1 1:<PCAval1> ... N:<PCAvalN> N+1:<Clinical exam value 1> ...
I've made a test adding such general features for cell recognition and the accuracy raised.
This Guide describes how to use enumerator-features.
P.S.:
In my test I've isolated, and squeezed cells from microscope image to a matrix 16x16. Each pixel in this matrix was a feature - 256 features. Additionally I've added some features as original size, moments, etc.

Resources