(Marginal) density estimation over a (Pytorch) dataset - pytorch

Online, there are a lot of density estimation methods available that work with Pytorch. However, many of these methods only do conditional density estimation, or the data has to be low dimensional or in a specific form, etc....
I just wondered if anyone knows any marginal density estimation methods that work with high dimensional data like images; bonus if it already works with a Dataset class like TensorDataset!
I skimmed through all the repositories listed here and other public implementations that I was able to find. Most are conditional density estimation/only shown with very low dimensional data.

Related

Simple linear regressions vs multiple linear regression model scaling

I read somewhere suggesting that in case there are multiple features(multi linear model) no feature scaling is needed because co-efficient takes care of that.
But for single feature(simple linear model); feature scaling is needed.
Is this how python scikilt learn works or I read something wrong?
Need answer from someone who has tested both with and without feature scaling in simple linear regression
Scaling is used when we want to scale the features in a particular range. In particular algorithms, the model will be sensitive to outliers so it is recommended to scale the features in a particular range. Algorithms like distance-based need feature scale. It also depends on data not in particular for any dataset such as multiple linear regression or linear regression. Sometimes features scaling is not recommended as the data points will shift from a particular range to a normal distribution range as it will lead to an impact on modelling.

meaning of weighted metrics in scikit: bigger class more weight or smaller class more weight?

I am dealing with an imbalanced dataset and tried handle it with the validation metric.
In scikit docu I found the following for weighted:
Calculate metrics for each label, and find their average weighted by
support (the number of true instances for each label). This alters
‘macro’ to account for label imbalance; it can result in an F-score
that is not between precision and recall.
Does calculating the average weighted by support mean, that the class with more samples are higher weighted than the ones with less samples or, as it seems to be more logical, that smaller classes are weighted more than bigger ones.
I couldn't find anything out it in the docu and wanted to make sure I am choosing the right metric.
Thanks!
Short answer: weighted by support means the higher the support the higher the weight. This translates to the more samples a certain class has the higher its score is weighted.
That being said, please be aware that you have not "handled" the class imbalance by just choosing another calculation method for your metric. I believe they are meant to offer you another perspective of your model's performance.
Typically, models perform much better on the majority class. Using a weighted metric will overemphasize this. But the model still has the same, probably quite poor, performance on the minority class(es). If they happen to be the important one(s), you might end up just fooling yourself.

impact of labels' distribution in deep learning for regression problem

I am trying to train a CNN model for a regression problem, after that, I categorize predicted labels into 4 classes and check some accuracy metrics. In confusion matrix accuracy of class 2,3 are around 54% and accuracy of class 1,4 are more than 90%. labels are between 0-100 and classes are 1: 0-45,2: 45-60, 3:60-70, 4:70-100. I do not know where the problem comes from Is it because of the distribution of labels in the training set and what is the solution! Regards...
I attached the plot in the following link.
Training set target distribution
It's not a good idea to create classes that way. Giving to some classes a smaller window of values (i.e. you predict 2 for 15 values and 1 for 45 values), it is intrinsically more difficult for your model to predict class 2, and the best thing the model will learn during training will be to avoid class 2 as much as possible.
You may confirm this having a look at False Negatives for classes 2 and 3, if they are too many, it might be due to this.
The best thing to do would be categorizing your output space in equal portions, and trusting your model will learn which classes are less frequent, without trying to force that proportion by yourself.
If you don't have good results, it means you have to improve your model in other ways, maybe using data augmentation to get a uniform distribution of training samples may help.
If this doesn't sound convincing for you, try to have a look at this paper:
https://papers.nips.cc/paper/95-alvinn-an-autonomous-land-vehicle-in-a-neural-network.pdf
In end-to-end models for autonomous driving, neural networks have to predict classes indicating the steering angle. The distribution of these values is highly imbalanced as most of the time the car is going straight. Despite this, the best models do not discriminate against some classes to adapt to data distribution.
Good luck!

Which machine learning model works well when the feature of one class overlap with another class?

with respect to this training data set plot, which machine learning model suites well for this. As most of the column values of one class overlaps with another class.
Just a friendly reminder. Stack-overflow is a platform for question concerning programming. Your question is off topic.
Looking at the scatter plot, I would say that with a logistic regression you can already achieve some results. Clearly your data are not perfectly separable in the two-dimensional space. You will have an error that is different from zero by design.
For better results you have some options:
1) design better feature. The fact that the samples for the two classes overlap is a sign that the discriminative power of the feature is limited. You could consider to find better measurements that characterize your samples.
2) Use a SVM with a kernel that maps your problem in a space with higher dimensions. The fact that the samples are not separable in the two-dimensional space could be easily solved in a high dimensional one. The kernel of SVM (e.g. polynomial, Gaussian,...) maps your point in an higher space separating the data in that space.

Linear SVM vs Nonlinear SVM high dimensional data

I am working on a project where I use Spark Mllib Linear SVM to classify some data (l2 regularization). I have like 200 positive observation, and 150 (generated) negative observation, each with 744 features, which represent the level of activity of a person in different region of a house.
I have run some tests and the "areaUnderROC" metric was 0.991 and it seems that the model is quite good in classify the data that I provide to it.
I did some research and I found that the linear SVM is good in high dimensional data, but the problem is that I don't understand how something linear can divide my data so well.
I think in 2D, and maybe this is the problem but looking at the bottom image, I am 90% sure that my data looks more like a non linear problem
So it is normal that I have good results on the tests? Am I doing something wrong? Should I change the approach?
I think you question is about 'why linear SVM could classfy my hight Dimensions data well even the data should be non-linear'
some data set look like non-linear in low dimension just like you example image on right, but it is literally hard to say the data set is definitely non-linear in high dimension because a nD non-linear may be linear in (n+1)D space.So i dont know why you are 90% sure your data set is non-linear even it is a high Dimension one.
At the end, I think it is normal that you have a good test result in test samples, because it indicates that your data set just is linear or near linear in high Dimension or it wont work so well.Maybe cross-validation could help you comfirm that your approach is suitable or not.

Resources