I am trying to experiment with different values for the arguments of MCMC.sample in pymc.
I looked at help pages for MCMC.sample and I found:
tune_interval : int
Step methods will be tuned at intervals of this many iterations, default 1000
What does this mean by "Tuning of step methods" ? So, I don't know whether keeping this number high or low will yield me better result .
Tuning is an adaptive procedure for optimizing the variance of the proposal distribution with the Metropolis sampler. You definitely want to tune. I don't change my tuning interval at all, but there are scenarios where it might help, I suppose.
Related
I am currently writing a program which uses importance sampling. All is going well so far, however I have two minor queries that I would like to know about.
What is the most basic job that we would expect importance sampling to do?
For an arbitrarily large number of samples, what is the difference between a result with importance sampling and one without it?
I am trying to implement an A/B testing (online validation) for ML model that has a highly imbalanced positive event rate. For example, the model detects spam and only 1 out of 1000 samples is spam, or baseline click through rate is very low <0.1%
I know one issues is that I will need very large samples in each control and treatment cohort. Are there other issues that I need to be aware of? Will the statistical properties breakdown? What are the ways to counter them?
Thanks.
You can use a calculator like the one here to get a sense for volumes needed. How much of a difference are you expecting? Eg. Detecting a 1% improvement that’s statistically significant requires way more samples than if you’re looking to detect a 30% improvement.
https://www.statsig.com/calculator
What are the tips & tricks to improve our auc_roc_score?
Example:
Is balanced data required?
Is recall is more important than precision?
Is oversampling is usually better than undersampling?
Thanks again!
This is highly dependent upon the type of data in use and the type of model it is trained on along with the hyper parameters being currently used. The ROC is just the outcome of how well the data pre processing and model building is done. Thus you must look into ways of improving your model. Also precision and recall are useful in their own ways. It depends on the scenario.
Like precision answers the question :
What proportions of the predicted True labels were actually correct ?
whereas recall answers :
What proportion of the actually correct labels were identified correctly ?
Thus you would want a higher recall in case of identifying a corona patient while you would prefer a higher precision when it comes to the fact that the cost of acting is high and the cost of not acting is low.
Also, what exactly do u mean by balanced data varies from situation to situation.
Thus if you could specify the kind of model, problem and data in use, we may be able to help you more.
Please share the code of your model for the same.
I'm trying to find confidence intervals for the means of various variables in a database using SPSS, and I've run into a spot of trouble.
The data is weighted, because each of the people who was surveyed represents a different portion of the overall population. For example, one young man in our sample might represent 28000 young men in the general population. The problem is that SPSS seems to think that the young man's database entries each represent 28000 measurements when they actually just represent one, and this makes SPSS think we have much more data than we actually do. As a result SPSS is giving very very low standard error estimates and very very narrow confidence intervals.
I've tried fixing this by dividing every weight value by the mean weight. This gives plausible figures and an average weight of 1, but I'm not sure the resulting numbers are actually correct.
Is my approach sound? If not, what should I try?
I've been using the Explore command to find mean and standard error (among other things), in case it matters.
You do need to scale weights to the actual sample size, but only the procedures in the Complex Samples option are designed to account for sampling weights properly. The regular weight variable in Statistics is treated as a frequency weight.
The problem is as follows. When I do support vector machine training, suppose I have already performed cross validation on 10000 training points with a Gaussian kernel and have obtained the best parameter C and \sigma. Now I have another 40000 new training points and since I don't want to waste time on cross validation, I stick to the original C and \sigma that I obtained from the first 10000 points, and train the entire 50000 points on these parameters. Is there any potentially major problem with this? It seems that for C and \sigma in some range, the final test error wouldn't be that bad, and thus the above process seems okay.
There is one major pitfal of such appraoch. Both C and sigma are data dependant. In particular, it can be shown, that optimal C strongly depends on the size of the training set. So once you make your training data 5 times bigger, even if it brings no "new" knowledge - you should still find new C to get the exact same model as before. So, you can do such procedure, but keep in mind, that best parameters for smaller training set do not have to be the best for the bigger one (even though, they sometimes still are).
To better see the picture. If this procedure would be fully "ok" than why not fit C on even smaller data? 5 times? 25 times smaller? Mayone on one single point per class? 10,000 may seem "a lot", but it depends on the problem considered. In many real life domains this is just a "regular" (biology) or even "very small" (finance) dataset, so you won't be sure, if your procedure is fine for this particular problem until you test it.