What kind of Bonferroni adjustment to apply - statistics

I am currently working on sign predictability of certain non-linear models for various financial assets.
The following table shows the percentage of correct sign forecasts for 5 financial asset generated using 9 different models where *, ** and *** indicate significance at 10%, 5% and 1% level, respectively, for two-tailed binomial tests with H0: p=0.5.
Now, my question is, should I or shouldn't I apply a Bonferroni adjustment?
If so, should I adjust for the number of models, number of assets or both?
I am in doubt because the columns are highly correlated due to the fact that - if one model performs well on one asset, it tends to performs well on others as well.

Related

2*2 repeated measures ANOVA with an unbalanced number of observations

sorry for this very basic question. I've trawled through previous pages and cannot quite find a case that corresponds to our situation.
320 individuals rated two types of films. The rating was provided on a 1-11 scale.There are many films of each type. In short the DV is a continuous variable.
20 individuals have a particular disease that we now consider of interest. We would like to examine the effect of the disease on the rating.
We conducted a 2-way repeated measures ANOVA, using 'situation type' as a within-subject factor, and 'disease status' as a between-subject factor, using SPSS. The design is obviously unbalanced with more observations in the healthy group. The data appeared to be normally distributed. Levine test suggested equality of variance. Does that mean it is appropriate to use ANOVA for this analysis?

Testing significance of Strauss parameters in mppm model

I have a follow up question from my previous post.
Upon creating mppm models like these:
Str <- hyperframe(str=with(simba, Strauss(mean(nndist(Points)))))
fit0 <- mppm(Points ~ group, simba)
fit1 <- mppm(Points ~ group, simba, interaction=Str,
iformula = ~str + str:id)
Using anova.mppm to run a likelihood ratio test shows that the interaction is highly significant as a whole, but I would also like to test:
whether each individual id shows significant regularity.
whether some groups of ids show significantly stronger inhibition than other groups, for example, whether ids 1-7 are are significantly more regular than ids 8-10.
perform pairwise comparisons of regularity between different ids.
I am aware I could build separate ppm models for each id to test for significant regularity in each id, but I am not sure this is the best approach. Also, I do not think the "summary output" with the p-values for each Strauss interaction parameter can be used for pairwise comparisons other than to the reference level.
Any advice is greatly appreciated.
Thank you!
First let me explain that, for Gibbs models, the likelihood is intractable, so anova.mppm performs the adjusted composite likelihood ratio test, not the likelihood ratio test. However, you can essentially treat this as if it were the likelihood ratio test based on deviance differences.
whether each individual id shows significant regularity
I am aware I could build separate ppm models for each id to test for significant regularity in each id, but I am not sure this is the best approach.
This is appropriate. Use ppm to fit a Strauss model to an individual point pattern, and use anova.ppm to test whether the Strauss interaction is statistically significant.
whether some groups of ids show significantly stronger inhibition than other groups, for example, whether ids 1-7 are are significantly more regular than ids 8-10.
Introduce a new categorical variable (factor) f, say, that separates the two groups that you want to compare. In your model, add the term f:str to the interaction formula; this gives you the alternative hypothesis. The null and alternative models are identical except that the alternative includes the term f:str in the interaction formula. Now apply anova.mppm. Like all analyses of variance, this performs a two-sided test. For the one-sided test, inspect the sign of the coefficient of f:str in the fitted alternative model. If it has the sign that you wanted, report it as significant at the same p-value. Otherwise, report it as non-significant.
perform pairwise comparisons of regularity between different ids.
This is not yet supported (in theory or in software).
[Congratulations, you have reached the boundary of existing methodology!]

Assumptions of the Chi-square test of independence

I want to use the Chi-square test of independence to test the following two variables: Student knowledge v.s. course attendance
The null hypothesis is: student knowledge and course attendance (X and Y) are independent
Members in each student knowledge group: Low (12), average(29), high(9)
The results show that there are two degrees of freedom, the chi-square statistic is 0.20, and the p-value is 0.90, and we cannot accept the null hypothesis. I added an image of my test.
click to see the image of the test
I have little doubts regarding the following two issues: the student knowledge groups have an unequal number of participants, the number of participated students in each course is fewer than 10.
My question is: does this test fit for my data?
In case, this test cannot be used for my data, what statistical test I should use instead?
Welcome to stack exchange. Using the Chi-Square test for independence can be an issue with small cell sizes (ie G3, course Y which has a cell count of 2). This has to do with the use of Chi-Square Distribution as an approximation.
I would recommend Fisher's Exact Test. It's usually designated as a tool for small sample sizes, but it is still effective for large samples.

Statistical method to compare urban vs rural matched siblings

I am writing a study protocol for my masters thesis. The study seeks to compare the rates of Non Communicable Diseases and risk factors and determine the effects of rural to urban migration. Sibling pairs will be identified from a rural area. One of the siblings should have participated in the rural NCD survey which is currently on going in the area. The other sibling should have left the area and reported moving to a city.Data will collected by completing a questionnaire on demographics, family history,medical history, diet,alcohol consumption, smoking ,physical activity.This will be done for both the rural and urban sibling, with data on the amount of time spent in urban areas fur
The outcomes which are binary (whether one has a condition or not) are : 1.diabetic, 2.hypertensive, 3.obese
What statistical method can I use to compare the outcomes (stated above) between the two groups, considering that the siblings were matched (one urban sibling for every rural sibling)?
What statistical methods can also be used to explore associations between amount spent in urban residence and the outcomes?
Given that your main aim is to compare quantities of two nominal distributions, a chi-square test seems to be the method of choice with regard to your first question. However, it should be mentioned that a chi-square test is somehow "the smallest" test for answering differences in samples. If you are studying medicine (or related) a chi-square test is fine because it is also frequently applied by researchers of this field. If you are studying psychology or sociology (or related) I'd advise to highlight limitations of the test in the discussions section since it mostly tests your distributions against randomly expected distributions.
Regarding your second question, a logistic regression would be applicable since it allows binomial distributed variables both for independent variables (predictors) and dependent variables. However, if you have other interval scaled variables (e.g. age, weight etc.) you could also use t-tests or ANOVAs to investigate differences between these variables with respect to the existence of specific diseases (i.e. is diabetic or not).
Overall, this matter strongly depends on what you mean by "association". Classically, "association" refers to correlations or linear regression (for which you need interval scaled variables on "both sides") but given your data structure, the aforementioned methods possess a better fit.
How you actually calculate these tests depends on the statistics software used.

Obtaining the Standard Error of Weighted Data in SPSS

I'm trying to find confidence intervals for the means of various variables in a database using SPSS, and I've run into a spot of trouble.
The data is weighted, because each of the people who was surveyed represents a different portion of the overall population. For example, one young man in our sample might represent 28000 young men in the general population. The problem is that SPSS seems to think that the young man's database entries each represent 28000 measurements when they actually just represent one, and this makes SPSS think we have much more data than we actually do. As a result SPSS is giving very very low standard error estimates and very very narrow confidence intervals.
I've tried fixing this by dividing every weight value by the mean weight. This gives plausible figures and an average weight of 1, but I'm not sure the resulting numbers are actually correct.
Is my approach sound? If not, what should I try?
I've been using the Explore command to find mean and standard error (among other things), in case it matters.
You do need to scale weights to the actual sample size, but only the procedures in the Complex Samples option are designed to account for sampling weights properly. The regular weight variable in Statistics is treated as a frequency weight.

Resources