I use two methods, each of them yielding a p-value:
p1
p2
Given that these methods are different and complementary, I would combine their p-values in a geometric mean:
p=sqrt( p1 * p2 )
although I am pretty sure there are much more rigorous ways to achieve that combination, I aim first at an intuitive and easy way.
Nevertheless, does such a mean of p-values make sense?
I precise, that I am interested in the ranking of events relative to their p-values, more than in their absolute values.
Cheers,
Xavier
I think you can answer this for yourself if you think it through. A p-value is p(data | something) where something is some combination of model, parameters, and hypothesis. What is sqrt(p(data | something) p(data | something else)) ? Is there a general law of probability which will enable you to derive some other expression from that?
If the methods are very independent, then the Fisher's method offers a simple way to combine p values:
http://mikelove.wordpress.com/2012/03/12/combining-p-values-fishers-method-sum-of-p-values-binomial/
which is somehow related to the geometrical mean.
Related
I'm working on a simple project in which I'm trying to describe the relationship between two positively correlated variables and determine if that relationship is changing over time, and if so, to what degree. I feel like this is something people probably do pretty often, but maybe I'm just not using the correct terminology because google isn't helping me very much.
I've plotted the variables on a scatter plot and know how to determine the correlation coefficient and plot a linear regression. I thought this may be a good first step because the linear regression tells me what I can expect y to be for a given x value. This means I can quantify how "far away" each data point is from the regression line (I think this is called the squared error?). Now I'd like to see what the error looks like for each data point over time. For example, if I have 100 data points and the most recent 20 are much farther away from where the regression line/function says it should be, maybe I could say that the relationship between the variables is showing signs of changing? Does that make any sense at all or am I way off base?
I have a suspicion that there is a much simpler way to do this and/or that I'm going about it in the wrong way. I'd appreciate any guidance you can offer!
I can suggest two strands of literature that study changing relationships over time. Typing these names into google should provide you with a large number of references so I'll stick to more concise descriptions.
(1) Structural break modelling. As the name suggest, this assumes that there has been a sudden change in parameters (e.g. a correlation coefficient). This is applicable if there has been a policy change, change in measurement device, etc. The estimation approach is indeed very close to the procedure you suggest. Namely, you would estimate the squared error (or some other measure of fit) on the full sample and the two sub-samples (before and after break). If the gains in fit are large when dividing the sample, then you would favour the model with the break and use different coefficients before and after the structural change.
(2) Time-varying coefficient models. This approach is more subtle as coefficients will now evolve more slowly over time. These changes can originate from the time evolution of some observed variables or they can be modeled through some unobserved latent process. In the latter case the estimation typically involves the use of state-space models (and thus the Kalman filter or some more advanced filtering techniques).
I hope this helps!
I have a two 3D variables for a each time step (so I have N 3d matrix var(Nx,Ny,Nz), for each variables). I want to construct the two point statistics but I guess I'm doing something wrong.
Two-point statistics formula, where x_r is the reference point and x is the independent variable
I know that the theoretical formulation of a two-point cross correlation is the one written above.
Let's for sake of simplicity ignore the normalization, so I'm focusing on the numerator, that is the part I'm struggling with.
So, my two variables are two 3D matrix, with the following notation phi(x,y,z) = phi(i,j,k), same for psi.
My aim is to compute a 3d correlation, so given a certain reference point Reference_Point = (xr,yr,zr), but I guess I'm doing something wrong. I'm trying that on MATLAB, but my results are not accurate, and by doing some researches online it does seem that I should do convolutions or fft, but I don't find any theoretical framework that explains how to do that and why the formulation above in practices should be implemented by the use of a conv or fft. Moreover I would like to implement my cross-correlation in the spatial domain and not in the frequency domain, and with the convolution I don't understand how to choose the reference point.
Thank you so much in advance for reply
Yesterday, I posted a question about general concept of SVM Primal Form Implementation:
Support Vector Machine Primal Form Implementation
and "lejlot" helped me out to understand that what I am solving is a QP problem.
But I still don't understand how my objective function can be expressed as QP problem
(http://en.wikipedia.org/wiki/Support_vector_machine#Primal_form)
Also I don't understand how QP and Quasi-Newton method are related
All I know is Quasi-Newton method will SOLVE my QP problem which supposedly formulated from
my objective function (which I don't see the connection)
Can anyone walk me through this please??
For SVM's, the goal is to find a classifier. This problem can be expressed in terms of a function that you are trying to minimize.
Let's first consider the Newton iteration. Newton iteration is a numerical method to find a solution to a problem of the form f(x) = 0.
Instead of solving it analytically we can solve it numerically by the follwing iteration:
x^k+1 = x^k - DF(x)^-1 * F(x)
Here x^k+1 is the k+1th iterate, DF(x)^-1 is the inverse of the Jacobian of F(x) and x is the kth x in the iteration.
This update runs as long as we make progress in terms of step size (delta x) or if our function value approaches 0 to a good degree. The termination criteria can be chosen accordingly.
Now consider solving the problem f'(x)=0. If we formulate the Newton iteration for that, we get
x^k+1 = x - HF(x)^-1 * DF(x)
Where HF(x)^-1 is the inverse of the Hessian matrix and DF(x) the gradient of the function F. Note that we are talking about n-dimensional Analysis and can not just take the quotient. We have to take the inverse of the matrix.
Now we are facing some problems: In each step, we have to calculate the Hessian matrix for the updated x, which is very inefficient. We also have to solve a system of linear equations, namely y = HF(x)^-1 * DF(x) or HF(x)*y = DF(x).
So instead of computing the Hessian in every iteration, we start off with an initial guess of the Hessian (maybe the identity matrix) and perform rank one updates after each iterate. For the exact formulas have a look here.
So how does this link to SVM's?
When you look at the function you are trying to minimize, you can formulate a primal problem, which you can the reformulate as a Dual Lagrangian problem which is convex and can be solved numerically. It is all well documented in the article so I will not try to express the formulas in a less good quality.
But the idea is the following: If you have a dual problem, you can solve it numerically. There are multiple solvers available. In the link you posted, they recommend coordinate descent, which solves the optimization problem for one coordinate at a time. Or you can use subgradient descent. Another method is to use L-BFGS. It is really well explained in this paper.
Another popular algorithm for solving problems like that is ADMM (alternating direction method of multipliers). In order to use ADMM you would have to reformulate the given problem into an equal problem that would give the same solution, but has the correct format for ADMM. For that I suggest reading Boyds script on ADMM.
In general: First, understand the function you are trying to minimize and then choose the numerical method that is most suited. In this case, subgradient descent and coordinate descent are most suited, as stated in the Wikipedia link.
Let's say, I have two random variables,x and y, both of them have n observations. I've used a forecasting method to estimate xn+1 and yn+1, and I also got the standard error for both xn+1 and yn+1. So my question is that what the formula would be if I want to know the standard error of xn+1 + yn+1, xn+1 - yn+1, (xn+1)*(yn+1) and (xn+1)/(yn+1), so that I can calculate the prediction interval for the 4 combinations. Any thought would be much appreciated. Thanks.
Well, the general topic you need to look at is called "change of variables" in mathematical statistics.
The density function for a sum of random variables is the convolution of the individual densities (but only if the variables are independent). Likewise for the difference. In special cases, that convolution is easy to find. For example, for Gaussian variables the density of the sum is also a Gaussian.
For product and quotient, there aren't any simple results, except in special cases. For those, you might as well compute the result directly, maybe by sampling or other numerical methods.
If your variables x and y are not independent, that complicates the situation. But even then, I think sampling is straightforward.
in the J programming language,
-: i. 5
the above function computes the halves of all integers in [0,4]. Now let's say I'd like to re-write the -: function, just for the fun of it. My best guess so far was
]&%.2
but that doesn't seem to cut it. How do you do it?
%&2 NB. divide by two
0.5&* NB. multiply by one half
Note that ] % 2: would also work, but to ensure proper grammar you would either want to use that as the definition of a name, or you would want to put the expression in parenthesis.
I saw you were using %. probably because you were dividing a matrix and thought you needed to do a "matrix divide".
The matrix divide and matrix inverse they are talking about there is for matrix algebra, where you have a list of, well, essentially polynomials, and you want to do transformations on the polynomials all at once, so as to solve the equations. One of the things you can do really easily in J is matrix algebra, there are builtins for matrix divide and for inverting a matrix (as you have seen) and in the phrases section, there are short phrases for doing all of the typical matrix transformations. Taking the determinant, for example.
But when you are simply dividing a vector by a scalar to get a vector, or you are dividing a matrix by the corresponding elements of another matrix, well, that is just the % division symbol.
If you want to try and understand this, look at euler problem 101 (http://projecteuler.net/problem=101) and then google curve fitting on the Jsoftware.com site. Creating the matrixes from the observations, and the basic matrixes as shown allow you to solve for ax^2+bx+c = y where you have x and y and you want to determine a, b, and c. Just remember to use extended arithmetic for everything, as the resultant equations are very good but not perfect unless you do, and to solve the equation you need perfect equations.
Just a thought, unless you want to play with Matrix Algebra, you might not care.