Test an hypothesis significantly - statistics

How can I test the hypothesis that the execution time of an
algorithm is not exponentially, with respect to the size of the data.
For exaple, I have the sample:
[n time(s)] = {[02 0.36], [03 1.15], [04 2.66], [05 5.48], [06 6.54], [07 11.22], [08 12.87], [09 16.94], [10 17.59]}
where n is the size of the data. I want to proof significantly that the time does not grow exponentially with respect to the data.
What should be the hypotheses H0, H1.
Should I use anova or f-test? How do I apply it?
Thanks.

Note: this should be a comment, not an answer, but it got too long.
You probably need to learn a bit more about the rationale behind hypothesis testing. I suggest that you start with some online material such as this: http://stattrek.com/hypothesis-test/hypothesis-testing.aspx, but you may also need to look at some books on statistics. Your question, as it is now, cannot be answered because you can never "use statistics to prove something". Statistics will only tell you what is probable. So you cannot prove that the execution time does not grow exponentially. From you sample data, it really looks that it's not exponential. As a matter of fact, it really looks to be linear, so the growth is probably linear:
The code for generating this image in R is:
> n <- 2:10
> time <-c(0.36, 1.15, 2.66, 5.48, 6.54, 11.22, 12.87, 16.94, 17.59)
> model.linear <- lm(time ~n) # LM = Linear Model, time ~ a*n + b
> plot(time ~ n)
> lines(predict(model.linear)~n, col=2)
Do you need statistics to show that this linear model is a good fit? I hope you don't.

Related

Am I reading the notation of a heavysidestep function correclty for the leaky integrate-fire-or-burst neuron model?

As a passion project I'm recreating a neuronal model from XJ Wang's lab at NYU. The paper is Wei, W., & Wang, X. J. (2016). Inhibitory control in the cortico-basal ganglia-thalamocortical loop: complex regulation and interplay with memory and decision processes. Neuron, 92(5), 1093-1105.
The main problem I'm having is interpreting the equation for calculating the differential of the neurons membrane voltage. They have included a bursting neuronal model for cells in the basal ganglia and subthalamic nucleus. The differential equation for membrane voltage in these regions incorporates a hyperpolarization rebound which results in bursts and tonic spiking. The equation is on page 2 of a prior paper which uses basically the exact same model. I have linked to the paper below and I have provided an image link to the exact passage as well.
http://www.cns.nyu.edu/wanglab/publications/pdf/wei.jns2015.pdf
This is the equation I'm having trouble reading, don't worry about Isyn its the input current from the synapses
The equation is taken from this paper: https://www.physiology.org/doi/pdf/10.1152/jn.2000.83.1.588
Obviously the equation will need to be discritized so I can run it with numpy but I ill ignore that for now as it will be relatively easy to do so. The middle term with all the H's is whats giving me trouble. As I understand it I should be running code which dos the following:
gt * h * H(V-Vh) * (V-Vt)
Where H(V-Vh) is the heavyside step function, V is the membrane voltage at the prior timestep Vh = -60mV and Vt = 120mV. gt is a conductance efficacy constant in nanoSiemens. I think the correct way to interpret this for python is...
gt * h * heavyside(-60, 0.5)*(V-120)
But I'm not 100% sure I'm reading the notation correctly. Could someone please confirm I've read it as it is intended?
Secondly h is the deactivation term which gives rise to bursting as described in the final paragraph on page 2 of Smith et al., 2000 (the second pdf I've linked to). I understand the differential equations that govern the evolution of h well enough but what is the value of h? In Smith et al. 2000 the authors say that h relaxes to zero with a time constant of 20ms and it relaxes to unity with a time constant of 100ms. What value is h relaxing from and what does it mean to relax to unity?
For you x1 (of the numpy.heaviside) is = V-Vh; you are comparing that difference to zero. You might try writing your own version of the Heaviside function to deepen understanding, and then move back to the numpy version if you need it for speed or compatibility. The pseudo-code wordy version would be something like,
if (V<Vh): return(0); else: return(1);
You could probably just write (V>=Vh) in your code as Python will treat the boolean as 1 if true and 0 if false.
This ignores the possibility of V==Vh in the complete version of Heaviside, but for most practical work with real values (even discretized in a computer) that is unlikely to be a case to worth concerning yourself with, but you could easily add it in.

monitor progress in a Monte Carlo algorithm

I am making a Monte Carlo algorithm and I want to display its progress in a progress bar (values from 0% to 100%)
My initial thought is to compare the standard deviation generated by the algorithm with the solution tolerance specified.
like
progress = 100 * specified_tolerance / standard_deviation
However I wonder if there is something better, or if my approach has some pitfall.
[EDIT]
A sample picture of the simulations I'm making:
Thanks
Well, problem, I think, with your solution is that std.dev is going down as inverse square root of N (number of events generated), and assuming that N is proportional to simulation time, your scale would behave like
progress = C * sqrt(t)
which is quite unnatural if you ask me
I would redo scale to be linear, which means dealing with squared sigma (or variance)
UPDATE
thinking about it, I would do both. Typically you have green/blue bar and some numbers displaying % on top of that. I would separate the two indicators, and
make progress bar linear (dealing with variance), but percent display dealing with std.dev and therefore, being sqrt(). It would look a bit weird, but might be the best of two worlds

Integrating Power pdf to get energy pdf?

I'm trying to work out how to solve what seems like a simple problem, but I can't convince myself of the correct method.
I have time-series data that represents the pdf of a Power output (P), varying over time, also the cdf and quantile functions - f(P,t), F(P,t) and q(p,t). I need to find the pdf, cdf and quantile function for the Energy in a given time interval [t1,t2] from this data - say e(), E(), and qe().
Clearly energy is the integral of the power over [t1,t2], but how do I best calculate e, E and qe ?
My best guess is that since q(p,t) is a power, I should generate qe by integrating q over the time interval, and then calculate the other distributions from that.
Is it as simple as that, or do I need to get to grips with stochastic calculus ?
Additional details for clarification
The data we're getting is a time-series of 'black-box' forecasts for f(P), F(P),q(P) for each time t, where P is the instantaneous power and there will be around 100 forecasts for the interval I'd like to get the e(P) for. By 'Black-box' I mean that there will be a function I can call to evaluate f,F,q for P, but I don't know the underlying distribution.
The black-box functions are almost certainly interpolating output data from the model that produces the power forecasts, but we don't have access to that. I would guess that it won't be anything straightforward, since it comes from a chain of non-linear transformations. It's actually wind farm production forecasts: the wind speeds may be normally distributed, but multiple terrain and turbine transformations will change that.
Further clarification
(I've edited the original text to remove confusing variable names in the energy distribution functions.)
The forecasts will be provided as follows:
The interval [t1,t2] that we need e, E and qe for is sub-divided into 100 (say) sub-intervals k=1...100. For each k we are given a distinct f(P), call them f_k(P). We need to calculate the energy distributions for the interval from this set of f_k(P).
Thanks for the clarification. From what I can tell, you don't have enough information to solve this problem properly. Specifically, you need to have some estimate of the dependence of power from one time step to the next. The longer the time step, the less the dependence; if the steps are long enough, power might be approximately independent from one step to the next, which would be good news because that would simplify the analysis quite a bit. So, how long are the time steps? An hour? A minute? A day?
If the time steps are long enough to be independent, the distribution of energy is the distribution of 100 variables, which will be very nearly normally distributed by the central limit theorem. It's easy to work out the mean and variance of the total energy in this case.
Otherwise, the distribution will be some more complicated result. My guess is that the variance as estimated by the independent-steps approach will be too big -- the actual variance would be somewhat less, I believe.
From what you say, you don't have any information about temporal dependence. Maybe you can find or derive from some other source or sources an estimate the autocorrelation function -- I wouldn't be surprised if that question has already been studied for wind power. I also wouldn't be surprised if a general version of this problem has already been studied -- perhaps you can search for something like "distribution of a sum of autocorrelated variables." You might get some interest in that question on stats.stackexchange.com.

How do i prove that my derived equation and the Monte-Carlo simulation are equivalent?

I have derived and implemented an equation of an expected value.
To show that my code is free of errors i have employed the Monte-Carlo
computation a number of times to show that it converges into the same
value as the equation that i derived.
As I have the data now, how can i visualize this?
Is this even the correct test to do?
Can I give a measure how sure i am that the results are correct?
It's not clear what you mean by visualising the data, but here are some ideas.
If your Monte Carlo simulation is correct, then the Monte Carlo estimator for your quantity is just the mean of the samples. The variance of your estimator (how far away from the 'correct' value the average value will be) will scale inversely proportional to the number of samples you take: so long as you take enough, you'll get arbitrarily close to the correct answer. So, use a moderate (1000 should suffice if it's univariate) number of samples, and look at the average. If this doesn't agree with your theoretical expectation, then you have an error somewhere, in one of your estimates.
You can also use a histogram of your samples, again if they're one-dimensional. The distribution of samples in the histogram should match the theoretical distribution you're taking the expectation of.
If you know the variance in the same way as you know the expectation, you can also look at the sample variance (the mean squared difference between the sample and the expectation), and check that this matches as well.
EDIT: to put something more 'formal' in the answer!
if M(x) is your Monte Carlo estimator for E[X], then as n -> inf, abs(M(x) - E[X]) -> 0. The variance of M(x) is inversely proportional to n, but exactly what it is will depend on what M is an estimator for. You could construct a specific test for this based on the mean and variance of your samples to see that what you've done makes sense. Every 100 iterations, you could compute the mean of your samples, and take the difference between this and your theoretical E[X]. If this decreases, you're probably error free. If not, you have issues either in your theoretical estimate or your Monte Carlo estimator.
Why not just do a simple t-test? From your theoretical equation, you have the true mean mu_0 and your simulators mean,mu_1. Note that we can't calculate mu_1, we can only estimate it using the mean/average. So our hypotheses are:
H_0: mu_0 = mu_1 and H_1: mu_0 does not equal mu_1
The test statistic is the usual one-sample test statistic, i.e.
T = (mu_0 - x)/(s/sqrt(n))
where
mu_0 is the value from your equation
x is the average from your simulator
s is the standard deviation
n is the number of values used to calculate the mean.
In your case, n is going to be large, so this is equivalent to a Normal test. We reject H_0 when T is bigger/smaller than (-3, 3). This would be equivalent to a p-value < 0.01.
A couple of comments:
You can't "prove" that the means are equal.
You mentioned that you want to test a number of values. One possible solution is to implement a Bonferroni type correction. Basically, you reduce your p-value to: p-value/N where N is the number of tests you are running.
Make your sample size as large as possible. Since we don't have any idea about the variability in your Monte Carlo simulation it's impossible to say use n=....
The value of p-value < 0.01 when T is bigger/smaller than (-3, 3) just comes from the Normal distribution.

How do I efficiently estimate a probability based on a small amount of evidence?

I've been trying to find an answer to this for months (to be used in a machine learning application), it doesn't seem like it should be a terribly hard problem, but I'm a software engineer, and math was never one of my strengths.
Here is the scenario:
I have a (possibly) unevenly weighted coin and I want to figure out the probability of it coming up heads. I know that coins from the same box that this one came from have an average probability of p, and I also know the standard deviation of these probabilities (call it s).
(If other summary properties of the probabilities of other coins aside from their mean and stddev would be useful, I can probably get them too.)
I toss the coin n times, and it comes up heads h times.
The naive approach is that the probability is just h/n - but if n is small this is unlikely to be accurate.
Is there a computationally efficient way (ie. doesn't involve very very large or very very small numbers) to take p and s into consideration to come up with a more accurate probability estimate, even when n is small?
I'd appreciate it if any answers could use pseudocode rather than mathematical notation since I find most mathematical notation to be impenetrable ;-)
Other answers:
There are some other answers on SO that are similar, but the answers provided are unsatisfactory. For example this is not computationally efficient because it quickly involves numbers way smaller than can be represented even in double-precision floats. And this one turned out to be incorrect.
Unfortunately you can't do machine learning without knowing some basic math---it's like asking somebody for help in programming but not wanting to know about "variables" , "subroutines" and all that if-then stuff.
The better way to do this is called a Bayesian integration, but there is a simpler approximation called "maximum a postieri" (MAP). It's pretty much like the usual thinking except you can put in the prior distribution.
Fancy words, but you may ask, well where did the h/(h+t) formula come from? Of course it's obvious, but it turns out that it is answer that you get when you have "no prior". And the method below is the next level of sophistication up when you add a prior. Going to Bayesian integration would be the next one but that's harder and perhaps unnecessary.
As I understand it the problem is two fold: first you draw a coin from the bag of coins. This coin has a "headsiness" called theta, so that it gives a head theta fraction of the flips. But the theta for this coin comes from the master distribution which I guess I assume is Gaussian with mean P and standard deviation S.
What you do next is to write down the total unnormalized probability (called likelihood) of seeing the whole shebang, all the data: (h heads, t tails)
L = (theta)^h * (1-theta)^t * Gaussian(theta; P, S).
Gaussian(theta; P, S) = exp( -(theta-P)^2/(2*S^2) ) / sqrt(2*Pi*S^2)
This is the meaning of "first draw 1 value of theta from the Gaussian" and then draw h heads and t tails from a coin using that theta.
The MAP principle says, if you don't know theta, find the value which maximizes L given the data that you do know. You do that with calculus. The trick to make it easy is that you take logarithms first. Define LL = log(L). Wherever L is maximized, then LL will be too.
so
LL = hlog(theta) + tlog(1-theta) + -(theta-P)^2 / (2*S^2)) - 1/2 * log(2*pi*S^2)
By calculus to look for extrema you find the value of theta such that dLL/dtheta = 0.
Since the last term with the log has no theta in it you can ignore it.
dLL/dtheta = 0 = (h/theta) + (P-theta)/S^2 - (t/(1-theta)) = 0.
If you can solve this equation for theta you will get an answer, the MAP estimate for theta given the number of heads h and the number of tails t.
If you want a fast approximation, try doing one step of Newton's method, where you start with your proposed theta at the obvious (called maximum likelihood) estimate of theta = h/(h+t).
And where does that 'obvious' estimate come from? If you do the stuff above but don't put in the Gaussian prior: h/theta - t/(1-theta) = 0 you'll come up with theta = h/(h+t).
If your prior probabilities are really small, as is often the case, instead of near 0.5, then a Gaussian prior on theta is probably inappropriate, as it predicts some weight with negative probabilities, clearly wrong. More appropriate is a Gaussian prior on log theta ('lognormal distribution'). Plug it in the same way and work through the calculus.
You can use p as a prior on your estimated probability. This is basically the same as doing pseudocount smoothing. I.e., use
(h + c * p) / (n + c)
as your estimate. When h and n are large, then this just becomes h / n. When h and n are small, this is just c * p / c = p. The choice of c is up to you. You can base it on s but in the end you have to decide how small is too small.
You don't have nearly enough info in this question.
How many coins are in the box? If it's two, then in some scenarios (for example one coin is always heads, the other always tails) knowing p and s would be useful. If it's more than a few, and especially if only some of the coins are only slightly weighted then it is not useful.
What is a small n? 2? 5? 10? 100? What is the probability of a weighted coin coming up heads/tail? 100/0, 60/40, 50.00001/49.99999? How is the weighting distributed? Is every coin one of 2 possible weightings? Do they follow a bell curve? etc.
It boils down to this: the differences between a weighted/unweighted coin, the distribution of weighted coins, and the number coins in your box will all decide what n has to be for you to solve this with a high confidence.
The name for what you're trying to do is a Bernoulli trial. Knowing the name should be helpful in finding better resources.
Response to comment:
If you have differences in p that small, you are going to have to do a lot of trials and there's no getting around it.
Assuming a uniform distribution of bias, p will still be 0.5 and all standard deviation will tell you is that at least some of the coins have a minor bias.
How many tosses, again, will be determined under these circumstances by the weighting of the coins. Even with 500 tosses, you won't get a strong confidence (about 2/3) detecting a .51/.49 split.
In general, what you are looking for is Maximum Likelihood Estimation. Wolfram Demonstration Project has an illustration of estimating the probability of a coin landing head, given a sample of tosses.
Well I'm no math man, but I think the simple Bayesian approach is intuitive and broadly applicable enough to put a little though into it. Others above have already suggested this, but perhaps if your like me you would prefer more verbosity.
In this lingo, you have a set of mutually-exclusive hypotheses, H, and some data D, and you want to find the (posterior) probabilities that each hypothesis Hi is correct given the data. Presumably you would choose the hypothesis that had the largest posterior probability (the MAP as noted above), if you had to choose one. As Matt notes above, what distinguishes the Bayesian approach from only maximum likelihood (finding the H that maximizes Pr(D|H)) is that you also have some PRIOR info regarding which hypotheses are most likely, and you want to incorporate these priors.
So you have from basic probability Pr(H|D) = Pr(D|H)*Pr(H)/Pr(D). You can estimate these Pr(H|D) numerically by creating a series of discrete probabilities Hi for each hypothesis you wish to test, eg [0.0,0.05, 0.1 ... 0.95, 1.0], and then determining your prior Pr(H) for each Hi -- above it is assumed you have a normal distribution of priors, and if that is acceptable you could use the mean and stdev to get each Pr(Hi) -- or use another distribution if you prefer. With coin tosses the Pr(D|H) is of course determined by the binomial using the observed number of successes with n trials and the particular Hi being tested. The denominator Pr(D) may seem daunting but we assume that we have covered all the bases with our hypotheses, so that Pr(D) is the summation of Pr(D|Hi)Pr(H) over all H.
Very simple if you think about it a bit, and maybe not so if you think about it a bit more.

Resources