Standard approximation scheme of delay equations? - delay

I would like to program an approximation method for delay differential equations, somewhat of a niche topic but I'd like to try it. However, the standard Euler's or other Runge-Kutta methods don't necessarily conform to this. How can I accurately approximate solutions to delay equations?

For an ODE, the data to pass to the solver are the ODE function f(t,y), the initial point y(t_0)=y_0 and the end t_f of the integration interval.
For a DDE solver, the additional data needed to drive the evaluation of the DDE are the delays td[0..s], and the history function h(t) which also assumes the role of the initial values. The DDE "right-side" function f(t,y,yd) itself takes as inputs the current state y and the state vectors yd[i]=y(t-td[i]) at the delayed times.
To implement the solver you can take any method where you have an interpolation procedure, aka "dense output", of the same order as the method. This procedure or interpolator object contains the data from the previous integration steps and defaults to the history function h(t) for times before the start of the integration interval. Then during the solver stages this interpolation is used to compute the yd values, and after each integration step the interpolation data is updated to contain the new data.
Everything else proceeds as usual, the only restriction is that the time steps have to be smaller than the minimal delay so that all delayed states are inside the historical data of the interpolator.

Related

Fit an exponential function to time-series data

I've to fit the following exponential function to a time-series data (data).
$C(t)$ = $C_{\infty} (1-\exp(-\frac{t}{\tau}))$
I want to compute the time scale $\tau$ at which C(t) reaches $C_{\infty}$. I would like to ask for suggestions on how $\tau$ can be computed. I found an example here that use curve fitting. But I am not sure how to use curve_fit library in scipy to set up the problem described above.
One cannot expect a good fitting along the whole curve with the function that you choose.
This is because especially at t=0 this function returns C=0 while the data value is C=2.5 .This is very far considering the order of magnitude.
Nevertheless on can try to fit this function for a rough result. A non-linear regression calculus is necessary : this is the usual approach using available softwares. This is the recommended method in context of academic exercices.
Alternatively and more simply, a linear regression can be used thanks to a non-conventional method explained in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales .
The result is shown below.
For a better fitting one have to take account of the almost constant value of data in the neighborhood of t=0. Choosing a function made of two logistic functions would be recommended. But the calculus is more complicated.
IN ADDITION, AFTER THE OP CHANGES THE DATA :
The change of data makes out of date the above answer.
In fact artificially changing the origin of the y-scale so that y=0 at t=0 changes nothing. The slope at t=0 of the chosen fonction is far to be nul, while the slope of the data curve is almost 0. This remains incompatible.
Definitively the chosen function y=C*(1-exp(-t/tau)) cannot fit correctly the data (the preceeding data or the new data as well).
As already pointed out, for a better fitting one have to take account of the almost constant value of data in the neighborhood of t=0. Choosing a function made of two logistic functions would be recommended. But the calculus is more complicated.

Maximum log-likelihood from data histogram not data directly

I have a complicated theoretical Probability Density Function (PDF) that I define in mathematica and that depends on some parameters that I need to estimate from comparison with real data. From a big simulation done on a cluster and not my laptop I have acquired a lot of events (over 10^9).
The way I understand things, given that I know what the PDF is I 'just' need to sum the probability that those events appear for a given set of parameters and maximise this quantity by adjusting the parameters.
However, given the number of events I would rather work with something less computer-time consuming and work for example with something easily generated like an histogram of my data. But then how would my log-likelihood estimator work?
Thanks a lot for your answers!

Excel - How to split data into train and test sets that are equally distributed

I've got a data set (in Excel) that I'm going to import into SAS to undertake some modelling.
I've got a method for randomly splitting my excel dataset (using the =RAND() function), but is there a way (at the splitting stage) to ensure the distribution of the samples is even (other than to keep randomly splitting and testing the distribution until it becomes acceptable)?
Otherwise, if this is best performed in SAS, what is the most efficient approach for testing the sample randomness?
The dataset contains 35 variables, with a mixture of binary, continuous and categorical variables.
In SAS, you can just use proc surveyselect to do this.
proc surveyselect data=sashelp.cars out=cars_out outall samprate=0.7;
run;
data train test;
set cars_out;
if selected then output test;
else output train;
run;
If there is a particular variable[s] you want to make sure the Train and Test sets are balanced on, you can use either strata or control depending on exactly what sort of thing you're talking about. control will simply make an approximate attempt to even things by the control variables (it sorts by the control variable, then pulls every 3rd or whatever, so you get a sort of approximate balance; if you have 2+ control variables it snake-sorts, Asc. then Desc. etc. inside, but that reduces randomness).
If you use strata, it guarantees you the sample rate inside the strata - so if you did:
proc sort data=sashelp.cars out=cars;
by origin;
run;
proc surveyselect data=cars out=cars_out outall samprate=0.7;
strata origin;
run;
(and the final splitting data step is the same) then you'd get 70% of each separate origin pulled (which would end up being 70% of the total, of course).
Which you do depends on what you care about it being balanced by. The more things you do this with, the less balanced it is with everything else, so be cautious; it may be that a simple random sample is the best, especially if you have a good enough N.
If you don't have enough N, then you can use bootstrapping techniques, meaning you take a sample WITH replacement from that 70% and take maybe 100 of those samples, each with a higher N than your original. Then you do your test or whatever on each sample selected, and the variation in those results tells you how you're doing even if your N is not enough to do it in one pass.
This answer has nothing to do with Excel, but with sampling strategy.
First we must construct a criteria that the sample's measure's are "close enough" to the complete dataset.
Say we are interested in the mean and the standard deviation and that the complete population is a set of 10,000 values in column A
we calculate the mean and standard deviation of the complete dataset.
devise a "close enough" criteria for each measure
pick, say, 500 samples
calculate the measures for the sample.
if the measures are "close enough" we are done, otherwise pick another 500.
We need to be careful that the criteria are not too tight; otherwise we may loop forever.

How to interpret some syntax (n.adapt, update..) in jags?

I feel very confused with the following syntax in jags, for example,
n.iter=100,000
thin=100
n.adapt=100
update(model,1000,progress.bar = "none")
Currently I think
n.adapt=100 means you set the first 100 draws as burn-in,
n.iter=100,000 means the MCMC chain has 100,000 iterations including the burn-in,
I have checked the explanation for this question a lot of time but still not sure whether my interpretation about n.iter and n.adapt is correct and how to understand update() and thinning.
Could anyone explain to me?
This answer is based on the package rjags, which takes an n.adapt argument. First I will discuss the meanings of adaptation, burn-in, and thinning, and then I will discuss the syntax (I sense that you are well aware of the meaning of burn-in and thinning, but not of adaptation; a full explanation may make this answer more useful to future readers).
Burn-in
As you probably understand from introductions to MCMC sampling, some number of iterations from the MCMC chain must be discarded as burn-in. This is because prior to fitting the model, you don't know whether you have initialized the MCMC chain within the characteristic set, the region of reasonable posterior probability. Chains initialized outside this region take a finite (sometimes large) number of iterations to find the region and begin exploring it. MCMC samples from this period of exploration are not random draws from the posterior distribution. Therefore, it is standard to discard the first portion of each MCMC chain as "burn-in". There are several post-hoc techniques to determine how much of the chain must be discarded.
Thinning
A separate problem arises because in all but the simplest models, MCMC sampling algorithms produce chains in which successive draws are substantially autocorrelated. Thus, summarizing the posterior based on all iterations of the MCMC chain (post burn-in) may be inadvisable, as the effective posterior sample size can be much smaller than the analyst realizes (note that STAN's implementation of Hamiltonian Monte-Carlo sampling dramatically reduces this problem in some situations). Therefore, it is standard to make inference on "thinned" chains where only a fraction of the MCMC iterations are used in inference (e.g. only every fifth, tenth, or hundredth iteration, depending on the severity of the autocorrelation).
Adaptation
The MCMC samplers that JAGS uses to sample the posterior are governed by tunable parameters that affect their precise behavior. Proper tuning of these parameters can produce gains in the speed or de-correlation of the sampling. JAGS contains machinery to tune these parameters automatically, and does so as it draws posterior samples. This process is called adaptation, but it is non-Markovian; the resulting samples do not constitute a Markov chain. Therefore, burn-in must be performed separately after adaptation. It is incorrect to substitute the adaptation period for the burn-in. However, sometimes only relatively short burn-in is necessary post-adaptation.
Syntax
Let's look at a highly specific example (the code in the OP doesn't actually show where parameters like n.adapt or thin get used). We'll ask rjags to fit the model in such a way that each step will be clear.
n.chains = 3
n.adapt = 1000
n.burn = 10000
n.iter = 20000
thin = 50
my.model <- jags.model(mymodel.txt, data=X, inits=Y, n.adapt=n.adapt) # X is a list pointing JAGS to where the data are, Y is a vector or function giving initial values
update(my.model, n.burn)
my.samples <- coda.samples(my.model, params, n.iter=n.iter, thin=thin) # params is a list of parameters for which to set trace monitors (i.e. we want posterior inference on these parameters)
jags.model() builds the directed acyclic graph and then performs the adaptation phase for a number of iterations given by n.adapt.
update() performs the burn-in on each chain by running the MCMC for n.burn iterations without saving any of the posterior samples (skip this step if you want to examine the full chains and discard a burn-in period post-hoc).
coda.samples() (from the coda package) runs the each MCMC chain for the number of iterations specified by n.iter, but it does not save every iteration. Instead, it saves only ever nth iteration, where n is given by thin. Again, if you want to determine your thinning interval post-hoc, there is no need to thin at this stage. One advantage of thinning at this stage is that the coda syntax makes it simple to do so; you don't have to understand the structure of the MCMC object returned by coda.samples() and thin it yourself. The bigger advantage to thinning at this stage is realized if n.iter is very large. For example, if autocorrelation is really bad, you might run 2 million iterations and save only every thousandth (thin=1000). If you didn't thin at this stage, you (and your RAM) would need to manipulate an object with three chains of two million numbers each. But by thinning as you go, the final object only has 2 thousand numbers in each chain.

Integrating Power pdf to get energy pdf?

I'm trying to work out how to solve what seems like a simple problem, but I can't convince myself of the correct method.
I have time-series data that represents the pdf of a Power output (P), varying over time, also the cdf and quantile functions - f(P,t), F(P,t) and q(p,t). I need to find the pdf, cdf and quantile function for the Energy in a given time interval [t1,t2] from this data - say e(), E(), and qe().
Clearly energy is the integral of the power over [t1,t2], but how do I best calculate e, E and qe ?
My best guess is that since q(p,t) is a power, I should generate qe by integrating q over the time interval, and then calculate the other distributions from that.
Is it as simple as that, or do I need to get to grips with stochastic calculus ?
Additional details for clarification
The data we're getting is a time-series of 'black-box' forecasts for f(P), F(P),q(P) for each time t, where P is the instantaneous power and there will be around 100 forecasts for the interval I'd like to get the e(P) for. By 'Black-box' I mean that there will be a function I can call to evaluate f,F,q for P, but I don't know the underlying distribution.
The black-box functions are almost certainly interpolating output data from the model that produces the power forecasts, but we don't have access to that. I would guess that it won't be anything straightforward, since it comes from a chain of non-linear transformations. It's actually wind farm production forecasts: the wind speeds may be normally distributed, but multiple terrain and turbine transformations will change that.
Further clarification
(I've edited the original text to remove confusing variable names in the energy distribution functions.)
The forecasts will be provided as follows:
The interval [t1,t2] that we need e, E and qe for is sub-divided into 100 (say) sub-intervals k=1...100. For each k we are given a distinct f(P), call them f_k(P). We need to calculate the energy distributions for the interval from this set of f_k(P).
Thanks for the clarification. From what I can tell, you don't have enough information to solve this problem properly. Specifically, you need to have some estimate of the dependence of power from one time step to the next. The longer the time step, the less the dependence; if the steps are long enough, power might be approximately independent from one step to the next, which would be good news because that would simplify the analysis quite a bit. So, how long are the time steps? An hour? A minute? A day?
If the time steps are long enough to be independent, the distribution of energy is the distribution of 100 variables, which will be very nearly normally distributed by the central limit theorem. It's easy to work out the mean and variance of the total energy in this case.
Otherwise, the distribution will be some more complicated result. My guess is that the variance as estimated by the independent-steps approach will be too big -- the actual variance would be somewhat less, I believe.
From what you say, you don't have any information about temporal dependence. Maybe you can find or derive from some other source or sources an estimate the autocorrelation function -- I wouldn't be surprised if that question has already been studied for wind power. I also wouldn't be surprised if a general version of this problem has already been studied -- perhaps you can search for something like "distribution of a sum of autocorrelated variables." You might get some interest in that question on stats.stackexchange.com.

Resources