fmi2: What is the unit of input parameter tolerance in the API "fmi2SetupExperiment" - fmi

I am implementing the slave for fmi 2.0. For the API
fmi2SetupExperiment(fmi2Component c,
fmi2Boolean toleranceDefined,
fmi2Real tolerance,
fmi2Real startTime,
fmi2Boolean stopTimeDefined,
fmi2Real stopTime)
I understand that the tolerance parameter is used for the error estimation during the simulation.
I would like to know the unit/ value form of the tolerance parameter, for example if the tolerance is 5%, what would be the value of tolerance?
Will it be 5 or 1.05 or some other form?

The FMI 2.0 standard talks of "relative tolerance" on page 22.
This is not rigorously defined there, but correspond to relative tolerance values that are passed to a numerical solver.
Many FMI importing tools, e.g., use the Sundials solvers.
The relative tolerances are explained there: https://computation.llnl.gov/projects/sundials/faq#cvode_tols .
So in your example I would expect 0.05 to be the right value.

The FMI Specification 2.0 states that usually a relative tolerance is used which does not have a unit (% is not a unit, it merely stands for × 10^-2).
So most likely, to pass a value of 5% as tolerance, you will have to pass 0.05 as tolerance.
The following is quoted from the FMI Specification 2.0:
Arguments toleranceDefined and tolerance depend on the FMU type:
fmuType = fmi2ModelExchange:
If toleranceDefined = fmi2True then the model is called with a numerical
integration scheme where the step size is controlled by using tolerance for error estimation (usually as relative tolerance).
In such a case, all numerical algorithms used inside the model (for example to solve non-linear algebraic equations) should also operate with an error estimation of an appropriate smaller relative tolerance.
fmuType = fmi2CoSimulation:
If toleranceDefined = fmi2True then the communication interval of the slave is controlled by error estimation.
In case the slave utilizes a numerical integrator with variable step size and error estimation, it is suggested to use tolerance for the error estimation of the internal integrator (usually as relative tolerance).
An FMU for Co-Simulation might ignore this argument.
If you want to know exactly how this parameter is implemented, you have to ask the creator of your FMU - or look inside yourself if you can.
If you cannot look inside your FMU and the creator cannot tell you what it does internally, just change the value and compare the results and the run time.

Related

Why is Standard Deviation the square of difference of an obsevation from the mean?

I am learning statistics, and have some basic yet core questions on SD:
s = sample size
n = total number of observations
xi = ith observation
μ = arithmetic mean of all observations
σ = the usual definition of SD, i.e. ((1/(n-1))*sum([(xi-μ)**2 for xi in s])**(1/2) in Python lingo
f = frequency of an observation value
I do understand that (1/n)*sum([xi-μ for xi in s]) would be useless (= 0), but would not (1/n)*sum([abs(xi-μ) for xi in s]) have been a measure of variation?
Why stop at power of 1 or 2? Would ((1/(n-1))*sum([abs((xi-μ)**3) for xi in s])**(1/3) or ((1/(n-1))*sum([(xi-μ)**4 for xi in s])**(1/4) and so on have made any sense?
My notion of squaring is that it 'amplifies' the measure of variation from the arithmetic mean while the simple absolute difference is somewhat a linear scale notionally. Would it not amplify it even more if I cubed it (and made absolute value of course) or quad it?
I do agree computationally cubes and quads would have been more expensive. But with the same argument, the absolute values would have been less expensive... So why squares?
Why is the Normal Distribution like it is, i.e. f = (1/(σ*math.sqrt(2*pi)))*e**((-1/2)*((xi-μ)/σ))?
What impact would it have on the normal distribution formula above if I calculated SD as described in (1) and (2) above?
Is it only a matter of our 'getting used to the squares', it could well have been linear, cubed or quad, and we would have trained our minds likewise?
(I may not have been 100% accurate in my number of opening and closing brackets above, but you will get the idea.)
So, if you are looking for an index of dispersion, you actually don't have to use the standard deviation. You can indeed report mean absolute deviation, the summary statistic you suggested. You merely need to be aware of how each summary statistic behaves, for example the SD assigns more weight to outlying variables. You should also consider how each one can be interpreted. For example, with a normal distribution, we know how much of the distribution lies between ±2SD from the mean. For some discussion of mean absolute deviation (and other measures of average absolute deviation, such as the median average deviation) and their uses see here.
Beyond its use as a measure of spread though, SD is related to variance and this is related to some of the other reasons it's popular, because the variance has some nice mathematical properties. A mathematician or statistician would be able to provide a more informed answer here, but squared difference is a smooth function and is differentiable everywhere, allowing one to analytically identify a minimum, which helps when fitting functions to data using least squares estimation. For more detail and for a comparison with least absolute deviations see here. Another major area where variance shines is that it can be easily decomposed and summed, which is useful for example in ANOVA and regression models generally. See here for a discussion.
As to your questions about raising to higher powers, they actually do have uses in statistics! In general, the mean (which is related to average absolute mean), the variance (related to standard deviation), skewness (related to the third power) and kurtosis (related to the fourth power) are all related to the moments of a distribution. Taking differences raised to those powers and standardizing them provides useful information about the shape of a distribution. The video I linked provides some easy intuition.
For some other answers and a larger discussion of why SD is so popular, See here.
Regarding the relationship of sigma and the normal distribution, sigma is simply a parameter that stretches the standard normal distribution, just like the mean changes its location. This is simply a result of the way the standard normal distribution (a normal distribution with mean=0 and SD=variance=1) is mathematically defined, and note that all normal distributions can be derived from the standard normal distribution. This answer illustrates this. Now, you can parameterize a normal distribution in other ways as well, but I believe you do need to provide sigma, whether using the SD or precisions. I don't think you can even parametrize a normal distribution using just the mean and the mean absolute difference. Now, a deeper question is why normal distributions are so incredibly useful in representing widely different phenomena and crop up everywhere. I think this is related to the Central Limit Theorem, but I do not understand the proofs of the theorem well enough to comment further.

What is the default tolerance for ETRS89 in geospatial operations if the user does not specify any?

In Marklogic I'm using cts.geospatialRegionQuery to search for documents that contain (an indexed) geometry that has an intersection with the geometry I search with.
The geospatial region index uses etrs89/double as coordinate system. All geometries in the data have 9 decimal places.
According to the Marklogic Geospatial search applications documentation:
[...] geospatial queries against single precision indexes are accurate to within 1 meter for geodetic coordinate systems.
I would, therefore, expect that my queries would have sub-meter accuracy. However, I get search results from cts.geospatialRegionQuery containing geometries up to ~5 meters away from my search geometry. As far as I can see the only reason for this could be the tolerance option that I'm not specifying yet and is therefore using the default.
The documentation mentions that
If you do not explicitly set tolerance, MarkLogic uses the default tolerance appropriate for the coordinate system.
To ensure accuracy, MarkLogic enforces a minimum tolerance for each coordinate system.
This brings us to the actual question:
What is the default (and minimum) tolerance for the etrs89 coordinate system in Marklogic?
EDIT:
Looked further into the issue with help from Marklogic Support and found the cause of the low accuracy of my geospatial queries.
Before using cts.geospatialRegionQuery I parsed the search geometry with geo.parseWkt. This function does not allow you to explicitly set the coordinate system to be used and therefore uses the coordinate system set in the AppServer settings. By default this is single precision wgs84. This lead to a loss of 2-3 digits on my search geometry.
After setting the coordinate system to etrs89/double in the AppServer settings, geo.parseWkt didn't reduce the precision of the search geometry anymore and my geospatial queries had the expected 5 mm accuracy.
The default tolerance for WGS84 and ETRS89 coordinate systems is 0.5cm for double precision and 5 meters for single precision.
Closing the loop on this issue using feedback provided by MarkLogic support:
When setting up the query geo.ParseWkt was used to create the POINT and as this function does not take a coordinate system or precision as options the result was being truncated to 8 significant digits by default. In the Latitude they were working this reduced precision from 0.5cm to 5m leading to the observed results.
geo.parseWkt("POINT(5.176605744 52.045696539)");
Results in:
POINT(5.1766057 52.045697)
When using JavaScript the solution is to set the correct coordinate system in the AppServer, see https://docs.marklogic.com/guide/search-dev/geospatial#id_77035 and following example (written in XQuery):
xquery version "1.0-ml";
import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";
let $config := admin:get-configuration()
let $groupid := admin:group-get-id($config, "Default")
return admin:save-configuration(
admin:appserver-set-coordinate-system(
$config,
admin:appserver-get-id($config, $groupid, "App-Services"),
"etrs89/double")
Once this was done the POINT created using geo.ParseWkt had the correct level of precision.
With XQuery you can declare the coordinate system directly in the query:
declare option xdmp:coordinate-system "etrs89/double";

Black-box combinatorial optimization over permutations

I am solving general black-box optimization problems like:
x*: f(x) -> min, where x are permutations of length N (N = 50 for example, so brute force search is not possible). Objective function f(x) is represented by stand-alone computer code and x represents configuration of complex system with the response simulated by f(x).
I learned, that in this case I can use many heuristic methods. But, most of these methods use always some kind of local search, which require suitable distance metric at search space (space of permutations x in my case). Under suitable distance metric I mean the metric which fulfill the "locality" property, e.g. small change of permutation x produce small change of objective function f(x). In my case is not known any suitable distance metric with this property, so any kind of local search is nearly the random search.
I have a few questions:
Are there available any heuristic black-box combinatorial optimization methods, which does not use local search and/or any distance metric at search space? I need to overcome the low "locality" of the problem or simply the fact, that any suitable distance metric at search space is unknown.
Is the "locality" property really so restricted at combinatorial optimization in general? May be I miss something..., but the most of real-world black-box combinatorial problem has low or very low "locality" due to the fact, that the common permutation distance metrics (Hamming, Kendal, etc.) are not suitable metrics in general.
Is there any general method how to find suitable distance metric at search space to satisfy at least approximately the "locality"?
Additional remarks:
In real, the black-box function f(x) is realized by stand-alone deterministic simulation code, where x plays a role of discrete configuration of the simulated physical system. So, function f(x) has definitely well defined properties, but this properties are so difficult, that is not possible to simple exploit it.
Because of above mentioned complicated internal properties of function f(x) is not possible to find proper distance metric d(x,x') in search space which fulfill "locality" (similar x and x' in a sense of any distance metric produce similar responses f(x) and f(x'))
So, finally, I am looking for any optimization heuristics, which are able to find any suitable sub-optimal solutions only by informations available by properties of f(x) at fitness space. Like EDA's (Estimation of Distribution Algorithms) for example.
The main reason of this question is, what types of optimization heuristics are suitable to solve this kind of problems.

Bayesian t-test assumptions

Good afternoon,
I know that the traditional independent t-test assumes homoscedasticity (i.e., equal variances across groups) and normality of the residuals.
They are usually checked by using levene's test for homogeneity of variances, and the shapiro-wilk test and qqplots for the normality assumption.
Which statistical assumptions do I have to check with the bayesian independent t test? How may I check them in R with coda and rjags?
For whichever test you want to run, find the formula and plug in using the posterior draws of the parameters you have, such as the variance parameter and any regression coefficients that the formula requires. Iterating the formula over the posterior draws will give you a range of values for the test statistic from which you can take the mean to get an average value and the sd to get a standard deviation (uncertainty estimate).
And boom, you're done.
There might be non-parametric Bayesian t-tests. But commonly, Bayesian t-tests are parametric, and as such they assume equality of relevant population variances. If you could obtain a t-value from a t-test (just a regular t-test for your type of t-test from any software package you're comfortable with), use levene's test (do not think this in any way is a dependable test, remember it uses p-value), then you can do a Bayesian t-test. But remember the point that the Bayesian t-test, requires a conventional modeling of observations (Likelihood), and an appropriate prior for the parameter of interest.
It is highly recommended that t-tests be re-parameterized in terms of effect sizes (especially standardized mean difference effect sizes). That is, you focus on the Bayesian estimation of the effect size arising from the t-test not other parameter in the t-test. If you opt to estimate Effect Size from a t-test, then a very easy to use free, online Bayesian t-test software is THIS ONE HERE (probably one of the most user-friendly package available, note that this software uses a cauchy prior for the effect size arising from any type of t-test).
Finally, since you want to do a Bayesian t-test, I would suggest focusing your attention on picking an appropriate/defensible/meaningful prior rather then levenes' test. No test could really show that the sample data may have come from two populations (in your case) that have had equal variances or not unless data is plentiful. Note that the issue that sample data may have come from populations with equal variances itself is an inferential (Bayesian or non-Bayesian) question.

Create CDF for Anderson Darling test for Octave forge Statistics package function

I am using Octave and I would like to use the anderson_darling_test from the Octave forge Statistics package to test if two vectors of data are drawn from the same statistical distribution. Furthermore, the reference distribution is unlikely to be "normal". This reference distribution will be the known distribution and taken from the help for the above function " 'If you are selecting from a known distribution, convert your values into CDF values for the distribution and use "uniform'. "
My question therefore is: how would I convert my data values into CDF values for the reference distribution?
Some background information for the problem: I have a vector of raw data values from which I extract the cyclic component (this will be the reference distribution); I then wish to compare this cyclic component with the raw data itself to see if the raw data is essentially cyclic in nature. If the the null hypothesis that the two are the same can be rejected I will then know that most of the movement in the raw data is not due to cyclic influences but is due to either trend or just noise.
If your data has a specific distribution, for instance beta(3,3) then
p = betacdf(x, 3, 3)
will be uniform by the definition of a CDF. If you want to transform it to a normal, you can just call the inverse CDF function
x=norminv(p,0,1)
on the uniform p. Once transformed, use your favorite test. I'm not sure I understand your data, but you might consider using a Kolmogorov-Smirnov test instead, which is a nonparametric test of distributional equality.
Your approach is misguided in multiple ways. Several points:
The Anderson-Darling test implemented in Octave forge is a one-sample test: it requires one vector of data and a reference distribution. The distribution should be known - not come from data. While you quote the help-file correctly about using a CDF and the "uniform" option for a distribution that is not built in, you are ignoring the next sentence of the same help file:
Do not use "uniform" if the distribution parameters are estimated from the data itself, as this sharply biases the A^2 statistic toward smaller values.
So, don't do it.
Even if you found or wrote a function implementing a proper two-sample Anderson-Darling or Kolmogorov-Smirnov test, you would still be left with a couple of problems:
Your samples (the data and the cyclic part estimated from the data) are not independent, and these tests assume independence.
Given your description, I assume there is some sort of time predictor involved. So even if the distributions would coincide, that does not mean they coincide at the same time-points, because comparing distributions collapses over the time.
The distribution of cyclic trend + error would not expected to be the same as the distribution of the cyclic trend alone. Suppose the trend is sin(t). Then it never will go above 1. Now add a normally distributed random error term with standard deviation 0.1 (small, so that the trend is dominant). Obviously you could get values well above 1.
We do not have enough information to figure out the proper thing to do, and it is not really a programming question anyway. Look up time series theory - separating cyclic components is a major topic there. But many reasonable analyses will probably be based on the residuals: (observed value - predicted from cyclic component). You will still have to be careful about auto-correlation and other complexities, but at least it will be a move in the right direction.

Resources