What is the default tolerance for ETRS89 in geospatial operations if the user does not specify any? - geospatial

In Marklogic I'm using cts.geospatialRegionQuery to search for documents that contain (an indexed) geometry that has an intersection with the geometry I search with.
The geospatial region index uses etrs89/double as coordinate system. All geometries in the data have 9 decimal places.
According to the Marklogic Geospatial search applications documentation:
[...] geospatial queries against single precision indexes are accurate to within 1 meter for geodetic coordinate systems.
I would, therefore, expect that my queries would have sub-meter accuracy. However, I get search results from cts.geospatialRegionQuery containing geometries up to ~5 meters away from my search geometry. As far as I can see the only reason for this could be the tolerance option that I'm not specifying yet and is therefore using the default.
The documentation mentions that
If you do not explicitly set tolerance, MarkLogic uses the default tolerance appropriate for the coordinate system.
To ensure accuracy, MarkLogic enforces a minimum tolerance for each coordinate system.
This brings us to the actual question:
What is the default (and minimum) tolerance for the etrs89 coordinate system in Marklogic?
EDIT:
Looked further into the issue with help from Marklogic Support and found the cause of the low accuracy of my geospatial queries.
Before using cts.geospatialRegionQuery I parsed the search geometry with geo.parseWkt. This function does not allow you to explicitly set the coordinate system to be used and therefore uses the coordinate system set in the AppServer settings. By default this is single precision wgs84. This lead to a loss of 2-3 digits on my search geometry.
After setting the coordinate system to etrs89/double in the AppServer settings, geo.parseWkt didn't reduce the precision of the search geometry anymore and my geospatial queries had the expected 5 mm accuracy.

The default tolerance for WGS84 and ETRS89 coordinate systems is 0.5cm for double precision and 5 meters for single precision.

Closing the loop on this issue using feedback provided by MarkLogic support:
When setting up the query geo.ParseWkt was used to create the POINT and as this function does not take a coordinate system or precision as options the result was being truncated to 8 significant digits by default. In the Latitude they were working this reduced precision from 0.5cm to 5m leading to the observed results.
geo.parseWkt("POINT(5.176605744 52.045696539)");
Results in:
POINT(5.1766057 52.045697)
When using JavaScript the solution is to set the correct coordinate system in the AppServer, see https://docs.marklogic.com/guide/search-dev/geospatial#id_77035 and following example (written in XQuery):
xquery version "1.0-ml";
import module namespace admin = "http://marklogic.com/xdmp/admin" at "/MarkLogic/admin.xqy";
let $config := admin:get-configuration()
let $groupid := admin:group-get-id($config, "Default")
return admin:save-configuration(
admin:appserver-set-coordinate-system(
$config,
admin:appserver-get-id($config, $groupid, "App-Services"),
"etrs89/double")
Once this was done the POINT created using geo.ParseWkt had the correct level of precision.
With XQuery you can declare the coordinate system directly in the query:
declare option xdmp:coordinate-system "etrs89/double";

Related

Difference between distance() and geo_distance() in arangodb

What is the difference between the arango function - DISTANCE() and GE0_DISTANCE(). I know both of them calculates distance using haversines formula.
Thanks,
Nilotpal
Both are used for two different purposes
DISTANCE(latitude1, longitude1, latitude2, longitude2) → distance
The value is computed using the haversine formula, which is based on a spherical Earth model. It’s fast to compute and is accurate to around 0.3%, which is sufficient for most use cases such as location-aware services.
GEO_DISTANCE(geoJsonA, geoJsonB, ellipsoid) → distance
Return the distance between two GeoJSON objects, measured from the centroid of each shape. For a list of supported types see the geo index page. (Ref: https://www.arangodb.com/docs/3.8/aql/functions-geo.html#geo-index-functions)
This GeoJSON objects can be anything like GEO_LINESTRING, GEO_MULTILINESTRING, GEO_MULTIPOINT, GEO_POINT, GEO_POLYGON and GEO_MULTIPOLYGON - Reference<2>
Reference:
https://www.arangodb.com/docs/3.8/aql/functions-geo.html#geo-utility-functions
https://www.arangodb.com/docs/3.8/aql/functions-geo.html#geojson-constructors

fmi2: What is the unit of input parameter tolerance in the API "fmi2SetupExperiment"

I am implementing the slave for fmi 2.0. For the API
fmi2SetupExperiment(fmi2Component c,
fmi2Boolean toleranceDefined,
fmi2Real tolerance,
fmi2Real startTime,
fmi2Boolean stopTimeDefined,
fmi2Real stopTime)
I understand that the tolerance parameter is used for the error estimation during the simulation.
I would like to know the unit/ value form of the tolerance parameter, for example if the tolerance is 5%, what would be the value of tolerance?
Will it be 5 or 1.05 or some other form?
The FMI 2.0 standard talks of "relative tolerance" on page 22.
This is not rigorously defined there, but correspond to relative tolerance values that are passed to a numerical solver.
Many FMI importing tools, e.g., use the Sundials solvers.
The relative tolerances are explained there: https://computation.llnl.gov/projects/sundials/faq#cvode_tols .
So in your example I would expect 0.05 to be the right value.
The FMI Specification 2.0 states that usually a relative tolerance is used which does not have a unit (% is not a unit, it merely stands for × 10^-2).
So most likely, to pass a value of 5% as tolerance, you will have to pass 0.05 as tolerance.
The following is quoted from the FMI Specification 2.0:
Arguments toleranceDefined and tolerance depend on the FMU type:
fmuType = fmi2ModelExchange:
If toleranceDefined = fmi2True then the model is called with a numerical
integration scheme where the step size is controlled by using tolerance for error estimation (usually as relative tolerance).
In such a case, all numerical algorithms used inside the model (for example to solve non-linear algebraic equations) should also operate with an error estimation of an appropriate smaller relative tolerance.
fmuType = fmi2CoSimulation:
If toleranceDefined = fmi2True then the communication interval of the slave is controlled by error estimation.
In case the slave utilizes a numerical integrator with variable step size and error estimation, it is suggested to use tolerance for the error estimation of the internal integrator (usually as relative tolerance).
An FMU for Co-Simulation might ignore this argument.
If you want to know exactly how this parameter is implemented, you have to ask the creator of your FMU - or look inside yourself if you can.
If you cannot look inside your FMU and the creator cannot tell you what it does internally, just change the value and compare the results and the run time.

Mariadb: geography

I need to check if the distance between two geographic point is less then N km. I'm trying to execute this query:
select st_distance(
ST_GeomFromText('point(45.764043 4.835658999999964)', 4326),
ST_GeomFromText('point(45.750371 5.053963)', 4326)
) < :n
But it doesn't work because:
So far the SRID property is just a dummy in MySQL, it is stored as part of a geometries meta data but all actual calculations ignore it and calculations are done assuming Euclidean (planar) geometry.
(https://mariadb.com/kb/en/mariadb/st_transform-missing/)
My goal is to convert this distance to the metric distance or to convert the N to the degrees.
How I can do it?
Maybe, you know a better solution?
P.S. I need a solution based on the spatial methods (or more better for the performance).
I don't think the "distance" function is available (yet) in SPATIAL. There is a regular FUNCTION in https://mariadb.com/kb/en/latitudelongitude-indexing/ that does the work. However, the args and output are scaled lat/lng (10000*degrees). The code could be altered to avoid the scaling, but it is needed in the context of that blog page.

Text Documents Clustering - Non Uniform Clusters

I have been trying to cluster a set of text documents. I have a sparse TFIDF matrix with around 10k documents (subset of a large dataset), and I try to run the scikit-learn k-means algorithm with different sizes of clusters (10,50,100). Rest all the parameters are default values.
I get a very strange behavior that no matter how many clusters I specify or even if I change the number of iterations, there would be 1 cluster in the lot which would contain most of the documents in itself and there will be many clusters which would have just 1 document in them. This is highly non-uniform behavior
Does anyone know what kind of problem am I running into?
Here are the possible things that might be going "wrong":
Your k-means cluster initialization points are chosen as the same set of points in each run. I recommend using the 'random' for the init parameter of k-means http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html. If that doesn't work then supply to k-means your own set of random initial cluster centers. Remember to initialize your random generator using its seed() method as the current date and time. https://docs.python.org/2/library/random.html uses current date-time as the default value.
Your distance function, i.e. euclidean distance might be the culprit. This is less likely but it is always good to run k-means using cosine similarity especially when you are using it for document similarity. scikits doesn't have this functionality at present but you should look here: Is it possible to specify your own distance function using scikit-learn K-Means Clustering?
These two combined should give you good clusters.
I noticed with the help of above answers and comments that there was a problem with outliers and noise in original space. For this, we should use a dimensionality reduction method which eliminates the unwanted noise in the data. I tried random projections first but it failed to work with text data, simply because the problem was still not solved.
Then using Truncated Singular Value Decomposition, I was able to get perfect uniform clusters. Hence, the Truncated SVD is the way to go with textual data in my opinion.

Getting data based on location and a specified radius

Scenario: I have a large dataset, with each entry containing a location (x,y - coordinates).
I want to be able to request every entry from this dataset that is within 100m within this dataset and have it returned as an array.
How does one go about implementing something like this? Are there any patterns or framework that recommended? I've previously only worked with relational or simple key-value type data.
The data structure that solves this problem efficiently is a k-d tree. There are many implementations available, including a node.js module.
Put your data set into PostgreSQL and use an R-Tree index. You can then do a bounding box query to get all points with +-100 miles of any locations. Then calculate the radial distance and accept points within 100 miles. You can roll your own schema and queries or use PostGIS.
Unlike R-Trees KD-trees are not inherently balanced. So depending on how a KD-Tree is built you can get inconsistent performance due to unbalanced trees and the longest path.

Resources