Document DB Location Query - geospatial

My data looks something like this:
{
"id": "a06b42cf-d130-459a-8c89-dab77966747c",
"propertyBag": {
"Fixed": {
"address": {
"locationName": "",
"addressLine1": "1 Microsoft Way",
"addressLine2": null,
"city": "Redmond",
"postalCode": "98052",
"subDivision": null,
"state": "WA",
"country": "USA",
"location": {
"type": "Point",
"coordinates": [
47.640049,
-122.129797
]
}
},
}
}
}
Now when I try to query something like this
SELECT * FROM V v
WHERE ST_DISTANCE(v.propertyBag.Fixed.address.location, {
"type": "Point",
"coordinates": [47.36, -122.19]
}) < 100 * 1000
The results are always empty. I was wondering if someone can please let me know what maybe wrong?

I suspect that you just have the logitude and latitude transposed. Because if I change the document to:
"location": {
"type": "Point",
"coordinates": [-122.129797, 47.640049]
}
And I run this query:
SELECT
ST_DISTANCE(v.propertyBag.Fixed.address.location, {
"type": "Point",
"coordinates": [-122.19, 47.36]
})
FROM v
I get a result, but if I run it the way you show, I get no results.
In GeoJSON, points are specified with [longitude, latitude] to make it match with our normal expectations of x being east-west, and y being north-south. Unfortunately, this is the opposite of the traditional way of showing GEO coordinates.
-122 is not a valid value for latitude. The range for latitude is -90 to +90. Longitude is specified -180 to +180.
If your database is already populated and you don't feel like migrating it, then you could use a user defined function (UDF) to fix it during the query but I would strongly recommend doing the migration over this approach because geo-spacial indexes won't work as you have it now and your queries will be much slower as a result.
Again, I don't recommend this unless a GEO index is not important, but here is a swapXY UDF that will do the swap:
function(point) {
return {
type: "Point",
coordinates: [point.coordinates[1], point.coordinates[0]]
};
}
You use it in a query like this:
SELECT * FROM v
WHERE
ST_DISTANCE(
udf.swapXY(v.propertyBag.Fixed.address.location),
udf.swapXY({
"type": "Point",
"coordinates": [47.36, -122.19]
})
) < 100 * 1000

Related

Cosmos Db: How to query for the maximum value of a property in an array of arrays?

I'm not sure how to query when using CosmosDb as I'm used to SQL. My question is about how to get the maximum value of a property in an array of arrays. I've been trying subqueries so far but apparently I don't understand very well how they work.
In an structure such as the one below, how do I query the city with more population among all states using the Data Explorer in Azure:
{
"id": 1,
"states": [
{
"name": "New York",
"cities": [
{
"name": "New York",
"population": 8500000
},
{
"name": "Hempstead",
"population": 750000
},
{
"name": "Brookhaven",
"population": 500000
}
]
},
{
"name": "California",
"cities":[
{
"name": "Los Angeles",
"population": 4000000
},
{
"name": "San Diego",
"population": 1400000
},
{
"name": "San Jose",
"population": 1000000
}
]
}
]
}
This is currently not possible as far as I know.
It would look a bit like this:
SELECT TOP 1 state.name as stateName, city.name as cityName, city.population FROM c
join state in c.states
join city in state.cities
--order by city.population desc <-- this does not work in this case
You could write a user defined function that will allow you to write the query you probably expect, similar to this: CosmosDB sort results by a value into an array
The result could look like:
SELECT c.name, udf.OnlyMaxPop(c.states) FROM c
function OnlyMaxPop(states){
function compareStates(stateA,stateB){
stateB.cities[0].poplulation - stateA.cities[0].population;
}
onlywithOneCity = states.map(s => {
maxpop = Math.max.apply(Math, s.cities.map(o => o.population));
return {
name: s.name,
cities: s.cities.filter(x => x.population === maxpop)
}
});
return onlywithOneCity.sort(compareStates)[0];
}
You would probably need to adapt the function to your exact query needs, but I am not certain what your desired result would look like.

MongoDB / Geojson $geointersects problems

It's been now 8 hours I'm trying to deal with a MongoDB Geojson $geoIntersects issue :
It's working very well when my polygon is a square, or a rectangle,
but I'm unable to get some results from a $geoIntersects request when my Polygon has crossing vertices, like this example :
(Image from (https://geoman.io/geojson-editor))
The polygon data looks like this :
{
"type": "Feature",
"properties": {
"shape": "Polygon"
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[-1.565584, 47.226352],
[-1.564704, 47.226927],
[-1.564093, 47.225616],
[-1.563138, 47.226374],
[-1.565584, 47.226352]
]
]
},
"id": "dda54a42-090b-46ea-9dd0-fdda6d240f90"
}
For this example, I need to know if the Polygon includes my Point coordinates.
This is my simple query :
db.geojsondatas.find({
geometry: {
$geoIntersects: {
$geometry: {
type: "Point",
coordinates: [ -1.555638, 47.216245 ]
}
}
}
});
Anyone knows if there is a way to do this ?
Thanks by advance.
Maybe try with JTS Topology Suite and investigate your polygon:
I don't think $geoIntersects supports Self-intersecting polygon.

Azure Cosmos DB geospatial lookup consuming too high RU

I have a single Azure Cosmos DB collection I am querying against, hoping to use Geo-spatial index for efficient queries. The problem I'm encountering is that the RU consumption seems inefficient.
The collection has only 50k 1kb documents in it, but a query using ST_DISTANCE returning a single document consumes >900 RUs.
I've seen the RUs scale linearly based on the # of documents in the collection. It would seem indexing should prevent this behavior.
Example Query (950 RUs):
SELECT * FROM c where ST_DISTANCE(c.location, { 'type': 'Point', 'coordinates': [34.69, -1.91] }) < 500
Example document:
[
{
"id": "1504891036",
"name": "Oujda",
"location": {
"type": "Point",
"coordinates": [
34.69,
-1.91
]
},
"population": 409391,
"country": "Morocco",
"country.iso2": "MA",
"country.iso3": "MAR",
}
]
I've not modified the default indexing policy, which seems to cover spatial indexing:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Spatial",
"dataType": "Point"
}
]
}
],
"excludedPaths": []
}
I determined the problem. I had transposed the longitude and the latitude coordinate prescribed by GeoJSON:
Cosmos is expecting:
"location": {
"type": "Point",
"coordinates": [
<#lon>,
<#lat>
]
I had assumed, incorrectly, that it was lat/lon. Therefore many of my latitudes were outside of the 90/-90 range required, since longitude can be 180/-180. After re-creating my ~50k documents, RU for coordinate based lookups are consistently <10 RUs.
Before fix (all docs have transposed lat/lon coordinates, many outside the 90/-90 bounds and therefore invalid):
SELECT * FROM c where ST_DISTANCE(c.location, { 'type': 'Point', 'coordinates': [34.69, -1.91] }) < 500
940 RUs, 1 document returned
After fix (all docs re-created with lat/lon set correctly per GeoJSON specs):
SELECT * FROM c where ST_DISTANCE(c.location, { 'type': 'Point', 'coordinates': [-1.91,34.69] }) < 500
6 RUs, 1 document returned
Initial issue was confirmed/diagnosed by the following query:
SELECT ST_ISVALIDDETAILED(c.location) FROM c where c.name = "Kansas City"
Error: "Latitude values must be between -90 and 90 degrees."

Date Between Query in Cosmos DB

I am in the building a simple event store in Cosmos DB that has documents that are structured something like this:
{
"id": "e4c2bbd0-2885-4fb5-bcca-90436f79f155",
"entityType": "contact",
"history": [
{
"startDate": 1504656000,
"endDate": 1504656000,
"Name": "John"
},
{
"startDate": 1504828800,
"endDate": 1504828800,
"Name": "Jon"
}
]
}
This might not bet the most efficient way to store it but this is what I am starting with. But I want to be able to query all contact documents out of the db for a certain period of time. The startDate and endDate represent the time the record was valid. The history currently contains the entire history of the record which probably could be improved.
I have tried creating a query like this:
SELECT c.entityType, c.id,history.Name, history.startDate FROM c
JOIN history in c.history
where
c.entityType = "contact" AND
(history.StartDate <= 1504656001
AND history.EndDate >= 1504656001)
This query should return the state of the contact for 9/7/2017 but instead it is returning every one of the history. I have played with several options but I am not sure what I am missing.
I have also tried setting the index (maybe that is the issue?) So I have included the indexing policy here:
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
{
"kind": "Range",
"dataType": "Number",
"precision": -1
}
]
}
],
"excludedPaths": []
}
What am I missing? Is the index correct? Is my query correct for a date between query?
You have two issues. One is addressed by Matias in comment.
Second, your condition is history.StartDate <= 1504656001 AND history.EndDate >= 1504656001.
play with the range for e.g. history.StartDate >= 1504656001 AND history.EndDate <= 1504656111.

How to search through data with arbitrary amount of fields?

I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.
Created form is used for:
register a new member to event;
search through registered members.
What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?
I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.
The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/
In short, you will need to do the following steps to get what you want:
Create a special index described in the post.
Flatten the data you want to index using the flattenData function:
https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4.
Create a document with the original and flattened data and index it into Elasticsearch:
{
"data": { ... },
"flatData": [ ... ]
}
Optional: use Elasticsearch aggregations to find which fields and types have been indexed.
Execute queries on the flatData object to find what you need.
Example
Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:
name string
age long
sex long - 0 for male, 1 for female
In addition to this data, the related event probably has some sort of id, let's call it eventId. So the final document could look like this:
{
"eventId": "2T73ZT1R463DJNWE36IA8FEN",
"name": "Bob",
"age": 22,
"sex": 0
}
Now, before we index this document, we will flatten it using the flattenData function:
flattenData(document);
This will produce the following array:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "2T73ZT1R463DJNWE36IA8FEN"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Bob"
},
{
"key": "age",
"type": "long",
"key_type": "age.long",
"value_long": 22
},
{
"key": "sex",
"type": "long",
"key_type": "sex.long",
"value_long": 0
}
]
Then we will wrap this data in a document as I've showed before and index it.
Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:
name string
city string
sex string - "male" or "female"
This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".
Let's try to flatten the data submitted by this form:
flattenData({
"eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
"name": "Alice",
"city": "New York",
"sex": "female"
});
This will produce the following data:
[
{
"key": "eventId",
"type": "string",
"key_type": "eventId.string",
"value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
},
{
"key": "name",
"type": "string",
"key_type": "name.string",
"value_string": "Alice"
},
{
"key": "city",
"type": "string",
"key_type": "city.string",
"value_string": "New York"
},
{
"key": "sex",
"type": "string",
"key_type": "sex.string",
"value_string": "female"
}
]
Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.
For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "eventId"}},
{"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
]
}
}
}
},
{
"nested": {
"path": "flatData",
"query": {
"bool": {
"must": [
{"term": {"flatData.key": "name"}},
{"match": {"flatData.value_string": "bob"}}
]
}
}
}
}
]
}
}
}
ElasticSearch automatically detects the field content in order to index it correctly, even if the mapping hasn't been defined previously. So, yes : ElasticSearch suits well these cases.
However, you may want to fine tune this behavior, or maybe the default mapping applied by ElasticSearch doesn't correspond to what you need : in this case, take a look at the default mapping or, for even further control, the dynamic templates feature.
If you let your end users decide the keys you store things in, you'll have an ever-growing mapping and cluster state, which is problematic.
This case and a suggested solution is covered in this article on common problems with Elasticsearch.
Essentially, you want to have everything that can possibly be user-defined as a value. Using nested documents, you can have a key-field and differently mapped value fields to achieve pretty much the same.

Resources