Marklogic geojson search - geospatial

MarkLogic newbie here.
How can I search for a Lat/Long point that is within the (multi)polygon below and return the document?
I have thousands of geojson documents and within I have MultiPolygon type geometries like this:
"geometry":{
"type":"MultiPolygon",
"coordinates":[
[
[
[
116.761454004,
-20.633334001
],
[
116.762183383,
-20.633777484
],
...
[
116.761248983,
-20.6337970009999
],
[
116.761454004,
-20.633334001
]
]
]
]
}
Thanks!

You could use the geospatial region query to match regions. You need to have geospatial region index configured to do so. You can look at the query documentation here:
http://docs.marklogic.com/cts:geospatial-region-query

Related

How to perform proximity search on a nested array of polygons in mongodb?

I have a collection with a nested array of polygons stored in the below given format:
_id: ObjectId("...."),
"attributes": {
"zones": [
{
"zoneName": "...",
"zoneLocs" : [
{
"type" : "Polygon",
"coordinates" : [[...]]
}
....
]
}
....
]
}
I want to perform a geojson search on this collection, where I pass a point and a max-distance value, and pull all the documents which have a polygon in the enclosed area.
I tried to do this by using $elemMatch, $geoWithin and $center. My final query looked like this, but did not fetch any results, even though there were polygons in the enclosing area:
{
"attributes.zones": {
"$elemMatch":{
"zoneLocs": {
"$elemMatch" : {
"$geoWithin": {
"$center" : [
[
-104.84127910390623,
39.820705065855044
],
100000
]
}
}
}
}
}
}
I have created a 2dsphere index on the path 'attributes.zones.zoneLocs', but no luck so far. Any help would be greatly appreciated.

MongoError: Edge locations in degrees

I want to get point within a polygon and I am using below query for getting data
loc: {
$geoWithin: {
$geometry: {
type: 'Polygon',
coordinates: [
[
[
-117.83736,
33.73838
],
[
-117.83588,
33.73837
],
[
-117.83369,
33.73839
]...
]
]
}
}
}
But I am getting an error like
Edges 1 and 3 cross. Edge locations in degrees:
How can I solve this?
As i stated in the comments your coordinates list is an invalid polygon.
You need to validate each edge with every other edge using a simple check as described here.

Find paths from a graph and then count how many times a path occurs in Azure Cosmos DB using Gremlin

I am storing clickstream events in graph database using the below structure
User perform multiple events and each event has a edge towards previous event:
Vertices are 'user' and 'event'
Edges are 'performed' and 'previous'
Each event has a property named referer.
For eg, if a user views a page www.foobar.com/aaa
then there will be a page view event and it will have referer:www.foobar.com/aaa
Now I want to find the possible paths from homepage with their count
Using the below Gremlin query I am able to find the possible paths, but I am not able to group them to find counts of each path:
g.V().hasLabel('event').has('referer','https://www.foobar.com/').in('previous').in('previous').path().by('referer')
Output:
[
{
"labels": [
[],
[],
[]
],
"objects": [
"https://www.foobar.com/",
"https://www.foobar.com/aaa",
"https://www.foobar.com/bbb"
]
},
{
"labels": [
[],
[],
[]
],
"objects": [
"https://www.foobar.com/",
"https://www.foobar.com/aaa",
"https://www.foobar.com/bbb"
]
},
{
"labels": [
[],
[],
[]
],
"objects": [
"https://www.foobar.com/",
"https://www.foobar.com/ccc",
"https://www.foobar.com/ddd"
]
}
]
I want an output like this:
[[
"https://www.foobar.com/",
"https://www.foobar.com/aaa",
"https://www.foobar.com/bbb"
]:2,
[
"https://www.foobar.com/",
"https://www.foobar.com/ccc",
"https://www.foobar.com/ddd"
]:1]
Since I am using azure cosmos graph db only these gremlin operators are available
https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin-support
Thanks
You can apply groupCount to a path using a syntax such as this:
groupCount().by(path().by('referer'))
So you could rewrite your query as:
g.V().hasLabel('event').
has('referer','https://www.foobar.com/').
in('previous').
in('previous').
groupCount().by(path().by('referer'))
Hope this helps,
Cheers
Kelvin

limit in _source in elasticsearch

This is my source from ES:
"_source": {
"queryHash": "query412236215",
"id": "query412236215",
"content": {
"columns": [
{
"name": "Catalog",
"type": "varchar(10)",
"typeSignature": {
"rawType": "varchar",
"typeArguments": [],
"literalArguments": [],
"arguments": [
{
"kind": "LONG_LITERAL",
"value": 10
}
]
}
}
],
"data": [
[
"apm"
],
[
"postgresql"
],
[
"rest"
],
[
"system"
],
[
"tpch"
]
],
"query_string": "show catalogs",
"execution_time": 1979
},
"createdOn": "1514269074289"
}
How can i get the n records inside _source.data?
Lets say _source.data have 100 records , I want only 10 at a time ,also is it possible to assign offset for next 10 records?
Thanks
Take a look at scripting. As far as I know there isn't any built-in solution because Elasticsearch is primarily built for searching and filtering with a document store only as a secondary concern.
First, the order in _source is stable, so it's not totally impossible:
When you get a document back from Elasticsearch, any arrays will be in
the same order as when you indexed the document. The _source field
that you get back contains exactly the same JSON document that you
indexed.
However, arrays are indexed—made searchable—as multivalue fields,
which are unordered. At search time, you can’t refer to "the first
element" or "the last element." Rather, think of an array as a bag of
values.
However, source filtering doesn't cover this, so you're out of luck with arrays.
Also inner hits won't help you. They do have options for sort, size, and from, but those will only return the matched subdocuments and I assume you want to page freely through all of them.
So your final hope is scripting, where you can build whatever you want. But this is probably not what you want:
Do you really need paging here? Results are transferred in a compressed fashion, so the overhead of paging is probably much larger than transferring the data in one go.
If you do need paging, because your array is huge, you probably want to restructure your documents.

Separate output values from a single grok query?

I've been capturing web logs using logstash, and specifically I'm trying to capture web URLs, but also split them up.
If I take an example log entry URL:
"GET https://www.stackoverflow.com:443/some/link/here.html HTTP/1.1"
I use this grok pattern:
\"(?:%{NOTSPACE:http_method}|-)(?:%{SPACE}http://)?(?:%{SPACE}https://)?(%{NOTSPACE:http_site}:)?(?:%{NUMBER:http_site_port:int})?(?:%{GREEDYDATA:http_site_url})? (?:%{WORD:http_type|-}/)?(?:%{NOTSPACE:http_version:float})?(?:%{SPACE})?\"
I get this:
{
"http_method": [
[
"GET"
]
],
"SPACE": [
[
" ",
null,
""
]
],
"http_site": [
[
"www.stackoverflow.com"
]
],
"BASE10NUM": [
[
"443"
]
],
"http_site_url": [
[
"/some/link/here.html"
]
],
"http_type": [
[
"HTTP"
]
]
}
The trouble is, I'm trying to ALSO capture the entire URL:
https://www.stackoverflow.com:443/some/link/here.html
So in total, I'm seeking 4 separate outputs:
http_site_complete https://www.stackoverflow.com:443/some/link/here.html
http_site www.stackoverflow.com
http_site_port 443
http_site_url /some/link/here.html
Is there some way to do this?
First, look at the built-in patterns for dealing with URLs. Putting something like URIHOST in your pattern will be easier to read and maintain that a bunch od WORDs or NOTSPACEs.
Second, once you have lots of little fields, you can always use logstash's filters to manipulate them. You could use:
mutate {
add_field => { "http_site_complete", "%{http_site}:%{http_site_port}%{http_site_url}" }
}
}
Or you could get fancy with your regexp and use a named group:
(?<total>%{WORD:wordOne} %{WORD:wordTwo} %{WORD:wordThree})
which would individually capture three fields and make one more field from the whole string.

Resources