How to get distance in Solr 4 geospatial search? - search

I'm playing with the new Solr 4 geospatial search. Like in an example from http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 I'm trying to get the results like so:
http://localhost:8983/solr/allopenhours/select?
&q=foobar
&fq=geo:%22Intersects(Circle(54.729696,-98.525391%20d=0.08992))%22
&q={!%20score=distance}
&fl=*,score
But it doesn't work. How can I get distance and score fields in the results set?

According to the reference Spatial Search - Returning the distance you can edit your fields parameter to do one of the following:
&fl=*,score,geodist()
&fl=*,score,_dist_:geodist() - this one will return the distance in the alias _dist_

The answer Paige gave is correct. However, the error is shown depending on query given.
Error parsing fieldname: geodist - not enough parameters:[]
geodist needs the sfield (field which holds the location in the document) and a pt (the central point of the circle). If it can't find any of these, it will throw the error shown.
Either add these two to the URL
&pt=52.373,4.899&sfield=store&fl=_dist_:geodist()
Or add the two (or actually 3: pt, lat and lon) to the geodist() function call:
&fl:_dist_:geodist(store,52.373,4.899)
Note that in the first case, if you have additional geo functions (like geofilt) in your query, the pt and sfield are used for that as well (unless locally overridden)

Related

How to visualize a count of all values in an array field in Kibana

I am having trouble creating a particular type of visualization in Kibana. My events in Kibana are statistics on communications between two ip address. Two of the fields are lists of ports used by the particular ip address. An example of the fields would be:
ip1 = 192.168.101.2
ip2 = 192.168.101.3
ip2Ports = 80,443
ip1Ports = 80,57000,0
I would like to have a top count of all the values such as
port count
80 2
57000 1
443 1
I have been able to parse ip2Ports to be ip2Ports_List.column1, ip2Ports_List.column2, ect, but I can only choose one term with term aggregation in the visualization. I can split the chart, but that leads to separate counts for each field. If I go by the original ip2Ports field, it is just aggregated as the string such as, "80,443".
Is it even possible to create a top count visualization of fields with multiple values? If so, how would I do so. If not, is there a way to restructure my data so I can do it? Thank you!
My issue stemmed from the format of the values being sent in by Logstash. I had thought that the 'ip2Ports_List.column1' format, which was a result from using the csv filter, was part of an array. It wasn't. After analyzing it, 'ip2Ports_List.column1' didn't seem to be much different from a new field.
Elastic needed an array to give me the visualization I wanted. I wasn't sure what the best way to produce it was, so I just ended up using the ruby filter. This is what the code ended up looking like:
ruby {
code => "fields = event.get('portsIp').split(',')
event.set('portsIpArray',fields)"
}
Where 'portsIp' looked something like "80,443". Splitting it turned 'portsIp' into a Ruby array. I just set that array as the value for a new event field, 'portsIpArray'.
From there when I tried visualize the 'portsIpArray' field, it looked exactly how I wanted it to, treating each port as separate value, and still associating each port with the same event/field.
Extra:
Also something I discovered is if you're writing your code like I was, directly in the Logstash conf file, Logstash doesn't like it if you use double quotes within the double quoted code. In hindsight it makes sense, but it doesn't give a clear error so it's difficult to figure out.

Solr spatial search , polygon intersect error

I am trying spatial search in SOLR 5.0 , My system are up and running, but sometimes I am coming across this error, I have google it around but found no explanation as such, anyone please help me with this.
My Error
because: com.spatial4j.core.exception.InvalidShapeException: Self-intersection at or near point
(13.143009111281323, 80.07316589355469, NaN)","code":400}})
My Query :
Intersects(POLYGON((13.142340452070176 80.07522583007812,13.141003126359843 80.079345703125
,13.141003126359843 80.08621215820312,13.1383284530778 80.09445190429688,13.1383284530778 80.101318359375
,13.136991105507466 80.11367797851562,13.131641642380112 80.14114379882812,13.128966867118496 80.15350341796875
,13.126292062732247 80.16311645507812,13.124954649619115 80.17135620117188,13.111580118251648 80.17135620117188
,13.099542418228534 80.15899658203125,13.095529720741494 80.10543823242188,13.09820485966459 80.10543823242188
,13.099542418228534 80.10406494140625,13.102217513557752 80.10406494140625,13.103555050321653 80.10269165039062
,13.106230102044576 80.101318359375,13.114255082724767 80.101318359375,13.1155925540513 80.09994506835938
,13.118267474880913 80.09857177734375,13.119604924382593 80.09857177734375,13.124954649619115 80.0958251953125
,13.127629468565612 80.0958251953125,13.130304258390225 80.09445190429688,13.131641642380112 80.09445190429688
,13.134316388511614 80.09307861328125,13.136991105507466 80.09033203125,13.139665793362148 80.0848388671875
,13.141003126359843 80.08346557617188,13.141003126359843 80.08209228515625,13.142340452070176 80.08071899414062
,13.142340452070176 80.0738525390625,13.14367777049247 80.07247924804688,13.14367777049247 80.07110595703125
,13.142340452070176 80.07522583007812)))distErrPct=0
Well, let's take a look at the polygon:
Looks reasonable, but pretty busy in the lower right:
There you can see the self-intersection, and self-intersecting polygons aren't acceptable (see Solr Spatial search with self-intersecting polygons for some more information on that).
It looks to me like you're trying to be more precise than your application is capable of.

Searching closest venues using ll and radius not working properly

I know there are a lot of questions about this issue, but I've reached a point where I can't really do anything else but to ask if somebody else has a solution for this issue...
Using the Foursquare api explorer to test out my query I can't seem to obtain an accurate or even good fit for the data I need to obtain.
It is quite simple. I need to obtain the closest venue from a set of coordinates. I don't mind not having results if nothing is found near by.
So, reading the API documentation (https://developer.foursquare.com/docs/venues/venues) I conclude that I need a search and not an explore because I don't want sugestions of recommended venues (and the results when I tested it proved that it wasn't what I was expecting).
So, using search api I want to find places (the place, but places would do...) close to these coordinates
ll=37.424782,-122.162989
considering that I want places close by, I add
radius=51
and I don't really want many results
limit=2
from the documentation I see that radius is
Only valid for requests with intent=browse, or requests with
intent=checkin and categoryId or query
so, I use
intent=browse
which concludes my query to:
venues/search?intent=browse&ll=37.424782,-122.162989&radius=51&limit=2
Query Result:
https://developer.foursquare.com/docs/explore#req=venues/search%3Fintent%3Dbrowse%26ll%3D37.424782,-122.162989%26radius%3D51%26limit%3D2
Here we can see that the first result is straight outside of the radius ... distance: 135
the second result however is cool ... distance: 50
What am I doing wrong to get these results? If I increase the limit all I get is more results that are also outside the radius, I could iterate through them and find the one with the smallest distance... but I have no guarantee that the closest result will be on the top X that I limit, even If I had that guarantee, it would be a tiresome solution to an apparently simple question...
Thanks for the help...
Marc
EDIT:
I managed to make have the query perform as I intended ... But I had to add all of the parent categories from:
https://developer.foursquare.com/categorytree
categoryId=
4d4b7104d754a06370d81259, Arts & Entertainment
4d4b7105d754a06372d81259, College & University
4d4b7105d754a06373d81259, Event
4d4b7105d754a06374d81259, Food
4d4b7105d754a06376d81259, Nightlife Spot
4d4b7105d754a06377d81259, Outdoors & Recreation
4d4b7105d754a06375d81259, Professional & Other Places
4e67e38e036454776db1fb3a, Residence
4d4b7105d754a06378d81259, Shop & Service
4d4b7105d754a06379d81259 Travel & Transport
making my query into:
venues/search?
intent=checkin&ll=37.424782,-122.162989&radius=60&categoryId=4d4b7104d754a06370d81259,4d4b7105d754a06372d81259,4d4b7105d754a06373d81259,4d4b7105d754a06374d81259,4d4b7105d754a06376d81259,4d4b7105d754a06377d81259,4d4b7105d754a06375d81259,4e67e38e036454776db1fb3a,4d4b7105d754a06378d81259,4d4b7105d754a06379d81259
Query Result:
https://developer.foursquare.com/docs/explore#req=venues/search%3Fintent%3Dcheckin%26ll%3D37.424782,-122.162989%26radius%3D60%26categoryId%3D4d4b7104d754a06370d81259,4d4b7105d754a06372d81259,4d4b7105d754a06373d81259,4d4b7105d754a06374d81259,4d4b7105d754a06376d81259,4d4b7105d754a06377d81259,4d4b7105d754a06375d81259,4e67e38e036454776db1fb3a,4d4b7105d754a06378d81259,4d4b7105d754a06379d81259
It still has results outside of my radius still ... but it's an acceptable error margin ... it is weird however.
I managed to make have the query perform as I intended ... But I had to add all of the parent categories from:
https://developer.foursquare.com/categorytree
categoryId=
4d4b7104d754a06370d81259, Arts & Entertainment
4d4b7105d754a06372d81259, College & University
4d4b7105d754a06373d81259, Event
4d4b7105d754a06374d81259, Food
4d4b7105d754a06376d81259, Nightlife Spot
4d4b7105d754a06377d81259, Outdoors & Recreation
4d4b7105d754a06375d81259, Professional & Other Places
4e67e38e036454776db1fb3a, Residence
4d4b7105d754a06378d81259, Shop & Service
4d4b7105d754a06379d81259 Travel & Transport
making my query into: venues/search?
intent=checkin&ll=37.424782,-122.162989&radius=60&categoryId=4d4b7104d754a06370d81259,4d4b7105d754a06372d81259,4d4b7105d754a06373d81259,4d4b7105d754a06374d81259,4d4b7105d754a06376d81259,4d4b7105d754a06377d81259,4d4b7105d754a06375d81259,4e67e38e036454776db1fb3a,4d4b7105d754a06378d81259,4d4b7105d754a06379d81259
Query Result: https://developer.foursquare.com/docs/explore#req=venues/search%3Fintent%3Dcheckin%26ll%3D37.424782,-122.162989%26radius%3D60%26categoryId%3D4d4b7104d754a06370d81259,4d4b7105d754a06372d81259,4d4b7105d754a06373d81259,4d4b7105d754a06374d81259,4d4b7105d754a06376d81259,4d4b7105d754a06377d81259,4d4b7105d754a06375d81259,4e67e38e036454776db1fb3a,4d4b7105d754a06378d81259,4d4b7105d754a06379d81259
It still has results outside of my radius still ... but it's an acceptable error margin ... it is weird however.
Although this question is old I'm responding for others. I was working on something similar to this recently and what I learned was that in order to use radius you also need to use the 'query' parameter. What I did was to use the star character '*' and it worked for me. I have to say though that the limit of 50 is something I haven't solved yet which I'm working on at the moment.

Using indexed types for ElasticSearch in Titan

I currently have a VM running Titan over a local Cassandra backend and would like the ability to use ElasticSearch to index strings using CONTAINS matches and regular expressions. Here's what I have so far:
After titan.sh is run, a Groovy script is used to load in the data from separate vertex and edge files. The first stage of this script loads the graph from Titan and sets up the ES properties:
config.setProperty("storage.backend","cassandra")
config.setProperty("storage.hostname","127.0.0.1")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","db/es")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")
The second part of the script sets up the indexed types:
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make();
The third part loads in the data from the CSV files, this has been tested and works fine.
My problem is, I don't seem to be able to use the ElasticSearch functions when I do a Gremlin query. For example:
g.E.has("property",CONTAINS,"test")
returns 0 results, even though I know this field contains the string "test" for that property at least once. Weirder still, when I change CONTAINS to something that isn't recognised by ElasticSearch I get a "no such property" error. I can also perform exact string matches and any numerical comparisons including greater or less than, however I expect the default indexing method is being used over ElasticSearch in these instances.
Due to the lack of errors when I try to run a more advanced ES query, I am at a loss on what is causing the problem here. Is there anything I may have missed?
Thanks,
Adam
I'm not quite sure what's going wrong in your code. From your description everything looks fine. Can you try the follwing script (just paste it into your Gremlin REPL):
config = new BaseConfiguration()
config.setProperty("storage.backend","inmemory")
config.setProperty("storage.index.elastic.backend","elasticsearch")
config.setProperty("storage.index.elastic.directory","/tmp/es-so")
config.setProperty("storage.index.elastic.client-only","false")
config.setProperty("storage.index.elastic.local-mode","true")
g = TitanFactory.open(config)
g.makeKey("name").dataType(String.class).make()
g.makeKey("property").dataType(String.class).indexed("elastic",Edge.class).make()
g.makeLabel("knows").make()
g.commit()
alice = g.addVertex(["name":"alice"])
bob = g.addVertex(["name":"bob"])
alice.addEdge("knows", bob, ["property":"foo test bar"])
g.commit()
// test queries
g.E.has("property",CONTAINS,"test")
g.query().has("property",CONTAINS,"test").edges()
The last 2 lines should return something like e[1t-4-1w][4-knows-8]. If that works and you still can't figure out what's wrong in your code, it would be good if you can share your full code (e.g. in Github or in a Gist).
Cheers,
Daniel

What spatial SRID is this? (trying to convert a .shp file to WSG84)

I'm trying to import some Shapefile mapping data into Sql2008. Before I do that, I need to convert it to WGS84 / SRID 4326, because all my existing data is in this format.
This is the source file info:
GEOGCS["GCS_GDA_1994",DATUM["D_GDA_1994",
SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
I've tried googling for this and haven't had too much luck.
Secondly, I've tried to check the spatial_reference_systems table and I can't see it in there.
eg. SELECT * from sys.spatial_reference_systems
So, can anyone help me? I can't covert it to SRID 4326 if i don't know it's current SRID.
UPDATE 1
I found this page which explains the tech specs of GDA 1994 .. but doesn't hint at any SRID number... ???
UPDATE 2
This search result page also has some interesting results. From here, if you click on the SR-ORG:6643: Australia Albers Equal Area Conic link, it explains that datum .. and it's pretty much identical to the one I'm searching for. This means the SRID is 6643.
So is that the answer?
Using FME as my reference, this (GDA94) maps to EPSG:4283, which means that you need to use SRID 4283 (assuming that you're using EPSG-compliant SRID values)
Using this link GDA94 can be mapped to SRID = 4283 covering the Australian continent. If one knows, for example, that it is Western Australia it may be better to use SRID = 28350 and preserve greater accuracy.

Resources