I have a set or lat/longs stored in a db. I want to query the db and return documents that are within range of another lat/long. I know how to determine the distance between two sets but I don't want to have to do that for every entry in the db. What is the best way to achieve this?
Thanks very much.
Perhaps you could use Geospatial Indexing to achieve this...
If that's no good, I actually built a node.js addon to perform nearest neighbor searches called node-kdtree. It could be used to find the closest n points, and is fairly quick since it is just a wrapper to an underlying C library. But it sounds like it would be a poor choice for your needs because you would have to pull all of your data out of the DB first in order to process it. With the limited information I have, I suggest that you try using the built-in functionality of mongodb first.
Related
Rather simple question here. Using CloudSearch, how do I find an object that does NOT have a certain key/property defined.
eg. I have been storing Car objects all along without indexing their price. Now I have began indexing Car objects with their msrp... how do I find the Car objects stored without any indexed price?
(and price:null)
(and price:undefined)
and other similar 'falsy' statements and their stringified permutations all do not work.
I am using AWS sdk in Node.js.
TIA!
Niko
The option that will work without any reindexing is a range search like
(NOT (range field=price [0,}))
which matches cars with a price that is not between 0 and infinity (eg ones with no price). See this answer for a discussion of other options.
Side note: I get the impression that you may be using CloudSearch to store your data. If so, I would consider using a datastore (which are designed to store data) rather than CloudSearch (which is a search engine). For one, it'll make this sort of query much easier.
I have more "Location documents" in my couchdb with longitude and latitude fields. How to find all location documents in database which distance to provided latitude and longitude is less than provided distance.
There is a way how to achieve it using vanilla CouchDB, but it‘s bit tricky.
You can use the fact you can apply two map functions during one request. Second map function can be created using list mechanics.
Lists are not very efficient from computational side, they can‘t cache results as views. But they have one unique feature – you can pass several arguments into list. Moreover, one of your arguments can be, for example, JS code, that is eval-ed inside list function (risky!).
So entire scheme looks like this:
Make view, that performs coarse search
Make list, that receives custom params and refines data set
Make client-side API to ease up querying this chain.
Can‘t provide exact code for your particular case, many details are not clear, but it seems that coarse search must group results to somehow linearly enumerated squares, and list perform more precise calculations.
Please note, that scheme might be inefficient for large datasets since it‘s computationally hungry.
Vanilla CouchDB isn't really built for geospacial queries.
Your best bet is to either use GeoCouch, CouchDB-Lucene or something similar.
Failing that, you could emit a Geohash from your map function, and do range queries over those.
Caveats apply. Queries around Geohash "fault lines" (equator, poles, longitude 180, etc) can give too many or too little results.
There are multiple JavaScript libraries that can help convert to/from Geohash, as well as help with some of those caveats.
CouchDB is not built for dynamic queries, so there is no good/fast way of implementing it in vanilla couchDB.
If you know beforehand which locations you want to calculate the distance from you could create a view for each location and call it with parameters ?startkey=0&endkey=max_distance
function(doc) {
function distance(...){ /* your function for calculating distance */ }
var NY = {lat:40,lon:73}
emit( distance(NY,doc), doc._id);
}
If you do not know the locations beforehand you could solve it by using a temporary view, but I would strongly advise against it since it's slow and should only be used for testing.
There are questions like this on here, but no answers.
I need to implement a feature where the two types of nodes (labelled :Hashtags and :Statements) in my Neo4J 2.0 database can be searched by the users from my Node.Js app.
So that means the users enter something they need into a search field, click search, and get the results. A better scenario is that it's more responsive and finds possible matches on the fly.
How would you implement that?
I have some ideas, but unsure about which one to go for:
Each time the user makes a search, make this kind of Cypher query (not very efficient to query the database so much, I guess, and won't work for responsive results suggestions):
MATCH (h:Hashtag{name:"user_query"}), (s:Hashtag{name:"user_query"}) RETURN h,s;
Install something like Elastic Search and let it handle the search (this is what the guys from Linkurio.us have done)
In the first option the .name property of those labeled nodes is, of course, indexed.
The second option seems to be more robust, but I really would like to avoid having to install extra software and having this kind of dependencies.
Maybe you know of a better solution?
Thank you!
I don't understand why the first option would not be responsive?
After all the Neo4j indexing by default is using Lucene, the same as elastic search?
And with an index (or unique constraint) the lookup should be instant.
Did you actually test the performance? (Make sure to use parameters for the actual value)
Is it possible to transform the returned data from a Find query in MongoDB?
As an example, I have a first and last field to store a user's first and last name. In certain queries, I wish to return the first name and last initial only (e.g. 'Joe Smith' returned as 'Joe S'). In MySQL a SUBSTRING() function could be used on the field in the SELECT statement.
Are there data transformations or string functions in Mongo like there are in SQL? If so can you please provide an example of usage. If not, is there a proposed method of transforming the data aside from looping through the returned object?
It is possible to do just about anything server-side with mongodb. The reason you will usually hear "no" is you sacrifice too much speed for it to make sense under ordinary circumstances. One of the main forces behind PyMongo, Mike Dirolf with 10gen, has a good blog post on using server-side javascript with pymongo here: http://dirolf.com/2010/04/05/stored-javascript-in-mongodb-and-pymongo.html. His example is for storing a javascript function to return the sum of two fields. But you could easily modify to return the first letter of your user name field. The gist would be something like:
db.system_js.first_letter = "function (x) { return x.charAt(0); }"
Understand first, though, that mongodb is made to be really good at retrieving your data, not really good at processing it. The recommendation (see for example 50 tips and tricks for mongodb developers from Kristina Chodorow by Oreilly) is to do what Andrew tersely alluded to doing above: make a first letter column and return that instead. Any processing can be more efficiently done in the application.
But if you feel that even querying for the fullname before returning fullname[0] from your 'view' is too much of a security risk, you don't need to do everything the fastest possible way. I'd avoided map-reduce in mongodb for awhile because of all the public concerns about speed. Then I ran my first map reduce and twiddled my thumbs for .1 seconds as it processed 80,000 10k documents. I realize in the scheme of things, that's tiny. But it illustrates that just because it's bad for a massive website to take a performance hit on some server side processing, doesn't mean it would matter to you. In my case, I imagine it would take me slightly longer to migrate to Hadoop than to just eat that .1 seconds every now and then. Good luck with your site
The question you should ask yourself is why you need that data. If you need it for display purposes, do that in your view code. If you need it for query purposes, then do as Andrew suggested, and store it as an extra field on the object. Mongo doesn't provide server-side transformations (usually, and where it does, you usually don't want to use them); the answer is usually to not treat your data as you would in a relational DB, but to use the more flexible nature of the data store to pre-bake your data into the formats that you're going to be using.
If you can provide more information on how this data should be used, then we might be able to answer a little more usefully.
Doeas anybody know if riaksearch has the ability to generate excerpt with highlight points in it similar to lucene does?
Riak Search doesn't expose this functionality out of the box, but with a little work you can create a rough approximation.
Riak Search allows you to feed search results into a MapReduce job. If you do this, then your Map or Reduce function will also get a list of token positions in the document that matched the query (this is exposed as keydata, http://www.basho.com/search.php?q=keydata). Using these positions, you can write code to mark up the document or excerpt portions of text.
I think this functionality will hardly ever be implemented in Riak since it's philisophy implies that it doesn't care about what exactly is stored in the values and therefore does not process them in any meaningful way except providing some metadata like indices.