CouchDB query for more dynamic values

CouchDB query for more dynamic values - couchdb

I have more "Location documents" in my couchdb with longitude and latitude fields. How to find all location documents in database which distance to provided latitude and longitude is less than provided distance.

There is a way how to achieve it using vanilla CouchDB, but it‘s bit tricky.
You can use the fact you can apply two map functions during one request. Second map function can be created using list mechanics.
Lists are not very efficient from computational side, they can‘t cache results as views. But they have one unique feature – you can pass several arguments into list. Moreover, one of your arguments can be, for example, JS code, that is eval-ed inside list function (risky!).
So entire scheme looks like this:
Make view, that performs coarse search
Make list, that receives custom params and refines data set
Make client-side API to ease up querying this chain.
Can‘t provide exact code for your particular case, many details are not clear, but it seems that coarse search must group results to somehow linearly enumerated squares, and list perform more precise calculations.
Please note, that scheme might be inefficient for large datasets since it‘s computationally hungry.

Vanilla CouchDB isn't really built for geospacial queries.
Your best bet is to either use GeoCouch, CouchDB-Lucene or something similar.
Failing that, you could emit a Geohash from your map function, and do range queries over those.
Caveats apply. Queries around Geohash "fault lines" (equator, poles, longitude 180, etc) can give too many or too little results.
There are multiple JavaScript libraries that can help convert to/from Geohash, as well as help with some of those caveats.

CouchDB is not built for dynamic queries, so there is no good/fast way of implementing it in vanilla couchDB.
If you know beforehand which locations you want to calculate the distance from you could create a view for each location and call it with parameters ?startkey=0&endkey=max_distance
function(doc) {
function distance(...){ /* your function for calculating distance */ }
var NY = {lat:40,lon:73}
emit( distance(NY,doc), doc._id);
}
If you do not know the locations beforehand you could solve it by using a temporary view, but I would strongly advise against it since it's slow and should only be used for testing.

Related

CloudSearch, How to find documents where a key is not defined?

Rather simple question here. Using CloudSearch, how do I find an object that does NOT have a certain key/property defined.
eg. I have been storing Car objects all along without indexing their price. Now I have began indexing Car objects with their msrp... how do I find the Car objects stored without any indexed price?
(and price:null)
(and price:undefined)
and other similar 'falsy' statements and their stringified permutations all do not work.
I am using AWS sdk in Node.js.
TIA!
Niko

The option that will work without any reindexing is a range search like
(NOT (range field=price [0,}))
which matches cars with a price that is not between 0 and infinity (eg ones with no price). See this answer for a discussion of other options.
Side note: I get the impression that you may be using CloudSearch to store your data. If so, I would consider using a datastore (which are designed to store data) rather than CloudSearch (which is a search engine). For one, it'll make this sort of query much easier.

Transform MongoDB Data on Find

Is it possible to transform the returned data from a Find query in MongoDB?
As an example, I have a first and last field to store a user's first and last name. In certain queries, I wish to return the first name and last initial only (e.g. 'Joe Smith' returned as 'Joe S'). In MySQL a SUBSTRING() function could be used on the field in the SELECT statement.
Are there data transformations or string functions in Mongo like there are in SQL? If so can you please provide an example of usage. If not, is there a proposed method of transforming the data aside from looping through the returned object?

It is possible to do just about anything server-side with mongodb. The reason you will usually hear "no" is you sacrifice too much speed for it to make sense under ordinary circumstances. One of the main forces behind PyMongo, Mike Dirolf with 10gen, has a good blog post on using server-side javascript with pymongo here: http://dirolf.com/2010/04/05/stored-javascript-in-mongodb-and-pymongo.html. His example is for storing a javascript function to return the sum of two fields. But you could easily modify to return the first letter of your user name field. The gist would be something like:
db.system_js.first_letter = "function (x) { return x.charAt(0); }"
Understand first, though, that mongodb is made to be really good at retrieving your data, not really good at processing it. The recommendation (see for example 50 tips and tricks for mongodb developers from Kristina Chodorow by Oreilly) is to do what Andrew tersely alluded to doing above: make a first letter column and return that instead. Any processing can be more efficiently done in the application.
But if you feel that even querying for the fullname before returning fullname[0] from your 'view' is too much of a security risk, you don't need to do everything the fastest possible way. I'd avoided map-reduce in mongodb for awhile because of all the public concerns about speed. Then I ran my first map reduce and twiddled my thumbs for .1 seconds as it processed 80,000 10k documents. I realize in the scheme of things, that's tiny. But it illustrates that just because it's bad for a massive website to take a performance hit on some server side processing, doesn't mean it would matter to you. In my case, I imagine it would take me slightly longer to migrate to Hadoop than to just eat that .1 seconds every now and then. Good luck with your site

The question you should ask yourself is why you need that data. If you need it for display purposes, do that in your view code. If you need it for query purposes, then do as Andrew suggested, and store it as an extra field on the object. Mongo doesn't provide server-side transformations (usually, and where it does, you usually don't want to use them); the answer is usually to not treat your data as you would in a relational DB, but to use the more flexible nature of the data store to pre-bake your data into the formats that you're going to be using.
If you can provide more information on how this data should be used, then we might be able to answer a little more usefully.

Best way query a database for nearby lat/longs?

I have a set or lat/longs stored in a db. I want to query the db and return documents that are within range of another lat/long. I know how to determine the distance between two sets but I don't want to have to do that for every entry in the db. What is the best way to achieve this?
Thanks very much.

Perhaps you could use Geospatial Indexing to achieve this...
If that's no good, I actually built a node.js addon to perform nearest neighbor searches called node-kdtree. It could be used to find the closest n points, and is fairly quick since it is just a wrapper to an underlying C library. But it sounds like it would be a poor choice for your needs because you would have to pull all of your data out of the DB first in order to process it. With the limited information I have, I suggest that you try using the built-in functionality of mongodb first.

CouchDB map/reduce by any document property at runtime?

I come from a SQL world where lookups are done by several object properties (published = TRUE or user_id = X) and there are no joins anywhere (because of the 1:1 cache layer). It seems that a document database would be a good fit for my data.
I am trying to figure-out if there is a way to pass one (or more) object properties to a CouchDB map/reduce function to find matching documents in a database without creating dozens of views for each document type.
Is it possible to pass the desired document property key(s) to match at run-time to CouchDB and have it return the objects that match (or the count of object that match for pagination)?
For example, on one page I want all posts with a doc.user_id of X that are doc.published. On another page I might want all documents with doc.tags[] with the tag "sport".

You could build a view that iterates over the keys in the document, and emits a key of [propertyName, propertyValue] - that way you're building a single index with EVERYTHING prop/value in it. Would be massive, no idea how performance would be to build, and disk usage (probably bad).
Map function would look something like:
// note - totally untested, my CouchDB fu is rusty
function(doc) {
for(prop in doc) {
emit([prop, doc[prop]], null);
}
}
Works for the basic case of simple properties, and can be extended to be smart about arrays, and emit a prop/value pair for each item in the array. That would let you handle the tags.
To query on it, set [prop] as your query key on the view.

Basically, no.
The key difference between something like Couch and a SQL DB is that the only way to query in CouchDB is essentially through the views/indexes. Indexes in SQL are optional. They exist (mostly) to boost performance. For example, if you have a small DB, your app will run just fine on SQL with 0 indexes. (Might be some issue with unique constraints, but that's a detail.)
The overall point being is that part of the query processor in a SQL database includes other methods of data access beyond simply indexes, notably table scans, merge joins, etc.
Couch has no query processor. It has views (defined by JS) used to define B-Tree indexes.
And, that's it. That's the hammer of Couch. It's a good hammer. It's been lasting the data processing world for basically 40 years.
Indexes are somewhat expensive to create in Couch (based on data volume) which is why "temporary views" are frowned upon. And they have a cost in maintenance as well, so views need to be a conscious design element in your database. At the same time, they're a bit more powerful than normal SQL indexes as well.
You can readily add your own query processing on top of Couch, but that will be more work for you. You can create a few select views, on your most popular or selective criteria, and then filter the resulting documents by other criteria in your own code. Yes, you have to do it, so you have to question whether the effort involved is worth more than whatever benefits you feel Couch is offering your (HTTP API, replication, safe, always consistent datastore, etc.) over a SQL solution.

I ran into a similar issue like this, and built a quick workaround using CouchDB-Python (which is a great library). It's not a pretty solution (goes against the principles of CouchDB), but it works.
CouchDB-Python gives you the function "Query", which allows you to "execute an ad-hoc temporary view against the database". You can read about it here
What I have is that I store the javascript function as a string in python, and the concatenate it with variable names that I define in Python.
In some_function.py
variable = value
# Map function (in javascript)
map_fn = """function(doc) {
<javascript code>
var survey_match = """ + variable + """;
<javascript code>
"""
# Iterates through rows
for row in db.query(map_fn):
<python code>
It sure isn't pretty, and probably breaks a bunch of CouchDB philosophies, but it works.
D

riak search result excerpt

Doeas anybody know if riaksearch has the ability to generate excerpt with highlight points in it similar to lucene does?

Riak Search doesn't expose this functionality out of the box, but with a little work you can create a rough approximation.
Riak Search allows you to feed search results into a MapReduce job. If you do this, then your Map or Reduce function will also get a list of token positions in the document that matched the query (this is exposed as keydata, http://www.basho.com/search.php?q=keydata). Using these positions, you can write code to mark up the document or excerpt portions of text.

I think this functionality will hardly ever be implemented in Riak since it's philisophy implies that it doesn't care about what exactly is stored in the values and therefore does not process them in any meaningful way except providing some metadata like indices.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string