handling an empty result from map in a reduce

handling an empty result from map in a reduce - couchdb

In couchDB, does reduce still get called if the map result is empty? if so, are both keys and value empty?
my use case (and hopefully there's a better way to do this):
I send a query to my cluster, and I require both the list of items and the count of items returned (which the map doesnt seem to provide… it only gives me the total count of the view, not the filtered viewresult). I then call reduce to get the count in a separate query.
Sometimes the ViewResult is empty, which makes reduce return null. I could look for this null, but I doubt this is the correct approach in couchdb world.
Edit: turns out the ORM I'm using does support a way to do it.

The reduce function isn't called when there are no rows.
The easiest way to achieve your goal is to just do the map, and back in your code retrieve the length of the rows array that is returned from CouchDB.

The reduce function being called on an empty map was actually a bug that I aided in fixing many months ago. I believe it was patched in 1.2. If you are up for using 1.1 then this bug may still exist and be useable.

Related

an alternative for getAllEntriesByKey?

I want to build a type-ahead function but I need an alternative to getAllEntriesByKey method because the initial data collection is seems to be too large for an acceptable performance.
I would rather like to use the getEntryByKey method and the next X number of documents in a View.
Is something possible? Just jump into a position in a view (matching a specified query) and collect the next X number of documents?
For now I have written most in SSJS.

you can use a combination of NotesView.GetEntryByKey and NotesView.CreateViewNavFrom. This means however you will access the view twice so I do not know if you gain any performance improvement here.
An example (LotusScript) can be found here:
http://lpar.ath0.com/2011/09/19/notesviewentrycollection-vs-notesviewnavigator/
The LotusScript can easily be transformed into SSJS. I have used it something similar before. I can write a blog-post about it.

Pagination with Redis sets vs sorted sets

We are storing users and friends (relationships) in Redis sets.
This is probably easy but we can't figure out how to get back results when paginating.
Example: when showing a logged in users's friends, we need the first 20 results, then on the following click, the next 20 results, etc.. We don't really care about the order, provided we don't get repeated data for the following queries.
We prefer to use sets vs sorted sets, as sets lets us use cheap SINTER for other queries.
WHat would the recommended aproach be? Storing them as both sets and sorted sets? Sounds a bit redundant.

You can paginate through a Set using SSCAN, note that it can return the same result twice though. Alternatively, Sorted Sets are the best for that kind of task. Lastly, Lists can also work but LRANGE is an expensive operation.

How to insert an auto increment / sequential number value in CouchDB documents?

I'm currently playing with couchDB a bit and have the following scenario:
I'm implementing an issue tracker. Requirement is that each issue document has (besides it's document _id) a unique numerical sequential number in order to refer to it in a more appropriate way.
My first approach was to have a view which simply returns the count of unique issue documents currently stored. Increment that value on the client side by 1, assign it to my new issue and insert that.
Turned out to be a bad idea, when inserting multiple issues with ajax calls or having multiple clients adding issues at the same time. In latter case is wouldn't be even possible without communication between clients.
Ideally I want the sequential number to be generated on couch, which is afaik not possible due to conflicting states in distributed systems.
Is there any good pattern one could use (maybe on the client side) to approach this? I feel like this is a standard kind of use case (thinking of invoice numbers, etc).
Thanks in advance!

You could use a separate document which is empty, though it only consists of the id and rev. The rev prefix is always an integer, so you could use it as your auto incrementing number.
Just make a POST to your document, this will increase the rev and return it. Then you can use this generated value for your purpose.
Alternative way:
Create a separate document, consisting of value and lock. Then execute something like: "IF lock == true THEN return ELSE set lock = true AND increase value by 1", then do a GET to retrieve the new value and finally set lock = false.

I agree with you that using a view that gives you a document count is not a great idea. And it is the reason that couchdb uses a uuid's instead.
I'm not aware of a sequential id feature in couchdb, but think it's quite easy to write. I'd consider either:
An RPC (eg. with RabbitMQ) call to a single service to avoid concurrency issues. You can then store the latest number in a dedicated document on a specific non distributed couchdb or somewhere else. This may not scale particularly well, but you're writing a heck of an issue tracking system before this becomes an issue.
If you can allow missing numbers, set the uuid algorithm on your couch to sequential and you are at least good until the first buffer overflow. See more info at: http://couchdb.readthedocs.org/en/latest/config/misc.html#uuids-configuration

Trying to return paged results with Riak using continuation hash - NoSQL pagination

Edit: I added an answer with a more generic approach for NoSQL situations.
I am working on a project using Riak (with LevelDB).
Using the REST API that Riak offers, I am able to get data based on indexes and a range, which returns the results sorted alpha-num by the index, and a continuation hash.
Example call:
http://server/buckets/bucketname/index/someindex_int/333333333/555555555?max_results=10&return_terms=true&continuation=somehashhere
Example results:
{
results: [
{
about_river: "12312"
},
{
balloon_tall: "45345"
},
{
basket_written: "23434523"
}
],
continuation: "g2987392479789879087987asdfasdf="
}
I am also making a separate call without specifying max_results and return_terms to get a count of the docs that are in the set. I will know the number of docs per set and the total number of docs, which easily lets us know the number of "pages".
While I am able to make a call for each set of documents based on the hash, then receive a next hash with the results set, I am looking for a way to predict the hashes, therefore pre-populate the client with pagination links.
Is this possible? Are the hashes dynamic based on the index/range info or are they some random value generated by the node your data is returned from?
A coworker has mentioned that the hashes are based on what node you are hitting in the cluster, but I am unable to find documentation on this.
Secondarily, the idea was brought up to cycle through the entire set in the background to get the hashes. This will work, but seems pretty expensive.
I am brand new to Riak and any advice here would be great. I am not able to find any good examples of pagination with Riak. The one that did exist is gone from the internet as far as I can tell.

No, the continuation is not "predictable" nor is anything your co-worker saying correct.
Unfortunately there is no way to know the total number of objects in the range specified except for querying the range without the max_results parameter as you are doing (outside of a 1:1 relation between index key and object key, obviously).

The other answer was the answer I needed, but with some help from CodingHorror, I came up with the answer I wanted.
No pagination. With no pagination, only getting the hash for the next results set is no problem, in fact, it's ideal for my use-case. Just merge that next set onto your existing set(s). But don't let it go on forever.
My inspiration: http://blog.codinghorror.com/the-end-of-pagination/
Thanks, Jeff Atwood!

Ain't the number of results in the response the same?
something like
RiakFuture<SearchOperation.Response, BinaryValue> searchResult = client.executeAsync(searchOp);
searchResult.await();
com.basho.riak.client.core.operations.SearchOperation.Response response = searchResult.get();
logger.debug("number of results {} ", response.numResults());

How do I sort Lucene results by field value using a HitCollector?

I'm using the following code to execute a query in Lucene.Net
var collector = new GroupingHitCollector(searcher.GetIndexReader());
searcher.Search(myQuery, collector);
resultsCount = collector.Hits.Count;
How do I sort these search results based on a field?
Update
Thanks for your answer. I had tried using TopFieldDocCollector but I got an error saying, "value is too small or too large" when i passed 5000 as numHits argument value. Please suggest a valid value to pass.

The search.Searcher.search method will accept a search.Sort parameter, which can be constructed as simply as:
new Sort("my_sort_field")
However, there are some limitations on which fields can be sorted on - they need to be indexed but not tokenized, and the values convertible to Strings, Floats or Integers.
Lucene in Action covers all of the details, as well as sorting by multiple fields and so on.

What you're looking for is probably TopFieldDocCollector. Use it instead of the GroupingHitCollector (what is that?), or inside it.
Comment on this if you need more info. I'll be happy to help.

In the original (Java) version of Lucene, there is no hard restriction on the size of the the TopFieldDocCollector results. Any number greater than zero is accepted. Although memory constraints and performance degradation create a practical limit that depends on your environment, 5000 hits is trivial and shouldn't pose a problem outside of a mobile device.
Perhaps in porting Lucene, TopFieldDocCollector was modified to use something other than Lucene's "heap" implementation (called PriorityQueue, extended by FieldSortedHitQueue)—something that imposes an unreasonably small limit on the results size. If so, you might want to look at the source code for TopFieldDocCollector, and implement your own similar hit collector using a better heap implementation.
I have to ask, however, why are you trying to collect 5000 results? No user in an interactive application is going to want to see that many. I figure that users willing to look at 200 results are rare, but double it to 400 just as factor of safety. Depending on the application, limiting the result size can hamper malicious screen scrapers and mitigate denial-of-service attacks too.

The constructor for Sort accepting only the string field name has been depreciated. Now you have to create a sort object and pass it in as the last paramater of searcher.Search()
/* sorting by a field of type long called "size" from greatest -> smallest
(signified by passing in true for the last isReversed paramater)*/
Sort sorter = new Sorter(new SortField("size", SortField.Type.LONG, true))
searcher.Search(myQuery, collector, sorter);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string