Getting a list of documents with a maximum field value from CouchDB view

Getting a list of documents with a maximum field value from CouchDB view - couchdb

Let's say I have blog entries like these in my CouchDB database:
{"name":"Mary", "postdate":"20110412", "subject":"this", "message":"blah"}
{"name":"Joe", "postdate":"20110411", "subject":"that", "message":"yadda"}
{"name":"Mary", "postdate":"20110411", "subject":"and this", "message":"blah-blah"}
{"name":"Joe", "postdate":"20110410", "subject":"And other thing", "message":"yada-yada"}
{"name":"Jane", "postdate":"20110409", "subject":"Serious stuff", "message":"Not really"}
It's pretty easy to get a list of all posts. But how do I get a list of latest posts from all the users?
Like that:
{"name":"Mary", "postdate":"20110412", "subject":"this", "message":"blah"}
{"name":"Joe", "postdate":"20110411", "subject":"that", "message":"yadda"}
{"name":"Jane", "postdate":"20110409", "subject":"Serious stuff", "message":"Not really"}

Try with this map function:
function(doc) {
if (doc.postdate && doc.name) {
emit([doc.name, doc.postdate], 1);
}
}
and the following reduce function:
function(keys, values, rereduce) {
var max = 0,
ks = rereduce ? values : keys;
for (var i = 1, l = ks.length; i < l; ++i) {
if (ks[max][0][1] < ks[i][0][1]) max = i;
}
return ks[max];
}
and query it with group_level=1. It gives you the _id of the posts, then you can retrieve them all with a single query with the keys parameter or using a POST.
I am not sure if this is the best approach, but it seems to work.
UPDATE: fixed map to handle rereduce correctly.

You're going to emit the postdate as the key because keys are sorted. For example, this is what your map function will look like...
function(doc) {
if(doc.postdate) {
emit(doc.postdate, doc);
}
}
That will give you all the docs sorted ascending by postdate. If you want descending then query with ?descending=true
Cheers.

Related

Distinct values in Azure Search Suggestions?

I am offloading my search feature on a relational database to Azure Search. My Products tables contains columns like serialNumber, PartNumber etc.. (there can be multiple serialNumbers with the same partNumber).
I want to create a suggestor that can autocomplete partNumbers. But in my scenario I am getting a lot of duplicates in the suggestions because the partNumber match was found in multiple entries.
How can I solve this problem ?

The Suggest API suggests documents, not queries. If you repeat the partNumber information for each serialNumber in your index and then suggest based on partNumber, you will get a result for each matching document. You can see this more clearly by including the key field in the $select parameter. Azure Search will eliminate duplicates within the same document, but not across documents. You will have to do that on the client side, or build a secondary index of partNumbers just for suggestions.
See this forum thread for a more in-depth discussion.
Also, feel free to vote on this UserVoice item to help us prioritize improvements to Suggestions.

I'm facing this problem myself. My solution does not involve a new index (this will only get messy and cost us money).
My take on this is a while-loop adding 'UserIdentity' (in your case, 'partNumber') to a filter, and re-search until my take/top-limit is met or no more suggestions exists:
public async Task<List<MachineSuggestionDTO>> SuggestMachineUser(string searchText, int take, string[] searchFields)
{
var indexClientMachine = _searchServiceClient.Indexes.GetClient(INDEX_MACHINE);
var suggestions = new List<MachineSuggestionDTO>();
var sp = new SuggestParameters
{
UseFuzzyMatching = true,
Top = 100 // Get maximum result for a chance to reduce search calls.
};
// Add searchfields if set
if (searchFields != null && searchFields.Count() != 0)
{
sp.SearchFields = searchFields;
}
// Loop until you get the desired ammount of suggestions, or if under desired ammount, the maximum.
while (suggestions.Count < take)
{
if (!await DistinctSuggestMachineUser(searchText, take, searchFields, suggestions, indexClientMachine, sp))
{
// If no more suggestions is found, we break the while-loop
break;
}
}
// Since the list might me bigger then the take, we return a narrowed list
return suggestions.Take(take).ToList();
}
private async Task<bool> DistinctSuggestMachineUser(string searchText, int take, string[] searchFields, List<MachineSuggestionDTO> suggestions, ISearchIndexClient indexClientMachine, SuggestParameters sp)
{
var response = await indexClientMachine.Documents.SuggestAsync<MachineSearchDocument>(searchText, SUGGESTION_MACHINE, sp);
if(response.Results.Count > 0){
// Fix filter if search is triggered once more
if (!string.IsNullOrEmpty(sp.Filter))
{
sp.Filter += " and ";
}
foreach (var result in response.Results.DistinctBy(r => new { r.Document.UserIdentity, r.Document.UserName, r.Document.UserCode}).Take(take))
{
var d = result.Document;
suggestions.Add(new MachineSuggestionDTO { Id = d.UserIdentity, Namn = d.UserNamn, Hkod = d.UserHkod, Intnr = d.UserIntnr });
// Add found UserIdentity to filter
sp.Filter += $"UserIdentity ne '{d.UserIdentity}' and ";
}
// Remove end of filter if it is run once more
if (sp.Filter.EndsWith(" and "))
{
sp.Filter = sp.Filter.Substring(0, sp.Filter.LastIndexOf(" and ", StringComparison.Ordinal));
}
}
// Returns false if no more suggestions is found
return response.Results.Count > 0;
}

public async Task<List<string>> SuggestionsAsync(bool highlights, bool fuzzy, string term)
{
SuggestParameters sp = new SuggestParameters()
{
UseFuzzyMatching = fuzzy,
Top = 100
};
if (highlights)
{
sp.HighlightPreTag = "<em>";
sp.HighlightPostTag = "</em>";
}
var suggestResult = await searchConfig.IndexClient.Documents.SuggestAsync(term, "mysuggestion", sp);
// Convert the suggest query results to a list that can be displayed in the client.
return suggestResult.Results.Select(x => x.Text).Distinct().Take(10).ToList();
}
After getting top 100 and using distinct it works for me.

You can use the Autocomplete API for that where does the grouping by default. However, if you need more fields together with the result, like, the partNo plus description it doesn't support it. The partNo will be distinct though.

mongodb : how to get documents that contains nested field data

suppose my data is
db.posts.save({postid:1,postdata:"hi am ",comments:["nice","whats bro"]});
so in this case how to iterate comments
means
cursor=selct * from posts;
cursor 1=selct * from comments where postid=:cursor.postid
for (i in cursor)
for(j in cursor1)

One way to iterate over the comments would be as follows:
db.posts.find().forEach(
function(doc) {
// iterate over the comments in your preferred javascript way
var i;
for (i=0; i<doc.comments.length; ++i) {
// Do whatever you want with doc.comments[i] here
}
}
)

Running couchdb view with group=false (SELECT DISTINCT type behavior)

I was following this guide on couchdb http://guide.couchdb.org/draft/cookbook.html#unique in order to return a distinct list from a view.
My map function looks like:
function(doc) {
if(doc.PartnerName !=null) {
emit(doc.PartnerName, null);
}
}
And, I have a reduce function:
function(keys, values) {
return true;
}
When I run this by hitting:
/dbName/_design/Partners/_view/my-view-name
I get this back:
{"rows":[
{"key":null,"value":true}
]}
If I add ?reduce=false to the end, I get back sort of the desired result:
{
"total_rows":11,"offset":0,
"rows":[
{"id":"a","key":"PARTNER_ONE","value":null},
{"id":"b","key":"PARTNER_ONE","value":null},
{"id":"c","key":"PARTNER_ONE","value":null},
{"id":"d","key":"PARTNER_ONE","value":null},
{"id":"e","key":"PARTNER_ONE","value":null},
{"id":"f","key":"PARTNER_ONE","value":null},
{"id":"g","key":"PARTNER_TWO","value":null},
{"id":"h","key":"PARTNER_TWO","value":null},
{"id":"i","key":"PARTNER_TWO","value":null},
{"id":"j","key":"PARTNER_THREE","value":null},
{"id":"k","key":"PARTNER_FOUR","value":null}
]}
However, I'm ideally trying to get a distinct list, so in the above example, it'd be PARTNER_ONE, PARTNER_TWO, PARTNER_THREE, PARTNER_FOUR

I think you are missing the group=true parameter. Try to query
/dbName/_design/Partners/_view/my-view-name?group=true
and see if that gives you the correct result.

Map/Reduce differences between Couchbase & CloudAnt

I've been playing around with Couchbase Server and now just tried replicating my local db to Cloudant, but am getting conflicting results for my map/reduce function pair to build a set of unique tags with their associated projects...
// map.js
function(doc) {
if (doc.tags) {
for(var t in doc.tags) {
emit(doc.tags[t], doc._id);
}
}
}
// reduce.js
function(key,values,rereduce) {
if (!rereduce) {
var res=[];
for(var v in values) {
res.push(values[v]);
}
return res;
} else {
return values.length;
}
}
In Cloudbase server this returns JSON like:
{"rows":[
{"key":"3d","value":["project1","project3","project8","project10"]},
{"key":"agents","value":["project2"]},
{"key":"fabrication","value":["project3","project5"]}
]}
That's exactly what I wanted & expected. However, the same query on the Cloudant replica, returns this:
{"rows":[
{"key":"3d","value":4},
{"key":"agents","value":1},
{"key":"fabrication","value":2}
]}
So it somehow only returns the length of the value array... Highly confusing & am grateful for any insights by some M&R ninjas... ;)

It looks like this is exactly the behavior you would expect given your reduce function. The key part is this:
else {
return values.length;
}
In Cloudant, rereduce is always called (since the reduce needs to span over multiple shards.) In this case, rereduce calls values.length, which will only return the length of the array.

I prefer to reduce/re-reduce implicitly rather than depending on the rereduce parameter.
function(doc) { // map
if (doc.tags) {
for(var t in doc.tags) {
emit(doc.tags[t], {id:doc._id, tag:doc.tags[t]});
}
}
}
Then reduce checks whether it is accumulating document ids from the identical tag, or whether it is just counting different tags.
function(keys, vals, rereduce) {
var initial_tag = vals[0].tag;
return vals.reduce(function(state, val) {
if(initial_tag && val.tag === initial_tag) {
// Accumulate ids which produced this tag.
var ids = state.ids;
if(!ids)
ids = [ state.id ]; // Build initial list from the state's id.
return { tag: val.tag,
, ids: ids.concat([val.id])
};
} else {
var state_count = state.ids ? state.ids.length : state;
var val_count = val.ids ? val.ids.length : val;
return state_count + val_count;
}
})
}
(I didn't test this code, but you get the idea. As long as the tag value is the same, it doesn't matter whether it's a reduce or rereduce. Once different tags start reducing together, it detects that because the tag value will change. So at that point just start accumulating.
I have used this trick before, although IMO it's rarely worth it.
Also in your specific case, this is a dangerous reduce function. You are building a wide list to see all the docs that have a tag. CouchDB likes tall lists, not fat lists. If you want to see all the docs that have a tag, you could map them.
for(var a = 0; a < doc.tags.length; a++) {
emit(doc.tags[a], doc._id);
}
Now you can query /db/_design/app/_view/docs_by_tag?key="3d" and you should get
{"total_rows":287,"offset":30,"rows":[
{"id":"project1","key":"3d","value":"project1"}
{"id":"project3","key":"3d","value":"project3"}
{"id":"project8","key":"3d","value":"project8"}
{"id":"project10","key":"3d","value":"project10"}
]}

Select top/latest 10 in couchdb?

How would I execute a query equivalent to "select top 10" in couch db?
For example I have a "schema" like so:
title body modified
and I want to select the last 10 modified documents.
As an added bonus if anyone can come up with a way to do the same only per category. So for:
title category body modified
return a list of latest 10 documents in each category.
I am just wondering if such a query is possible in couchdb.

To get the first 10 documents from your db you can use the limit query option.
E.g. calling
http://localhost:5984/yourdb/_design/design_doc/_view/view_name?limit=10
You get the first 10 documents.
View rows are sorted by the key; adding descending=true in the querystring will reverse their order. You can also emit only the documents you are interested using again the querystring to select the keys you are interested.
So in your view you write your map function like:
function(doc) {
emit([doc.category, doc.modified], doc);
}
And you query it like this:
http://localhost:5984/yourdb/_design/design_doc/_view/view_name?startkey=["youcategory"]&endkey=["youcategory", date_in_the_future]&limit=10&descending=true

here is what you need to do.
Map function
function(doc)
{
if (doc.category)
{
emit(['category', doc.category], doc.modified);
}
}
then you need a list function that groups them, you might be temped to abuse a reduce and do this, but it will probably throw errors because of not reducing fast enough with large sets of data.
function(head, req)
{
% this sort function assumes that modifed is a number
% and it sorts in descending order
function sortCategory(a,b) { b.value - a.value; }
var categories = {};
var category;
var id;
var row;
while (row = getRow())
{
if (!categories[row.key[0]])
{
categories[row.key[0]] = [];
}
categories[row.key[0]].push(row);
}
for (var cat in categories)
{
categories[cat].sort(sortCategory);
categories[cat] = categories[cat].slice(0,10);
}
send(toJSON(categories));
}
you can get all categories top 10 now with
http://localhost:5984/database/_design/doc/_list/top_ten/by_categories
and get the docs with
http://localhost:5984/database/_design/doc/_list/top_ten/by_categories?include_docs=true
now you can query this with a multiple range POST and limit which categories
curl -X POST http://localhost:5984/database/_design/doc/_list/top_ten/by_categories -d '{"keys":[["category1"],["category2",["category3"]]}'
you could also not hard code the 10 and pass the number in through the req variable.
Here is some more View/List trickery.

slight correction. it was not sorting untill I added the "return" keyword in your sortCategory function. It should be like this:
function sortCategory(a,b) { return b.value - a.value; }

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Getting a list of documents with a maximum field value from CouchDB view - couchdb

Related

Distinct values in Azure Search Suggestions?

mongodb : how to get documents that contains nested field data

Running couchdb view with group=false (SELECT DISTINCT type behavior)

Map/Reduce differences between Couchbase & CloudAnt

Select top/latest 10 in couchdb?

Categories

Resources