How to get from CouchDB only certain fields of certain documents for a single request? - couchdb

For example I have a thousands of documents with same structure, for example:
{
"key_1":"value_1",
"key_2":"value_2",
"key_3":"value_3",
...
...
}
And I need to get, let's say key_1, key_3 and key_23 from some set of documents with known IDs, for example, I need to process only 5 documents while my DB contains several thousands. Each time I have a different set of keys and document IDs. Is it possible to get that information for a one request?

You can use a list function (see: this, this, and this).
Since you know the ids, you can then query _all_docs with the list function:
POST /{db}/_design/{ddoc}/_list/{func}/_all_docs?include_docs=true&columns=["key_1","key_2","key_3"]
Accept: application/json
Content-Length: {whatever}
{
"keys": [
"docid002",
"docid005"
]
}
The list function needs to look at documents, and send the appropriate JSON for each one. Not tested:
(function (head, req) {
send('{"total_rows":' + head.total_rows + ',"offset":' + head.offset + ',"rows":[');
var columns = JSON.parse(req.query.columns);
var delim = '';
var row;
while (row = getRow()) {
var doc = {};
for (var k in columns) {
doc[k] = row.doc[k];
}
row.doc = doc;
send(delim + toJSON(row));
delim = ',';
}
send(']}');
})
Whether this is a good idea, I'm not sure. If your documents are big, and bandwidth savings important, it might.

Yes, that’s possible. Your question can be broken up into two distinct problems:
Getting only a part of the document (in your example: key_1, key_3 and key_23). This can be done using a view. A view is saved into a design document. See the wiki for more info on how to create views.
Retrieving only certain documents, which are defined by their ID. When querying views, you cannot only specify a single ID (or rather key), but also an array of keys, which is what you would need here. Again, see the section on querying views in the wiki for explanations and examples.

Even though you only need a subset of values from a document, you may find that the system as a whole performs better if you just ask for the entire document then select the values you need from that result.
To only get the specific key value pairs you need to create a view that has view entries with a multipart key consisting of the doc id and doc item name, with value of the corresponding doc item.
So your map function would look something like:
function(doc){
for(var i = 1; i < doc.keysInDoc; i++){
var k = "key_"+i;
emit([doc._id, k], doc.[k]);
}
}
You can then use multi key lookup with each key being of the form ["docid12345", "key_1"], ["docid56789", "key_23"], etc.
So a query like:
http://host:5984/db/_design/design/_view/view?&keys=[["docid002","key_8"],["docid005","key_7"]]
will return
{"total_rows":84,"offset":67,"rows":[
{"id":"docid002","key":["docid002","key_8"],"value":"value d2_k8"},
{"id":"docid005","key":["docid005","key_12"],"value":"value d5_k12"}
]}

Related

Select parts of a nlobjSearchResult

I have a large nlobjSearchResultSet object with over 18,000 "results".
Each result is a pricing record for a customer. There may be multiple records for a single customer.
As 18000+ records is costly in governance points to do mass changes, I'm migrating to a parent (customer) record and child records (items) so I can make changes to the item pricing as a sublist.
As part of this migration, is there a simple command to select only the nlapiSearchResult objects within the big object, which match certain criteria (ie. the customer id).
This would allow me to migrate the data with only the one search, then only subsequent create/saves of the new record format.
IN a related manner, is there a simple function call to return to number of records contained in a given netsuite record? For % progress context.
Thanks in advance.
you can actually get the number of rows by running the search with an added aggregate column. A generic way to do this for a saved search that doesn't have any aggregate columns is shown below:
var res = nlapiSearchRecord('salesorder', 'customsearch29', null,
[new nlobjSearchColumn('formulanumeric', null, 'sum').setFormula('1')]);
var rowCount = res[0].getValue('formulanumeric', null, 'sum');
console.log(rowCount);
To get the total number of records, the only way is do a saved search, an ideal way to do such search is using nlobjSearch
Below is a sample code for getting infinite search Results and number of records
var search = nlapiLoadSearch(null, SAVED_SEARCH_ID).runSearch();
var res = [],
currentRes;
var i = 0;
while(i % 1000 === 0){
currentRes = (search.getResults(i, i+1000) || []);
res = res.concat(currentRes );
i = i + currentRes.length;
}
res.length or i will give you the total number of records and res will give you all the results.

How to retrieve all documents in couchdb database without causing out of memory

I have a coucdb database which contains about 200000 tweets, keys are tweet ID. I have a query which needs to retrieve all documents to look for some information. I'm using lightcouch to work with couchdb in a java web app. If I create a dbClient like this:
List<JsonObject>tweets = dbClient.view("_all_docs").query(JsonObject.class);
and then loop through tweets, for each JsonObject in tweets, use
JsonObject tweetJson = dbClient.find(JsonObject.class, tweet.get("id").toString().replaceAll("\"", ""));
to retrieve each tweet one by one it took extremely long time for 200000 documents. If I load all documents in one single query using includeDocs(true)
List<JsonObject>allTweets = dbClient.view("_all_docs").includeDocs(true).query(JsonObject.class);
it caused outofmemory exception since the number of documents are too large. So how can i deal with this problem? I'm thinking about using limit(5000) to retrieve 5000 documents for each time and loop through whole database, but I don't know how to write the loop to continue to retrieve the next 5000 after the first 5000 docs. One possible solution is using startKey and endKey but I'm confused how to use them when the key is tweet ID.
Use queryPage but make sure to use a String as the Key
See: https://github.com/lightcouch/LightCouch/issues/26#event-122327174
0.1.6 still seems to show this behaviour.
A workaround that I found for this goes something like this:
changes = DbClient.changes()
.since(null) // or... since(since) if you want an offset
.includeDocs(true);
int size = 1;
getCursor("0");
while (size > 0 ) {
ChangesResult resultSet = changes.limit(40000).getChanges();
List<ChangesResult.Row> rowList = resultSet.getResults();
for (ChangesResult.Row feed: rowList) {
<instantiate your object via gson>
.
.
.
}
getCursor(resultSet.getLastSeq());
size = rowList.size();
}

How to search and sort with CouchDB in one map function

I'm stumbling a bit with my CouchDB knowledge.
I have a database of content that is tagged with an array of tags and has a created date.
I want to create a view that pulls a limited number of newest stories tagged with a specific tag.
For example, the newest 6 stories tagged "Business."
Ran across this question, which seems to get me almost to where I need to go, but I'm missing one key element, which I think is how to craft the query string to sort by one key while searching by the other.
Here's my map function.
function(doc) {
if (doc.published == "yes" && doc.type == "news") {
for (var i = 0; i < doc.tags.length; i++) {
if (doc.tags[i]) {
emit([doc.created, doc.tags[i]], doc);
}
}
}
}
So how do I query that view for a all documents tagged "Business" that are the newest documents based on created.
The created attribute is a date sortable format.
First, I would switch the order of your emit:
emit([doc.tags[i], doc.created]);
(leave out doc as well, you can just add include_docs=true to get the entire document, and your view won't take up so much disk-space in the process)
Now you can query for the all the stories tagged as "Business" by using the following querystring:
startkey=["Business"]&endkey=["Business",{}]
You'll get all the documents with the tag business, and they'll be sorted by date.
This takes advantage of view collation, which basically is the rules governing how indexes are sorted/queried. For complex keys like this, the sorting is done for each item of the array separately. (ie. the first key is sorted first, the second key is sorted second, etc) This is why the order matters, as you must always move from left to right when querying a view index.
If you want the 6 most recent, your querystring will need to change:
descending=true&limit=6&endkey=["Business"]&startkey=["Business",{}]
NOTICE You need to swap the startkey/endkey values, due to how the descending parameter works. See the View reference page on the wiki for further explanation.
OK, I think I figured this out, but I'm not quite certain I fully understand it.
I found this story about complex keys and searching and sorting.
My map function looks like this:
function(doc) {
if (doc.published == "yes" && doc.type == "news") {
for (var i = 0; i < doc.tags.length; i++) {
if (doc.tags[i]) {
emit([doc.tags[i], doc.created], doc);
}
}
}
}
And to query and sort using it, the query looks like this.
http://localhost:5984/database/_design/story/_view/tagged?limit=10&startkey=["Business"]&endkey=["Business",{}]&descending=false
I'm getting the results I want, but I'm not entirely certain I understand it all.

Searching required data in couchdb

I have documents like,
{_id:1,
name:"john"
}
{_id:2,
name:"john boss"
}
{_id:3,
name:"jim"
}
I have to search the data where ever john is stored in documents. Suppose, if i search "john" the documents should get _id:1 & _id:2 related data. Please guide me to get the result.
I appreciate if any one provide the solutions.
I suggest a CouchDB view to show you all "words" from the "name" field.
function(doc) {
// map function: _design/example/_view/names
if(!doc.name) // Optionally do more testing for doc type, etc. here.
return
// Emit one row per word in the name field (first name, last name, etc.).
var words = doc.name.split(/\s+/)
for(var i = 0; i < words.length; i++)
emit(words[i].toLowerCase(), doc._id)
}
Now if you query /db/_design/example/_view/names?key="john", you will get two rows: one for doc id 1, and another for id 2. I also added a conversion to lower case, so searching for "john" will match people named "John".
Duplicates are possible: the same doc ID listed multiple times, e.g. for {"name":"John John"}; however you are guaranteed that all duplicate rows will be adjacent.
You can also add ?include_docs=true to your request to get the full document for each row.

What is the best way to retrieve distinct / unique values using SPQuery?

I have a list that looks like:
Movie Year
----- ----
Fight Club 1999
The Matrix 1999
Pulp Fiction 1994
Using CAML and the SPQuery object I need to get a distinct list of items from the Year column which will populate a drop down control.
Searching around there doesn't appear to be a way of doing this within the CAML query. I'm wondering how people have gone about achieving this?
Another way to do this is to use DataView.ToTable-Method - its first parameter is the one that makes the list distinct.
SPList movies = SPContext.Current.Web.Lists["Movies"];
SPQuery query = new SPQuery();
query.Query = "<OrderBy><FieldRef Name='Year' /></OrderBy>";
DataTable tempTbl = movies.GetItems(query).GetDataTable();
DataView v = new DataView(tempTbl);
String[] columns = {"Year"};
DataTable tbl = v.ToTable(true, columns);
You can then proceed using the DataTable tbl.
If you want to bind the distinct results to a DataSource of for example a Repeater and retain the actual item via the ItemDataBound events' e.Item.DataItem method, the DataTable way is not going to work. Instead, and besides also when not wanting to bind it to a DataSource, you could also use Linq to define the distinct values.
// Retrieve the list. NEVER use the Web.Lists["Movies"] option as in the other examples as this will enumerate every list in your SPWeb and may cause serious performance issues
var list = SPContext.Current.Web.Lists.TryGetList("Movies");
// Make sure the list was successfully retrieved
if(list == null) return;
// Retrieve all items in the list
var items = list.GetItems();
// Filter the items in the results to only retain distinct items in an 2D array
var distinctItems = (from SPListItem item in items select item["Year"]).Distinct().ToArray()
// Bind results to the repeater
Repeater.DataSource = distinctItems;
Repeater.DataBind();
Remember that since there is no CAML support for distinct queries, each sample provided on this page will retrieve ALL items from the SPList. This may be fine for smaller lists, but for lists with thousands of listitems, this will seriously be a performance killer. Unfortunately there is no more optimized way of achieving the same.
There is no DISTINCT in CAML to populate your dropdown try using something like:
foreach (SPListItem listItem in listItems)
{
if ( null == ddlYear.Items.FindByText(listItem["Year"].ToString()) )
{
ListItem ThisItem = new ListItem();
ThisItem.Text = listItem["Year"].ToString();
ThisItem.Value = listItem["Year"].ToString();
ddlYear.Items.Add(ThisItem);
}
}
Assumes your dropdown is called ddlYear.
Can you switch from SPQuery to SPSiteDataQuery? You should be able to, without any problems.
After that, you can use standard ado.net behaviour:
SPSiteDataQuery query = new SPSiteDataQuery();
/// ... populate your query here. Make sure you add Year to the ViewFields.
DataTable table = SPContext.Current.Web.GetSiteData(query);
//create a new dataview for our table
DataView view = new DataView(table);
//and finally create a new datatable with unique values on the columns specified
DataTable tableUnique = view.ToTable(true, "Year");
After coming across post after post about how this was impossible, I've finally found a way. This has been tested in SharePoint Online. Here's a function that will get you all unique values for a column. It just requires you to pass in the list Id, View Id, internal list name, and a callback function.
function getUniqueColumnValues(listid, viewid, column, _callback){
var uniqueVals = [];
$.ajax({
url: _spPageContextInfo.webAbsoluteUrl + "/_layouts/15/filter.aspx?ListId={" + listid + "}&FieldInternalName=" + column + "&ViewId={" + viewid + "}&FilterOnly=1&Filter=1",
method: "GET",
headers: { "Accept": "application/json; odata=verbose" }
}).then(function(response) {
$(response).find('OPTION').each(function(a,b){
if ($(b)[0].value) {
uniqueVals.push($(b)[0].value);
}
});
_callback(true,uniqueVals);
},function(){
_callback(false,"Error retrieving unique column values");
});
}
I was considering this problem earlier today, and the best solution I could think of uses the following algorithm (sorry, no code at the moment):
L is a list of known values (starts populated with the static Choice options when querying fill-in options, for example)
X is approximately the number of possible options
1. Create a query that excludes the items in L
1. Use the query to fetch X items from list (ordered as randomly as possible)
2. Add unique items to L
3. Repeat 1 - 3 until number of fetched items < X
This would reduce the total number of items returned significantly, at the cost of making more queries.
It doesn't much matter if X is entirely accurate, but the randomness is quite important. Essentially the first query is likely to include the most common options, so the second query will exclude these and is likely to include the next most common options and so on through the iterations.
In the best case, the first query includes all the options, then the second query will be empty. (X items retrieved in total, over 2 queries)
In the worst case (e.g. the query is ordered by the options we're looking for, and there are more than X items with each option) we'll make as many queries as there are options. Returning approximately X * X items in total.

Resources