Batch update/save documents in Truevault? - truevault

Is there a way to batch save/update documents in Truevault? The current documentation only shows how to save/update individual documents.

Currently TrueVault does not have a way to update multiple Documents in one API call. We are looking to add this functionality in the future.

Related

How to update documents with map/reduce functions?

I am searching for a way to update and persist documents from within the reduce function of a custom design document. Is it possible? How can I do that?
I am searching for a way to update and persist documents from within the reduce function of a custom design document. Is it possible?
No. It is not possible.

Find the tables and fields in a saved search for Records Browser Item

The issue we are having is trying to map/relate the fields with different tables from result of saved search created on Records Browser Item(http://www.netsuite.com/help/helpcen...cord/item.html).
We have a retail inventory management system with many modules. So the attempt relating our columns to NetSuite has been going on for a while without any conclusion.
The approach we are trying is to run SuiteScript on the debugger and view the dataset. We were successful those with relatively little volume of data. As the limit is 10,000 rows, we are stuck with Search on Item that returns 1Mil. records. The search returns this volume of data when we add all the search columns. The problem the process of add/removing individual columns is rigorous and just with one column it returns more than 10,000 rows. So it becomes impossible to fetch the data and complete the mapping process.
So I would like to know if there is any way we can only see the schema and their relationships for a saved search?
Thanks.
In SuiteScript 1.0, this can be achieved by a scheduled script that creates multiple CSV files from a saved search (SuiteAnswers article 36206). You'll have to get around the search limit (SuiteAnswers article 33496) AND the governance limit (SuiteAnswers article 23406). If you make the file Available Without Login, you should be able to retrieve the CSV with an HTTP GET request without credentials. However, that will make the data potentially viewable by anyone who knows the URL--a security concern that you will have to consider.
In SuiteScript 2.0, this can probably be achieved with a Map/Reduce script (SuiteAnswers article 43795). This may be a better way to optimize the script, but I have not tested it myself in SuiteScript 2.0.

Applying "tag" to millions of documents, using bulk/update methods

We have in our ElasticSearch instance about 55.000.000 of documents. We have a CSV file with user_ids, the biggest CSV has 9M entries. Our documents have user_id as the key, so this is convenient.
I am posting the question because I want to discuss and have the best option to get this done, as there are different ways to address this problem. We need to add the new "label" to the document if the user document doesn't have it yet eg tagging the user with "stackoverflow" or "github".
There is the classic partial update endpoint. This sounds way slow as we need to iterate over 9M of user_ids and issue the api call for each of them.
there is the bulk request, which provides some better performance but with limited 1000-5000 documents that can be mentioned in one call. And knowing when the batch is too large is kinda know how we need to learn on the go.
Then there is the official open issue for /update_by_query endpoint which has lots of traffic, but no confirmation it was implemented in the standard release.
On this open issue there is a mention for a update_by_query plugin which should provide some better handling, but there are old and open issues where users are complaining of performance problems and memory issues.
I am not sure it it's doable on EL, but I thought I would load all the CSV entries into a separate index, and somehow would join the two indexes and apply script that would add the tag if doesn't exists yet.
So the question remains whats the best way to do this, and if some of you have done in past this, make sure you share your numbers/performance and how you would do differently this time.
While waiting for update by query support, I have opted for:
Use the scan/scroll API to loop over the document IDs you want to tag (related answer).
Use the bulk API to perform partial updates to set the tag on every matching doc.
Additionally I store the tag data (your CSV) in a separate doc type, and query from that and tag all new docs as they are created, i.e., to not have to first index and then update.
Python snippet to illustrate the approach:
def actiongen():
docs = helpers.scan(es, query=myquery, index=myindex, fields=['_id'])
for doc in docs:
yield {
'_op_type': 'update',
'_index': doc['_index'],
'_type': doc['_type'],
'_id': doc['_id'],
'doc': {'tags': tags},
}
helpers.bulk(es, actiongen(), index=args.index, stats_only=True)
Using the aforementioned update-by-query plugin, you would simply call:
curl -XPOST localhost:9200/index/type/_update_by_query -d '{
"query": {"filtered": {"filter":{
"not": {"term": {"tag": "github"}}
}}},
"script": "ctx._source.label = \"github\""
}'
The update-by-query plugin only accepts a script, not partial documents.
As for performance and memory issues, I guess the best thing is to give it a try.
I'd go with the bulk API with the caveat that you should try to update each document the minimal number of times. Updates are just atomic deletes and adds and leave behind the deleted document as a tombstone until it can be merged out.
Sending a groovy script to execute the update probably makes the most sense here so you don't have to fetch the document first.
Could you create a Parent/Child relationship whereby you can add a 'tags' type which references your 'posts' type as its parent. This way you wouldn't need to perform a full reindex of your data - simply index each of the appropriate tags against the appropriate post ID.
A very old thread. Landed through the github page to implement "update by query" to see if it's implemented in 2.0 but unluckily not. Thanks to plugin from Teka, if the update is small, that very much doable from sense but our use case was to update million of documents daily based on certain complex queries. At the end, we moved to es-hadoop connector. Although infrastructure is a big big overhead here but parallelizing the process of fetching/updating/inserting document through spark helped us anyhow. If anyone has any other suggestion discovered :) in past one year, would love to hear on that.

CouchDB, how to get document changes only

Using /_changes?filter=_design I can get all the changes for design documents.
How do I get all the changes for documents only?
Is there such a thing like /_changes?filter=_docs_only ???
There is no built in filter for this. You will need to write your own filter function (http://couchdb.readthedocs.org/en/latest/couchapp/ddocs.html#filterfun) that excludes design documents (check the doc's _id for "_design/", etc.) from the feed. You then reference this filter function when you query the changes feed (http://couchdb.readthedocs.org/en/latest/api/database/changes.html?highlight=changes). However, most applications don't run into this too often since design documents are typically only updated when there is an application change.
It would probably be more efficient to implement this filter on the client side instead of streaming all your changes to the couchjs process (always inefficient). As your application loops through the changes simply check whether it is a design doc there.
Cheers.

CouchDB bulk update based on document values

I would like to update all documents where doc.type = "article".
From what I understand _bulk_docs works on all documents. To narrow down affected docs one can use key value/range.
This is not ideal because I have different types of documents in database. I hoped I can update all documents returned by a view but it seams to be not possible (please correct me if I'm wrong).
The only solution I can think of is prefixing all keys with document type but is that a reasonable approach?
There is no way of doing this in CouchDB. Moreover, there is not much sense in doing this, since in CouchDB you can only update whole document, not just some properties. So if you is was possible to achieve what you want, it would make all the documents identical.
You could
fetch all documents where doc.type == "article" -- you'd probably use a view for this
make all modifications locally
upload all documents using _bulk_docs
If the number of documents matching your criterion are too large to fit in a single request, you'd have to make multiple requests to _bulk_docs. Also doing this could introduce conflicts that you'd have to resolve afterwards.

Resources