How to use Cosmos DB trigger to merge document - azure

I have a project that using Azure Stream Analytics to save messages from an event hub to the Cosmos DB.
But since Cosmos upsert overwrite the existing document, and I dont want to lose any existing data elements.
The idea here is to use the Cosmos DB pre-trigger that find the current data with same id and merge to the new document before saving it. The trigger looks like this:
function mergeData(){
var context = getContext();
var collection = context.getCollection();
var container = context.getCollection();
var request = context.getRequest();
// item to be created
var newItem = request.getBody();
// query for latest existing records
var filterQuery = "SELECT * FROM root c where c.id=\"" + newItem.id + "\" order by c._ts desc";
var accept = container.queryDocuments(container.getSelfLink(), filterQuery,
queryOldRecords);
if (!accept) throw "Unable to query for older documents, abort";
function queryOldRecords(err, items, responseOptions) {
if (err) throw new Error("Error" + err.message);
if (items.length > 0) {
// merge logic here
newItem.name = items[0].name;
request.setBody(newItem);
}
}
}
The trigger is set as pre and for replace with the Cosmos DB container.
However the trigger does not work, I keep seeing ASA upsert overwrite the document and lost previous data.
(Have not tested with code to direct insert records to Cosmos DB.)
Here are the questions:
Will the trigger be automatically executed when the ASA upsert occurred?
If not do I need to code in the ASA to include the trigger to be executed and how?
How to debug the Cosmos DB trigger in general and specially in this case?
Is there other way to achieve this?
Thanks for any help in advance.

From this link: ASA documentdb output
With compatibility level 1.0, Stream Analytics performs this update as a PATCH operation, so it enables partial updates to the document. Stream Analytics adds new properties or replaces an existing property incrementally. However, changes in the values of array properties in your JSON document result in overwriting the entire array. That is, the array isn't merged.
With 1.2, upsert behavior is modified to insert or replace the document.
I switched ASA to level 1.0, it does not even connect to Cosmos DB.
But with level 1.1, the update is working as PATCH operations which the json element not be removed if not presenting in the new document, and updated if value is different.
The level 1.2 (as default) is just doing the upsert without PATCH operation.

Related

Azure Cosmos DB: Unique index constraint violation using UpsertDocumentAsync

I have defined a UniqueKey policy in my Azure Cosmos DB Container, for field UniqueName
The below function is being called on a timer.
I'm attempting to Upsert documents in Azure Cosmos DB using Azure Functions bindings, like so:
public async Task ManageItems([ActivityTrigger] string records,
[CosmosDB(
databaseName: "mydatabase",
collectionName: "items",
ConnectionStringSetting = "CosmosDbConnectionString")] DocumentClient client,
ILogger log)
{
var collectionUri = UriFactory.CreateDocumentCollectionUri("mydatabase", "items");
foreach (var record in records)
{
log.LogDebug($"Upserting itemNumber={record.UniqueName}");
await client.UpsertDocumentAsync(collectionUri, record);
}
}
During the first execution in a blank "items" container, the Upsert for each record works splendidly, inserting each record as a specific document.
However when doing a test of the same data as the first execution, but now expecting an "Update" as opposed to an "Insert" attempt, I get an exception:
Unique index constraint violation after UpsertDocumentAsync method runs.
What am I missing here?
To my understanding, an Upsert is either an update or insert, depending on whether the object exists or not, via it's unique identifier.
The check of whether the outgoing object unique id from the method matches the existing document unique id is supposed to be happening at the Cosmos DB container level.
What I expect to happen is the call notices that the document with that unique ID already exists, and it performs an update, not throw an exception. I would expect it to throw an exception if the method was Insert only.
This issue was fixed by specifying an explicit "id" field in the class where the "records" came from.
The "id" was set to the unique "recordNumber" that I wanted to use as a unique value.
For good measure I set the disableAutomaticIdGeneration to true in the UpsertDocumentAsync method
UpsertDocumentAsync(collectionUri, record, disableAutomaticIdGeneration:true);
No more unique index violation, and no duplicates either.
Worth noting the solution is similar to this one: How can I insert/update data in CosmosDB in an Azure function

Azure Cosmos DB - Update existing documents with an additional field

Existing Cosmos DB documents need to be altered/updated with a new property & also existing documents of other collections need to be updated with the same new property along with its value.
Is there any recommended way or tool available to update existing documents on Cosmos DB, or is writing the custom c# application/PowerShell script using Cosmos DB SDK is the only option?
Example:
Existing user document
{
id:user1#mail.com,
name: "abc",
country: "xyz"
}
Updated user document
{
id:user1#mail.com,
name: "abc",
country: "xyz",
guid:"4334fdfsfewr" //new field
}
Existing order document of the user
{
id:user1#mail.com,
user: "user1#mail.com",
date: "09/28/2020",
amt: "$45"
}
Updated order document of the user
{
id:user1#mail.com,
user: "user1#mail.com",
userid: "4334fdfsfewr", // new field but with same value as in user model
date: "09/28/2020",
amt: "$45"
}
I'd probably go with:
Update user documents through a script
Have Azure Function with Cosmosdb trigger that would listen to changes on users documents and update orders appropriately
[UPDATE]
whatever type of script you feel best with: PS, C#, Azure Functions...
now, what do you mean they need to be altered with the new property "on the same time"? i'm not sure that's possible in any way. if you want such an effect then i guess your best bet is:
create new collection/container for users
have an Azure Function that listens to a change feed for your existing users container (so, with StartFromBeginning option)
update your documents to have new field and store them in a newly created container
once done, switch your application to use new container
its your choice how would you change other collections (orders): using changeFeed & Azure Functions from old or new users container.
PS.
Yes, whatever flow i'd go with, it would still be Azure Functions with Cosmos DB trigger.
I have added some solution for .Net Core API 3.0 or higher version.
// You can put any filter for result
var result = _containers.GetItemLinqQueryable<MessageNoteModel>().Where(d => d.id == Id
&& d.work_id.ToLower() == workId.ToLower()).ToFeedIterator();
if (result.HasMoreResults)
{
var existingDocuments = result.ReadNextAsync().Result?.ToList();
existingDocuments.ForEach(document =>
{
//Creating the partition key of the document
var partitionKey = new PartitionKey(document?.work_id);
document.IsConversation = true;
//Inserting/Updating the message in to cosmos db collection: Name
_containers.Twistle.ReplaceItemAsync(document, document.document_id, partitionKey);
});
}
We had the same issue of updating the Cosmos DB schema for existing documents. We were able to achieve this through a custom JsonSerializer.
We created CosmosJsonDotNetSerializer inspired from Cosmos DB SDK. CosmosJsonDotNetSerializer exposes the FromStream method that allows us to deal with raw JSON. You can update the FromStream method to update document schema to your latest version. Here is the pseudo-code:
public override T FromStream<T>(Stream stream)
{
using (stream)
{
if (typeof(Stream).IsAssignableFrom(typeof(T)))
{
return (T)(object)stream;
}
using (var sr = new StreamReader(stream))
{
using (var jsonTextReader = new JsonTextReader(sr))
{
var jsonSerializer = GetSerializer();
return UpdateSchemaVersion<T>(jsonSerializer.Deserialize<JObject>(jsonTextReader));
}
}
}
}
private T UpdateSchemaVersonToCurrent<T>(JObject jObject)
{
// Add logic to update JOjbect to the latest version. For e.g.
jObject["guid"] = Guid.NewGuid().ToString();
return jObject.ToObject<T>();
}
You can set Serializer to CosmosJsonDotNetSerializer in CosmosClientOptions while creating CosmosClient.
var cosmosClient = new CosmosClient("<cosmosDBConnectionString>",
new CosmosClientOptions
{
Serializer = new CosmosJsonDotNetSerializer()
};
This way, you always deal with the latest Cosmos document throughout the code, and when you save the entity back to Cosmos, it is persisted with the latest schema version.
You can take this further by running schema migration as a separate process, for example, inside an Azure function, where you load old documents, convert them to the latest version and then save it back to Cosmos.
I also wrote a post on Cosmos document schema update that explains this in detail.

Sequelize upsert with conditions on previous data

I have a server that accepts RESTful-style requests. For PUT of a certain class of objects, I want to either insert a new record with the request body data, or update a previous record. For that, I'm using Sequelize upsert.
app.put('/someModel/:id', (req, res, next) => {
const data = JSON.parse(JSON.stringify(req.body));
data.id = req.params.id;
models.SomeModel.upsert(data).then(() => {
return models.SomeModel.findByPk(req.params.id);
}).then(res.send.bind(res)).catch(next);
});
On update, I only want to update the database record if it hasn't previously been changed. That is, if a Client A fetches a record, Client B then modifies that record, and then Client A subsequently tries to update it, the request should fail since the record was updated by Client B in between Client A's fetch and update. (On the HTTP side, I'm using the If-Unmodified-Since request header.)
Basically, I need a way to:
UPDATE table
SET
field1="value1",
field2="value2"
WHERE
id=1234 AND
updatedAt="2019-04-16 17:41:10";
That way, if the record has been updated previously, this query won't modify data.
How can I use Sequelize upsert to generate SQL like this?
At least for MySql, that wouldn't be valid - upsert results in
insert into table (field1) values ('value1') on duplicate key update field1 = values(value1);
The syntax doesn't allow for a WHERE clause. Another DBMS, maybe?
With raw sql, you could fake it with the IF function:
INSERT INTO table (field1) VALUES ('value1') ON DUPLICATE KEY UPDATE
field1 = IF(updatedAt = "2019-04-16 17:41:10", values(field1), field1);
but it's kind of ugly.

Use cloudant/couchDB update handlers to write records into another database

I am using IBM cloudant's update handlers to add timestamp on document when it is created/updated. I am able to use the following function to add timestamp to the documents in the update handlers' database.
function(doc, req) {
if (!doc) {
doc = {_id: req.uuid};
}
var body = JSON.parse(req.body);
for (key in body){
doc[key] = body[key];
}
doc.timestamp = + new Date();
return [doc, JSON.stringify(doc)];
}
However, I would like to keep all the history in another database (saying HISTORY database). How could I insert a document from current database's update handlers to another database? Thank you.
One potential solution might be to set up continuous replication and define the update handler on the target database. The replication source database would be your HISTORY database containing the original documents and the target database store the time-stamped documents.

Inserting records without failing on duplicate

I'm inserting a lot of documents in bulk with the latest node.js native driver (2.0).
My collection has an index on the URL field, and I'm bound to get duplicates out of the thousands of lines I insert. Is there a way for MongoDB to not crash when it encounters a duplicate?
Right now I'm batching records 1000 at a time, and Using insertMany. I've tried various things, including adding {continueOnError=true}. I tried inserting my records one by one, but it's just too slow, I have thousands of workers in a queue and can't really afford the delay.
Collection definition :
self.prods = db.collection('products');
self.prods.ensureIndex({url:1},{unique:true}, function() {});
Insert :
MongoProcessor.prototype._batchInsert= function(coll,items){
var self = this;
if(items.length>0){
var batch = [];
var l = items.length;
for (var i = 0; i < 999; i++) {
if(i<l){
batch.push(items.shift());
}
if(i===998){
coll.insertMany(batch, {continueOnError: true},function(err,res){
if(err) console.log(err);
if(res) console.log('Inserted products: '+res.insertedCount+' / '+batch.length);
self._batchInsert(coll,items);
});
}
}
}else{
self._terminate();
}
};
I was thinking of dropping the index before the insert, then reindexing using dropDups, but it seems a bit hacky, my workers are clustered and I have no idea what would happen if they try to insert records while another process is reindexing... Does anyone have a better idea?
Edit :
I forgot to mention one thing. The items I insert have a 'processed' field which is set to 'false'. However the items already in the db may have been processed, so the field can be 'true'. Therefore I can't upsert... Or can I select a field to be untouched by upsert?
The 2.6 Bulk API is what you're looking for, which will require MongoDB 2.6+* and node driver 1.4+.
There are 2 types of bulk operations:
Ordered bulk operations. These operations execute all the operation in order and error out on the first write error.
Unordered bulk operations. These operations execute all the operations in parallel and aggregates up all the errors. Unordered bulk operations do not guarantee order of execution.
So in your case Unordered is what you want. The previous link provides an example:
MongoClient.connect("mongodb://localhost:27017/test", function(err, db) {
// Get the collection
var col = db.collection('batch_write_ordered_ops');
// Initialize the Ordered Batch
var batch = col.initializeUnorderedBulkOp();
// Add some operations to be executed in order
batch.insert({a:1});
batch.find({a:1}).updateOne({$set: {b:1}});
batch.find({a:2}).upsert().updateOne({$set: {b:2}});
batch.insert({a:3});
batch.find({a:3}).remove({a:3});
// Execute the operations
batch.execute(function(err, result) {
console.dir(err);
console.dir(result);
db.close();
});
});
*The docs do state that: "for older servers than 2.6 the API will downconvert the operations. However it’s not possible to downconvert 100% so there might be slight edge cases where it cannot correctly report the right numbers."

Resources