Azure Cosmos DB: Unique index constraint violation using UpsertDocumentAsync - azure

I have defined a UniqueKey policy in my Azure Cosmos DB Container, for field UniqueName
The below function is being called on a timer.
I'm attempting to Upsert documents in Azure Cosmos DB using Azure Functions bindings, like so:
public async Task ManageItems([ActivityTrigger] string records,
[CosmosDB(
databaseName: "mydatabase",
collectionName: "items",
ConnectionStringSetting = "CosmosDbConnectionString")] DocumentClient client,
ILogger log)
{
var collectionUri = UriFactory.CreateDocumentCollectionUri("mydatabase", "items");
foreach (var record in records)
{
log.LogDebug($"Upserting itemNumber={record.UniqueName}");
await client.UpsertDocumentAsync(collectionUri, record);
}
}
During the first execution in a blank "items" container, the Upsert for each record works splendidly, inserting each record as a specific document.
However when doing a test of the same data as the first execution, but now expecting an "Update" as opposed to an "Insert" attempt, I get an exception:
Unique index constraint violation after UpsertDocumentAsync method runs.
What am I missing here?
To my understanding, an Upsert is either an update or insert, depending on whether the object exists or not, via it's unique identifier.
The check of whether the outgoing object unique id from the method matches the existing document unique id is supposed to be happening at the Cosmos DB container level.
What I expect to happen is the call notices that the document with that unique ID already exists, and it performs an update, not throw an exception. I would expect it to throw an exception if the method was Insert only.

This issue was fixed by specifying an explicit "id" field in the class where the "records" came from.
The "id" was set to the unique "recordNumber" that I wanted to use as a unique value.
For good measure I set the disableAutomaticIdGeneration to true in the UpsertDocumentAsync method
UpsertDocumentAsync(collectionUri, record, disableAutomaticIdGeneration:true);
No more unique index violation, and no duplicates either.
Worth noting the solution is similar to this one: How can I insert/update data in CosmosDB in an Azure function

Related

How to use Cosmos DB trigger to merge document

I have a project that using Azure Stream Analytics to save messages from an event hub to the Cosmos DB.
But since Cosmos upsert overwrite the existing document, and I dont want to lose any existing data elements.
The idea here is to use the Cosmos DB pre-trigger that find the current data with same id and merge to the new document before saving it. The trigger looks like this:
function mergeData(){
var context = getContext();
var collection = context.getCollection();
var container = context.getCollection();
var request = context.getRequest();
// item to be created
var newItem = request.getBody();
// query for latest existing records
var filterQuery = "SELECT * FROM root c where c.id=\"" + newItem.id + "\" order by c._ts desc";
var accept = container.queryDocuments(container.getSelfLink(), filterQuery,
queryOldRecords);
if (!accept) throw "Unable to query for older documents, abort";
function queryOldRecords(err, items, responseOptions) {
if (err) throw new Error("Error" + err.message);
if (items.length > 0) {
// merge logic here
newItem.name = items[0].name;
request.setBody(newItem);
}
}
}
The trigger is set as pre and for replace with the Cosmos DB container.
However the trigger does not work, I keep seeing ASA upsert overwrite the document and lost previous data.
(Have not tested with code to direct insert records to Cosmos DB.)
Here are the questions:
Will the trigger be automatically executed when the ASA upsert occurred?
If not do I need to code in the ASA to include the trigger to be executed and how?
How to debug the Cosmos DB trigger in general and specially in this case?
Is there other way to achieve this?
Thanks for any help in advance.
From this link: ASA documentdb output
With compatibility level 1.0, Stream Analytics performs this update as a PATCH operation, so it enables partial updates to the document. Stream Analytics adds new properties or replaces an existing property incrementally. However, changes in the values of array properties in your JSON document result in overwriting the entire array. That is, the array isn't merged.
With 1.2, upsert behavior is modified to insert or replace the document.
I switched ASA to level 1.0, it does not even connect to Cosmos DB.
But with level 1.1, the update is working as PATCH operations which the json element not be removed if not presenting in the new document, and updated if value is different.
The level 1.2 (as default) is just doing the upsert without PATCH operation.

How to reference new DynamoDB record in AWS Lambda function

Is it possible to reference a newly created DynamoDB record in AWS Lambda? For example, retrieving and using the ID of the newly created record. Hoping this is possible without a query to retrieve the new record from DynamoDB.
const docClient = new AWS.DynamoDB.DocumentClient();
... // Omitting the rest of the code in this example
const params = {
TableName : 'ExampleTableName',
Item: {
id: uuid.v1()
}
}
try {
await docClient.put(params).promise();
} catch (err) {
return err;
}
// Reference newly created record to retrieve the ID.
Of course you can achieve it using ReturnValues Paramter.
ReturnValues:- return the item's attribute values in the same operation.
But I am afraid that if you want to achieve your purpose you need to use an alternative API which UpdateItem
From the Docs
Use ReturnValues if you want to get the item attributes as they appear before or after they are updated. For UpdateItem, the valid values are:
NONE - If ReturnValues is not specified, or if its value is NONE, then nothing is returned. (This setting is the default for ReturnValues.)
ALL_OLD - Returns all of the attributes of the item, as they appeared before the UpdateItem operation.
UPDATED_OLD - Returns only the updated attributes, as they appeared before the UpdateItem operation.
ALL_NEW - Returns all of the attributes of the item, as they appear after the UpdateItem operation.
UPDATED_NEW - Returns only the updated attributes, as they appear after the UpdateItem operation.
There is no additional cost associated with requesting a return value aside from the small network and processing overhead of receiving a larger response. No read capacity units are consumed.
The values returned are strongly consistent.
Why you can use returnValue with putItem? Reason -> https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html#API_PutItem_RequestSyntax which wont solve your purpose.
You can store return values in variable and add custom logic to proceed further. :)

Azure CosmosDB/Nodejs - Entity with the specified id does not exist in the system

I am trying to delete and update records in cosmosDB using my graphql/nodejs code and getting error - "Entity with the specified id does not exist in the system". Here is my code
deleteRecord: async (root, id) => {
const { resource: result } = await container.item(id.id, key).delete();
console.log(`Deleted item with id: ${id}`);
},
Somehow below code is not able to find record, even "container.item(id.id, key).read()" doesn't work.
await container.item(id.id, key)
But if I try to find record using query spec it works
await container.items.query('SELECT * from c where c.id = "'+id+'"' ).fetchNext()
FYI- I am able to fetch all records and create new item, so Connection to DB and reading/writing is not an issue.
What else can it be? Any pointer related to this will be helpful.
Thanks in advance.
It seems you pass the wrong key to item(id,key). According to the Note of this documentation:
In both the "update" and "delete" methods, the item has to be selected
from the database by calling container.item(). The two parameters
passed in are the id of the item and the item's partition key. In this
case, the parition key is the value of the "category" field.
So you need to pass the value of your partition key, not your partition key path.
For example, if you have document like below, and your partition key is '/category', you need to use this code await container.item("xxxxxx", "movie").
{
"id":"xxxxxx",
"category":"movie"
}

Google Datastore can't update an entity

I'm having issues retrieving an entity from Google Datastore. Here's my code:
async function pushTaskIdToCurrentSession(taskId){
console.log(`Attempting to add ${taskId} to current Session: ${cloudDataStoreCurrentSession}`);
const transaction = datastore.transaction();
const taskKey = datastore.key(['Session', cloudDataStoreCurrentSession]);
try {
await transaction.run();
const [task] = await transaction.get(taskKey);
let sessionTasks = task.session_tasks;
sessionTasks.push(taskId);
task.session_tasks = sessionTasks;
transaction.save({
key: taskKey,
data: task,
});
transaction.commit();
console.log(`Task ${taskId} added to current Session successfully.`);
} catch (err) {
console.error('ERROR:', err);
transaction.rollback();
}
}
taskId is a string id of another entity that I want to store in an array of a property called session_tasks.
But it doesn't get that far. After this line:
const [task] = await transaction.get(taskKey);
The error is that task is undefined:
ERROR: TypeError: Cannot read property 'session_tasks' of undefined
at pushTaskIdToCurrentSession
Anything immediately obvious from this code?
UPDATE:
Using this instead:
const task = await transaction.get(taskKey).catch(console.error);
Gets me a task object, but it seems to be creating a new entity on the datastore:
I also get this error:
(node:19936) UnhandledPromiseRejectionWarning: Error: Unsupported field value, undefined, was provided.
at Object.encodeValue (/Users/.../node_modules/#google-cloud/datastore/build/src/entity.js:387:15)
This suggests the array is unsupported?
The issue here is that Datastore supports two kinds of IDs.
IDs that start with name= are custom IDs. And they are treated as strings
IDs that start with id= are numeric auto-generated IDs and are treated as integers
When you tried to updated the value in the Datastore, the cloudDataStoreCurrentSession was treated as a string. Since Datastore couldn't find an already created entity key with that custom name, it created it and added name= to specify that it is a custom name. So you have to pass cloudDataStoreCurrentSession as integer to save the data properly.
If I understand correctly, you are trying to load an Array List of Strings from Datastore, using a specific Entity Kind and Entity Key. Then you add one more Task and updated the value of the Datastore for the specific Entity Kind and Entity Key.
I have create the same case scenario as yours and done a little bit of coding myself. In this GitHub code you will find my example that does the following:
Goes to Datastore Entity Kind Session.
Retrieves all the data from Entity Key id=5639456635748352 (e.g.).
Get's the Array List from key: session_tasks.
Adds the new task that passed from the function's arguments.
Performs the transaction to Datastore and updates the values.
All steps are logged in the code and there are a lot of comments explaining exactly how the code works. Also there are two examples of currentSessionID. One for custom names and other one for automatically generated IDs. You can test the code to understand the usage of it and modify it according to your needs.

In Cloud function how can i join from another collection to get data?

I am using Cloud Function to send a notification to mobile device. I have two collection in Firestore clientDetail and clientPersonalDetail. I have clientID same in both of the collection but the date is stored in clientDetail and name is stored in clientPersonal.
Take a look:
ClientDetail -- startDate
-- clientID
.......
ClientPersonalDetail -- name
-- clientID
.........
Here is My full Code:
exports.sendDailyNotifications = functions.https.onRequest( (request, response) => {
var getApplicants = getApplicantList();
console.log('getApplicants', getApplicants);
cors(request, response, () => {
admin
.firestore()
.collection("clientDetails")
//.where("clientID", "==", "wOqkjYYz3t7qQzHJ1kgu")
.get()
.then(querySnapshot => {
const promises = [];
querySnapshot.forEach(doc => {
let clientObject = {};
clientObject.clientID = doc.data().clientID;
clientObject.monthlyInstallment = doc.data().monthlyInstallment;
promises.push(clientObject);
});
return Promise.all(promises);
}) //below code for notification
.then(results => {
response.send(results);
results.forEach(user => {
//sendNotification(user);
});
return "";
})
.catch(error => {
console.log(error);
response.status(500).send(error);
});
});
}
);
Above function is showing an object like this
{clienId:xxxxxxxxx, startDate:23/1/2019}
But I need ClientID not name to show in notification so I'll have to join to clientPersonal collection in order to get name using clientID.
What should do ?
How can I create another function which solely return name by passing clientID as argument, and waits until it returns the name .
Can Anybody please Help.?
But I need ClientID not name to show in notification so I'll have to join to clientPersonal collection in order to get name using clientID. What should do ?
Unfortunately, there is no JOIN clause in Firestore. Queries in Firestore are shallow. This means that they only get items from the collection that the query is run against. There is no way to get documents from two top-level collection in a single query. Firestore doesn't support queries across different collections in one go. A single query may only use properties of documents in a single collection.
How can I create another function which solely return name by passing clientID as argument, and waits until it returns the name.
So the most simple solution I can think of is to first query the database to get the clientID. Once you have this id, make another database call (inside the callback), so you can get the corresponding name.
Another solution would be to add the name of the user as a new property under ClientDetail so you can query the database only once. This practice is called denormalization and is a common practice when it comes to Firebase. If you are new to NoQSL databases, I recommend you see this video, Denormalization is normal with the Firebase Database for a better understanding. It is for Firebase realtime database but same rules apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an item, you need to do it in every place that it exists.
The "easier" solution would probably be the duplication of data. This is quite common in NoSQL world.
More precisely you would add in your documents in the ClientDetail collection the value of the client name.
You can use two extra functions in this occasion to have your code clear. One function that will read all the documents form the collection ClientDetail and instead of getting all the fields, will get only the ClientID. Then call the other function, that will be scanning all the documents in collection ClientPersonalDetail and retrieve only the part with the ClientID. Compare if those two match and then do any operations there if they do so.
You can refer to Get started with Cloud Firestore documentation on how to create, add and load documents from Firestore.
Your package,json should look something like this:
{
"name": "sample-http",
"version": "0.0.1",
"dependencies": {
"firebase-admin": "^6.5.1"
}
}
I have did a little bit of coding myself and here is my example code in GitHub. By deploying this Function, will scan all the documents form one Collection and compare the ClientID from the documents in the other collection. When it will find a match it will log a message otherwise it will log a message of not matching IDs. You can use the idea of how this function operates and use it in your code.

Resources