Azure Cosmos DB - Update existing documents with an additional field - azure

Existing Cosmos DB documents need to be altered/updated with a new property & also existing documents of other collections need to be updated with the same new property along with its value.
Is there any recommended way or tool available to update existing documents on Cosmos DB, or is writing the custom c# application/PowerShell script using Cosmos DB SDK is the only option?
Example:
Existing user document
{
id:user1#mail.com,
name: "abc",
country: "xyz"
}
Updated user document
{
id:user1#mail.com,
name: "abc",
country: "xyz",
guid:"4334fdfsfewr" //new field
}
Existing order document of the user
{
id:user1#mail.com,
user: "user1#mail.com",
date: "09/28/2020",
amt: "$45"
}
Updated order document of the user
{
id:user1#mail.com,
user: "user1#mail.com",
userid: "4334fdfsfewr", // new field but with same value as in user model
date: "09/28/2020",
amt: "$45"
}

I'd probably go with:
Update user documents through a script
Have Azure Function with Cosmosdb trigger that would listen to changes on users documents and update orders appropriately
[UPDATE]
whatever type of script you feel best with: PS, C#, Azure Functions...
now, what do you mean they need to be altered with the new property "on the same time"? i'm not sure that's possible in any way. if you want such an effect then i guess your best bet is:
create new collection/container for users
have an Azure Function that listens to a change feed for your existing users container (so, with StartFromBeginning option)
update your documents to have new field and store them in a newly created container
once done, switch your application to use new container
its your choice how would you change other collections (orders): using changeFeed & Azure Functions from old or new users container.
PS.
Yes, whatever flow i'd go with, it would still be Azure Functions with Cosmos DB trigger.

I have added some solution for .Net Core API 3.0 or higher version.
// You can put any filter for result
var result = _containers.GetItemLinqQueryable<MessageNoteModel>().Where(d => d.id == Id
&& d.work_id.ToLower() == workId.ToLower()).ToFeedIterator();
if (result.HasMoreResults)
{
var existingDocuments = result.ReadNextAsync().Result?.ToList();
existingDocuments.ForEach(document =>
{
//Creating the partition key of the document
var partitionKey = new PartitionKey(document?.work_id);
document.IsConversation = true;
//Inserting/Updating the message in to cosmos db collection: Name
_containers.Twistle.ReplaceItemAsync(document, document.document_id, partitionKey);
});
}

We had the same issue of updating the Cosmos DB schema for existing documents. We were able to achieve this through a custom JsonSerializer.
We created CosmosJsonDotNetSerializer inspired from Cosmos DB SDK. CosmosJsonDotNetSerializer exposes the FromStream method that allows us to deal with raw JSON. You can update the FromStream method to update document schema to your latest version. Here is the pseudo-code:
public override T FromStream<T>(Stream stream)
{
using (stream)
{
if (typeof(Stream).IsAssignableFrom(typeof(T)))
{
return (T)(object)stream;
}
using (var sr = new StreamReader(stream))
{
using (var jsonTextReader = new JsonTextReader(sr))
{
var jsonSerializer = GetSerializer();
return UpdateSchemaVersion<T>(jsonSerializer.Deserialize<JObject>(jsonTextReader));
}
}
}
}
private T UpdateSchemaVersonToCurrent<T>(JObject jObject)
{
// Add logic to update JOjbect to the latest version. For e.g.
jObject["guid"] = Guid.NewGuid().ToString();
return jObject.ToObject<T>();
}
You can set Serializer to CosmosJsonDotNetSerializer in CosmosClientOptions while creating CosmosClient.
var cosmosClient = new CosmosClient("<cosmosDBConnectionString>",
new CosmosClientOptions
{
Serializer = new CosmosJsonDotNetSerializer()
};
This way, you always deal with the latest Cosmos document throughout the code, and when you save the entity back to Cosmos, it is persisted with the latest schema version.
You can take this further by running schema migration as a separate process, for example, inside an Azure function, where you load old documents, convert them to the latest version and then save it back to Cosmos.
I also wrote a post on Cosmos document schema update that explains this in detail.

Related

How to use Cosmos DB trigger to merge document

I have a project that using Azure Stream Analytics to save messages from an event hub to the Cosmos DB.
But since Cosmos upsert overwrite the existing document, and I dont want to lose any existing data elements.
The idea here is to use the Cosmos DB pre-trigger that find the current data with same id and merge to the new document before saving it. The trigger looks like this:
function mergeData(){
var context = getContext();
var collection = context.getCollection();
var container = context.getCollection();
var request = context.getRequest();
// item to be created
var newItem = request.getBody();
// query for latest existing records
var filterQuery = "SELECT * FROM root c where c.id=\"" + newItem.id + "\" order by c._ts desc";
var accept = container.queryDocuments(container.getSelfLink(), filterQuery,
queryOldRecords);
if (!accept) throw "Unable to query for older documents, abort";
function queryOldRecords(err, items, responseOptions) {
if (err) throw new Error("Error" + err.message);
if (items.length > 0) {
// merge logic here
newItem.name = items[0].name;
request.setBody(newItem);
}
}
}
The trigger is set as pre and for replace with the Cosmos DB container.
However the trigger does not work, I keep seeing ASA upsert overwrite the document and lost previous data.
(Have not tested with code to direct insert records to Cosmos DB.)
Here are the questions:
Will the trigger be automatically executed when the ASA upsert occurred?
If not do I need to code in the ASA to include the trigger to be executed and how?
How to debug the Cosmos DB trigger in general and specially in this case?
Is there other way to achieve this?
Thanks for any help in advance.
From this link: ASA documentdb output
With compatibility level 1.0, Stream Analytics performs this update as a PATCH operation, so it enables partial updates to the document. Stream Analytics adds new properties or replaces an existing property incrementally. However, changes in the values of array properties in your JSON document result in overwriting the entire array. That is, the array isn't merged.
With 1.2, upsert behavior is modified to insert or replace the document.
I switched ASA to level 1.0, it does not even connect to Cosmos DB.
But with level 1.1, the update is working as PATCH operations which the json element not be removed if not presenting in the new document, and updated if value is different.
The level 1.2 (as default) is just doing the upsert without PATCH operation.

In Cloud function how can i join from another collection to get data?

I am using Cloud Function to send a notification to mobile device. I have two collection in Firestore clientDetail and clientPersonalDetail. I have clientID same in both of the collection but the date is stored in clientDetail and name is stored in clientPersonal.
Take a look:
ClientDetail -- startDate
-- clientID
.......
ClientPersonalDetail -- name
-- clientID
.........
Here is My full Code:
exports.sendDailyNotifications = functions.https.onRequest( (request, response) => {
var getApplicants = getApplicantList();
console.log('getApplicants', getApplicants);
cors(request, response, () => {
admin
.firestore()
.collection("clientDetails")
//.where("clientID", "==", "wOqkjYYz3t7qQzHJ1kgu")
.get()
.then(querySnapshot => {
const promises = [];
querySnapshot.forEach(doc => {
let clientObject = {};
clientObject.clientID = doc.data().clientID;
clientObject.monthlyInstallment = doc.data().monthlyInstallment;
promises.push(clientObject);
});
return Promise.all(promises);
}) //below code for notification
.then(results => {
response.send(results);
results.forEach(user => {
//sendNotification(user);
});
return "";
})
.catch(error => {
console.log(error);
response.status(500).send(error);
});
});
}
);
Above function is showing an object like this
{clienId:xxxxxxxxx, startDate:23/1/2019}
But I need ClientID not name to show in notification so I'll have to join to clientPersonal collection in order to get name using clientID.
What should do ?
How can I create another function which solely return name by passing clientID as argument, and waits until it returns the name .
Can Anybody please Help.?
But I need ClientID not name to show in notification so I'll have to join to clientPersonal collection in order to get name using clientID. What should do ?
Unfortunately, there is no JOIN clause in Firestore. Queries in Firestore are shallow. This means that they only get items from the collection that the query is run against. There is no way to get documents from two top-level collection in a single query. Firestore doesn't support queries across different collections in one go. A single query may only use properties of documents in a single collection.
How can I create another function which solely return name by passing clientID as argument, and waits until it returns the name.
So the most simple solution I can think of is to first query the database to get the clientID. Once you have this id, make another database call (inside the callback), so you can get the corresponding name.
Another solution would be to add the name of the user as a new property under ClientDetail so you can query the database only once. This practice is called denormalization and is a common practice when it comes to Firebase. If you are new to NoQSL databases, I recommend you see this video, Denormalization is normal with the Firebase Database for a better understanding. It is for Firebase realtime database but same rules apply to Cloud Firestore.
Also, when you are duplicating data, there is one thing that need to keep in mind. In the same way you are adding data, you need to maintain it. With other words, if you want to update/detele an item, you need to do it in every place that it exists.
The "easier" solution would probably be the duplication of data. This is quite common in NoSQL world.
More precisely you would add in your documents in the ClientDetail collection the value of the client name.
You can use two extra functions in this occasion to have your code clear. One function that will read all the documents form the collection ClientDetail and instead of getting all the fields, will get only the ClientID. Then call the other function, that will be scanning all the documents in collection ClientPersonalDetail and retrieve only the part with the ClientID. Compare if those two match and then do any operations there if they do so.
You can refer to Get started with Cloud Firestore documentation on how to create, add and load documents from Firestore.
Your package,json should look something like this:
{
"name": "sample-http",
"version": "0.0.1",
"dependencies": {
"firebase-admin": "^6.5.1"
}
}
I have did a little bit of coding myself and here is my example code in GitHub. By deploying this Function, will scan all the documents form one Collection and compare the ClientID from the documents in the other collection. When it will find a match it will log a message otherwise it will log a message of not matching IDs. You can use the idea of how this function operates and use it in your code.

Google Datastore not retrieving entities

I have been working with the google cloud library, and I can successfully save data in DataStore, specifically from my particle electron device (Used their tutorial here https://docs.particle.io/tutorials/integrations/google-cloud-platform/)
The problem I am now having is retrieving the data again.
I am using this code, but it is not returning anything
function getData(){
var data = [];
const query = datastore.createQuery('ParticleEvent').order('created');
datastore.runQuery(query).then(results => {
const event = results[0];
console.log(results);
event.forEach(data => data.push(data.data));
});
console.log(data)
}
But each time it is returning empty specifically returning this :
[ [], { moreResults: 'NO_MORE_RESULTS', endCursor: 'CgA=' } ]
, and I can't figure out why because I have multiple entities saved in this Datastore.
Thanks
In the tutorial.js from the repo mentioned in the tutorial I see the ParticleEvent entities are created using this data:
var obj = {
gc_pub_sub_id: message.id,
device_id: message.attributes.device_id,
event: message.attributes.event,
data: message.data,
published_at: message.attributes.published_at
}
This means the entities don't have a created property. I suspect that ordering the query by such property name is the reason for which the query doesn't return results. From Datastore Queries (emphasis mine):
The results include all entities that have at least one value for
every property named in the filters and sort orders, and whose
property values meet all the specified filter criteria.
I'd try ordering the query by published_at instead, that appears to be the property with a meaning closest to created.

Migration of Small Parse IDs to normal MongoDB's ObjectIDs

I am using Parse Dashboard for User Management of my iOS Application.Also, I am using external APIs which are using MongoDB database.
The issue currently I am facing is the User created from Parse Dashboard is having small id instead of MongoDB's ObjectID, and other resources which are not over parse are generated by normal ObjectID.
eg. User Object:
{
_id:"qVnyrGynJE",
user_name:"Aditya Raval"
}
Document Object:
{
_id:"507f191e810c19729de860ea",
doc_name:"Marksheet",
user:"qVnyrGynJE"
}
Task Object:
{
_id:"507f191e810c19729de860ea",
task_name:"Marksheet",
user:"qVnyrGynJE"
}
I am also using Keystone.js as a Backend Admin Dashboard.So basically due to this mix kind of IDs relationships inside KeyStone.js is broken and Keystone.js gets crashed.
So I want to migrate all my existing small IDs to normal MongoDB ObjectIDs without breaking into relationships or any other walkthrough by fixing Keystone.js
You can run something like this:
var users = db.Users.find({});
for(var i = 0; i < users.length(); i++)
{
var oldId = users[i]._id;
delete users[i]._id;
db.Users.insert(users[i], function(err, newUser) {
db.Documents.updateMany({"user": oldId},{ $set: { "user": newUser._id }});
//Do this for all collections that need to be update.
});
);
}
db.Users.deleteMany({_id: { $type: "string" }});

Use cloudant/couchDB update handlers to write records into another database

I am using IBM cloudant's update handlers to add timestamp on document when it is created/updated. I am able to use the following function to add timestamp to the documents in the update handlers' database.
function(doc, req) {
if (!doc) {
doc = {_id: req.uuid};
}
var body = JSON.parse(req.body);
for (key in body){
doc[key] = body[key];
}
doc.timestamp = + new Date();
return [doc, JSON.stringify(doc)];
}
However, I would like to keep all the history in another database (saying HISTORY database). How could I insert a document from current database's update handlers to another database? Thank you.
One potential solution might be to set up continuous replication and define the update handler on the target database. The replication source database would be your HISTORY database containing the original documents and the target database store the time-stamped documents.

Resources