Copy documents from one DocumentCollection to another? - azure

In my Azure CosmosDB, that I use with the Gremlin API there is one database called graphdb with several {DocumentCollections}.
I would like to copy a selected set of Vertices and Edges from one collection (graphdb) to another (Tintin).
I managed to do this by transferring all data via the client, but it would be much easier if data stayed in Azure. Thus I tried some SQL in the Azure portal like:
SELECT *
INTO Tintin
FROM graphdb;
However, this seems unsupported.

Now you cannot join multiple collections and you query violates this rule.
But I think +1 for your idea, you should post it on https://feedback.azure.com/

Related

How can I do bulk inserts into the Common Data Service?

I have 1000 records that I need to sync daily from an API. I am currently bulk inserting them into a SQL Database, however I would like to use Dataverse/a Common Data Service database instead.
The Logic App connector seems to do 1 record at a time and the SDK does PUTS and POSTS. How can I either insert 1000 records into the Common Data Service in bulk OR somehow synchronise my SQL DB with the CDS?
As far as I know there is no another way to do that without programming. You can extended your Power Automate Flow with Azure Functions to insert these records in a single transaction.
In this link explain how can be do it.
https://learn.microsoft.com/en-us/powerapps/developer/data-platform/webapi/execute-batch-operations-using-web-api#when-to-use-batch-requests
Please let me know wtih anything
If you want to regularly ingest data (1000 rows) into Dataverse (CDS), then use Dataflows. The following link to MS Docs describes how to set up scheduled bulk data updates. It is therefore a pull rather than push model.
https://learn.microsoft.com/en-us/powerapps/maker/data-platform/create-and-use-dataflows

Logic App to push data from Cosmosdb into CRM and perform an update

I have created a logic app with the goal of pulling data from a container within cosmosdb (with a query), looping over the results and then pushing this data into CRM (or Common Data Service). When the data is pushed to CRM, an ID will be generated. I wish to then update cosmosdb with this new ID. Here is what I have so far:
This next step is querying for the data within our cosmosdb database and selecting all IDS with a length that is greater than 15. (This tells us that the ID is not yet within the CRM database)
Then we loop over the results and push this into CRM (Dynamics365 or the Common Data Service)
Dilemma: The first part of this process appears to be correct, however, I want to make sure that I am on the right track with this. Furthermore, once the data is successfully pushed to CRM, CRM automatically generates an ID for each record. How would I then update cosmosDB with the newly generated IDs?
Any suggestion is appreciated
Thanks
I see a red flag in your approach here with this query with length(c.id) > 15. This is not something I would do. I don't know how big your database is going to be but generally not very performant to do high volumes of cross partition queries, especially if the database is going to keep growing.
Cosmos DB already provides an awesome streaming capability so rather than doing this in a batch I would use Change Feed and use that to accomplish whatever your doing here in your Logic App. This will likely give you better control of the process and likely allow you to get the id back out of your CRM app to insert back into Cosmos DB.
Because you will be writing back to Cosmos DB, you will need a flag to ignore the update in Change Feed when the item is updated.

Archiving Azure Search Service

Need suggestion on archiving unused data from search service and reload it back when needed(reload to be done later).
Initial design draft looks like this:
Find the keys from search service based on some conditions(like take inactive, how old) that need to be archived.
Run achiever job(need suggestion here, could be a web job, function app)
Fetch the data and insert to blob storage and delete it from the search service.
Now the real way is to run the job in the pool and should be asynchronous
There's no right / wrong answer for this question. What you need to do is perform batch queries (up to 1000 docs), and schedule it to archive past data (eg. run an Azure function which will trigger and search for docs where createdDate > DataTime.Now).
Then persist that data somewhere (can be a cosmos db or as blob into storage account). Once you need to upload it again, I would consider it as a new insert, so it should follow your current insert process.
You can also take a look on this tool which helps to copy data from your index pretty quick:
https://github.com/liamca/azure-search-backup-restore

Azure Mobile Services PullAsync not all data

Using Azure Mobile Services and Azure Easy Tables on the back end I want to get filtered data on the client since tables could be quite large but useful rows to specific user with own ID wouldn't be. I tried to use
IMobileServiceTableQuery<Messages> query =
msgTable.Where(c => c.UserId==_myId);
await msgTable.PullAsync("syncmsg"+_myid, query);
but it turns out that PullAsync apply query only on next times but first time it pulls all data. It there any way using Azure Mobile Services pull and store on local storage only filtered on query data?
So, first things first - you should do security filtering on the server, not the client. There are easy ways to adjust the filter on the server for your specifications. See https://github.com/Azure/azure-mobile-apps-node/tree/master/samples for plenty of samples.
As to this issue, you are building the query wrong. The thing you want is:
var query = msgTable.CreateQuery().Where(c => c.UserId == myId);
await msgTable.PullAsync('mysyncquery', query);
Note the CreateQuery() in the middle. Without that, you don't get the base query set up.

PouchDB - start local, replicate later

Does it create any major problems if we always create and populate a PouchDB database locally first, and then later sync/authenticate with a centralised CouchDB service like Cloudant?
Consider this simplified scenario:
You're building an accommodation booking service such as hotel search or airbnb
You want people to be able to favourite/heart properties without having to create an account, and will use PouchDB to store this list
i.e. the idea is to not break their flow by making them create an account when it isn't strictly necessary
If users wish to opt in, they can later create an account and receive credentials for a "server side" database to sync with
At the point of step 3, once I've created a per-user CouchDB database server-side and assigned credentials to pass back to the browser for sync/replication, how can I link that up with the PouchDB data already created? i.e.
Can PouchDB somehow just reuse the existing database for this sync, therefore pushing all existing data up to the hosted CouchDB database, or..
Instead do we need to create a new PouchDB database and then copy over all docs from the existing (non-replicated) one to this new (replicated) one, and then delete the existing one?
I want to make sure I'm not painting myself into any corner I haven't thought of, before we begin the first stage, which is supporting non-replicated PouchDB.
It depends on what kind of data you want to sync from the server, but in general, you can replicate a pre-existing database into a new one with existing documents, just so long as those document IDs don't conflict.
So probably the best idea for the star-rating model would be to create documents client-side with IDs like 'star_<timestamp>' to ensure they don't conflict with anything. Then you can aggregate them with a map/reduce function.

Resources