Wanted to implement my own backup mechanism for Cosmos DB. In order to do that wanted just to grab the data every x hours and put it onto some other storage account / different cosmos db instance.
Since I can't use Data Factory (not available in my region) is there any other easy way to get data from Cosmos and put it somewhere else?
First thing that comes to my mind are just some SQL queries that would go through all collections and copy them. Is there an easier way?
Since you can't use Data Factory (maybe it's most suitable for you), I suggest you using below two solutions:
1.Azure Time Trigger Function.
It supports CORN expression. So ,you could query the data and copy them into the target collection via Cosmos db sdk. However, please note the Azure Function has execution time limitation.
2.Azure Cosmos DB Migration Tool.
You could see the tool could be executed in command-line. So, please package the commands into a bat file. Then use Windows scheduled task to execute the file. Or you could use Azure Web Job to implement the same requirements.
Related
We are building data migration pipeline using Azure data factory (ADF). We are transferring data from one CosmosDb instance to another. We plan to enable dual writes, so that we write to both the databases before migration begins to ensure that during migration if any data point changes both the databases get the most updated data. However, In ADF there is only Insert or upsert options available. Our case is on Insert if it gets 'conflict' continue and fail the pipeline. Can anyone give any pointers on how to achieve that in ADF?
Other option would be to create our own custom tool using CosmosDb libraries to transfer data.
If you are doing a live migration ADF is not the right tool to use as this is intended for offline migrations. If you are migrating from one Cosmos DB account to another your best option is to use the Cosmos DB Live Data Migrator.
This tool also provides dead letter support as well which is another requirement you have.
I am using an Azure SQL Database for our team's reporting and the data size right now is too big to handle by a single data (at least I think so, it has 2 fact tables with around 100m rows in each table).
The Azure SQL Database is named "operation-db" and the Synapse is named "operation-synapse".
I want to make the transition for my team become as smooth as possible. So I'm planning to copy all the tables, views, stored procedure and user-defined function over to Synapse.
Once I'm done with that, is there a way to rename "operation-synapse" to "operation-db" so the team doesn't have to go to their code base to change the name of the db?
Thanks!
It is not possible to rename a SQL Pool via SQL Server Management Studio and you will receive the following error:
ALTER DATABASE NAME statement is not supported in a Synapse workspace.
To update the name of a SQL pool, use the Azure Synapse Portal or the
Synapse REST API. (Microsoft SQL Server, Error: 49978)
The REST API however does list a move method to change names:
POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Synapse/workspaces/{workspaceName}/sqlPools/{sqlPoolName}/move?api-version=2019-06-01-preview
I couldn't get it to work though. YMMV. Not renaming your db shouldn't be a big deal though. Your team should feel comfortable with changing connection strings etc and it will help them understand they are moving to a different product (Synapse) with different characteristics.
Before you move to Synapse however, have you look at Clustered Columnstore indexes in Azure SQL DB? They are default type of index in a SQL Pool database but are also available in SQL DB. They can compress your data 5-10x so it might end up not that big at all. Columnstore is great for aggregate queries but less so for point lookups so have a think about your workload before you migrate.
100 million rows is not big enough for synapse. Cci data in each shard will only have 1 row group (1mil rows).
Consider using partitioning or CCI in your sql db itself.
Also what's your usage pattern? If you are doing point lookups and updates clustered indexes will perform better.
You can rename a Synapse database easily using the SSMS GUI. (I've just tried this on v18.8).
Just click once on the database name in the Object Explorer to select it, then press the F2 key to rename it.
The Synapse service must be running (i.e. not paused) for the rename to work.
You can rename Synapse database using T-SQL. The command is as follows:
ALTER DATABASE [OldSynapseDBName]
MODIFY NAME = [NewSynapseDBName]
Note you need to be connected to/issue the command from the master database otherwise it will not work.
The command takes can 30 seconds on 100GB DB and there are some caveats such as DB must not be used during operation.
My requirements are as below :
Move 3 SAP local databases to 3 Azure SQL DB.
Then Sync daily transactions or data to azure every night. If transactions of local DB are already exists in azure, update process will do on these transactions if not insert process will do.
Local systems will not stop after moving to azure. They will still goes about 6 months.
Note :
We are not compatible with Azure Data Sync process because of it's
limitations - only support 500 tables, can't sync no primary keys
table, no views and no procedure. It also increase database size on
both(local and azure).
Azure Data Factory Pipeline can fulfill my requirements but I have
to create pipeline and procedure manually for each table. (SAP has
over 2000 tables, not good for me)
We don't use azure VM and Manage Instance
Can you guide me the best solution to move and sync? I am new to azure.
Thanks all.
Since you mentioned that ADF basically meets your needs, I will try to start from ADF. Actually,you don't need to manually create each table one by one.The creation could be done in the ADF sdk or powershell script or REST api. Please refer to the official document:https://learn.microsoft.com/en-us/azure/data-factory/
So,if you could get the list of SAP table names(i found this thread:https://answers.sap.com/questions/7375575/how-to-get-all-the-table-names.html) ,you could loop the list and execute the codes to create pipelines in the batch.Only table name property need to be set.
I have Azure cosmos DB account what I want to do is backup data which is one month old from azure cosmos DB to Azure blob storage using my node app. I have already created pipeline and have triggered it by using create run pipeline API for Nodejs (using Azure data factory). But I am not able to figure out how to make the pipeline selective for data which is one month old from the current date. Any suggestions for that?
EDIT: Actually I want to run the API daily so that it backs up data which is one month old. For example, let's say I get 100 entries today in my cosmos DB, so the pipeline should select data from current date - 30 days and should back it up so that at any point my Azure cosmos DB has data for recent 30 days only and rest are backed up to Azure blob.
Just a supplement to #David's answer here.If you mean Cosmos DB SQL API, it has automatic backup mechanism based on this link:Automatic and online backups.
With Azure Cosmos DB, not only your data, but also the backups of your
data are highly redundant and resilient to regional disasters. The
automated backups are currently taken every four hours and at any
point of time, the latest two backups are stored. If you have
accidentally deleted or corrupted your data, you should contact Azure
support within eight hours so that the Azure Cosmos DB team can help
you restore the data from the backups.
However,you cannot access this backup directly. Azure Cosmos DB will use this backup only if a backup restore is initiated.
But the document provides two options to manage your own backups.
1.Use Azure Data Factory to move data periodically to a storage of your choice.
2.Use Azure Cosmos DB change feed to read data periodically for full backups, as well as for incremental changes, and store it in your own
storage.
You could use trigger the copy activity in ADF to transfer data in the schedule.If you want to filter data by date,you could learn about _ts in cosmos db which represents the latest modified time of data.
Not sure what pipeline you're referring to. That said: Cosmos DB doesn't have any built-in backup tools. You'd need to select and copy this data programmatically.
If using the MongoDB API, you could pass a query parameter to the mongoexport command-line tool (to serve as your date filter), but you'd still need to run mongoexport from your VM, write to a local directory, then copy to blob storage (I don't know if you can install/run MongoDB tools in something like Azure Functions or a DevOps pipeline).
I have a ADF copy activity copy rows of data from Azure SQL to Azure Cosmos DB.
I have a need to manipulate the document generated. I wrote the logic for the same inside a Pre Create Database Trigger that gets executed whenever a new document is created.
The trigger is not getting executed.
I was not able understand what the problem is, couldn't find any documentation either. The Cosmos DB client API's to create document needs the trigger to execute to be specified explicitly. Not sure if something similar could be done for ADF copy activity as well. Please help.
I am trying to avoid writing a custom activity (so as to leverage built-in scaling and error handling capabilities).
This seems similar to Azure cosmos db trigger, but the answers are not applicable to this question.