Data Migration from AWS RDS to Azure SQL Data Warehouse - azure

I have my application's database running in AWS RDS (postgresql). I need to migrate the data from AWS to Azure SQL Data Warehouse.
This is a kind of ETL process and I need to do some calculations/computations/aggregations on the Data from Postgresql and put it in a different schema in Azure SQL Data Warehouse for reporting purpose.
Also, I need to sync the data on a regular basis without duplication.
I am new to this Data Migration concept and kindly let me know what are the best possible ways to achieve this task?
Thanks!!!

Azure datafactory is the option for you. It is a cloud data integration service, to compose data storage, movement, and processing services into automated data pipelines.
Please find the Postgresql connector below.
https://learn.microsoft.com/en-us/azure/data-factory/data-factory-onprem-postgresql-connector
On the transform part you may have to put in some custom intermediate steps to do the data massaging.

Have you tried the Azure datafactory suggestion?
Did it solve your issue?
If not, you can try using Alooma. This solution can replicate PostgreSQL database hosted on Amazon RDS to Azure SQL data warehouse in near real time. (https://www.alooma.com/integrations/postgresql/)
Follow this steps to migrate from RDS to Azure SQL:
Verify your host configuration
On the RDS dashboard under Parameter Groups, navigate to the group that's associated with your instance.
Verify that hot_standby and hot_standby_feedback are set to 1.
Verify that max_standby_archive_delay and max_standby_streaming_delay are greater than 0 (we recommend 30000).
If any of the parameter values need to be changed, click Edit Parameters.
Connect to Alooma
You can connect via SSH server (https://support.alooma.com/hc/en-us/articles/214021869-Connecting-to-an-input-via-SSH) or to to whitelist access to Alooma's IP addresses.
52.35.19.31/32
52.88.52.130/32
52.26.47.1/32
52.24.172.83/32
Add and name your PostreSQL input from the Plumbing screen and enter the following details:
Hostname or IP address of the PostgreSQL server (default port is 5432)
User name and Password
Database name
Choose the replication method you'd like to use for PostgreSQL database replication
For full dump/load replication, provide:
A space- or comma-separated list of the names of the tables you want to replicate.
The frequency at which you'd like to replicate your tables. The more frequent, the more fresh your data will be, but the more load it puts on your PostgreSQL database.
For incremental dump/load replication, provide:
A table/update indicator column pairs for each table you want to replicate.
Don't have an update indicator column? Let us know! We can still make incremental load work for you.
Keep the mapping mode to the default of OneClick if you'd like Alooma to automatically map all PostgreSQL tables exactly to your target data warehouse. Otherwise, they'll have to be mapped manually from the Mapper screen.

Related

Database migration from AWS Aurora Serverless v1

We have a plan to create a new application, which won't have the same architecture of the current one, and will definitely have some differences in the tables' schemas. Our current application's database is running on a AWS Aurora Serverless v1 cluster, with MySQL 5.7. We need to migrate data from the old application to the new one, making some schema changes in some places.
One solution is to create dumps of the old tables, in an easily readable format like csv, then import the data in the new application, picking the data as needed. This will require writing bespoke code on both applications to export data from one and import it into the other.
Considering that we are in AWS, and we intend to keep using it, choosing Aurora as a database (we'll have to decide whether use a provided solution or go with Serverless v2), is there any service to export data from one Aurora Serverless v1 cluster to another Aurora cluster, making changes along the way?
I know that database snapshots exist, but that's not exacly our goal. We need to migrate data from one database to another, not creating another copy of the first one. I've seen that there is a service, called "AWS Database Migration Service", that's used to migrate data into RDS, but does this service suit our needs, to get data from one Aurora Serverless database, do some changes, and then put it into another RDS database of our choice?
You could use following options to accomplish your goal.
1.- AWS Database Migration Service, this service provide you capabilities to replicate your data and also perform some schema transformation during replication process. You can create selection and transformation rules, the service will execute those rules and apply changes to the target schemas. You could review more details on this link.
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations.html
2.- AWS Glue Studio, it is ETL service that allows you extract, convert and transfer data between source and target databases.
What is AWS Glue Studio?
Automate ETL jobs between Amazon RDS for SQL Server and Azure Managed SQL using AWS Glue Studio

How to access a Redshift DB through VPN to extract data and load into own Azure environment?

pretty new to the Azure environment and so far my search for information wasnt very successful.
Problem is as follows:
we wanna access a redshift DB which you can only connect to if you are conntected to a specific VPN beforehand - this is the main problem
we then wanna build an automated data pipeline which extracts daily updated data from the redshift db and create our own analytics solution from it
how can that be set up in a fully automated workflow and also in the simplest, most efficient way with the tools available on the azure platform?
thanks for the help.
If VPN is not the challenge and you just need to extract the data from Redshift DB and store it in any Azure Service like Blob Storage or Azure Synapse Analytics, then best possible way is to use Azure Data Factory. Azure Data Factory is a fully managed, serverless data integration service.
You can copy data using Copy activity from Amazon Redshift to any supported sink data store. For a list of data stores that are supported as sources/sinks by the copy activity, see the Supported data stores table.
Specifically, this Amazon Redshift connector supports retrieving data from Redshift using query or built-in Redshift UNLOAD support.
Note: When copying data to an Azure data store, see Azure Data Center IP Ranges for the Compute IP address and SQL ranges used by the Azure data centers.
In case you need to import data into Azure SQL database from AWS Redshift, follow the link.

Is it possible to rename Azure Synapse SQL Pool?

I am using an Azure SQL Database for our team's reporting and the data size right now is too big to handle by a single data (at least I think so, it has 2 fact tables with around 100m rows in each table).
The Azure SQL Database is named "operation-db" and the Synapse is named "operation-synapse".
I want to make the transition for my team become as smooth as possible. So I'm planning to copy all the tables, views, stored procedure and user-defined function over to Synapse.
Once I'm done with that, is there a way to rename "operation-synapse" to "operation-db" so the team doesn't have to go to their code base to change the name of the db?
Thanks!
It is not possible to rename a SQL Pool via SQL Server Management Studio and you will receive the following error:
ALTER DATABASE NAME statement is not supported in a Synapse workspace.
To update the name of a SQL pool, use the Azure Synapse Portal or the
Synapse REST API. (Microsoft SQL Server, Error: 49978)
The REST API however does list a move method to change names:
POST https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Synapse/workspaces/{workspaceName}/sqlPools/{sqlPoolName}/move?api-version=2019-06-01-preview
I couldn't get it to work though. YMMV. Not renaming your db shouldn't be a big deal though. Your team should feel comfortable with changing connection strings etc and it will help them understand they are moving to a different product (Synapse) with different characteristics.
Before you move to Synapse however, have you look at Clustered Columnstore indexes in Azure SQL DB? They are default type of index in a SQL Pool database but are also available in SQL DB. They can compress your data 5-10x so it might end up not that big at all. Columnstore is great for aggregate queries but less so for point lookups so have a think about your workload before you migrate.
100 million rows is not big enough for synapse. Cci data in each shard will only have 1 row group (1mil rows).
Consider using partitioning or CCI in your sql db itself.
Also what's your usage pattern? If you are doing point lookups and updates clustered indexes will perform better.
You can rename a Synapse database easily using the SSMS GUI. (I've just tried this on v18.8).
Just click once on the database name in the Object Explorer to select it, then press the F2 key to rename it.
The Synapse service must be running (i.e. not paused) for the rename to work.
You can rename Synapse database using T-SQL. The command is as follows:
ALTER DATABASE [OldSynapseDBName]
MODIFY NAME = [NewSynapseDBName]
Note you need to be connected to/issue the command from the master database otherwise it will not work.
The command takes can 30 seconds on 100GB DB and there are some caveats such as DB must not be used during operation.

Apply local DB changes to Azure SQL Database

I have a backup file that came from Server A and I copied that .bak files into my local and setup that DB into my Sql Server Management Studio. Now After setting it up I deployed it in Azure Sql Database. But now there were change in the Data in Server A because it's still being used, so I need to get all those changes to the Azure SQL Database that I just deployed. How am I going to do that?
Note: I'm using Azure for my server and I have a local copy of Server A database. So basically in terms of data and structure my local and the previous Server A db is the same. But after a few days Server A data is now updated and my local DB is still the same as when I just backup the db in Server A.
How can I update the DB in Azure to take all the changes in Server A and deploy it in Azure?
You've got a few choices. It's just about migrating data. It's also a question of which data you're going to migrate. Let's say it's a neat, complete replacement. Then, I'd suggest looking at the bacpac mechanism. That's a way to export a database, it's structure and data, then import it into a new location. This is one mechanism of moving to Azure.
If you can't simply replace everything, you need to look at other options. First, there's SSIS. You can build a pipeline to move the data you need. There's also export and import through sqlcmd, which can connect to Azure SQL Database. You can also look to a third party tool like Redgate SQL Data Compare as a way to pick and choose the data that gets moved. There are a whole bunch of other possible Extract/Transform/Load (ETL) tools out there that can help.
Do you want to sync schema changes as well as Data change or just Data? If it is just Data then the best service to be used would be Azure Data Migration Service, where this service can help you copy the delta with respect to Data to Azure incrementally, both is online and offline manner and you can also decide on the schedule.

How can i change sql azure server location

I would like to transfer my existing SQL Azure location to other one, but I think there is no functionality right now to do so on the management portal of Azure.
I just googled it and found one link http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/e6c961cc-5eea-4f07-82c9-a8805d367b05 that says I need to use the data sync option in Azure's portal but I don't have that feature enabled in my Azure portal.
Also if I do use that option, is there any charge for it? Finally, are there any other option that is possible for moving the SQL Azure location?
To Move an Existing SQL Server Database to a New Region on Azure Assuming There Are No Blob Containers Associated With the Database. For further reference see:
https://azure.microsoft.com/en-us/blog/migrating-azure-services-to-new-regions/
Upgrade the database, if necessary, to one of the Premium pricing tiers
Add geo-replication to the existing database. You can choose what region to have the backup of the existing database. Create a new Database server in the target region of your choice. I suggest provisioning that new database server with the same admin username and password as the existing sql database. When creating the secondary database, I suggest making the Secondary type “Readable” as it will allow you the ability to check that all data and schemas were replicated correctly.
Allow the two databases time to sync. Rule of thumb according from Microsoft AzureCAT is: 3 * (5 minutes + database size / 150 MB/minute)
Configure the Firewall settings of the secondary database to allow the necessary IP addresses to access the database
Temporarily shut down whatever users or applications are accessing the existing database.
From the Azure portal select the existing database and change its geo-replication role from primary to secondary.
Run any ddl scripts that rely on the masterdb such as ddl scripts to recreate users and user profiles
Change the connection strings of any applications to point to the new database.
Users and applications can now connect to the new Database
At your discretion you can remove the old database as a backup and add any new regions as backup.
In terms of charges there will be charges for upgrading the old database if it isn't already a premium database. There will also be charges for creating the geo-replicated database. However, those charges can be limited to a day to a few days worth of fees (depending on how long geo-replication takes). Once the new database is up and running, delete the old database as soon as possible to limit additional fees. Finally, if you upgraded the service level of the old database to a premium tier to facilitate the geo-replication, you will want to downgrade the new database to the original service level of the old database to also limit fees.
I think you can use new Import/Export bacpac feature. I have used it to move databases between accounts and can't see why it wouldn't also work between regions.
See how here
If you are able to stop writes to the DB for a time then you can use the Copy feature on the Azure Portal.
Create a new SQL Server in the region of your choosing.
Add your service(s) IP addresses to the new SQL Server firewall.
Stop writes to the origin database.
Open the origin database in the Azure Portal and click Copy at the top of the blade.
Choose your new SQL server located in the destination region.
Wait for the copy to complete.
Update your service(s) to point to the destination DB.
Enable DB writes.
Verify everything is working.
Delete origin database (and server if it was the only DB on the server).
I wouldn't use DataSynch because it creates many objects in your database to perform synchronization (it's an invasive solution). You can indeed try the Import/Export feature; that should work fine. You can also download a trial version of the Enzo backup tool, which comes with a 30-day free trial: http://www.bluesyntax.net/backup.aspx. [disclaimer: I am the author of this tool]
Regarding the pricing question, you may be charged for data being extracted out of the database. Moving data "in" SQL Azure is free of charge for now. If you are transferring the data to a different data center, you will be charged for extracting the data. It's 15 cents per GB in the US and Europe, and 20 cents in Asia. Here are the pricing details: http://www.microsoft.com/windowsazure/pricing/
Keep in mind that a database that requires 4GB of storage doesn't mean you have 4GB of data. Sometimes indexes can take a lot of space. To estimate the size of the data you will need to transfer you can either drop your indexes (and wait a little for the database size to shrink; the database size should be roughly equal to your data transfer needs) or you can calculate the size of your tables by running a command. Here is a link to an article that shows how to do something similar (look at the second command with is a SELECT statement; just run it for all the tables): http://www.sqldocumentor.com/table-size-in-sql-server-find-rows-and-disk-space-usage
Azure has released a new tool called Azure Resource Mover.
Resource mover can for now handle these resources:
Azure VMs and associated disks
NICs
Availability sets
Azure virtual networks
Public IP addresses
Network security groups (NSGs)
Internal and public load balancers
Azure SQL databases and elastic pools
https://learn.microsoft.com/en-us/azure/resource-mover/move-region-within-resource-group
Azure SQL Server is not supported yet but Azure has a complete guide for this anyway:
https://learn.microsoft.com/en-us/azure/resource-mover/tutorial-move-region-sql#move-the-sql-server

Resources