I'm trying to get data from a different database in databricks that's not the default one. However, I can't seem to find details about how to go about it.
The docs here only mention that it uses the default db in databricks, however my data is not in there.
Can anyone point some resources to be able to query a different database in Databricks?
Thanks
Related
We have an Azure HDInsight cluster setup that runs Presto and Superset app connecting to it. We recently onboarded a new storage account to the cluster by updating core-site.xml, which allows us to create an external table from the Hive View.
We are able to query the external table from the new storage account in the Hive View without issue.
In the Superset app, we are able to locate the external table and see the table schema without issue.
But when trying to query the external table over the Superset app via Presto, it says presto error: Configuration property storageaccount.dfs.core.windows.net not found
Anyone know what is missing from our setup? Any advice is appreciated.
core-site.xml setting
external table query successful in Hive View
Presto not able to query the same table
Problem resolved. We simply need to restart the Presto.
What are the best ways to Back up and restore Azure SQL Database schema in Azure cloud?
I have tried creating bacpac files, but problem with that is, it will be imported as a new database. I want to back up and restore specific schema only within the same database.
Another way i am looking at is creating a sql script file which contains data and schema using SSMS. But here size of the sql script is huge.
Any help is greatly appreciated
We can use bcp Utility for exporting and importing the data in a fast way.
I want to back up and restore specific schema only within the same
database.
There is no native tool for Azure SQL Database that can do backup/restore of some certain schema.
The closest one to the requirements is a bacpac, however it can restore data into the empty or in a new database.
Therefore, a possible option is to move data out and then in using ETL tools like:
SSIS
ADF
Databricks
I have an Oracle DB with data that I need to load and transform into an Azure SQL Database. I have no control over either the DB nor the application that updates its data.
I'm looking at Azure Data Factory, but I really need data changes in Oracle to be reflected as near to real-time as possible.
I would appreciate any suggestions / insights.
Is ADF the correct tool for the job? If so, what is a good approach to use? If not suitable, what should I consider using instead?
For real-time you don't really want an ELT/ETL tool like ADF. Consider a replication agent like Attunity or (gulp at the licensing costs) GoldenGate.
I don't think Data Factory is not good for you. Yes you can copy data from Oracle to Azure SQL database with it. But like #Thiago Custodio said, we need need to do it to each table you have. That's too complicated.
Just reference: Copy data from and to Oracle by using Azure Data Factory.
As you said, you really need data changes in Oracle to be reflected as near to real-time as possible.
The migration/copy time must be very short. Then the data in Oracle and Azure SQL database could be same before the Oracle data changed next time. I searched a lot and didn't find any real-time copy tools. Actually, I think you want the copy could be something like 'data sync'.
I found this link Sync Oracle Database with SQL Azure, hope it could give some good ideas for you.
About the data migration or copy, You can using bellow ways:
SQL Server Migration Assistant for Oracle (OracleToSQL)
Azure Database Migration Service (DMS)
Reference tutorial:
Migrating Oracle Databases to SQL Server (OracleToSQL): SQL Server Migration Assistant (SSMA) for Oracle is a comprehensive environment that helps you quickly migrate Oracle databases to Azure SQL database.
How to migrate Oracle to Azure SQL Database with minimum downtime:
Hope this helps.
For the record, we went with a product named QLik Replicate (aka Attunity) and it is working very well!
I am trying to find the way to use the application insight logged data in azure search. For now I am able to export data from Application Insight to Blob storage. But when I am trying to fetch it from azure search then I am getting data related to file not the real data stored in file. Also, in every 0.5-1 minute new file is being created in blob.
Could you please help me to find the way to use the application insight data in azure search?
I don't want to use SQL database as data source in azure search because might be some performance issue will occur if I schedule it to get the data from SQL database in every hour and also it will not be synced.
Please suggest.
I have my application's database running in AWS RDS (postgresql). I need to migrate the data from AWS to Azure SQL Data Warehouse.
This is a kind of ETL process and I need to do some calculations/computations/aggregations on the Data from Postgresql and put it in a different schema in Azure SQL Data Warehouse for reporting purpose.
Also, I need to sync the data on a regular basis without duplication.
I am new to this Data Migration concept and kindly let me know what are the best possible ways to achieve this task?
Thanks!!!
Azure datafactory is the option for you. It is a cloud data integration service, to compose data storage, movement, and processing services into automated data pipelines.
Please find the Postgresql connector below.
https://learn.microsoft.com/en-us/azure/data-factory/data-factory-onprem-postgresql-connector
On the transform part you may have to put in some custom intermediate steps to do the data massaging.
Have you tried the Azure datafactory suggestion?
Did it solve your issue?
If not, you can try using Alooma. This solution can replicate PostgreSQL database hosted on Amazon RDS to Azure SQL data warehouse in near real time. (https://www.alooma.com/integrations/postgresql/)
Follow this steps to migrate from RDS to Azure SQL:
Verify your host configuration
On the RDS dashboard under Parameter Groups, navigate to the group that's associated with your instance.
Verify that hot_standby and hot_standby_feedback are set to 1.
Verify that max_standby_archive_delay and max_standby_streaming_delay are greater than 0 (we recommend 30000).
If any of the parameter values need to be changed, click Edit Parameters.
Connect to Alooma
You can connect via SSH server (https://support.alooma.com/hc/en-us/articles/214021869-Connecting-to-an-input-via-SSH) or to to whitelist access to Alooma's IP addresses.
52.35.19.31/32
52.88.52.130/32
52.26.47.1/32
52.24.172.83/32
Add and name your PostreSQL input from the Plumbing screen and enter the following details:
Hostname or IP address of the PostgreSQL server (default port is 5432)
User name and Password
Database name
Choose the replication method you'd like to use for PostgreSQL database replication
For full dump/load replication, provide:
A space- or comma-separated list of the names of the tables you want to replicate.
The frequency at which you'd like to replicate your tables. The more frequent, the more fresh your data will be, but the more load it puts on your PostgreSQL database.
For incremental dump/load replication, provide:
A table/update indicator column pairs for each table you want to replicate.
Don't have an update indicator column? Let us know! We can still make incremental load work for you.
Keep the mapping mode to the default of OneClick if you'd like Alooma to automatically map all PostgreSQL tables exactly to your target data warehouse. Otherwise, they'll have to be mapped manually from the Mapper screen.