I'm new to pyspark, so can you please suggest how to connect SQL DW from Pyspark using jupyter-notebook. I'm not using HDinsight or DataBricks.
I have setup the pyspark and Jupyter-note book using this link.
First, please make sure you have downloaded the Microsoft JDBC Driver for SQL Server from here (Download Microsoft JDBC Driver for SQL Server) and add it to your spark jar libraries path.
Second, it sounds like you setup the pyspark and Jupyter notebook on premise or on local. If that without on Azure cloud, you must add your client ip into your Azure SQL DW firewalls as the figure below, please refer to the section Create a server-level firewall rule of the offical document Quickstart: Create and query an Azure SQL data warehouse in the Azure portal to know more about that.
Next, you need to find the JDBC connection string of Azure SQL DW as the section Sample JDBC connection string of the document Connection strings for Azure SQL Data Warehouse said, you should see it from the tabs Overview or SQL databases on Azure portal.
Then, you can refer to the blog PySpark connection with MS SQL Server to try to connect Azure SQL DW via pyspark in your jupyter notebook.
Hope it helps.
Related
I am thinking about using Snowflake as data warehouse. My databases are in Azure SQl Database and I would like to know what tools I need for etl my data from Azure SQL Database to Snowflake.
I think Snowpark could work for data transformations, but I wonder what other code tool could I use.
Also, I wonder if I use azure blob storage as staging area or snowflake has its own.
Thanks
You can use HEVO data a third-party tool where you can directly migrate data from Microsoft SQL Server to Snowflake.
STEPS TO BE FOLLOWED
Make a connection to your Microsoft SQL Server database.
Choose a replication mode.
Create a Snowflake Data Warehouse configuration.
Alternatively, You can use SnowSQL to Connect Microsoft SQL Server to Snowflake where you export data from SQL Server to SSMS, upload the same to either Azure storage or S3, and move the data from Storage to Snowflake.
REFERENCES:
Microsoft SQL Server to Snowflake
How to move the data from Azure Blob Storage to Snowflake
I am trying to install the Apache Spark Connector for SQL Server and Azure SQL to use transactional data in big data analytics and persists results for ad-hoc queries or reporting. The connector allows you to use any SQL database, on-premises or in the cloud, as an input data source or output data sink for Spark jobs.
The spark sql connector is located here https://github.com/microsoft/sql-spark-connector
Can someone let me know how to import it in Azure Synapse Apache Spark?
As per the conversation with Synapse Product Group:
You don’t need to add the connector Apache Spark connector jar files or any package com.microsoft.sqlserver.jdbc.spark to your Synapse Spark pool. The connector is there out of the box for Spark 2.4 and for Spark 3.1 it will be in production most likely in upcoming weeks.
For more details, refer to the Microsoft Q&A thread which addressing similar issue.
I want to execute spark sql commands from Linux Machine on Databricks Cluster. Is there any way to achieve this?
I have set of spark sql commands in a .sql file and want to execute this file using Databricks cluster in Linux Machine.
I am looking something analogous to SQLPLUS, where we make connection with DB and execute sql, in the similar way do we have any utility/solution to execute spark sql over Databricks cluster.
You can connect to a Databricks cluster using ODBC, JDBC, HTTP or thrift protocol. In every case you will need an access token with enough permissions.
I am using IntelliJ DataGrip to connect via JDBC. I had to configure the databricks driver and used this URI.
jdbc:spark://mycompany.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<MY-DATABRICKS-ORGAINZATION-ID>/<MY-DATABRICKS-CLUSTER-ID>;AuthMech=3;UID=token;PWD=<MY-DATABRICKS-TOKEN>
I believe any modern SQL client should be able to connect as Databricks is exposing standard interfaces.
This is the official documentation from databricks
https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html
I am trying to read data from databricks delta lake via. apache superset. I can connect to delta lake with a JDBC connection string supplied by the cluster but superset seems to require a sql alchemy string so I'm not sure what I need to do to get this working. Thank you, anything helps
superset database setup
Have you tried this?
https://flynn.gg/blog/databricks-sqlalchemy-dialect/
Thanks to contributions by Evan Thomas, the Python databricks-dbapi
package now supports using Databricks as a SQL dialect within
SQLAlchemy. This is particularly useful for hooking up Databricks to a
dashboard frontend application like Apache Superset. It provides
compatibility with both standard Databricks and Azure Databricks.
Just use pyhive and you should be ready to connect to databricks thrift JDBC server.
Hi Is it possible to connect Azure Postgres SQL Database to PowerBI using Direct Query, I cant seem to find information regarding this.
Currently these are the only data sources supported by DirectQuery:
Amazon Redshift
Azure HDInsight Spark (Beta)
Azure SQL Database
Azure SQL Data Warehouse
Google BigQuery (Beta)
IBM DB2 database
IBM Netezza (Beta)
Impala (version 2.x)
Oracle Database (version 12 and above)
SAP Business Warehouse Application Server
SAP Business Warehouse Message Server (Beta)
SAP HANA
Snowflake
Spark (Beta) (version 0.9 and above)
SQL Server
Teradata Database
Vertica (Beta)
PostgreSQL is supported, but only in import mode. So no, you can't use DirectQuery with PostgreSQL (unless you write your own custom connector). You can vote for this idea though.
I'm working on a Custom Connector that will work for Direct Query from PostgreSQL through an ODBC driver. Working on a full write-up (this month when I get time) but until then I can just share the repo here:
DirectQuery for Postgres via ODBC
This is working for us to DirectQuery our Postgres data source via an Azure hosted Windows instance running the custom connector on a On-Premise gateway 24/7.