apache superset connecting to databricks delta lake - databricks

I am trying to read data from databricks delta lake via. apache superset. I can connect to delta lake with a JDBC connection string supplied by the cluster but superset seems to require a sql alchemy string so I'm not sure what I need to do to get this working. Thank you, anything helps
superset database setup

Have you tried this?
https://flynn.gg/blog/databricks-sqlalchemy-dialect/
Thanks to contributions by Evan Thomas, the Python databricks-dbapi
package now supports using Databricks as a SQL dialect within
SQLAlchemy. This is particularly useful for hooking up Databricks to a
dashboard frontend application like Apache Superset. It provides
compatibility with both standard Databricks and Azure Databricks.

Just use pyhive and you should be ready to connect to databricks thrift JDBC server.

Related

How to connect to Azure data lake storage using presto on python?

so i need to use presto to connect to ADLS, now I have read hive can connect to adls and presto can be used to connect to hive, but I could not find a single article on how to connect to adls using hive using python.
Thanks in advance.
I'm able to find the same kind of request from OP in past thread.
This answer from #SachinSheth might help you to accomplish your task.
Also adding the third-party links below:
https://stackoverflow.com/users/5781104/sachin-sheth
https://medium.com/azure-data-lake/connecting-your-own-hadoop-or-spark-to-azure-data-lake-store-93d426d6a5f4

Install sql-spark-connector library to Azure Synapse Apache Spark

I am trying to install the Apache Spark Connector for SQL Server and Azure SQL to use transactional data in big data analytics and persists results for ad-hoc queries or reporting. The connector allows you to use any SQL database, on-premises or in the cloud, as an input data source or output data sink for Spark jobs.
The spark sql connector is located here https://github.com/microsoft/sql-spark-connector
Can someone let me know how to import it in Azure Synapse Apache Spark?
As per the conversation with Synapse Product Group:
You don’t need to add the connector Apache Spark connector jar files or any package com.microsoft.sqlserver.jdbc.spark to your Synapse Spark pool. The connector is there out of the box for Spark 2.4 and for Spark 3.1 it will be in production most likely in upcoming weeks.
For more details, refer to the Microsoft Q&A thread which addressing similar issue.

How to run spark sql queries using Databricks Cluster through Linux?

I want to execute spark sql commands from Linux Machine on Databricks Cluster. Is there any way to achieve this?
I have set of spark sql commands in a .sql file and want to execute this file using Databricks cluster in Linux Machine.
I am looking something analogous to SQLPLUS, where we make connection with DB and execute sql, in the similar way do we have any utility/solution to execute spark sql over Databricks cluster.
You can connect to a Databricks cluster using ODBC, JDBC, HTTP or thrift protocol. In every case you will need an access token with enough permissions.
I am using IntelliJ DataGrip to connect via JDBC. I had to configure the databricks driver and used this URI.
jdbc:spark://mycompany.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/<MY-DATABRICKS-ORGAINZATION-ID>/<MY-DATABRICKS-CLUSTER-ID>;AuthMech=3;UID=token;PWD=<MY-DATABRICKS-TOKEN>
I believe any modern SQL client should be able to connect as Databricks is exposing standard interfaces.
This is the official documentation from databricks
https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html

Does azure databricks support stream access fromr azure postgresql?

I have asked similar question but I would like to ask question if I can use Microsoft Azure to achieve my goal.
Is streaming input from external database (postgresql) supported in Apache Spark?
I have a database deployed on Microsoft Azure Postgresql. I have a table which I want to stream access from . Using Kafka connect , it seems that I could stream access the table, however, looking on online document , I could not find database(postgresql) as a datasource .
Does azure databricks suport stream reading postgresql table ? Or is it better to use
azure HDInsight with kafka and spark ?
I appreciate if I could get some help.
Best Regards,
Yu Watanabe
Unfortunately, Azure Databricks does not support stream reading of Azure postgresql database.
Azure HDInsight with Kafka and Spark will be the right choice for your requirement.
Managed Kafka and integration with other HDInsight offerings that can be used to make a complete data platform.
Azure also offers a range of other managed services needed in a data platform such as SQL Server, Postgre, Redis and Azure IoT Event Hub.
As per my research, I have found a third-party tool name "Panoply" which integrate Databricks and PostgreSQL using Panoply.
Hope this helps.

Azure ML - Import Hive Query Failing - Hive over ADLS

We are working on Azure ML and ADLS combination. Since HDInsight Cluster is working over ADLS, we are trying to use Hive Query and HDFS route and running into problems.
Request your help in solving the problem of reading data from hive query and writing to HDFS. Below is the error URL for reference:
https://studioapi.azureml.net/api/sharedaccess?workspaceId=025ba20578874d7086e6c495cc49a3f2&signature=ZMUCNMwRjlrksrrmsrx5SaGedSgwMmO%2FfSHvq190%2F1I%3D&sharedAccessUri=https%3A%2F%2Fesprodussouth001.blob.core.windows.net%2Fexperimentoutput%2Fccf9a206-730d-4773-b44e-a2dd8c6e87b9%2Fccf9a206-730d-4773-b44e-a2dd8c6e87b9.txt%3Fsv%3D2015-02-21%26sr%3Db%26sig%3DHkuFm8B2Ba1kEWWIwanqlv%2FcQPWVz0XYveSsZnEa0Wg%3D%26st%3D2017-10-16T18%3A31%3A06Z%26se%3D2017-10-17T18%3A36%3A06Z%26sp%3Dr
Azure Machine Learning supports Hive but not over ADLS.

Resources