I want to know what security protocol is used for interaction between databricks and bigquery while using the spark bigquery connector:
https://github.com/GoogleCloudDataproc/spark-bigquery-connector
Like for example: interaction between Azure databricks and SalesForce Marketing Cloud is via SFTP(SSH file transfer protocol)
Related
Is it possible to connect Azure Data Factory with Azure Databricks SQL Endpoints (Delta table and views) instead of interactive cluster. I tried with Azure delta lake connector but it has options for cluster and not Endpoints?
Unfortunately, you cannot connect Azure Databricks SQL endpoints with Azure Databricks using ADF.
Note: With compute option - you can connect Azure Databricks workspace with the below cluster options:
New Job cluster
Existing interactive cluster
Existing instance pool
Note: With Datastore option - Azure Databricks Delta Lake option you can connect only existing interactive clusters:
Appreciate if you could share the feedback on our feedback channel. Which would be open for the user community to upvote & comment on. This allows our product teams to effectively prioritize your request against our existing feature backlog and gives insight into the potential impact of implementing the suggested feature.
I am trying to install the Apache Spark Connector for SQL Server and Azure SQL to use transactional data in big data analytics and persists results for ad-hoc queries or reporting. The connector allows you to use any SQL database, on-premises or in the cloud, as an input data source or output data sink for Spark jobs.
The spark sql connector is located here https://github.com/microsoft/sql-spark-connector
Can someone let me know how to import it in Azure Synapse Apache Spark?
As per the conversation with Synapse Product Group:
You don’t need to add the connector Apache Spark connector jar files or any package com.microsoft.sqlserver.jdbc.spark to your Synapse Spark pool. The connector is there out of the box for Spark 2.4 and for Spark 3.1 it will be in production most likely in upcoming weeks.
For more details, refer to the Microsoft Q&A thread which addressing similar issue.
I am trying to read data from databricks delta lake via. apache superset. I can connect to delta lake with a JDBC connection string supplied by the cluster but superset seems to require a sql alchemy string so I'm not sure what I need to do to get this working. Thank you, anything helps
superset database setup
Have you tried this?
https://flynn.gg/blog/databricks-sqlalchemy-dialect/
Thanks to contributions by Evan Thomas, the Python databricks-dbapi
package now supports using Databricks as a SQL dialect within
SQLAlchemy. This is particularly useful for hooking up Databricks to a
dashboard frontend application like Apache Superset. It provides
compatibility with both standard Databricks and Azure Databricks.
Just use pyhive and you should be ready to connect to databricks thrift JDBC server.
According to https://cloud.google.com/dataproc/docs/concepts/connectors/bigquery the connector uses BigQuery Storage API to read data using gRPC. However, I couldn't find any Storage API/gRPC usage in the source code here: https://github.com/GoogleCloudDataproc/spark-bigquery-connector/tree/master/connector/src/main/scala
My questions are:
1. could anyone show me the source code where uses storage API & gprc call?
2. Does Dataset<Row> df = session.read().format("bigquery").load() work through GBQ storage API? if not, how to read from GBQ to Spark using BigQuery Storage API?
Spark BigQuery Connector uses only BigQuery Storage API for reads, you can see it here, for example.
Yes, Dataset<Row> df = session.read().format("bigquery").load() works through BigQuery Storage API.
I have asked similar question but I would like to ask question if I can use Microsoft Azure to achieve my goal.
Is streaming input from external database (postgresql) supported in Apache Spark?
I have a database deployed on Microsoft Azure Postgresql. I have a table which I want to stream access from . Using Kafka connect , it seems that I could stream access the table, however, looking on online document , I could not find database(postgresql) as a datasource .
Does azure databricks suport stream reading postgresql table ? Or is it better to use
azure HDInsight with kafka and spark ?
I appreciate if I could get some help.
Best Regards,
Yu Watanabe
Unfortunately, Azure Databricks does not support stream reading of Azure postgresql database.
Azure HDInsight with Kafka and Spark will be the right choice for your requirement.
Managed Kafka and integration with other HDInsight offerings that can be used to make a complete data platform.
Azure also offers a range of other managed services needed in a data platform such as SQL Server, Postgre, Redis and Azure IoT Event Hub.
As per my research, I have found a third-party tool name "Panoply" which integrate Databricks and PostgreSQL using Panoply.
Hope this helps.