Object embedded in Databricks SQL command

Object embedded in Databricks SQL command - databricks

I came across the following SQL command in Databricks notebook and I am confused about what is this ${da.paths.working_dir} object in following SQL command. Is it a python object or something else?
SELECT * FROM parquet.${da.paths.working_dir}/weather
I know it contains the path of a working directory but how can I access/print it.
I tried to demystify it but failed as illustrated in the following figure.
NOTE: My notebook is SQL notebook

Finally, I figured it out. This is a high-level variable in Databricks SQL and we can access it using the SELECT keyword in Databricks SQL as shown below:
SELECT '${da.paths.working_dir}';
EDIT: This high variable is spark configuration which can be set as follows:
## spark.conf.set(key, value)
spark.conf.set(da.paths.working_dir, "/path/to/files")
To access this property in python:
spark.conf.get(da.paths.working_dir)
To access this property in Databricks SQL:
SELECT {da.paths.working_dir}

Related

Databricks python notebook to execute %sql commandlet based on condition

I have created a python notebook in Databricks, I have python logic and need to execute a %sql commandlet.
Say I wanted to execute that commandlet2 based on a python variable
cmd1
EXECUTE_SQL= True
cmd2
if condition :
%sql .....

As mentioned, you can use following Python code (or Scala) to make behavior similar to the %sql cell:
if condition:
display(spark.sql("your-query"))
One advantage of this approach is that you can embed variables into the query text.

Another alternate which I used is
Extracted the sql to a different notebook,
In my case i don't want any results back
Also I am cleaning up the delta tables and deleting contents.
/clean-deltatable-notebook (an sql notebook)
delete from <database>.<table>
used the dbutils.run.notebook() from the python notebook.
cmd2
if condition :
result = dbutils.run.notebook('<path-of-(clean-deltatable-noetbook)',timeout_seconds = 30)
print(result)
Link on dbutils.notebook.run() from Databricks

How to list hive variables available in a notebook session?

In a hive session it is possible to list the variables available with the following:
0: jdbc:hive2://127.0.0.1:10000>set;
How can I list the hive variables from a databricks notebook?

The solution was to just run the SET SQL command:
%sql
SET

Variable value has to pass in the Databricks direct sql query instead of spark.sql(""" """)

In the databricks notebook, I have written the query
%sql
set four_date='2021-09-16';
select * from df2_many where four_date='{{four_date}}'
Its not working, please advise that how to apply in the direct query instead of spark.sql(""" """)
Note: dont use $ its asking value in the text box, please confirm if there is any other alternative solution
how to apply the variable values which is to manipulate in the direct query at the Databricks

If you are using a Databricks Notebook, you will need to use Widgets:
https://docs.databricks.com/notebooks/widgets.html
CREATE WIDGET DATE four_date DEFAULT "2021-09-16"
SELECT * FROM df2_many WHERE four_date=getArgument("four_date")

How to read a table from a databse in azure synapse and write it into the default (spak)?

I am writing the below code in the notebook of azure synapse
%%spark
val df = spark.read.sqlanalytics("emea_analytics.abc.cde_mydata")
df.write.mode("overwrite").saveAsTable("default.t1")
I am getting the below error:
Error: com.microsoft.spark.sqlanalytics.exception.SQLAnalyticsConnectorException: The specified table does not exist. Please provide a valid table.
at com.microsoft.spark.sqlanalytics.read.SQLAnalyticsReader.readSchema(SQLAnalyticsReader.scala:103)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation$.create(DataSourceV2Relation.scala:175)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:204)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at org.apache.spark.sql.SqlAnalyticsConnector$SQLAnalyticsFormatReader.sqlanalytics(SqlAnalyticsConnector.scala:42)

The error message clearly says - The specified table does not exist. Please provide a valid table.
Error : com.microsoft.spark.sqlanalytics.exception.SQLAnalyticsConnectorException: The specified table does not exist. Please provide a valid table.
Make sure specified table exists before you running above code.
Reference: Azure Synapse Analytics - Load the NYC Taxi data into the Spark nyctaxi database.

update table from Pyspark using JDBC

I have a small log dataframe which has metadata regarding the ETL performed within a given notebook, the notebook is part of a bigger ETL pipeline managed in Azure DataFactory.
Unfortunately, it seems that Databricks cannot invoke stored procedures so I'm manually appending a row with the correct data to my log table.
however, I cannot figure out the correct sytnax to update a table given a set of conditions :
the statement I use to append a single row is as follows :
spark_log.write.jdbc(sql_url, 'internal.Job',mode='append')
this works swimmingly however, as my Data Factory is invoking a stored procedure,
I need to work in a query like
query = f"""
UPDATE [internal].[Job] SET
[MaxIngestionDate] date {date}
, [DataLakeMetadataRaw] varchar(MAX) NULL
, [DataLakeMetadataCurated] varchar(MAX) NULL
WHERE [IsRunning] = 1
AND [FinishDateTime] IS NULL"""
Is this possible ? if so can someone show me how?
Looking at the documentation this only seems to mention using select statements with the query parameter :
Target Database is an Azure SQL Database.
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
just to add this is a tiny operation, so performance is a non-issue.

You can't do single record updates using jdbc in Spark with dataframes. You can only append or replace the entire table.
You can do updates using pyodbc- requires installing the MSSQL ODBC driver (How to install PYODBC in Databricks) or you can use jdbc via JayDeBeApi (https://pypi.org/project/JayDeBeApi/)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Object embedded in Databricks SQL command - databricks

Related

Databricks python notebook to execute %sql commandlet based on condition

How to list hive variables available in a notebook session?

Variable value has to pass in the Databricks direct sql query instead of spark.sql(""" """)

How to read a table from a databse in azure synapse and write it into the default (spak)?

update table from Pyspark using JDBC

Categories

Resources