how to pass static value into dynamic on basis of column value in azure databricks - azure

how to pass static value into dynamic on basis of column value in Azure Databricks. Currently, I have 13 notebook and its scheduled ,so I want to schedule only one notebook and In addition, data of column( 13 rows) which I defined separate in 13 notebook so how I dynamically pass that value .

You can create different jobs that refer the single notebook, pass parameters to a job and then retrieve these parameters using Databricks Widgets (widgets are available for all languages). In the notebook it will look as following (for example, in Python):
# this is necessary only if you execute notebook interactively
dbutils.widgets.text("param1_name", "default_value", "description")
# get job parameter
param1 = dbutils.widgets.get("param1_name")
# ... use param1

Related

For each activity in azure synapse

A notebook needs to be run over a set of Array of arrays in for each activity . Is there a way to pass in that subarray as a parameter to the notebook that's inside for each activity.
The datatype for a parameter in notebook in synapse doesn't accept an array type
Currently, Notebook activity does not support passing arrays to Notebook.
So, pass it as String, then In Notebook convert it into array.
My Array parameter in pipeline:
Inside ForEach, pass it as String.
Output of Notebook in one iteration:
Use eval() to convert string in to list in Notebook.

How can I access python variable in Spark SQL?

I have python variable created under %python in my jupyter notebook file in Azure Databricks. How can I access the same variable to make comparisons under %sql. Below is the example:
%python
RunID_Goal = sqlContext.sql("SELECT CONCAT(SUBSTRING(RunID,1,6),SUBSTRING(RunID,1,6),'01_')
FROM RunID_Pace").first()[0]
AS RunID_Goal
%sql
SELECT Type , KPIDate, Value
FROM table
WHERE
RunID = RunID_Goal (This is the variable created under %python and want to compare over here)
When I run this it throws an error:
Error in SQL statement: AnalysisException: cannot resolve 'RunID_Goal' given input columns:
I am new azure databricks and spark sql any sort of help would be appreciated.
One workaround could be to use Widgets to pass parameters between cells. For example, on Python side it could be as following:
# generate test data
import pyspark.sql.functions as F
spark.range(100).withColumn("rnd", F.rand()).write.mode("append").saveAsTable("abc")
# set widgets
import random
vl = random.randint(0, 100)
dbutils.widgets.text("my_val", str(vl))
and then you can refer the value from the widget inside the SQL code:
%sql
select * from abc where id = getArgument('my_val')
will give you:
Another way is to pass variable via Spark configuration. You can set variable value like this (please note that that the variable should have a prefix - in this case it's c.):
spark.conf.set("c.var", "some-value")
and then from SQL refer to variable as ${var-name}:
%sql
select * from table where column = '${c.var}'
One advantage of this is that you can use this variable also for table names, etc. Disadvantage is that you need to do the escaping of the variable, like putting into single quotes for string values.
You cannot access this variable. It is explained in the documentation:
When you invoke a language magic command, the command is dispatched to the REPL in the execution context for the notebook. Variables defined in one language (and hence in the REPL for that language) are not available in the REPL of another language. REPLs can share state only through external resources such as files in DBFS or objects in object storage.
Here is another workaround.
# Optional code to use databricks widgets to assign python variables
dbutils.widgets.text('my_str_col_name','my_str_col_name')
dbutils.widgets.text('my_str_col_value','my_str_col_value')
my_str_col_name = dbutils.widgets.get('my_str_col_name')
my_str_col_value = dbutils.widgets.get('my_str_col_value')
# Query with string formatting
query = """
select *
from my_table
where {0} < '{1}'
"""
# Modify query with the values of Python variable
query = query.format(my_str_col_name,my_str_col_value)
# Execute the query
display(spark.sql(query))
A quick complement to answer.
Do you can use widgets to pass parameters to another cell using magic %sql, as was mentioned;
dbutils.widgets.text("table_name", "db.mytable")
And at the cell that you will use this variable do you can use $ shortcut ~ getArgument isn't supported;
%sql
select * from $table_name

Parametrize and loop KQL queries in JupyterLab

My question is how to assign variables within a loop in KQL magic command in Jupyter lab. I refer to Microsoft's document on this subject and will base my question on the code given here:
https://learn.microsoft.com/en-us/azure/data-explorer/kqlmagic
1. First query below
%%kql
StormEvents
| summarize max(DamageProperty) by State
| order by max_DamageProperty desc
| limit 10
2. Second: Convert the resultant query to a dataframe and assign a variable to 'statefilter'
df = _kql_raw_result_.to_dataframe()
statefilter =df.loc[0].State
statefilter
3. This is where I would like to modify the above query and let statefilter have multiple variables (i.e. consist of different states):
df = _kql_raw_result_.to_dataframe()
statefilter =df.loc[0:3].State
statefilter
4. And finally I would like to run my kql query within a for loop for each of the variables within statefilter. This below syntax may not be correct but it can give an example for what I am looking for:
dfs = [] # an empty list to store dataframes
for state in statefilters:
%%kql
let _state = state;
StormEvents
| where State in (_state)
| do some operations here for that specific state
df = _kql_raw_result_.to_dataframe()
dfs.append(df) # store the df specific to state in the list
The reason why I am not querying all the desired states within the KQL query is to prevent resulting in really large query outcomes being assigned to dataframes. This is not for this sample StormEvents table which has a reasonable size but for my research data which consists of many sites and is really big. Therefore I would like to be able to run a KQL query/analysis for each site within a for loop and assign each site's query results to a dataframe. Please let me know if this is possible or perhaps there may other logical ways to do this within KQL...
There are few ways to do it.
The simplest is to refractor your %%kql cell magic to a %kql line magic.
Line magic can be embedded in python cell.
Other option is to: from Kqlmagic import kql
The Kqlmagic kql method, accept as a string a kql cell or line.
You can call kql from python.
Third way is to call the kql magic via the ipython method:
ip.run_cell_magic('kql', {your kql magic cell text})
You can call it from python.
Example of using the single line magic mentioned by Michael and a return statement that converted the result to JSON. Without the conversion to JSON I wasn't getting anything back.
def testKQL():
%kql DatabaseName | take 10000
return _kql_raw_result_.to_dataframe().to_json(orient='records')

Parameter to notebook - widget not defined error

I have passed a parameter from JobScheduling to Databricks Notebook and tried capturing it inside the python Notebook using dbutils.widgets.get ().
When I run the scheduled job, I get the error "No Input Widget defined", thrown by the library module ""InputWidgetNotDefined"
May I know the reason, thanks.
To use widgets, you fist need to create them in the notebook.
For example, to create a text widget, run:
dbutils.widgets.text('date', '2021-01-11')
After the widget has been created once on a given cluster, it can be used to supply parameters to the notebook on subsequent job runs.

Dynamic Forms + SparkSQL Variable Binding in Zep Notebooks

Is it possible to use SparkSQL in a Zeppelin Notebook to take the input of a dynamic form and bind it, the way that one can with the Angular interpreter?
I'm trying to use SparkSQL in a notebook to create a dashboard, but I want the user to be able to input a universal variable value at the beginning of the notebook and have it apply for multiple paragraphs.
Note level dynamic forms in Zeppelin are not supported yet (there is a Jira Introduce Note level dynamic form).
I am using a workaround for now:
dedicate a paragraph to the dynamic forms and variable binding (e.g. z.angularBind("BIND_VAR_A", z.input("VAR_A", 111))
z.angularBind("BIND_VAR_B", z.input("VAR_B", "Default")) -> image)
recover the variables in any paragraph that shares the same context (e.g. val VAR_A = z.angular("BIND_VAR_A")
val data = "(select * from table where id = " + VAR_A + ") as data")
It works also with the sql interpreter:
%sql
select * from data where id = VAR_A

Resources