In Azure data bricks i created SQL note book. I am trying to use the variables and use that across multiple SQL statements. e.g. declare fiscal year and use that across where criteria. Intent is to avoid hardcoding.
It looks i have to use Python / Scala. Is there any way to achieve this using pure SQL statements?
e.g.:
var #fiscalYear = 2018;
select * from table where fiscalyear = #fiscalyear
Check this link out:
https://forums.databricks.com/questions/176/how-do-i-pass-argumentsvariables-to-notebooks.html
Another way is to do this (to set variable value):
%python
dbutils.widgets.text("var","text")
dbutils.widgets.remove("var")
Then you can go:
%sql
select * from table where value = '$var'
Related
I need to create a dashboard inside Databricks that summarizes the number of rows in the current workspace right now.
Is there a way to create a SQL query to calculate the number of rows by table, schema, and catalog? The expected result would be:
Catalog
Schema
Table
Rows
example_catalog_1
Finance
table_example_1
1567000
example_catalog_1
Finance
table_example_2
67000
example_catalog_2
Procurement
table_example_1
45324888
example_catalog_2
Procurement
table_example_2
89765987
example_catalog_2
Procurement
table_example_3
145000
Currently, I am working on a pure SQL workflow. So I would like to understand if it's possible to execute such an action using SQL, because as much as I know, the dashboards in Databricks do not accept PySpark Codes.
I was looking for a way to do that. I know that it's possible to access the tables in the workspace by using system.information_schema.tables but how to use it to count to total rows for each table presented there?
I was checking that via SQL Server it's possible via sys schema, dynamic query, or BEGIN...END clause. I couldn't find a way in Databricks to do that.
I strongly doubt if you can run that kind of query in the databricks dashboard . The link shared by #Sharma is more as to how to get the record count using dataframe and not how to link that with the databricks dashboard .
This is my 1st time working with Azure synapse and It seems Select Insert is not working, is there any workaround for this one, where I will just use select statement and then dump it into a temporary table?
here are the error prompted
The query references an object that is not supported in distributed processing mode.
and this is my query
Select *
Into #Temp1
FROM [dbo].[TblSample]
This is the azure synapse we are currently using
ondemand-sql.azuresynapse.net
In Synapse On-Demand, the use of Temporary Tables is limited. In your case I am assuming that dbo.TblSample is an External Table which is possibly why you are facing this restriction.
Instead of using a Temp Table, can you either just JOIN the TblSample directly or use a CTE if you are SELECTing specific rows and columns?
Is it possible to export the output of a "magic SQL" command cell in Databricks?
I like the fact that one doesn't have to escape the SQL command and it can be easily formatted. But, I cant seem to be able to use the output in other cells. What I would like to do is export the data to a CSV file, but potentially, finish some final manipulation of the dataframe before I write it out.
sql = "select * from calendar"
df = sqlContext.sql(sql)
display(df.limit(10))
vs (DBricks formatted the following code)
%sql
select
*
from
calendar
but imagine, once you bring in escaped strings, nested joins, etc. Wondering if there is a better way to work with SQL in databricks.
The simplest solution is the most obvious one that I didn't think of: create a view!
%sql
CREATE OR REPLACE TEMPORARY VIEW vwCalendar as
/*
Comments to make your future self happy!
*/
select
c.line1, -- more comments
c.line2, -- more comments
c.zipcode
from
calendar
where
c.status <> 'just an example\'s' -- <<imagine escaping this
and now you can use the view vwCalendar in subsequent SQL cells just like any other table.
and if you want to use it in a python cell:
df = spark.table("vwCalendar")
display(df.limit(3))
https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-ddl-create-view.html
https://docs.databricks.com/spark/latest/spark-sql/udf-python.html#user-defined-functions---python
I can display the Databricks table format using: DESCRIBE {database name}.{table name};
This will display something like:
format id etc.
hive null ...
Is there a way to write a SQL statement like:
SELECT FORMAT FROM {some table} where database = {db name} and table = {table name};
I would like to know if there is a Databricks catalog table that I can query directly. I want to list all of the Databricks tables that have a "format = 'delta'".
Unlike a relational database management system, there is no system catalog to query this information directly.
You need to combine 3 spark statements with python dataframe code to get the answer you want.
%sql
show databases
This command will list all the databases (schemas).
%sql
show tables from dim;
This command will list all the tables in a database (schema).
%sql
describe table extended dim.employee
This command will return detailed information about a table.
As you can see, we want to pick up the following fields (database, table, location, provider and type) for all tables in all databases. Then filter for type 'delta'.
Databricks has the unity catalog in public preview right now.
https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/
Databricks has implemented the information schema that is in all relational database management systems. This is part of this new release.
https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html
In theory, this statement would bring information back on all tables if the unity catalog was enabled in my service. Since it is not enabled, the query processor does not understand my request.
In short, use spark.sql() and dataframes to write a program to grab the information. But this is a lengthy task. A easier alternative is to use the unity catalog. Make sure it is available in your region.
To return the table in format method, we generally use “Describe Formatted”:
DESCRIBE FORMATTED [db_name.]table_name
DESCRIBE FORMATTED delta.`path-to-table` (Managed Delta Lake)
You cannot use select statement to get the format of the table.
The supported SQL – select statements.
SELECT * FROM boxes
SELECT width, length FROM boxes WHERE height=3
SELECT DISTINCT width, length FROM boxes WHERE height=3 LIMIT 2
SELECT * FROM VALUES (1, 2, 3) AS (width, length, height)
SELECT * FROM VALUES (1, 2, 3), (2, 3, 4) AS (width, length, height)
SELECT * FROM boxes ORDER BY width
SELECT * FROM boxes DISTRIBUTE BY width SORT BY width
SELECT * FROM boxes CLUSTER BY length
For more details, refer “Azure Databricks – SQL Guide: Select”.
Hope this helps.
I have set of SQL select queries to execute and share the consolidated results in excel sheet, I'm using sqlyog to do this.
Every time I execute results are in multiple tables. Can I get the results in a single table?
Select * from a.table;
Select * from b.table;
To get the result in a single table you need to use JOIN's in your query.
For Example, I have a area_table and I have a height_table.
to get the result in the consolidated table I would use JOIN and write the query as:
Select a.*,b.* from area_table a
Join height_table b;