I have a table called tbl1 in azure data bricks and I want to perform simple unpivot operation using SQL. I am new to SQL and DataBricks. I followed online tutorial to perform unpivot operation. Based on that I came up with below syntax. But. i am getting this You have an error in your SQL syntax; it seems the error is around: 'unpivot( height for details IN (ucol1, ucol2, ucol3))' error continuously.
SQL Syntax for unpivot operation.
%sql
Select date_format(X.Date,'dd-MMM')Datee
,X.width
,X.height
,X.details
,X.col1
From
(
Select
Datee,width,B.details,col1,height, from tbl1 A,
Unpivot
(
height
for details IN (
ucol1, ucol2, ucol3
)) B
GROUP BY Datee,width,B.Details,height,col1
)X
Is there anything wrong with the above SQL syntax?
Any hint would be appreciable.
Please let me know if you need any further details.
You can use stack in Spark SQL to unpivot, eg
SELECT
DATE_FORMAT( Date, 'dd-MMM') x,
STACK( 3, ucol1, ucol2, ucol3 )
FROM tbl1;
It would be helpful if you provided some simple sample data and expected results as it's not 100% clear what you need as your query does not work in any language.
Related
This is my 1st time working with Azure synapse and It seems Select Insert is not working, is there any workaround for this one, where I will just use select statement and then dump it into a temporary table?
here are the error prompted
The query references an object that is not supported in distributed processing mode.
and this is my query
Select *
Into #Temp1
FROM [dbo].[TblSample]
This is the azure synapse we are currently using
ondemand-sql.azuresynapse.net
In Synapse On-Demand, the use of Temporary Tables is limited. In your case I am assuming that dbo.TblSample is an External Table which is possibly why you are facing this restriction.
Instead of using a Temp Table, can you either just JOIN the TblSample directly or use a CTE if you are SELECTing specific rows and columns?
How can I get the latest row from a stream in data flow transformation. Below is the sql equivalent query. I checked filter mapping transformation but I did find any relevant function in visual expression builder. I amnew to data factory and I am currently exploring the data flow canvas.
SQL: Select Top 1 * from XYZ table order by timestamp desc;
In Source transformation, under source options, you can select the Input as Query and write the SQL query to get the latest record.
I can display the Databricks table format using: DESCRIBE {database name}.{table name};
This will display something like:
format id etc.
hive null ...
Is there a way to write a SQL statement like:
SELECT FORMAT FROM {some table} where database = {db name} and table = {table name};
I would like to know if there is a Databricks catalog table that I can query directly. I want to list all of the Databricks tables that have a "format = 'delta'".
Unlike a relational database management system, there is no system catalog to query this information directly.
You need to combine 3 spark statements with python dataframe code to get the answer you want.
%sql
show databases
This command will list all the databases (schemas).
%sql
show tables from dim;
This command will list all the tables in a database (schema).
%sql
describe table extended dim.employee
This command will return detailed information about a table.
As you can see, we want to pick up the following fields (database, table, location, provider and type) for all tables in all databases. Then filter for type 'delta'.
Databricks has the unity catalog in public preview right now.
https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/
Databricks has implemented the information schema that is in all relational database management systems. This is part of this new release.
https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html
In theory, this statement would bring information back on all tables if the unity catalog was enabled in my service. Since it is not enabled, the query processor does not understand my request.
In short, use spark.sql() and dataframes to write a program to grab the information. But this is a lengthy task. A easier alternative is to use the unity catalog. Make sure it is available in your region.
To return the table in format method, we generally use “Describe Formatted”:
DESCRIBE FORMATTED [db_name.]table_name
DESCRIBE FORMATTED delta.`path-to-table` (Managed Delta Lake)
You cannot use select statement to get the format of the table.
The supported SQL – select statements.
SELECT * FROM boxes
SELECT width, length FROM boxes WHERE height=3
SELECT DISTINCT width, length FROM boxes WHERE height=3 LIMIT 2
SELECT * FROM VALUES (1, 2, 3) AS (width, length, height)
SELECT * FROM VALUES (1, 2, 3), (2, 3, 4) AS (width, length, height)
SELECT * FROM boxes ORDER BY width
SELECT * FROM boxes DISTRIBUTE BY width SORT BY width
SELECT * FROM boxes CLUSTER BY length
For more details, refer “Azure Databricks – SQL Guide: Select”.
Hope this helps.
I have a requirement where i need to migrate data from one table of oracle DB to different tables based on condition like if tableA contains value A in one column then insert it into tableA else insert it into tableB. Can we do this using TALEND.
Someone please guide me.
Yes you can do conditional load in Talend. and based on your scenario you can use filter expression of Talend to do it. check screen for more details.
add two oracle output for loading into table A and table B like below screen.
I Have Cassandra Column Family Name as Data3, In That I Have 2 Columns With Data As Follows
URL Data
www.google.com Google
I Want A Similar Query in Cassandra like ( SELECT * FROM Table1 WHERE Data='Google')
Thanks
select * from Data3 where Data = 'Google'
This is CQL, as described by CQL Language Reference on DataStax.
Weirdly, that we use earlier version of Cassandra where CQL was not supported. And we never thought that we actually required something like SQL. If you wanted more detailed read these articles
CQL Utility
You could query without SQL type utility
You could see non-SQL example/tutorial here . Here is how you select columns