Synapse Spark SQL Delta Merge Mismatched Input Error - azure

I am trying to update the historical table, but am getting a merge error. When I run this cell:
%%sql
select * from main
UNION
select * from historical
where Summary_Employee_ID=25148
I get a two row table that looks like:
EmployeeID Name
25148 Wendy Clampett
25148 Wendy Monkey
I'm trying to update the Name... using the following merge command
%%sql
MERGE INTO main m
using historical h
on m.Employee_ID=h.Employee_ID
WHEN MATCHED THEN
UPDATE SET
m.Employee_ID=h.Employee_ID,
m.Name=h.Name
WHEN NOT MATCHED THEN
INSERT(Employee,Name)
VALUES(h.Employee,h.Name)
Here's my error:
Error:
mismatched input 'MERGE' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD'}(line 1, pos 0)

Synapse doesn't support the sql merge, like databricks. However, you can use the python solution. Note historical was really my updates...
So for the above, I used:
import delta
main = delta.DeltaTable.forPath(spark,"path")
(main
.alias("main")
.merge(historical.alias("historical"),
.whenMatchedUpdate(set = {main.Employee_ID=historical.Employee_ID})
.whenNotMathcedInsert(values =
{"employeeID":"historical.employeeID","name"="historical.name})
.execute()
)

Your goal is to upsert the target table historical, but as per your query the target table is set to main instead of historical and also the update statement set to main and insert statement set to historical
Try the following,
%%sql
MERGE INTO historical target
using main source
on source.Employee_ID=target.Employee_ID
WHEN MATCHED THEN
UPDATE SET
target.Name=source.Name
WHEN NOT MATCHED THEN
INSERT(Employee,Name)
VALUES(source.Employee,source.Name)

It's supported in Spark 3.0 that's currently in preview, so this might be worth a try. I did see the same error on the Spark 3.0 pool, but it's quite misleading as it actually means that you're trying to merge on duplicate data or that you're offering duplicate data to the original set. I've validated this by querying the delta lake and the raw file for duplicates with the serverless SQL Pool and Polybase.

Related

How to create temporary view in Spark SQL using a CTE?

I'm attempting to create a temp view in Spark SQL using a with the statement:
create temporary view cars as (
with models as (
select 'abc' as model
)
select model from models
)
But this error is thrown:
error in SQL statement: ParseException:
mismatched input 'with' expecting {'(', 'SELECT', 'FROM', 'DESC', 'VALUES', 'TABLE', 'INSERT', 'DESCRIBE', 'MAP', 'MERGE', 'UPDATE', 'REDUCE'}(line 2, pos 8)
== SQL ==
create temporary view cars as (
with models as (
--------^^^
select 'abc' as model
)
select model from models
)
Removing brackets after first as makes it work:
create temporary view cars as
with models as (
select 'abc' as model
)
select model from models

Presto combining two columns and output as one

i'm trying to combine two columns together in presto.
this is part of a query, and it has to be formatted in a certain way.
SELECT 'Display' AS channel,
DBM.dated,
DBM.market,
DBM.impressions,
DBM.clicks,
sum(DBM.amount_spent_EUR)+sum(DBm.platform_fee) as DBM.amount_spent_EUR
FROM
(
SELECT
DATE_FORMAT(DATE_PARSE(date,'%Y/%m/%d'),'%Y-%m-%d') AS dated,
trim(SPLIT_PART(insertion_order,'|',3)) AS market,
sum(cast(impressions as double)) as impressions,
sum(cast(clicks as double)) as clicks,
sum(CAST(media_cost_advertiser_currency AS DOUBLE)*1.15) AS amount_spent_EUR,
sum(CAST(media_fee_1_adv_currency AS DOUBLE)*1.15) as platform_fee
FROM ralph_lauren_google_sheet_dbm_data_2
WHERE dated <= {{days_ago 1}}
GROUP BY 1,2
)DBM
the error is as following:
Query 20190814_125505_19433_rcrut failed: line 1:144: extraneous input
'.' expecting {, ',', 'EXCEPT', 'FROM', 'GROUP', 'HAVING',
'INTERSECT', 'LIMIT', 'ORDER', 'UNION', 'WHERE'}
the error is the dbm.amount_spent_eur. this column has to come out like this.
How can I get around this?
You can use double quotes in such cases.
as "DBM.amount_spent_EUR"

How to convert a weird date time string with timezone into a timestamp (PySpark)

I have a column called datetime which is a string of form
Month Name DD YYYY H:MM:SS,nnn AM/PM TZ
where nnn is the nanosecond precision, AM/PM is self explanatory and TZ is the timezone for example MDT
For example:
Mar 18 2019 9:48:08,576 AM MDT
Mar 18 2019 9:48:08,623 AM MDT
Mar 18 2019 9:48:09,273 AM MDT
The nanosecond precision is importance since the logs are so close in time. TZ is optional as they're all in the same timezone but ideally would like to capture this too.
Is PySpark able to handle this? I've tried using unix_timestamp with no luck.
Edit
Tried
%sql
formatw = 'MMM dd yyyy H:mm:ss,SSS a z'
select to_date(string)
from table
Get error:
Error in SQL statement: ParseException:
mismatched input 'format' expecting {'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'MERGE', 'UPDATE', 'CONVERT', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD', 'OPTIMIZE'}(line 1, pos 0)
I would recommend you to take a look to pyspark.sql.functions.to_date(col, format=None) function.
From the documentation:
Converts a Column of pyspark.sql.types.StringType or pyspark.sql.types.TimestampType into pyspark.sql.types.DateType using the optionally specified format. Specify formats according to SimpleDateFormats. By default, it follows casting rules to pyspark.sql.types.DateType if the format is omitted (equivalent to col.cast("date")).
So, you can use all the Date patterns specified in Java - SimpleDateFormat.
If you want to use the Python formats, then I would recommend defining your own UDF using datetime. But, using the Spark one has better performance and it's already defined.
Besides, is it nanoseconds or milliseconds (H:mm:ss,SSS)?

How to Left Join in Presto SQL?

Can't for the life of me figure out a simple left join in Presto, even after reading the documentation. I'm very familiar with Postgres and tested my query there to make sure there wasn't a glaring error on my part. Please reference code below:
select * from
(select cast(order_date as date),
count(distinct(source_order_id)) as prim_orders,
sum(quantity) as prim_tickets,
sum(sale_amount) as prim_revenue
from table_a
where order_date >= date '2018-01-01'
group by 1)
left join
(select summary_date,
sum(impressions) as sem_impressions,
sum(clicks) as sem_clicks,
sum(spend) as sem_spend,
sum(total_orders) as sem_orders,
sum(total_tickets) as sem_tickets,
sum(total_revenue) as sem_revenue
from table_b
where site like '%SEM%'
and summary_date >= date '2018-01-01'
group by 1) as b
on a.order_date = b.summary_date
Running that gives the following error
SQL Error: Failed to run query
Failed to run query
line 1:1: mismatched input 'on' expecting {'(', 'SELECT', 'DESC', 'WITH',
'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'GRANT',
'REVOKE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'CALL', 'PREPARE', 'DEALLOCATE', 'EXECUTE'} (Service: AmazonAthena; Status Code: 400; Error Code: InvalidRequestException; Request ID: a33a6671-07a2-4d7b-bb75-f70f7b82409e)
line 1:1: mismatched input 'on' expecting {'(', 'SELECT', 'DESC', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'GRANT', 'REVOKE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'CALL', 'PREPARE', 'DEALLOCATE', 'EXECUTE'} (Service: AmazonAthena; Status Code: 400; Error Code: InvalidRequestException; Request ID: a33a6671-07a2-4d7b-bb75-f70f7b82409e)
The first problem I notice is that your join clause assumes the first sub-query is aliased as a, but it is not aliased at all. I recommend aliasing that table to see if that fixes it (I also recommend aliasing the order_date column explicitly outside of the cast() statement since you are joining on that column).
Try this:
select * from
(select cast(order_date as date) as order_date,
count(distinct(source_order_id)) as prim_orders,
sum(quantity) as prim_tickets,
sum(sale_amount) as prim_revenue
from table_a
where order_date >= date '2018-01-01'
group by 1) as a
left join
(select summary_date,
sum(impressions) as sem_impressions,
sum(clicks) as sem_clicks,
sum(spend) as sem_spend,
sum(total_orders) as sem_orders,
sum(total_tickets) as sem_tickets,
sum(total_revenue) as sem_revenue
from table_b
where site like '%SEM%'
and summary_date >= date '2018-01-01'
group by 1) as b
on a.order_date = b.summary_date
One option is to declare your subqueries by using with:
with a as
(select cast(order_date as date),
count(distinct(source_order_id)) as prim_orders,
sum(quantity) as prim_tickets,
sum(sale_amount) as prim_revenue
from table_a
where order_date >= date '2018-01-01'
group by 1),
b as
(select summary_date,
sum(impressions) as sem_impressions,
sum(clicks) as sem_clicks,
sum(spend) as sem_spend,
sum(total_orders) as sem_orders,
sum(total_tickets) as sem_tickets,
sum(total_revenue) as sem_revenue
from table_b
where site like '%SEM%'
and summary_date >= date '2018-01-01'
group by 1)
select * from a
left join b
on a.order_date = b.summary_date;

How to Import the data from an excel sheet to the MySQL table AUTOMATICALLY?

I got a requirement to develop a Java app to load the data from an excel sheet to a MySQL database table everyday.
My actual requirement is when ever the user opens the application, it should automatically load the data from Excel -> MySQL database table.
I do not have idea to import the data from excel to MySQL table.
Can anyone help me in this?
Thanks in advance.
You can import the data from excel using MySQL LOAD DATA INFILE command..
For automation you have to write a function to run the MySQL command on load..
Create a dummyTable for daily load
Clear it before you begin
Save your excel in same column structure
as dummyTable in csv format, comma delimited
Run:
LOAD DATA INFILE '/path/theFile1.csv'
INTO TABLE dummyTable
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
Proceed to use it (dummyTable). Clear it.
Yes, you can generate the mysql queries from excel using PHP.
error_reporting(E_ALL ^ E_NOTICE);
require_once 'query_generator.php';
$data = new Spreadsheet_Excel_Reader("example.xls");
Output:
INSERT INTO table_name VALUES ( '101', 'Narendra Modi', 'Cabinet Ministers', 'Personnel, Public Grievances and Pensions, Department of Atomic Energy, Department of Space, All important policy issues and all other portfolios not allocated to any Minister', 'NULL', 'NULL', 'NULL');
INSERT INTO table_name VALUES ( '102', 'Rajnath Singh', 'Cabinet Ministers', 'Home Affairs', 'NULL', 'NULL', 'NULL');
INSERT INTO table_name VALUES ( '103', 'Sushma Swaraj', 'Cabinet Ministers', 'External Affairs, Overseas Indian Affairs', 'NULL', 'NULL', 'NULL');
INSERT INTO table_name VALUES ( '104', 'Arun Jaitley', 'Cabinet Ministers', 'Finance, Corporate Affairs and Defence', 'NULL', 'NULL', 'NULL');
......
I have written PHP code for generate all queries from excel with sngle click for our production system

Resources