Malformed SQL Statement: Expected token 'USING' but found Identifier with value 't' instead - apache-spark

I am trying to merge to a SQL Database using the following code in Databricks with pyspark
query = """
MERGE INTO deltadf t
USING df s
ON s.SLAId_Id = t.SLAId_Id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
"""
driver_manager = spark._sc._gateway.jvm.java.sql.DriverManager
con = driver_manager.getConnection(url) #
stmt = con.createStatement()
stmt.executeUpdate(query)
stmt.close()
But I'm getting the following error:
SQLException: Malformed SQL Statement: Expected token 'USING' but found Identifier with value 't' instead at position 25.
Any thoughts on where might be going wrong?

I don't know why you're getting this exact error. However I believe there are a number of issues with what you are trying to do.
Running the query via JDBC makes it run in SQL Server only. Construct like WHEN MATCHED THEN UPDATE SET * / WHEN NOT MATCHED INSERT * will not work. Databricks accepts it, but for SQL Server you need to explicitly provide columns to update and values to insert (reference).
Also, do you actually have tables named deltadf and df in SQL Server? I suppose you have a Dataframe or temporary view named df... this will not work. As said, this query executes in SQL Server only. If you want to upload data from Dataframe use df.write.format("jdbc").save (reference).
See this Fiddle - if deltadf and df are tables, running this query in SQL Server (any version) will only complain about Incorrect syntax near '*'.

SQLException: Malformed SQL Statement: Expected token 'USING' but found Identifier with value 't' instead at position 25.
if you missed updating any specific field or specific syntax, you will get this error.
I performed merge operation its working fine for me without error, Please follow below reference .
Reference:
https://www.youtube.com/watch?v=i5oM2bUyH0o
https://docs.databricks.com/delta/delta-update.html#upsert-into-a-table-using-merge
https://www.sqlshack.com/sql-server-merge-statement-overview-and-examples/

Related

Databricks SQL nondeterministic expressions using DELETE FROM

I am trying to execute the following SQL clause using Databricks SQL:
DELETE FROM prod_gbs_gpdi.bronze_data.sapex_ap_posted AS HISTORICAL_DATA
WHERE
HISTORICAL_DATA._JOB_SOURCE_FILE = (SELECT MAX(NEW_DATA._JOB_SOURCE_FILE) FROM temp_sapex_posted AS NEW_DATA)
The intention of the query is to delete a set of rows in a historical data table based on a value present in a column of new data table.
For reasons that I cannot understand it is raising an error like:
Error in SQL statement: AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate, Window, or Generate, but found:
(HISTORICAL_DATA._JOB_SOURCE_FILE IN (listquery()))
in operator DeleteCommandEdge
It seems it is not accepting a subquery inside the where clause. That's odd for me, as in the Databricks documentation Link it is acceptable.
I even tried other types of predicates, like:
(SELECT FIRST(NEW_DATA._JOB_SOURCE_FILE) FROM temp_sapex_posted AS NEW_DATA)
(SELECT DISTINCT NEW_DATA._JOB_SOURCE_FILE FROM temp_sapex_posted AS NEW_DATA)
IN (SELECT NEW_DATA._JOB_SOURCE_FILE FROM temp_sapex_posted AS NEW_DATA)
None of them seems to take effect in executing the query successfully.
What's even odd for me is that I was able to accomplish a similar case with a slightly different query, as it can be seen in this link.
I have created demo_table1 & demo_table2 for querying purpose. I have created the following query carrying the similar purpose. I haven’t considered double aliases and have given straight query using subquery, it also depends on data frame in usage use a normal pandas data frame. it works fine for me.
delete from demo_table1 as t1 where t1.age = (select min(t2.age) from demo_table2 as t2);

Tablename with spaces at JDBC connection gives error

I'm trying to establish a connection in AWS Glue, using a pyspark script.
The JDBC connection is pointing to a Microsoft SQL Server in Azure Cloud.
When I try to enter the connection string, it works until it gets to the table that it should read. That's mainly because of the whitespace inside the table name. Do you have any hint on how to write the syntax here?
source_df = sparksession.read.format("jdbc").option("url","jdbc:sqlserver://00.000.00.00:1433;databaseName=Sample").option("dbtable", "dbo.122 SampleCompany DE$Contract Header").option("user", "sampleuser").option("password", "sampL3p4ssw0rd").load()
When you execute this, it always throws the error:
py4j.protocol.Py4JJavaError: An error occurred while calling o69.load. : com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '.122'
Do you have any idea how to solve this?
Given the presence of spaces (and probably the dollar sign, and the fact the identifier starts with numbers), you need to quote the object name. Quoting object names in SQL Server is done by enclosing it in brackets (or, though this may depend on the session config, double quotes).
Keep in mind that dbo is the schema, while 122 SampleCompany DE$Contract Header is the table name. Schema and table name need to be quoted separately, not as a unit.
So, try to pass "dbo.[122 SampleCompany DE$Contract Header]"

Snowflake Python connector insert doesn't accept variables

Using snowflake connector I am trying to insert a record in a table.
In snowflake doc they have shown examples with hard coded strings, but when I try to use my variables instead, it doesn't work. Please suggest how to use variables in this case.
conn.cursor().execute(
"INSERT INTO cm.crawling_metrics(FEED_DATE,COMP_NAME,REFRESH_TYPE,CRAWL_INPUT,CRAWL_SUCCESS) VALUES " +
"(score_creation_date,compName,sRefreshType,mp_sku_count,comp_sku_count)"
I get the below error
snowflake.connector.errors.ProgrammingError: 000904 (42000): SQL compilation error: error line 1 at position 100
invalid identifier 'SCORE_CREATION_DATE'
NOTE: In the above code if i hard code with String instead of variables, it works.
Kindly suggest what is the right way ?
You need to use string interpolation / formatting for your code to use these as actual variables:
conn.cursor().execute(
"INSERT INTO cm.crawling_metrics (FEED_DATE, COMP_NAME, REFRESH_TYPE, CRAWL_INPUT, CRAWL_SUCCESS) VALUES " +
f"('{score_creation_date}', '{compName}', '{sRefreshType}', '{mp_sku_count}', '{comp_sku_count}')"
)

What am I missing in trying to pass Variables in an SSIS Execute SQL Task?

I am creating an SSIS Execute SQL Task that will use variables but it is giving me an error when I try to use it. When I try to run the below, it gives me an error and when I try to build the query, it gives me an error SQL Sytnax Errors encountered and unable to parse query. I am using an OLEDB connection. Am I not able to use variables to specify the tables?
You can't parameterize a table name.
Use the Expressions editor in your Execute SQL Task to Select a SqlStatementSource property.
Try "SELECT * FROM " + #[User::TableName]
After clicking OK twice (to exit the Task editor), you should be able to reopen the editor and find your table name in the SQL statement.
Add a string cast in a case where it might be a simple Object - (DT_WSTR,100)
You are using only single parameter(?) in the query and assigning 3 inputs to that parameters which is not fair put only single input and assign some variable as input as shown in image and change the value of variable respectively.
the parameter name should be incremented by 1 start with 0 because they are the indexes representing the "?" in the query which was written the query window.

Rocket Software SQL ODBC, Retrieve Wrong Filenames

I installed Rocket Software for accessing an Unidata Db through SQL Server 2008. The idea is to write SQL Procedures for populating SQL Tables, but the problem I am getting is retrieving wrong filenames i. e. Select * from MyDb_Members. I got the field names as Member{Name, Phone{number. In my unidata core these fields are named as Member Name, Phone Number.
Do you know if there is way to run sql queries with those field names without getting sql query errors. It looks sql server does not like to use that name convention:
Select Member{Name from MyDb_Members
Error near '{'
Thanks for your help
Try formatting the query using NATIVE keyword. I don't use Unidata, but in UniVerse this works well. I have a lot of columns that contain periods and those are illegal column names in standard SQL.
{ NATIVE "Select * from MyDb_Members" }

Resources