Tablename with spaces at JDBC connection gives error - apache-spark

I'm trying to establish a connection in AWS Glue, using a pyspark script.
The JDBC connection is pointing to a Microsoft SQL Server in Azure Cloud.
When I try to enter the connection string, it works until it gets to the table that it should read. That's mainly because of the whitespace inside the table name. Do you have any hint on how to write the syntax here?
source_df = sparksession.read.format("jdbc").option("url","jdbc:sqlserver://00.000.00.00:1433;databaseName=Sample").option("dbtable", "dbo.122 SampleCompany DE$Contract Header").option("user", "sampleuser").option("password", "sampL3p4ssw0rd").load()
When you execute this, it always throws the error:
py4j.protocol.Py4JJavaError: An error occurred while calling o69.load. : com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '.122'
Do you have any idea how to solve this?

Given the presence of spaces (and probably the dollar sign, and the fact the identifier starts with numbers), you need to quote the object name. Quoting object names in SQL Server is done by enclosing it in brackets (or, though this may depend on the session config, double quotes).
Keep in mind that dbo is the schema, while 122 SampleCompany DE$Contract Header is the table name. Schema and table name need to be quoted separately, not as a unit.
So, try to pass "dbo.[122 SampleCompany DE$Contract Header]"

Related

Malformed SQL Statement: Expected token 'USING' but found Identifier with value 't' instead

I am trying to merge to a SQL Database using the following code in Databricks with pyspark
query = """
MERGE INTO deltadf t
USING df s
ON s.SLAId_Id = t.SLAId_Id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
"""
driver_manager = spark._sc._gateway.jvm.java.sql.DriverManager
con = driver_manager.getConnection(url) #
stmt = con.createStatement()
stmt.executeUpdate(query)
stmt.close()
But I'm getting the following error:
SQLException: Malformed SQL Statement: Expected token 'USING' but found Identifier with value 't' instead at position 25.
Any thoughts on where might be going wrong?
I don't know why you're getting this exact error. However I believe there are a number of issues with what you are trying to do.
Running the query via JDBC makes it run in SQL Server only. Construct like WHEN MATCHED THEN UPDATE SET * / WHEN NOT MATCHED INSERT * will not work. Databricks accepts it, but for SQL Server you need to explicitly provide columns to update and values to insert (reference).
Also, do you actually have tables named deltadf and df in SQL Server? I suppose you have a Dataframe or temporary view named df... this will not work. As said, this query executes in SQL Server only. If you want to upload data from Dataframe use df.write.format("jdbc").save (reference).
See this Fiddle - if deltadf and df are tables, running this query in SQL Server (any version) will only complain about Incorrect syntax near '*'.
SQLException: Malformed SQL Statement: Expected token 'USING' but found Identifier with value 't' instead at position 25.
if you missed updating any specific field or specific syntax, you will get this error.
I performed merge operation its working fine for me without error, Please follow below reference .
Reference:
https://www.youtube.com/watch?v=i5oM2bUyH0o
https://docs.databricks.com/delta/delta-update.html#upsert-into-a-table-using-merge
https://www.sqlshack.com/sql-server-merge-statement-overview-and-examples/

Snowflake Python connector insert doesn't accept variables

Using snowflake connector I am trying to insert a record in a table.
In snowflake doc they have shown examples with hard coded strings, but when I try to use my variables instead, it doesn't work. Please suggest how to use variables in this case.
conn.cursor().execute(
"INSERT INTO cm.crawling_metrics(FEED_DATE,COMP_NAME,REFRESH_TYPE,CRAWL_INPUT,CRAWL_SUCCESS) VALUES " +
"(score_creation_date,compName,sRefreshType,mp_sku_count,comp_sku_count)"
I get the below error
snowflake.connector.errors.ProgrammingError: 000904 (42000): SQL compilation error: error line 1 at position 100
invalid identifier 'SCORE_CREATION_DATE'
NOTE: In the above code if i hard code with String instead of variables, it works.
Kindly suggest what is the right way ?
You need to use string interpolation / formatting for your code to use these as actual variables:
conn.cursor().execute(
"INSERT INTO cm.crawling_metrics (FEED_DATE, COMP_NAME, REFRESH_TYPE, CRAWL_INPUT, CRAWL_SUCCESS) VALUES " +
f"('{score_creation_date}', '{compName}', '{sRefreshType}', '{mp_sku_count}', '{comp_sku_count}')"
)

Can we run complex multi line SQL queries using Blueprism?

I am new to SQL stuff in blueprism, I am able to configure SQL object and execute simple queries, but I am facing trouble while trying to run multiline complex SQL queries.
when I was trying to execute the below query in blueprism, getting some error message, saying "Incorrect Syntax near Database2"
"select top 10 * from [Database1].[dbo].[Table1]
join [Database1].[dbo].[Table2] on [Database1].[dbo].[Table2].Fieldname1=[Database1].[dbo].[Table1].Fieldname2
join [Database2].[dbo].[Table1] on [Database2].[dbo].[Table1].Fieldname1=[Database1].[dbo].[Table2].Fieldname2"
Can somebody please help me, what was the wrong in the above query...
I found the answer myself, there should not be any additional white space characters in the query, entire query should be in continuous line. The beauty of blueprism is, it can execute any level of complex queries without any constraints, but need to modify the syntax accordingly. always we should mention the filename and table names in the following format - [databasename].[dbo].[tablename].[fieldname]

What am I missing in trying to pass Variables in an SSIS Execute SQL Task?

I am creating an SSIS Execute SQL Task that will use variables but it is giving me an error when I try to use it. When I try to run the below, it gives me an error and when I try to build the query, it gives me an error SQL Sytnax Errors encountered and unable to parse query. I am using an OLEDB connection. Am I not able to use variables to specify the tables?
You can't parameterize a table name.
Use the Expressions editor in your Execute SQL Task to Select a SqlStatementSource property.
Try "SELECT * FROM " + #[User::TableName]
After clicking OK twice (to exit the Task editor), you should be able to reopen the editor and find your table name in the SQL statement.
Add a string cast in a case where it might be a simple Object - (DT_WSTR,100)
You are using only single parameter(?) in the query and assigning 3 inputs to that parameters which is not fair put only single input and assign some variable as input as shown in image and change the value of variable respectively.
the parameter name should be incremented by 1 start with 0 because they are the indexes representing the "?" in the query which was written the query window.

ERROR for load files in HBase at Azure with ImportTsv

Trying to load tsv file in HBase running in HDInsight in Microsoft Azure cloud using a recommended approach connecting through Remote Desktop and running on the command line trying to load t1.tsv file (with two tab separated columns) from hdfs into hbase t1 table:
C:\apps\dist\hbase-0.98.0.2.1.5.0-2057-hadoop2\bin>hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,num t1 t1.tsv
and get:
ERROR: One or more columns in addition to the row key and timestamp(optional) are required
Usage: importtsv -Dimporttsv.columns=a,b,c
replacing order of the specified columns to num,HBASE_ROW_KEY
C:\apps\dist\hbase-0.98.0.2.1.5.0-2057-hadoop2\bin>hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=num,HBASE_ROW_KEY t1 t1.tsv
I get:
ERROR: Must specify exactly one column as HBASE_ROW_KEY
Usage: importtsv -Dimporttsv.columns=a,b,c
This tells me that comma separator in the column list is not recognized or column name is incorrect I also tried to use column with qualifier as num:v and as 'num' - nothing helps
Any ideas what could be wrong here? Thanks.
>hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,d:c1,d:c2" testtable /example/inputfile.txt
This works for me. I think there are some differences between terminals in Linux and Windows, thus in windows you need to add quotation marks to clarify this is a value string, otherwise might not be recognized.

Resources