Spark SQL: INSERT INTO statement syntax

Spark SQL: INSERT INTO statement syntax - apache-spark

While reading the Datastax docs for supported syntax of Spark SQL, I noticed you can use INSERT statements like you would normally do:
INSERT INTO hello (someId,name) VALUES (1,"hello")
Testing this out in a Spark 2.0 (Python) environment and a connection to a Mysql database, throws the error:
File "/home/yawn/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
pyspark.sql.utils.ParseException:
u'\nmismatched input \'someId\' expecting {\'(\', \'SELECT\', \'FROM\', \'VALUES\', \'TABLE\', \'INSERT\', \'MAP\', \'REDUCE\'}(line 1, pos 19)\n\n== SQL ==\nINSERT INTO hello (someId,name) VALUES (1,"hello")\n-------------------^^^\n'
However if I remove the explicit column definition, it works as expected:
INSERT INTO hello VALUES (1,"hello")
Am I missing something?

Spark support hive syntax so if you want to insert row you can do as follows
insert into hello select t.* from (select 1, 'hello') t;

Related

Dask read_sql_query did not execute sql that I put in

Hi all I'm new to Dask.
I faced an error when I tried using read_sql_query to get data from Oracle database.
Here is my python script:
con_str = "oracle+cx_oracle://{UserID}:{Password}#{Domain}/?service_name={Servicename}"
sql= "
column_a, column_b
from
database.tablename
where
mydatetime >= to_date('1997-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')
"
from sqlalchemy.sql import select, text
from dask.dataframe import read_sql_query
sa_query= select(text(sql))
ddf = read_sql_query(sql=sa_query, con=con, index_col="index", head_rows=5)
I refered this post: Reading an SQL query into a Dask DataFrame
Remove "select" string from my query.
And I got an cx_Oracle.DatabaseError with missing expression [SQL: SELECT FROM DUAL WHERE ROWNUM <= 5]
But I don't get it where the query came from.
Seem like it didn't execute the sql code I provided.
I'm not sure which part I did not config right.
*Note: using pandas.read_sql is ok , only fail when using dask.dataframe.read_sql_query

Databricks- Spark SQL Update statement error

This is a pretty straightforward update statement that works on SQL Server DB and I have re-written it in Databricks which is not working, Can you provide your suggestions?
update
a
set
composite_account_key=nvl(e.account_key,0)
edw.account_fact a
join edw.account_dim b on (a.account_key=b.account_key)
join vw_account_hier c on (b.accountcode=c.accountcode)
join edw.analysis_codes_dim d on (d.anlys_code_dimkey=a.anlys_code_dimkey and c.atomic_anlys_appl_cde=d.anlys_appl_cde)
join vw_composite e on (c.edw_c_account_code=e.edw_c_account_code)
where
a.timekey='95'
ParseException:[PARSE_SYNTAX_ERROR] Syntax error at or near 'from'(line 5, pos 0)

The syntax of update statement in Databricks SQL does not support using from parameter.
You can create a temporary view from the result of all the join operations and use this view in the update statement directly instead.
The following is the demonstration of the same. I have the result of my join query as shown below:
When I try to use from parameter directly in update statement (update id value to 10 wherever it is 1 from join result), I get the same error.
So, I have created a view first and then used it in update query to get the result.
%sql
--CREATE TEMPORARY VIEW for_updt as (select a.id,a.gname,b.team from demo as a join demo1 as b on a.id=b.id );
update demo set id=10 where id in(select id from for_updt where) and (demo.id=1)

Azure Synapse Serverless SQL Pool - Error while selecting from quoted fields

I am getting an error while selecting from a CSV file that contains quoted fields using Serverless/ OnDemand SQL Pool in Azure Synapse. The data contains the field terminator (,) within the fields, but it's quoted with double quotes. I have even tried to specify the FIELDQUOTE explicitly even though I am using the default quote in the data.
My file contains the data as below
"number", "text"
1, "one"
2, "two"
11, "one, one"
12, "one, two"
The SQL that I ran is as below
SELECT
*
FROM
OPENROWSET(
BULK 'https://mydatalake.dfs.core.windows.net/data/test_quoted_fields.csv',
FORMAT = 'CSV',
        PARSER_VERSION = '2.0',
FIELDQUOTE = '"',
FIELDTERMINATOR = ',',
HEADER_ROW = TRUE
) AS [result]
And the error message is as below
Error handling external file: 'Quotes '' must be inside quoted fields at [byte: 10]. '. File/External table name: 'https://mydatalake.dfs.core.windows.net/data/test_quoted_fields.csv'
Please note that I am running the query using Serverless/ OnDemand SQL Pool.
Can someone help please? Thanks
Data in ADLS Portal looks as below
In Edit mode
In Preview mode

I so appears that having .csv in excel and after uploading to ADLS would add additional " " around quoted fields so as to be seen as a string. And it does seem to work well.

Spark Data frame search column starting with a string

I have a requirement to filter a data frame based on a condition that a column value should starts with a predefined string.
I am trying following:
val domainConfigJSON = sqlContext.read
.jdbc(url, "CONFIG", prop)
.select("DID", "CONF", "KEY").filter("key like 'config.*'")
And getting exception:
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:
You have an error in your SQL syntax; check the manual that
corresponds to your MariaDB server version for the right syntax to use
near 'KEY = 'config.*'' at line 1
Using spark: 1.6.1

You can use the startsWith function present in Column class.
myDataFrame.filter(col("columnName").startswith("PREFIX"))

I used the same function but I was getting errors then I checked what is the error?
actually, we need to use startsWith(literals: String) but the above function having lowercase startswith().
Ex : df.filter(col("ACCOUNT_NUMBER").startsWith("9"))

Turning a Comma Separated string into individual rows in Teradata

I read the post:
Turning a Comma Separated string into individual rows
And really like the solution:
SELECT A.OtherID,
Split.a.value('.', 'VARCHAR(100)') AS Data
FROM
( SELECT OtherID,
CAST ('<M>' + REPLACE(Data, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM Table1
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
But it did not work when I tried to apply the method in Teradata for a similar question. Here is the summarized error code:
select failed 3707: expected something between '.' and the 'value' keyword. So is the code only valid in SQL Server? Would anyone help me to make it work in Teradata or SAS SQL? Your help will be really appreciated!

This is SQL Server syntax.
In Teradata there's a table UDF named STRTOK_SPLIT_TO_TABLE,
e.g.
SELECT * FROM dbc.DatabasesV AS db
JOIN
(
SELECT token AS DatabaseName, tokennum
FROM TABLE (STRTOK_SPLIT_TO_TABLE(1, 'dbc,systemfe', ',')
RETURNS (outkey INTEGER,
tokennum INTEGER,
token VARCHAR(128) CHARACTER SET UNICODE)
) AS d
) AS dt
ON db.DatabaseName = dt.DatabaseName
ORDER BY tokennum;
Or see my answer to this similar question

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Spark SQL: INSERT INTO statement syntax - apache-spark

Spark support hive syntax so if you want to insert row you can do as follows insert into hello select t.* from (select 1, 'hello') t;

Related

Dask read_sql_query did not execute sql that I put in

Databricks- Spark SQL Update statement error

Azure Synapse Serverless SQL Pool - Error while selecting from quoted fields

Spark Data frame search column starting with a string

Turning a Comma Separated string into individual rows in Teradata

Categories

Resources