I created a table like this and inserted some data
spark.sql(s"create table if not exists test " +
"(key String," +
"name String," +
"address String," +
"inserted_at TIMESTAMP) " +
s" using delta LOCATION 's3://test/user/'")
I can view the table via
spark.table("test").show()
But when I do
DeltaTable.forPath(spark,"s3://test/user/" ).toDF.show(false)
I cannot see the data. But when i try this method
DeltaTable.isDeltaTable("s3://test/user/")
it is true. Can anyone please explain what I am missing?
Further, when I want to do a merge operation, I am getting this error.
[error] !
[error] java.lang.UnsupportedOperationException: null (DeltaTable.scala:639)
[error] io.delta.tables.DeltaTable$.forPath(DeltaTable.scala:639)
Related
The following code on Azure Synapse Serverless SQL Pool gives the following error:
Incorrect syntax near 'DISTRIBUTION'.
SELECT CM.EntityName,
--Before the first column of each table, construct a DROP TABLE statement if already exist
CASE WHEN CM.OrdinalPosition = 1
THEN
'DROP EXTERNAL TABLE MyTable' + '.' +
QUOTENAME(#EnrichedViewSchema) + '.' + CM.EntityName + '
CREATE TABLE MyTable' + '.' +
QUOTENAME(#EnrichedViewSchema) + '.' + CM.EntityName + '
WITH
(
DISTRIBUTION = ROUND_ROBIN
);
AS
SELECT DISTINCT '
ELSE ' ,'
END
Can someone look at the code and let me know where I might going wrong?
Azure Synapse SQL Server Pool Error: Incorrect syntax near 'DISTRIBUTION'
CREATE TABLE MyTable' + '.' +
QUOTENAME(#EnrichedViewSchema) + '.' +
CM.EntityName + '
WITH
(
DISTRIBUTION = ROUND_ROBIN
)
Serverless SQL pool is used to query over the data lake, and we cannot create tables in it. We can create external tables and temporary tables only in serverless SQL pool.
Also, Distribution is applicable only for dedicated SQL pool tables.
Therefore, above SQL script is not possible.
Reference: screenshot from Microsoft document Design tables using Synapse SQL - Azure Synapse Analytics | Microsoft Learn
there is an additional semicolon before AS in your script.
Wrong:
CREATE TABLE XXX WITH(DISTRIBUTION=ROUND_ROBIN); AS SELECT
Correct:
CREATE TABLE XXX WITH(DISTRIBUTION=ROUND_ROBIN) AS SELECT
I am have two table 1 is with 50K records and other is with 2.5K records and I want to update this 2.5K records into table one. Currently I was doing this by using INSERT OVERWRITE statement in spark Mapr cluster. And I want to do same in azure databricks. Where I created 2 tables in databricks and read data from on-prem servers into azure then using INSERT OVERWRITE statement. But I was doing this my previus/History data was completely replaced with new data.
In Mapr cluster
src_df_name.write.mode("overwrite").format("hive").saveAsTable(s"cs_hen_mbr_stg") //stage table with 2.5K records.
spark.sql(s"INSERT OVERWRITE TABLE cs_hen_mbr_hist " +
s"SELECT NAMED_STRUCT('INID',stg.INID,'SEG_NBR',stg.SEG_NBR,'SRC_ID',stg.SRC_ID, "+
s"'SYS_RULE',stg.SYS_RULE,'MSG_ID',stg.MSG_ID, " +
s"'TRE_KEY',stg.TRE_KEY,'PRO_KEY',stg.PRO_KEY, " +
s"'INS_DATE',stg.INS_DATE,'UPDATE_DATE',stg.UPDATE_DATE,'STATUS_KEY',stg.STATUS_KEY) AS key, "+
s"stg.MEM_KEY,stg.INDV_ID,stg.MBR_ID,stg.SEGO_ID,stg.EMAIL, " +
s"from cs_hen_mbr_stg stg" )
By doing above in mapr cluster I was able to update values.But i was trying same in azure databricks My history data is getting lost.
In Databriks
val VW_HISTORY_MAIN=spark.read.format("parquet").option("header","true").load(s"${SourcePath}/VW_HISTORY")
VW_HISTORY_MAIN.write.mode("overwrite").format("hive").saveAsTable(s"demo.cs_hen_mbr_stg") //writing this to table in databricks.
spark.sql(s"INSERT OVERWRITE TABLE cs_hen_mbr_hist " +
s"SELECT NAMED_STRUCT('INID',stg.INID,'SEG_NBR',stg.SEG_NBR,'SRC_ID',stg.SRC_ID, "+
s"'SYS_RULE',stg.SYS_RULE,'MSG_ID',stg.MSG_ID, " +
s"'TRE_KEY',stg.TRE_KEY,'PRO_KEY',stg.PRO_KEY, " +
s"'INS_DATE',stg.INS_DATE,'UPDATE_DATE',stg.UPDATE_DATE,'STATUS_KEY',stg.STATUS_KEY) AS key, "+
s"stg.MEM_KEY,stg.INDV_ID,stg.MBR_ID,stg.SEGO_ID,stg.EMAIL, " +
s"from cs_hen_mbr_stg stg" )
Why it is not working with databricks?
I am trying to connect DB2/IDAA using ADFV2 - while executing simple query "select * from table" - I am getting below error:
Operation on target Copy data from IDAA failed: An error has occurred on the Source side. 'Type = Microsoft.HostIntegration.DrdaClient.DrdaException, Message = Exception or type' Microsoft.HostIntegration.Drda.Common.DrdaException 'was thrown. SQLSTATE = HY000 SQLCODE = -343, Source = Microsoft.HostIntegration.Drda.Requester, '
I checked a lot and tried various options but still it's an issue.
I tried query "select * from table with ur" - query to call with read-only but still get above result.
If I use query like select * from table; commit; - then activity succeeded but no record fetch.
Is anyone have solution ?
I have my linked service setup like this. additional connection properties value is : SET CURRENT QUERY ACCELERATION = ALL
I have a SQL watermark table which contains the last date in my destination table
My source data is coming from an Azure Storage Table and the date time is a string
I set up the date time in the watermark table to match the format in the Azure table storage
I create a lookup and a copy task
If I hard code the date into the Query for source and run this works fine CreatedAt ge '2019-03-06T14:03:11.000Z'
But obviously I dont want to hard code this value. I want to use the date from the lookup
But when I replace the hardcoded date with the lookup value
CreatedAt ge 'activity('LookupWatermarkOld').output'
I get an error
{
"errorCode": "2200",
"message":"ErrorCode=FailedStorageOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
storage operation failed with the following error 'The remote server returned an error: (400) Bad Request.'.,Source=,
''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (400) Bad Request.,
Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=Syntax
error at position 42 in 'CreatedAt ge 'activity('LookupWatermarkOld').output''.\nRequestId:8c65ced9-b002-0051-79d9-d41d49000000\nTime:2019-03-07T11:35:39.0640233Z,,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,'",
"failureType": "UserError",
"target": "CopyMentions"
}
Can anyone help me with this? How do you use the Lookup value in a Azure Table query?
check this out:
1) Lookup activity. Query field:
SELECT MAX(WatermarkColumnName) as LastId FROM TableName;
Also, make sure that you checked "First row only" option.
2) In Copy Data activity use query. Query field:
#concat('SELECT * FROM TableName as s WHERE s.WatermarkColumnName > ''', activity('LookupActivity').output.firstRow.LastID, '''')
Finally I got some help on this and it works with
CreatedAt gt '#{activity('LookupWatermarkOld').output.firstRow.WaterMarkValue}'
the WaterarkValue is the column name from the SQL Lookup table
The Lookup creates an array so you have to specify the FirstRow from this array
And wrap in '' so its used as a string value
--For recent ADFv2
Use the watermark/lookup/output value in parameter.
Example: ParamUserCount = #{activity('LookupActivity').output.count}
or for output function
and you can use it in query as
Example: "select * from userDetails where usercount = {$ParamUserCount}"
make sure you enclose the query in " " to set as string and parameter in query should be enclosed in { }
I am creating a DataFrame and registering that DataFrame as temp view using df.createOrReplaceTempView('mytable'). After that I try to write the content from 'mytable' into Hive table(It has partition) using the following query
insert overwrite table
myhivedb.myhivetable
partition(testdate) // ( 1) : Note here : I have a partition named 'testdate'
select
Field1,
Field2,
...
TestDate //(2) : Note here : I have a field named 'TestDate' ; Both (1) & (2) have the same name
from
mytable
when I execute this query, I am getting the following error
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.Table$ValidationFailureSemanticException: Partition spec
{testdate=, TestDate=2013-01-01}
Looks like I am getting this error because of the same field names ; ie testdate(the partition in Hive) & TestDate (The field in temp table 'mytable')
Whereas if my partition name testdate is different from the fieldname(ie TestDate), the query executes successuflly. Example...
insert overwrite table
myhivedb.myhivetable
partition(my_partition) //Note here the partition name is not 'testdate'
select
Field1,
Field2,
...
TestDate
from
mytable
My guess is it looks like a Bug in Spark...but would like to have second opinion...Am I missing something here?
#DuduMarkovitz #dhee ; apologies for being too late for the response. I am finally able to resolve the issue. Earlier I was creating the table using cameCase(in the CREATE statement) which seems to be the reason for the Exception. Now i have created the table using the DDL where field names are in lower case. This has resolved my issue