Copying data in and out of Snowflake via Azure Blob Storage - azure

I'm trying to copy into blob storage and then copy out of blob storage. The copy into works:
copy into 'azure://my_blob_url.blob.core.windows.net/some_folder/MyTable'
from (select *
from MyTable
where condition = 'true')
credentials = (azure_sas_token = 'my_token');
But the copy out fails:
copy into MyTable
from 'azure://my_blob_url.blob.core.windows.net/some_folder/MyTable'
credentials = (azure_sas_token = 'my_token');
the error is:
SQL Compilation error: Function 'EXTRACT' not supported within a COPY.
Weirdly enough, it worked once and hasn't worked since. I'm at a loss, nothing turns up details for this.
I know there's an approach I could take using stages, but I don't want to for a bunch of reasons and even when I try with stages the same error presents itself.
Edit:
The cluster key definition is:
cluster by (idLocal, year(_ts), month(_ts), substring(idGlobal, 0, 1));
where the idLocal and idGlobal are varchars and the _ts is a TIMESTAMPTZ

I think I've seen this before with a cluster key on the table (which I don't think is supported with COPY INTO). The EXTRACT function (shown in the error) being part of the CLUSTER BY on the table.
This is a bit of a hunch, but assuming this isn't occurring for all your tables, hoping it leads to investigation on the table configuration and perhaps that might help.

Alex can you try with a different function in the cluster key on your target table like date_trunc('day',_ts)?
thanks
Chris

Related

Databricks auto merge schema

Does anyone know how to resolve this error?
I have put the following before my merge, but it seems to not like it.
%sql set spark.databricks.delta.schema.autoMerge.enabled = true
Also, the reason for putting this in was because my notebook was failing on schema changes to a delta lake table. I have an additional column on one of the tables I am loading into. I thought that data bricks were able to auto-merge schema changes.
The code works fine in my environment. I'm using Databricks runtime 10.4
TL;DR: add a semicolon to the end of the separate SQL statements:
set spark.databricks.delta.schema.autoMerge.enabled = true;
The error is actually a more generic SQL error; the IllegalArgumentException is a clue - though not a very helpful one :)
I was able to reproduce your error:
set spark.databricks.delta.schema.autoMerge.enabled = true
INSERT INTO records SELECT * FROM students
gives: Error in SQL statement: IllegalArgumentException: spark.databricks.delta.schema.autoMerge.enabled should be boolean, but was true
and was able to fix it by adding a ; to the end of the first line:
set spark.databricks.delta.schema.autoMerge.enabled = true;
INSERT INTO records SELECT * FROM students
succeeds.
Alternatively you could run the set in a different cell.

AnalysisException: Operation not allowed: `CREATE TABLE LIKE` is not supported for Delta tables;

create table if not exists map_table like position_map_view;
While using this it is giving me operation not allowed error
As pointed in documentation, you need to use CREATE TABLE AS, just use LIMIT 0 in SELECT:
create table map_table as select * from position_map_view limit 0;
I didn't find an easy way of getting CREATE TABLE LIKE to work, but I've got a workaround. On DBR in Databricks you should be able to use SHALLOW CLONE to do something similar:
%sql
CREATE OR REPLACE TABLE $new_database.$new_table
SHALLOW CLONE $original_database.$original_table`;
You'll need to replace $templates manually.
Notes:
This has an added side-effect of preserving the table content in case you need it.
Ironically, creating empty table is much harder and involves manipulating show create table statement with custom code

Azure Data Factory - Azure SQL Managed Services incorrect Output column type

I have decided to try and use Azure Data Factory to replicate data from one SQL Managed Instance Database to another with some trimming of the data in the process.
I have set up two Datasets to each Database / Table imported the schema ok (these are duplicated so identical) created a dataflow with one as the source and updated the schema in the projection, added a simple AlterRow (column != 201) gave it the PK then I add the second dataset as the sink and for some reason in the mapping all the output columns are showing as 'string' but the input columns show correctly.
because of this the mapping fails as it thinks the input and output are not matching? I cant understand why both Schema's in the dataset show correctly and the projection in the dataflow for the source shows correctly but it thinks i am outputting to all string columns?
TIA
Here is an easy way to map a set of unknown incoming fields to a defined database table schema ... Add a Select transformation before your Sink. Paste this into the Script behind for the Select:
select(mapColumn(
each(match(true()))
),
skipDuplicateMapInputs: true,
skipDuplicateMapOutputs: true) ~> automap
Now, in your Sink, just leave schema drift and automapping on.

How to troubleshoot - Azure DataFactory - Copy Data Destination tables have not been properly configured

I'm setting up a SQL Azure Copy Data job using Data Factory. For my source I'm selecting the exact data that I want. For my destination I'm selecting use stored procedure. I cannot move forward from the table mapping page as it reports 'one or more destination tables have been been properly configured'. From what I can tell. Everything looks good as I can manually run the stored procedure from SQL without an issue.
I'm looking for troubleshooting advice on how to solve this problem as the portal doesn't appear to provide any more data then the error itself.
Additional but unrelated question: What is the benefit from me doing a copy job in data factory vs just having data factory call a stored procedure?
I've tried executing the stored procedure on via SQL. I discovered one problem with that as I had LastUpdatedDate in the TypeTable but it isnt actually an input value. After fixing that I'm able to execute the SP without issue.
Select Data from Source
SELECT
p.EmployeeNumber,
p.EmailName,
FROM PersonFeed AS p
Create table Type
CREATE TYPE [person].[PersonSummaryType] AS TABLE(
[EmployeeNumber] [int] NOT NULL,
[EmailName] [nvarchar](30) NULL
)
Create UserDefined Stored procedure
CREATE PROCEDURE spOverwritePersonSummary #PersonSummary [person].[PersonSummaryType] READONLY
AS
BEGIN
MERGE [person].[PersonSummary] [target]
USING #PersonSummary [source]
ON [target].EmployeeNumber = [source].EmployeeNumber
WHEN MATCHED THEN UPDATE SET
[target].EmployeeNumber = [source].EmployeeNumber,
[target].EmailName = [source].EmailName,
[target].LastUpdatedDate = GETUTCDATE()
WHEN NOT MATCHED THEN INSERT (
EmployeeNumber,
EmailName,
LastUpdatedDate)
VALUES(
[source].EmployeeNumber,
[source].EmailName,
GETUTCDATE());
END
Datafactory UI when setting destination on the stored procedure reports "one or more destination tables have been been properly configured"
I believe the UI is broken when using the Copy Data. I was able to map directly to a table to get the copy job created then manually edit the JSON and everything worked fine. Perhaps the UI is new and that explains why all the support docs only refer only to the json? After playing with this more it looks like the UI sees the table type as schema.type, but it drops the schema for some reason. A simple edit in the JSON file corrects it.

memsql does not support temporary table or table variable?

Tried to create temp table in Memsql:
Create temporary table ppl_in_grp as
select pid from h_groupings where dt= '2014-10-05' and location = 'Seattle'
Got this error: Feature 'TEMPORARY tables' is not supported by MemSQL.
Is there any equivalence I can use instead? Thanks!
temp tables are definitely on the roadmap. For now, with MemSQL 4 you can create a regular table and clean it up at the end of your session, or use subqueries.

Resources