Databricks Drop Rollback - databricks

Is there a way to undo drop table statemnt in Databricks. I know for delete there is time travel/restore option, but I am specifically looking for drop statemnt. Please help.

DROP TABLE removes data only when you have managed table - when you created it without explicit specification of location. To prevent dropping of the data, create a table as unmanaged - even if you drop the table, it will remove only table definition, but not data, so you can always re-create the table using the data (it's not limited to Delta, you can use other formats as well):
for SQL - specify path to data using LOCATION:
CREATE TABLE name
USING delta
LOCATION '<path-to-data>'
when using APIs (Scala/Python/R/Java) - provide the path option:
df.write.format("delta") \
.option("path", "path-to-data") \
.saveAsTable("table-name")

Related

AnalysisException: Operation not allowed: `CREATE TABLE LIKE` is not supported for Delta tables;

create table if not exists map_table like position_map_view;
While using this it is giving me operation not allowed error
As pointed in documentation, you need to use CREATE TABLE AS, just use LIMIT 0 in SELECT:
create table map_table as select * from position_map_view limit 0;
I didn't find an easy way of getting CREATE TABLE LIKE to work, but I've got a workaround. On DBR in Databricks you should be able to use SHALLOW CLONE to do something similar:
%sql
CREATE OR REPLACE TABLE $new_database.$new_table
SHALLOW CLONE $original_database.$original_table`;
You'll need to replace $templates manually.
Notes:
This has an added side-effect of preserving the table content in case you need it.
Ironically, creating empty table is much harder and involves manipulating show create table statement with custom code

Azure Data Factory - Azure SQL Managed Services incorrect Output column type

I have decided to try and use Azure Data Factory to replicate data from one SQL Managed Instance Database to another with some trimming of the data in the process.
I have set up two Datasets to each Database / Table imported the schema ok (these are duplicated so identical) created a dataflow with one as the source and updated the schema in the projection, added a simple AlterRow (column != 201) gave it the PK then I add the second dataset as the sink and for some reason in the mapping all the output columns are showing as 'string' but the input columns show correctly.
because of this the mapping fails as it thinks the input and output are not matching? I cant understand why both Schema's in the dataset show correctly and the projection in the dataflow for the source shows correctly but it thinks i am outputting to all string columns?
TIA
Here is an easy way to map a set of unknown incoming fields to a defined database table schema ... Add a Select transformation before your Sink. Paste this into the Script behind for the Select:
select(mapColumn(
each(match(true()))
),
skipDuplicateMapInputs: true,
skipDuplicateMapOutputs: true) ~> automap
Now, in your Sink, just leave schema drift and automapping on.

How to troubleshoot - Azure DataFactory - Copy Data Destination tables have not been properly configured

I'm setting up a SQL Azure Copy Data job using Data Factory. For my source I'm selecting the exact data that I want. For my destination I'm selecting use stored procedure. I cannot move forward from the table mapping page as it reports 'one or more destination tables have been been properly configured'. From what I can tell. Everything looks good as I can manually run the stored procedure from SQL without an issue.
I'm looking for troubleshooting advice on how to solve this problem as the portal doesn't appear to provide any more data then the error itself.
Additional but unrelated question: What is the benefit from me doing a copy job in data factory vs just having data factory call a stored procedure?
I've tried executing the stored procedure on via SQL. I discovered one problem with that as I had LastUpdatedDate in the TypeTable but it isnt actually an input value. After fixing that I'm able to execute the SP without issue.
Select Data from Source
SELECT
p.EmployeeNumber,
p.EmailName,
FROM PersonFeed AS p
Create table Type
CREATE TYPE [person].[PersonSummaryType] AS TABLE(
[EmployeeNumber] [int] NOT NULL,
[EmailName] [nvarchar](30) NULL
)
Create UserDefined Stored procedure
CREATE PROCEDURE spOverwritePersonSummary #PersonSummary [person].[PersonSummaryType] READONLY
AS
BEGIN
MERGE [person].[PersonSummary] [target]
USING #PersonSummary [source]
ON [target].EmployeeNumber = [source].EmployeeNumber
WHEN MATCHED THEN UPDATE SET
[target].EmployeeNumber = [source].EmployeeNumber,
[target].EmailName = [source].EmailName,
[target].LastUpdatedDate = GETUTCDATE()
WHEN NOT MATCHED THEN INSERT (
EmployeeNumber,
EmailName,
LastUpdatedDate)
VALUES(
[source].EmployeeNumber,
[source].EmailName,
GETUTCDATE());
END
Datafactory UI when setting destination on the stored procedure reports "one or more destination tables have been been properly configured"
I believe the UI is broken when using the Copy Data. I was able to map directly to a table to get the copy job created then manually edit the JSON and everything worked fine. Perhaps the UI is new and that explains why all the support docs only refer only to the json? After playing with this more it looks like the UI sees the table type as schema.type, but it drops the schema for some reason. A simple edit in the JSON file corrects it.

What is the difference between dynamic.partition=True and dynamic.partition.mode = nonstrict?

Spark 2.0 - pyspark
I seen the following 2 properties paired. What is the difference between them?
hive> SET hive.exec.dynamic.partition=true;
hive> SET hive.exec.dynamic.partition.mode=non-strict;
I know what the outcome is when they are used - you can use dynamic partitioning to load/create multiple partitions, but I don't know the difference between these two similar commands.
When I was running this code
input_field_names=['id','code','num']
df \
.select(input_field_names) \
.write \
.mode('append')\
.insertInto('test_insert_into_partition')
I got an error message that says Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
Using spark.sql("SET hive.exec.dynamic.partition.mode = nonstrict") the code works. It doesn't require me to use the other one.
Why don't I need to set SET hive.exec.dynamic.partition=true; and what else should I know to choose which one to use.
Although there is much to google, here is a short answer.
If you want to insert dynamically into Hive partitions both values need to be set and you can then load many partitions in one go:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict
create table tblename (h string,m string,mv double,country string)partitioned by (starttime string) location '/.../...'
INSERT overwrite table tblename PARTITION(starttime) SELECT h,m,mv,country,starttime from tblename2
Otherwise you need to do like this, setting the partition col val yourself / explicity:
INSERT into table tblename PARTITION(starttime='2017-08-09') SELECT h,m,mv,country from tblname2 where to_date(starttime)='2017-08-09'
The purpose of default value of 'strict' for
hive.exec.dynamic.partition.mode is there to prevent a user from
accidentally overwriting all the partitions, i.e. to avoid data loss.
So, there is not a situation of difference, rather a situation of caution, a but like the safety catch on a firearm, as it were.

memsql does not support temporary table or table variable?

Tried to create temp table in Memsql:
Create temporary table ppl_in_grp as
select pid from h_groupings where dt= '2014-10-05' and location = 'Seattle'
Got this error: Feature 'TEMPORARY tables' is not supported by MemSQL.
Is there any equivalence I can use instead? Thanks!
temp tables are definitely on the roadmap. For now, with MemSQL 4 you can create a regular table and clean it up at the end of your session, or use subqueries.

Resources