How to use SCD type 1 using Delta Live Table - databricks

I am looking for real time example for applying SCD type 1 using Delta Live Table.
I tried the reference from the official document from DLT but not able to get the exact answer.

Related

Delta Lake: How to merge with both SCD type 2 and automatic schema evolution enabled?

The Delta Lake documentation states that to use automatic schema evolution, one has to stick with updateAll() and insertAll() methods when using Delta merge i.e. can't use sub-expressions/conditions to change column values selectively.
https://docs.delta.io/latest/delta-update.html#automatic-schema-evolution
This is fine until I see the need to run SCD type 2 merges on the same table.
For SCD type 2 I want to be able to 'retire' an existing / matching row and add another one with appropriate flags / dates.
It seems I'm going to have to choose which feature I can have in my etl process using Delta Lake.
Is there an alternative approach which isn't documented? Am I missing something obvious here?

structure in getMetadata activity for csv file dataset show string datatypes for integer column in azure data factory

I want to do validation as first step before proceeding further in pipeline execute.
I am fetching metadata activity for my dataset and then checking it against a predefined schema in if condition.
Metadata for csv files show column type string even for integer which is breaking the validation.
Get Metadata doesn't support it, all the data type is considered as string in csv files.
You have posted a question on Microsoft forums here: https://learn.microsoft.com/en-us/answers/questions/44635/structure-in-getmetadata-activity-for-csv-file-dat.html, but Microsoft MSFT confirmed that: Using getMetadata on a csv file will give all strings.
The link he provided doesn't work for the column type.
I think that's a by-design problem and has no workaround now. And per my experience, the structure only works well for database.
The best way for you it that ask Azure Support for more details. Or post a new Data Factory feedback here: https://feedback.azure.com/forums/270578-data-factory. Hope the Data Factory Product team will see it and give us some guides.

How to Partition Database Table in Azure Data Explorer?

I started exploring ADX a few days back. I imported my data from Azure SQL to ADX using ADF pipeline but when I query those data, it is taking a long time. To find out some workaround I researched for Table Data Partitioning and I am much clear on partition types and tricks.
The problem is, I couldn't find any sample (Kusto Syntax) that guide me to define Paritionging on ADX Database Tables. Can anyone please help me with this syntax?
partition operator is probably what you are looking for:
T | partition by Col1 ( top 10 by MaxValue )
T | partition by Col1 { U | where Col2=toscalar(Col1) }
ADX doesn't currently have the notion of partitioning a table, though it may be added in the future.
that said, with the lack of technical information currently provided, it's somewhat challenging to understand how you got to the conclusion that partitioning your table is required and is the appropriate solution, as opposed to other (many) directions that ADX does allow you pursue.
if you would be willing to detail what actions you're performing, the characteristics of your data & schema, and which parts are performing slower than expected, that may help in providing you a more meaningful and helpful answer.
[if you aren't keen on exposing that information publicly, it's ok to open a support ticket with these details (through the Azure portal)]
(update: the functionality is available for a while now. read more # https://yonileibowitz.github.io/blog-posts/data-partitioning.html)

Cassandra store data in BLOB

We are using Cassandra 3 and have come up with a modelling based on the initial requirements. Since there have been very frequent requirements changes, this model has subsequently changed many times as well. Hence considering these requirements and model changes, there has been no major improvement in terms of development. The team have decided to go with the BLOB data type and store the entire data in the BLOB. Can you please share the drawback to use BLOB such a scenario. Thanks in Advance.
We migrated from Astyanax Cassandra 1.1 to CQL Cassandra 3.0 directly, so we still have a lot of column families which have value as BLOB.
Major issues we face right now are:
1) Difficult to visualize data directly from database: Biggest advantage of CQL is it supports SQL like queries, hence logging into cql terminal and getting results directly from there is saves a lot of time normally. If you use BLOB you will not be able to do all such things.
2) CQL performs better when your table has a well defined schema instead of using blob to store big chunk of data together.
If you are creating a new table, I will suggest to use Collections for your use case. You will be able to store different type of data and performance will also be good.
Nice slides comparing performance of schemaless tables and tables with scehma and collections. You can skip to slide 26 if you just want the summary.
https://www.slideshare.net/DataStax/migration-from-thrift-to-cql-brij-bhushan-ravat-ericsson-cassandra-summit-2016

Is it possible to insert/write data without defining columns in Cassandra?

I am trying to understand the fundamentals of Cassandra data model. I am using CQL. As per I know the schema must be defined before anyone can insert into new columns. If someone needs to add any column can use ALTER TABLE and can INSERT value to that new column.
But in cassandra definitive guide there is written that Cassandra is schema less.
In Cassandra, you don’t define the columns up front; you just define the column
families you want in the keyspace, and then you can start writing data without defining
the columns anywhere. That’s because in Cassandra, all of a column’s names are
supplied by the client.
I am getting confused and not finding any expected answer. Can someone please explain it to me or tell me if I am missing somthing?
Thanks in advance.
Theres two different APIs to interact with Cassandra for writing data. First there's the thrift API which always allowed to create columns dynamically, but also supports adding meta data for your columns.
Next theres the newer CQL based API. CQL was created to provide another abstraction layer that would make it more user friendly to work with Cassandra. With CQL you're required to define a schema upfront for your column names and datatypes. However, that doesn't mean its not possible to use dynamic columns using CQL.
See here for differences:
http://www.datastax.com/dev/blog/thrift-to-cql3
You are reading "Cassandra, the definitive guide": a 3/4 years old book that is telling you something that has changed long time ago. Now you have to define the tables structure before being able to write data.
Here you can find some reasons behind CQL introduction and the schema-less abandonment.
The official Datastax documentation should be your definitive guide.
HTH,
Carlo

Resources