The purpose of having TBLPROPERTIES in create table - databricks

What is the purpose of using TBLPROPERTIES("quality" = "silver") while creating table using CREATE STREAMING LIVE TABLE... syntax. Is it just to tag the table that it is a silver table or does it drive anything else during data processing

yes, similar properties are just for tagging the table, so if necessary you can quickly figure out which tables are belong to which data quality level. This is especially useful right now when all tables are registered in the same database/schema. And you can add more tags if necessary, for example, to identify subprojects, data sources, etc.

Related

Data model tool to connect to Databricks or Data lake

From data modeling documentation (Dimensional/ ER Diagram) is there any tool available which can connect to databricks/ data lake and read the table structure directly and also updates the structure of table whenever there is a addition or deletions of columns in a table?
And in a process, it should not remove the relationship made between tables whenever there is an update to a columns and/ or tables (addition/ deletion). And version control on same will be helpful using GIT etc.
Reason being I understand the PK and FK details are not maintained in datalake/ databricks tables entities. Request to please propose if any modeling tools are present for this use case.

How do we handle alter table column datatypes in Cassandra, in actual scenarios?

I'm aware of the restrictions Cassandra has on modifying column data types once table is created or even dropping and adding column with same name of different data type.
Dropping and Adding is allowed with restrictions.
But, if we talk about actual scenarios, it's not that uncommon to modify table schema during initial phase of our project.
Example: modifying Name column of User table from TEXT to a UDT(User Defined Type) that could encapsulate more information.
Coming from a RDBMS background, this is a very strange behaviour and maybe someone with actual project experience on Cassandra can answer it.
How do we handle such scenario of modifying column datatypes ? And what are the best practices.
Also, is this a common behaviour with other NoSQL or columnar databases ?

Spotfire for cross fact table joins using conformed dimensions

A few years I ascertained that Spotfire cannot perform multi-fact table queries using conform dimensions a la Ralph Kimball - like Tableau in which this is still the case.
Is this still so? Most people I speak to are not aware of this. I am not in a position to quickly assess this, hence my question.
If you are reading from a DB, you can create custom information links using SQL (or what Spotfire calls SQL, its a little different) that can certainly join multiple fact tables together through conforming dimensions. These may perform well or poorly depending on the amount of data and structure of the tables in question.
You can also 'join' fact tables across dimensions (or directly to each other if you have the right keys) within the tool itself. These are called relations and work under the same principles, but don't kick off joined SQL statements.
If you create a view in the DB that does the joins as you have said, Spotfire can read those as well into an information link.

Cassandra Schema Design - Handling Merging of Similar but Differing Source Data Sets

I'm working on a project to merge data from multiple database tables and files into cassandra. This will come from different sources such as flat files, sql db's, etc.
Problem Statement: Most of these source files are similar, however, there are some differences and I want to merge each of these into a single cassandra table. There are about 50 similar fields and an extra 20 fields that don't coexist. My thought is that I can merge them all and just add all of the fields and leave them as tombstones if not populated. The other option would be to merge the same fields into cassandra and then for the fields that are different to add a map column; however, I don't know if there is really any benefit in doing this other than looking nicer.
Any ideas/advice from people who have dealt with this?
What you need is an ETL tool (Extract/Transform/Load) to combine, clean and or standardize the data, and use Cassandra as your repository. There are multiple tools in the market that can provide you this functionality, (a google search for "ETL tools" can give you an overwhelming amount of resources to choose from).
As a personal preference check https://nifi.apache.org/ , you can define those transformations and filtering as workflows

PouchDB structure

i am new with nosql concept, so when i start to learn PouchDB, i found this conversion chart. My confusion is, how PouchDB handle if lets say i have multiple table, does it mean that i need to create multiple databases? Because from my understanding in pouchdb a database can store a lot of documents, but a document mean a row in sql or am i misunderstood?
The answer to this question seems to be surprisingly under-documented. While #llabball clearly gave a decent answer, I don't think that views are always the way to go.
As you can read here in the section When not to use map/reduce, Nolan explains that for simpler applications, the key is to abuse _ids, and leverage the power of allDocs().
In other words, if you had two separate types (say artists, and albums), then you could prefix the id of each type to obtain an easily searchable data set. For example _id: 'artist_name' & _id: 'album_title', would allow you to easily retrieve artists in name order.
Laying out the data this way will result in better performance due to not requiring extra indexes, and less code. Clearly however, if your data requirements are more complex, then views are the way to go.
... does it mean that i need to create multiple databases?
No.
... a document mean a row in sql or am i misunderstood?
That's right. The SQL table defines column header (name and type) - that are the JSON property names of the doc.
So, all docs (rows) with the same properties (a so called "schema") are the equivalent of your SQL table. You can have as much different schemata in one database as you want (visit json-schema.org for some inspiration).
How to request them separately? Create CouchDB views! You can get all/some "rows" of your tabular data (docs with the same schema) with one request as you know it from SQL.
To write such views easily the property type is very common for CouchDB docs. Your known name from a SQL table can be your type like doc.type: "animal"
Your view names will be maybe animalByName or animalByWeight. Depends on your needs.
Sometimes multiple-databases plan is a good option, like a database per user or even a database per user-feature. Take a look at this conversation on CouchDB mailing list.

Resources