Why Cassndra is called unstructured database - cassandra

Why Cassandra is called unstructured even though table/column family has to be defined with columns and their data type.
For the defined table with some fixed columns we can choose to fill some columns in one particular row and choose not to fill in other row. But same thing can be done in RDBMS where we can leave some columns in the insert statement and the columns left out should allow null?
As mongo store the data in json documents where we can store different (keys) data every time insert new document. we don't need to define anything . But for cassandra we need to reconfigure our table to accommodate new columns getting added.
Even though some articles are present but still its not clear to me. Can someone pin point the reason.

Basically is not about "how it works", is how the files are stored, this is why cassandra have not structure for the files, you can have a same récords in diferents folders.

Related

Excel - List of key values created from external files in power query, trouble with editing mapped values

I am attempting to create a standardized list of names for a long list of free typed values in a list of csv's pulled from Jira.
What I have tried so far has been to use Get Data -> From File -> From Folder
And then narrow it down to just the column I need and then remove all duplicate rows.
After loading that, I have tried adding a column that's just an empty string. I have done this both in power query and in the data model with the same effect. I want to have the second column so the user can map the values in the key column on a worksheet. This table will be used as a map for pivot tables to standardize names. Attempting to update the value in a worksheet and then refreshing to see that change in the data model just reverts the value back to an empty string.
Obviously i'm going about this the wrong way. The goal is to be able to maintain this key, value map over the months as new keys are added to it and just have to map those new entries rather than having to do a lot of work with comparing every time to see whats new. Is there a better way to achieve what I am trying to do and still maintain it being expandable over the months without having to redo the entire workbook?

Tabulator - Getting Columns including order and size

I am creating a table using Tabulator, which seems great and very powerful.
I want a way to save relevant data of the table so it can e recreated on the fly.
Currently, I think there are a few things I need...
The row data - I get this using table.getData();
The columns - I get this using table.getColumnDefinitions();
The row data seems perfect I can store that and use it. However, the column information I am saving doesnt appear to have the size of the columns if I have resized them?
Is there a way of getting ALL the relevant column info, so I can save and recreate it exactly?
Alternatively, if there's a single 1 function that saves everything (row data, columns (including order, size etc)) in one go as a JSON or something that may be handy
So you have a few options here.
Config Persistence
If you simply want the table to be the same way it was the last time the user used it on that computer, you could look at using the Peristent Configuration module. This will store a copy of the table column configuration on the browsers local storage so that next time they load the page it will be laid out the same.
Column Layout
If you want to store it externally then you are correct,
the column width is not updated in the definition after a user changes it.
If you want to get the current layout of the columns then you can use the getColumnLayout function to retrieve the current layout of columns:
var columnLayout = table.getColumnLayout();
Though this will only contain the key layout characteristics and not the full definition, you would need to merge them if you wanted to store them in one place.
More details on this method can be found in the Manual Column Layout Documentation

How to list all tables and corresponding columns from a databricks database?

I am trying to get a list of tables and columns in a database, so I could find which tables have a particular column, the best I could find is use separate queries like one to show all tables , and then one to show all columns in one table, e.g. SHOW TABLES FROM database_name, SHOW COLUMNS FROM databasename.tablename. It will not be ideal when you have many tables to go through. Any solution out there at all?
Unfortunately, there is no way to fetch all metadata in one call. You can only do show databases, show tables in ..., describe table .... There is also spark.catalog.listTables, etc., but they could be slower than corresponding SQL queries.
I answered to related question yesterday - you can find code there.

Limit data coming into Spotfire by a different data table

I have Table A prompted on Year/Month and Table B. Table B also has a Year/Month column. Table A is the default data table (gets pulled in first). I have set up a relationship between Table A and B on the common Year/Month column.
The goal is to get Table B to only pull through data where the Year/Month matches the Year/Month on Table A (what the user entered). The purpose is to keep the user from entering the Year/Month multiple times.
The issue is Table B contains almost 35 million records. What I do not want to do is have Spotfire pull across all 35 Million records. What is currently happening is Spotfire is pulling all those records, then by setting filtering to include Filtered Rows Only on Table B, I am limiting what is seen in the visualization to under 200,000 rows. I would much rather just pull across 200,000 rows to start with.
The question: Is there a way to force Spotfire to filter the data table (Table B) by another data table (Table A) as it pulls the data table (Table B) across, thus only pulling a small number of records into memory?
I'm writing this off the basis that most people utilize information links to get data into Spotfire, especially large data sets where the data is not embedded in the analysis. With that being said, I prefer to handle as much if not all of the joining / filtering / massaging at the data source versus the Spotfire application. Here are my views on the best practices and why.
Tables / Views vs Procedures as Information Links
Most people are familiar with the Table / View structure and get data into Spotfire in one of 2 ways
Create all joins / links in information designer based off data relations defined by the author by selecting individual tables from the data sources avaliable
Create a view (or similar object) at the data source where all joining / data relations are done, thus giving Spotfire a single flat file of data
Personally, option 2 is much easier IF you have access to the data source since the data source is designed to handle this type of work. Spotfire just makes it available but with limited functionality (i.e. complex queries, Intellisense, etc aren't available. No native IDE). What's even better is Stored Procedures IMHO and here is why.
In options 1 and 2 above, if you want to add a column you have to change the view / source code at the data source, or individually add a column in the information designer. This creates dwarfed objects and clutters up your library. For example, when you create an information link there is a folder with all the elements associated with it. If you want to add columns later, you'll have another folder for any columns added, and this gets confusing and hard to manage. If you create a procedure at the data source to return the data you need, and later want to add some columns, you only have to change this at the data source. i.e. change the procedure. Everything else will be inherited by Spotfire... all you have to do is click the "reload data" button in Spotfire. You don't have to change anything in the information designer. Additionally, you can easily add new parameters, set default parameter properties or prompt the user, making this a very efficient method of data retrieval. This is perfect when the data source is an OLTP and not a data-mart/data-warehouse (i.e. the data isn't already aggregated / cleansed) but can also be powerful in data warehouse environments as well.
Ditch the GUI, Edit the SQL
I find managing conditions, parameters, join paths, etc a bit annoying--but that's me. Instead, when possible, I prefer to click "Edit SQL" next to all the elements in my Information Link and alter the SQL there. This will allow database guys to work in an environment which is more familiar.

How to quickly migrate from one table into another one with different table structure in the same/different cassandra?

I had one table with more than 10,000,000 records in Cassandra, but for some reason, I want to build another Cassandra table with the same fields and several additional fields, and I will migrate the previous data into it. And now the two tables are in the same Cassandra cluster.
I want to ask how to finish this task in a shortest time?
And If my new table in the different Cassandra, How to do it?
Any advice will be appreciated!
If you just need to add blank fields to a table, then the best thing to do is use the alter table command to add the fields to the existing table. Then no copying of the data would be needed and the new fields would show up as null in the existing rows until you set them to something.
If you want to change the structure of the data in the new table, or write it to a different cluster, then you'd probably need to write an application to read each row of the old table, transform the data as needed, and then write each row to the new location.
You could also do this by exporting the data to a csv file, write a program to restructure the csv file as needed, then import the csv file into the new location.
Another possible method would be to use Apache Spark. You'd read the existing table into an RDD, transform and filter the data into a new RDD, then save the transformed RDD to the new table. That would only work within the same cluster and would be fairly complex to set up.

Resources