Enforcing Column Level Constraints on Iceberg Table - apache-spark

I am creating an iceberg table using Athena. I want to specify some constraints (NOT NULL, value greater than a specific value) on the columns. How can I do that? I tried specifying the column as NOT NULL but it doesn't allow me.
Thanks!

Related

How to understand the 'Flexible schema' in Cassandra?

I am new to Cassandra, and found below in the wikipedia.
A column family (called "table" since CQL 3) resembles a table in an RDBMS (Relational Database Management System). Column families contain rows and columns. Each row is uniquely identified by a row key. Each row has multiple columns, each of which has a name, value, and a timestamp. Unlike a table in an RDBMS, different rows in the same column family do not have to share the same set of columns, and a column may be added to one or multiple rows at any time.[29]
It said that 'different rows in the same column family do not have to share the same set of columns', but how to implement it? I have almost read all the documents in the offical site.
I can create table and insert data like below.
CREATE TABLE Emp_record(E_id int PRIMARY KEY,E_score int,E_name text,E_city text);
INSERT INTO Emp_record(E_id, E_score, E_name, E_city) values (101, 85, 'ashish', 'Noida');
INSERT INTO Emp_record(E_id, E_score, E_name, E_city) values (102, 90, 'ankur', 'meerut');
It's very like I did in the relational database. So how to create multiply rows with different columns?
I also found the offical document mentioned 'Flexible schema', how to understand it here?
Thanks very much in advance.
Column family is from the original design of Cassandra, when the data model looked like the Google BigTable or Apache HBase, and Thrift protocol was used for communication. But this required that schema was defined inside the application, and that makes access to data from many applications more problematic, as you need to update the schema inside all of them...
The CREATE TABLE and INSERT is a part of the Cassandra Query Language (CQL) that was introduced long time ago, and replaced Thrift-based implementation (Cassandra 4.0 completely removed the Thrift support). In CQL you need to have schema defined for a table, where you need to provide column name & type. If you really need to have dynamic columns, there are several approaches to that (I'll link answers that I already wrote over the time, so there won't duplicates):
If you have values of the same type, you can use one column as a name of the attribute/column, and another to store the value, like described here
if you have values of different types, you can also use one column as a name of attribute/column, and define multiple columns for values - one for each of the data types: int, text, ..., and you insert value into the corresponding columns only (described here)
you can use maps (described here) - it's similar to first or second, but mostly designed for very small number of "dynamic columns", plus have other limitations, like, you need to read the full map to fetch one value, etc.)

How to migrate cassandra cluster column change

We have a use case to change cassandra table column (change the type from Int to Long), since it not supported changing from Int to varInt is supported and we are fine with that.
But in some of the tables this column is a cluster column and we have no way of changing this.
I am curious what is the best way to handle this case.
You can not alter a clustering column in Cassandra - you'll need to make a new table and load the data into that table using a third party application (cqlsh COPY being the simplest, or something like Spark). If you're unable to tolerate a change in the table's name, you'll need to backup your data, drop the old table, and recreate it with the proper types.

Cassandra: how to filter more then one column

Is it possible to filter more than one column?
I want to give the user an option to filter informations, all written in one table. I add Indexes to my table but with these it was not possible to filter more then one column.
Values could also be null, so it is not possible to define them as clustering columns, is it?
There are several types of queries you can do in CQL that return more than one row.
The most common and efficient are range queries based on a clustering key.
Another method is to use the IN clause with a SELECT statement.
But Cassandra has a lot of restrictive rules on when you are allowed to do these types of queries and on which types of columns.
See more details here: A deep look at the CQL WHERE clause

Cassandra: Adding new column to the table

Hi I just added a new column Business_sys to my table my_table:
ALTER TABLE my_table ALTER business_sys TYPE set<text>;
But again I just droped this column name because I wanted to change the type of column:
ALTER TABLE my_table DROP business_sys;
Again when I tried to add the same colmn name with different type am getting error message
"Cannnot add a collection with the name business_sys because the collection with the same name and different type has already been used in past"
I just tried to execute this command to add a new column with different type-
ALTER TABLE my_table ADD business_sys list<text>;
What did I do wrong? I am pretty new to Cassandra. Any suggestions?
You're running into CASSANDRA-6276. The problem is when you drop a column in Cassandra that the data in that column doesn't just disappear, and Cassandra may attempt to read that data with its new comparator type.
From the linked JIRA ticket:
Unfortunately, we can't allow dropping a component from the comparator, including dropping individual collection columns from ColumnToCollectionType.
If we do allow that, and have pre-existing data of that type, C* simply wouldn't know how to compare those...
...even if we did, and allowed [users] to create a different collection with the same name, we'd hit a different issue: the new collection's comparator would be used to compare potentially incompatible types.
The JIRA suggests that this may not be an issue in Cassandra 3.x, but I just tried it in 3.0.3 and it fails with the same error.
What did I do wrong? I am pretty new to Cassandra. Any suggestions?
Unfortunately, the only way around this one is to use a different name for your new list.
EDIT: I've tried this out in Cassandra and ended up with inconsistent missing data. Best way to proceed is to change the column name as suggested in CASSANDRA-6276. And always follow documentation guidelines :)
-WARNING-
According to this comment from CASSANDRA-6276, running the following workaround is unsafe.
Elaborating on #masum's comment - it's possible to work around the limitation by first recreating the column with a non-collection type such as an int. Afterwards, you can drop and recreate again using the new collection type.
From your example, assuming we have a business_sys set:
ALTER TABLE my_table ADD business_sys set<text>;
ALTER TABLE my_table DROP business_sys;
Now re-add the column as int and drop it again:
ALTER TABLE my_table ADD business_sys int;
ALTER TABLE my_table DROP business_sys;
Finally, you can re-create the column with the same name but different collection type:
ALTER TABLE my_table ADD business_sys list<text>;
Cassandra doesn't allow you to recreate a column with the same name and the same datatype, but there is an workaround to fix it.
Once you have dropped the column with SET type, you can recreate it with only another "default" type such as varchar or interger.
After recreating with one of those types, you can drop the column once again and finally recreate with the proper type.
I illustrated it below
ALTER TABLE my_table DROP business_sys; # the drop you've done
ALTER TABLE my_table ADD business_sys varchar; # recreating with another type
ALTER TABLE my_table DROP business_sys; # dropping again
ALTER TABLE my_table ADD business_sys list<text>; # recreating with proper type

How to alter cassandra table columns

I need to add additional columns to a table in cassandra. But the existing table is not empty. Is there any way to update it in a simple way? Otherwise what is the best approach to add additional columns to a non empty table? thx in advance.
There's a good example of adding table columns to an existing table in the CQL documentation on ALTER. The following statement will add the column gravesite (with type varchar) to to the table addamsFamily:
ALTER TABLE addamsFamily ADD gravesite varchar;

Resources