Deleting all rows from Cassandra cql table [duplicate] - cassandra

This question already has answers here:
How do I delete all data in a Cassandra column family?
(6 answers)
Closed 6 years ago.
Is there a command to all the rows present in a cql table in cassandra like the one in sql?
delete from TABLE
Going by the documentation, I don't find any way to perform delete operation without a where condition.
DELETE col1 FROM SomeTable WHERE userID = 'some_key_value';

To remove all rows from a CQL Table, you can use the TRUNCATE command:
TRUNCATE keyspace_name.table_name;
Or if you are already using the keyspace that contains your target table:
TRUNCATE table_name;
Important to note, but by default Cassandra creates a snapshot of the table just prior to TRUNCATE. Be sure to clean up old snapshots, or set auto_snapshot: false in your cassandra.yaml.

Related

Databricks - How to change a partition of an existing Delta table?

I have a table in Databricks delta which is partitioned by transaction_date. I want to change the partition column to view_date. I tried to drop the table and then create it with a new partition column using PARTITIONED BY (view_date).
However my attempt failed since the actual files reside in S3 and even if I drop a hive table the partitions remain the same.
Is there any way to change the partition of an existing Delta table? Or the only solution will be to drop the actual data and reload it with a newly indicated partition column?
There's actually no need to drop tables or remove files. All you need to do is read the current table, overwrite the contents AND the schema, and change the partition column:
val input = spark.read.table("mytable")
input.write.format("delta")
.mode("overwrite")
.option("overwriteSchema", "true")
.partitionBy("colB") // different column
.saveAsTable("mytable")
UPDATE: There previously was a bug with time travel and changes in partitioning that has now been fixed.
As Silvio pointed out there is no need to drop the table. In fact the strongly recommended approach by databricks is to replace the table.
https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html#parameters
in spark SQL, This can be done easily by
REPLACE TABLE <tablename>
USING DELTA
PARTITIONED BY (view_date)
AS
SELECT * FROM <tablename>
Modded example from:
https://docs.databricks.com/delta/best-practices.html#replace-the-content-or-schema-of-a-table
Python solution:
If you need more than one column in the partition
partitionBy(column, column_2, ...)
def change_partition_of(table_name, column):
df = spark.read.table(tn)
df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").partitionBy(column).saveAsTable(table_name)
change_partition_of("i.love_python", "column_a")

How to overwrite multiple partitions in HIVE [duplicate]

This question already has answers here:
Overwrite specific partitions in spark dataframe write method
(14 answers)
Overwrite only some partitions in a partitioned spark Dataset
(3 answers)
Closed 4 years ago.
I have a large table and in which I would like overwrite certain top level partitions. for e.g. I have table which is partitioned based on year and month, and I would like to overwrite partitions say from year 2000 to 2018.
How I can do that.
Note : I would not like to delete the previous table and overwrite entire table with new data.

Update the column I searched for

is there any possibility to update a column-value in cassandra that I searched for (is part of my primary key)?
I have a (huge) list of items with a field calld "LastUpdateDateTime" and from time to time I search for columns that haven't updated for a while.
So, the reason i searched for this columns is cause I want to update them and after I update them I want to set the timestamp to the current date.
How to do this with cassandra?
You can't update primary key column, It will insert another record.
That's how cassandra work.
May be you will have to use spark-cassandra connector OR Delete the records with old values and insert new values.
Note: Deleting and inserting is not recommended if you have many records as it will create corresponding number of tombstones

How to add multiple columns in cassandra table?

I need to add some new columns to my existing column_family/table in cassandra.
I can add single column like this :
ALTER TABLE keyspace_name.table_name ADD column_name cql_type;
Can I add all new columns using a single query? If yes, how to do it using cql and datastax cassandra driver?
This is fixed in Cassandra 3.6
https://issues.apache.org/jira/browse/CASSANDRA-10411
ALTER TABLE foo ADD (colname1 int, colname2 int)

Migrate data from cassandra to cassandra

We have 2 cassandra clusters, first one has the old data and second one has the new data.
Now we want to move or copy the old data from first cluster to second. What is the best way to do this and how to do this?
we are using DSE 3.1.4.
One tool you could try would be the COPY TO/FROM cqlsh command.
On a node in the old cluster, you would use the COPY TO:
cqlsh> COPY myTable (col1, col2, col3, col4) TO 'temp.csv'
And then (after copying the file over) on a node in your new cluster, you would copy the data in the CSV file into Cassandra:
cqlsh> COPY myTable (col1, col2, col3, col4) FROM 'temp.csv'
Here is some more documentation on the COPY command.
Note that the COPY TO/FROM is recommended for tables that contain only a few million rows or less. For larger datasets you should look at:
Cassandra Bulk Loader
sstable2json
There is a tool called /usr/bin/sstableloader for copying data between the clusters. Although when I used it months ago, I encountered an error and used this instead. But since it was a long time ago, sstableloader might have been fixed already.

Resources