I have table with many millions rows. I added new field, and this is null in old rows, so I need to update it to 0, can I do it?)
Yes you can do it. You can update the value of new column. For this you can write a utility which will scan the complete table and update record one by one. If you are aware with spark and use it then things will be easier and faster.
Related
is there any possibility to update a column-value in cassandra that I searched for (is part of my primary key)?
I have a (huge) list of items with a field calld "LastUpdateDateTime" and from time to time I search for columns that haven't updated for a while.
So, the reason i searched for this columns is cause I want to update them and after I update them I want to set the timestamp to the current date.
How to do this with cassandra?
You can't update primary key column, It will insert another record.
That's how cassandra work.
May be you will have to use spark-cassandra connector OR Delete the records with old values and insert new values.
Note: Deleting and inserting is not recommended if you have many records as it will create corresponding number of tombstones
There is one table (or called column family) in Cassandra. I want to know how many records of this table were inserted or updated since a given timestamp. How to do it?
Your best option is to try writetime(column_name). That way you will get the write times of particular columns. You won't get, however, write times of already deleted columns. It's far from what you want, but that's the only possibility.
I hava created a cassandra table with 20 million records. Now I want to delete the expired data decided by one none primary key column. But it doesn't support the operation on the column. So I try to retrieve the table and get the data line by line to delete the data.Unfortunately,it is too huge to retrieve. Otherwise,I couldn't delete the whole table, how could I achieve my goal?
Your question is actually, how to get the data from the table in bulks (also called pagination).
You can do that by selecting different slices from your primary key: For example, if your primary key is some sort of ID, select a range of IDs each time, process the results and do whatever you want to do with them, then get the next range, and so on.
Another way, which depends on the driver you're working with, will be to use fetch_size. You can see a Python example here and a Java example here.
In my project, I am using Cassandra to store huge data. With MYSQL big table it will take a long time to add a new column or index. Will Cassandra solve that issue?
Yes it is relatively very easy to add a column and index that column in Cassandra.
Any column added will be propagated to all nodes very fast too. The added column will be initialised with NULL by default
I want to fetch last n, say last 5 updated rows i.e. order by updated_time desc in cassandra. Is there any good way of doing it?
Exact use case is like, I want to update the count of event whenever it occurs in the event table and fetch the last five events by updated time along with the count.
table structure:-
event_name text, updated_time timestamp, count counter
In Cassandra you can retrieve the editing time with writetime (cell_name). But as you have multiple columns and Cassandra is fast-reads only you may consider doing another view providing exactly the data needed in an ordered manner. On that new table you want to limit read results and periodically trim it down.
It may be possible doing it with writetime() -- but this was not the Cassandra way as it is too slow in production. Another table with just your data is the denormalized Cassandra way of solving it.