I am new to Cassandra. I created a table, and I have inserted some data in it, now I want to select data from it, and in the output, I want some calculated columns.
I created a user defined function concat, which concatenates 2 strings and returns the result. Then I noticed that this function shows data correctly when I use it in SELECT statement. but it does not work when I use in UPDATE statement:
That is, this works;
select concat(prov,city), year,mnth,acno,amnt from demodb.budgets;
but this does not;
update demodb.budgets set extra=concat(prov,city) where prov='ON';
In addition, the UPDATE also does not work if we simply assign one column's value to another column of same type (without any calculations), as below;
update demodb.budgets set extra=city where prov='ON';
Also, even a simple arithmetic calculation doesn't work in Update statement;
that is, this too doesn't work;
update demodb.budgets set amnt = amnt + 20 where prov='ON';
here amnt is a simple double type column.
(when I saw this; all I could do is pull my hair hardly and say, I can't work with Cassandra, i don't just want it if it cannot do simple arithmetic)
Can someone please help how can I achieve the desired updates?
I think the basic answer to your question is Read-before-write is a huge anti-pattern in Cassandra.
The issue of concurrency in a distributed environment is a key point there.
More info.
Related
I have two tables that use a unique concatenated column for their relationship. I simply want to make a measure that uses the values from C4 of Table1. I thought I could use a simple formula like =values(Table1[C4]) but I get an error of "A table of multiple values was supplied where a single value was expected."
Side note: I realize the concatenation is unnecessary here in this simple example, but it is necessary in the full data I am working with which is why I added it into this example.
Here's a simplified set of tables for what I am trying to do:
Table1
Table2
Relationships
First you should think. Do I really need a Calculated column? Can't this be calculated at runtime?
But if you still want to do it, you can use RELATED or RELATEDTABLE.
Keep in mind if you are pulling from RELATEDTABLE, returns many values. So you should apply some type of aggregation like SUMX or MAXX.
You can use context transition to retrieve the value.
= CALCULATE(MAX(Table1[C4])
Hello I am facing a problem when I am trying to execute a really simple update query in cqlsh
update "Table" set "token"='111-222-333-444' where uid='123' and "key"='somekey';
It didn't throw any error, but the value of the token is still the same. However, if I try the same query for some other field it works just fine:
update "Table" set "otherfield"='value' where uid='123' and "key"='somekey';
Any ideas why Cassandra can prevent updates for some fields?
Most probably the entry was inserted by client with incorrect clocks, or something like. The data in Cassandra are "versioned" by write time that could be even in the future (depending on use case). And when reading, Cassandra compares the write time of all versions of the specified column (there could be multiple versions in the data files on the disk), and selects one with highest write time.
You need to check the write time of that column value (use the writetime function) and compare with your current time:
select writetime(token) from Table where uid='123' and "key"='somekey';
the resulting value is in the microseconds. You can remove last 3 digits, and use something like this site to convert it into human-understandable time.
Hi I am having a cassandra table. my table has around 200 records in it . later i have altered the table to add a new column named budget which is of type boolean . I want to set the default value to be true for that column . what should be the cql looks like.
I am trying the following command but it didnt work
cqlsh:Openmind> update mep_primecastaccount set budget = true ;
SyntaxException: line 1:46 mismatched input ';' expecting K_WHERE
appreciate any help
thank you
Any operation that would require a cluster wide read before write is not supported (as it wont work in the scale that Cassandra is designed for). You must provide a partition and clustering key for an update statement. If theres only 200 records a quick python script or can do this for you. Do a SELECT * FROM mep_primecastaccount and iterate through ResultSet. For each row issue an update. If you have a lot more records you might wanna use spark or hadoop but for a small table like that a quick script can do it.
Chris's answer is correct - there is no efficient or reliable way to modify a column value for each and every row in the database. But for a 200-row table that doesn't change in parallel, it's actually very easy to do.
But there's another way that can work also on a table of billions of rows:
You can handle the notion of a "default value" in your client code. Pre-existing codes will not have a value for "budget" at all: It won't be neither true, nor false, but rather it will be outright missing (a.k.a. "null"). You client code may, when it reads a row with a missing "budget" value, replace it by some default value of its choice - e.g., "true".
I have been unable to decipher on how to proceed with a use case....
I want to keep count of some items, and query the data such that
counter_value < threshold value
Now in cassandra, indexes cannot be made on counters, that is something that is a problem, is there a workaround modelling which can be done to accomplish something similar??
thanks
You have partially answered your own question, saying what you want to query. So lets say first model the data the way you will query it later.
If you want to query through counter value, it cannot be a counter type. As it doesn't complies the two conditions needed to query the data
Cannot be part of index
Cannot be part of the partition key
Counters are the most efficient way to do fast writes in Cassandra for a counter use of case. But unfortunately they cannot be part of where clause, because of above two restrictions.
So if you want to solve the problem using Cassandra, change the type to a long in Cassandra, make it the clustering key or make an index over that column. In any case this will slower your writes and increase the latency of every operation of updating counter value, as you will be using the anti parttern of read-before-write.
I would recommend to use the index.
Last but not least, I would consider using a SQL database for this problem.
Depending on what you're trying to return as a result, you might be able to do something with a user defined aggregate function. You can put arbitrary code in the user defined function to filter based on the value of the counter.
See some examples here and here.
Other approaches would be to filter the returned rows on the client side, or to load the data into Spark and filter the rows in Spark.
I have an Excel workbook that utilises a data table (A).
I now want to create another data table (B) that effectively sits on top of the other data table. That is, each "iteration" of B calls A.
This approach fails although I cannot find any documentation about data tables that indicates that this would not work.
Basically I'd like to know if anyone has tried this before and whether I am missing something?
Is there a workaround? Do you know of any documentation that spells out whether and why this is not supported?
No.
I tried this at length some years ago in both xl03 and xl07 and my conclusion was that it can't be done - each data table seems to be an independent one-off run, they don't talk if you try to link them
I couldn't find any documentation on this issues either on the process, or for anyone else looking at a similar problem.
I want to share my experience using the data tables.
We have found a workaround for this problematic.
If you have two variables A & B that need to run into a datatable and get one or multiple result.
What we've done is :
Set any combinaison (binari combinaison) for A & B and put an id for each of this combinaison (A=0 & B=0 => id=1)
So you will then run one data table with a length of A*B.
The default here is the length to calculate those data (7min with 25 data table & 2 data table with a length of 8000 rows).
Hope it help !