Contradicting information on CQL counter type in docs - cassandra

I have been looking for some information on counters and it seems like there is some rather contradicting info regarding to what you can do with them.
According to the official DataStax Documentation (https://docs.datastax.com/en/cql-oss/3.x/cql/cql_reference/counter_type.html) "You cannot set the value of a counter, which supports two operations: increment and decrement.".
However, if we look into the BATCH CQL documentation (https://docs.datastax.com/en/dse/6.0/cql/cql/cql_reference/cql_commands/cqlBatch.html#cqlBatch__batch-updates), the bottom page example includes setting, adding, and subtracting a counter variable within a batch.
This example is likely also breaking the rule that having a counter in a table should only have counters for the rest of the table.
So what really are the limitations / usability for counters in cassandra DataStax? There does not seem to be a clear definition.

I think it's just a misunderstanding. Those pages do not contradict each other.
The CQL Counter type page correctly states that it is not possible to set the value of a counter column. For example, this is NOT valid:
UPDATE ks.counter_table
SET count = 10
WHERE pk = ?
The only valid operations on a counter column are increment and decrement. Here are some examples:
UPDATE ks.counter_table
SET count = count + 1
WHERE pk = ?
UPDATE ks.counter_table
SET count = count - 1
WHERE pk = ?
In the BATCH command page, the first 2 examples are increment operations:
UPDATE cycling.popular_count
SET popularity = popularity + 1
WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
UPDATE cycling.popular_count
SET popularity = popularity + 125
WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
The last example is a decrement operation:
UPDATE cycling.popular_count
SET popularity = popularity - 64
WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
I can't see how the examples would break the rule about only having counters in a counter table. From the examples, I would infer that the table schema is:
CREATE TABLE cycling.popular_count
id uuid,
popularity counter,
PRIMARY KEY(id)
)
You can have as many non-counter columns in the table as long as they are part of the PRIMARY KEY.
As a side note, it is not correct to refer to the software as either "Cassandra DataStax" or "DataStax Cassandra" so I've updated the title accordingly. DataStax (the company) does not own Cassandra. The more appropriate reference is "Apache Cassandra" or just plain "Cassandra". Cheers!

Related

Timeseries differencing - ArangoDB (AQL or Python)

I have a collection which holds documents, with each document having a data observation and the time that the data was captured.
e.g.
{
_key:....,
"data":26,
"timecaptured":1643488638.946702
}
where timecaptured for now is a utc timestamp.
What I want to do is get the duration between consecutive observations, with SQL I could do this with LAG for example, but with ArangoDB and AQL I am struggling to see how to do this at the database. So effectively the difference in timestamps between two documents in time order. I have a lot of data and I don't really want to pull it all into pandas.
Any help really appreciated.
Although the solution provided by CodeManX works, I prefer a different one:
FOR d IN docs
SORT d.timecaptured
WINDOW { preceding: 1 } AGGREGATE s = SUM(d.timecaptured), cnt = COUNT(1)
LET timediff = cnt == 1 ? null : d.timecaptured - (s - d.timecaptured)
RETURN timediff
We simply calculate the sum of the previous and the current document, and by subtracting the current document's timecaptured we can therefore calculate the timecaptured of the previous document. So now we can easily calculate the requested difference.
I only use the COUNT to return null for the first document (which has no predecessor). If you are fine with having a difference of zero for the first document, you can simply remove it.
However, neither approach is very straight forward or obvious. I put on my TODO list to add an APPEND aggregate function that could be used in WINDOW and COLLECT operations.
The WINDOW function doesn't give you direct access to the data in the sliding window but here is a rather clever workaround:
FOR doc IN collection
SORT doc.timecaptured
WINDOW { preceding: 1 }
AGGREGATE d = UNIQUE(KEEP(doc, "_key", "timecaptured"))
LET timediff = doc.timecaptured - d[0].timecaptured
RETURN MERGE(doc, {timediff})
The UNIQUE() function is available for window aggregations and can be used to get at the desired data (previous document). Aggregating full documents might be inefficient, so a projection should do, but remember that UNIQUE() will remove duplicate values. A document _key is unique within a collection, so we can add it to the projection to make sure that UNIQUE() doesn't remove anything.
The time difference is calculated by subtracting the previous' documents timecaptured value from the current document's one. In the case of the first record, d[0] is actually equal to the current document and the difference ends up being 0, which I think is sensible. You could also write d[-1].timecaptured - d[0].timecaptured to achieve the same. d[1].timecaptured - d[0].timecaptured on the other hand will give you the inverted timestamp for the first record because d[1] is null (no previous document) and evaluates to 0.
There is one risk: UNIQUE() may alter the order of the documents. You could use a subquery to sort by timecaptured again:
LET timediff = doc.timecaptured - (
FOR dd IN d SORT dd.timecaptured LIMIT 1 RETURN dd.timecaptured
)[0]
But it's not great for performance to use a subquery. Instead, you can use the aggregation variable d to access both documents and calculate the absolute value of the subtraction so that the order doesn't matter:
LET timediff = ABS(d[-1].timecaptured - d[0].timecaptured)

How do you add elements to a set with DataStax QueryBuilder?

I have a table whose column types are
text, bigint, set<text>
I'm trying to update a single row and add an element to the set using QueryBuilder.
The code that overwrites the existing set looks like this (note this is scala):
val query = QueryBuilder.update("twitter", "tweets")
.`with`(QueryBuilder.set("sinceid", update.sinceID))
.and(QueryBuilder.set("tweets", setAsJavaSet(update.tweets)))
.where(QueryBuilder.eq("handle", update.handle))
I was able to find the actual CQL for adding an element to a set which is:
UPDATE users
SET emails = emails + {'fb#friendsofmordor.org'} WHERE user_id = 'frodo';
But could not find an example using QueryBuilder.
Based off of the CQL I also tried:
.and(QueryBuilder.set("tweets", "tweets"+{setAsJavaSet(update.tweets)}))
But it did not work. Thanks in advance
Use add (add one element at a time) or addAll (more than one any number of element at a time) method to add to a set.
To extend Ananth's answer:
QueryBuilder.add does not support BindMarker. To use BindMarker while adding in set, it is required to use QueryBuilder.addAll only.*
*Just a note, Collections.singleton may come in handy in this regard.
Using #Ananth and #sazzad answers, the code below works:
Session cassandraSession;
UUID uuid;
Long value;
Statement queryAddToSet = QueryBuilder
.update("tableName")
.with(QueryBuilder.addAll("setFieldName", QueryBuilder.bindMarker()))
.where(QueryBuilder.eq("whereFieldName", QueryBuilder.bindMarker()));
PreparedStatement preparedQuery = cassandraSession.prepare(queryAddToSet);
BoundStatement boundQuery = preparedQuery.bind();
boundQuery
.setUUID("whereFieldName", uuid)
.setSet("setFieldName", Collections.singleton(value));
session.execute(boundQuery);

ColdFusion: Object with duplicate values (removing duplicates)

I have a query object (SQL) with some records, the problem is that some of the records contain duplicate values. :( (I can't use DISTINCT in my SQL Query, so how to remove in my object?)
categories[1].id = 1
categories[2].id = 1
categories[3].id = 2
categories[4].id = 3
categories[5].id = 2
Now I want to get a list with 1, 2, 3
Is that possible?
I'm not quite sure why you say you can't use DISTINCT, even given the qualification you offered. It doesn't matter were a query came from (<cfquery>, <cfldap>, <cfdirectory>, built by hand) by the time it's exposed to your CFML code, it's just "a query", so you can definitely use DISTINCT on it:
<cfquery name="distinctCategories" dbtype="query">
SELECT DISTINCT id
FROM categories
</cfquery>

Query WadPerformanceCountersTable in Increments?

I am trying to query the WadPerformanceCountersTable generated by Azure Diagnostics which has a PartitionKey based on tick marks accurate up to the minute. This PartitionKey is stored as a string (which I do not have any control over).
I want to be able to query against this table to get data points for every minute, every hour, every day, etc. so I don't have to pull all of the data (I just want a sampling to approximate it). I was hoping to using the modulus operator to do this, but since the PartitionKey is stored as a string and this is an Azure Table, I am having issues.
Is there any way to do this?
Non-working example:
var query =
(from entity in ServiceContext.CreateQuery<PerformanceCountersEntity>("WADPerformanceCountersTable")
where
long.Parse(entity.PartitionKey) % interval == 0 && //bad for a variety of reasons
String.Compare(entity.PartitionKey, partitionKeyEnd, StringComparison.Ordinal) < 0 &&
String.Compare(entity.PartitionKey, partitionKeyStart, StringComparison.Ordinal) > 0
select entity)
.AsTableServiceQuery();
If you just want to get a single row based on two different time interval (now and N time back) you can use the following query which returns the single row as described here:
// 10 minutes span Partition Key
DateTime now = DateTime.UtcNow;
// Current Partition Key
string partitionKeyNow = string.Format("0{0}", now.Ticks.ToString());
DateTime tenMinutesSpan = now.AddMinutes(-10);
string partitionKeyTenMinutesBack = string.Format("0{0}", tenMinutesSpan.Ticks.ToString());
//Get single row sample created last 10 mminutes
CloudTableQuery<WadPerformanceCountersTable> cloudTableQuery =
(
from entity in ServiceContext.CreateQuery<PerformanceCountersEntity>("WADPerformanceCountersTable")
where
entity.PartitionKey.CompareTo(partitionKeyNow) < 0 &&
entity.PartitionKey.CompareTo(partitionKeyTenMinutesBack) > 0
select entity
).Take(1).AsTableServiceQuery();
The only way I can see to do this would be to create a process to keep the Azure table in sync with another version of itself. In this table, I would store the PartitionKey as a number instead of a string. Once done, I could use a method similar to what I wrote in my question to query the data.
However, this is a waste of resources, so I don't recommend it. (I'm not implementing it myself, either.)

Drupal 6 next tid value

I need to find the next tid value that will be created when I create a new term. The term_data table does not show this value as the term with the highest tid can be deleted. It looks like the sequences table held this value in Drupal 5 but where is it held in Drupal 6?
Thanks for your time with this!
What do you need this for? You should be creating the term, then pulling the ID based on the vocabulary ID plus the term, like this -
$term = taxonomy_get_term_by_name('whatever the term is');
$new_term_id = $term->tid;
But to answer your question specifically -
$result = db_query("SHOW TABLE STATUS LIKE 'term_data'");
$row = db_fetch_array($result);
$next_id = $row['Auto_increment'];

Resources