Aggregation in Cassandra across partitions

Aggregation in Cassandra across partitions - cassandra

I have a Data model like below,
CREATE TABLE appstat.nodedata (
nodeip text,
timestamp timestamp,
flashmode text,
physicalusage int,
readbw int,
readiops int,
totalcapacity int,
writebw int,
writeiops int,
writelatency int,
PRIMARY KEY (nodeip, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC)
where, nodeip - primary key and timestamp - clustering key (Sorted by descinding oder to get the latest),
Sample data in this table,
SELECT * from nodedata WHERE nodeip = '172.30.56.60' LIMIT 2;
nodeip | timestamp | flashmode | physicalusage | readbw | readiops | totalcapacity | writebw | writeiops | writelatency
--------------+---------------------------------+-----------+---------------+--------+----------+---------------+---------+-----------+--------------
172.30.56.60 | 2017-12-08 06:13:07.161000+0000 | yes | 34 | 57 | 19 | 27 | 8 | 89 | 57
172.30.56.60 | 2017-12-08 06:12:07.161000+0000 | yes | 70 | 6 | 43 | 88 | 79 | 83 | 89
This is properly available and whenever I need to get the statistics I am able to get the data using the partition key like below,
SELECT nodeip,readbw,timestamp FROM nodedata WHERE nodeip = '172.30.56.60' AND timestamp < 1512652272989 AND timestamp > 1512537899000;
Also successfully aggregate the data like below,
SELECT sum(readbw) FROM nodedata WHERE nodeip = '172.30.56.60' AND timestamp < 1512652272989 AND timestamp > 1512537899000;
Now comes the next use case, Where I need to get the cluster data (All the data's of the four nodes),
Like below,
SELECT nodeip,readbw,timestamp FROM nodedata WHERE nodeip IN ('172.30.56.60','172.30.56.61','172.30.56.62','172.30.56.63') AND timestamp < 1512652272989 AND timestamp > 1512537899000;
But It clearly mentioned in number of sites that, 'IN query' has lots of performance hiccups, So what is your suggestion in this Data Model of 'nodedata' table mentioned above? (NOTE: Doing Multiple queries in different partitions are okay which I feel like a last option)
Do you have a better approach (or) re-designing this data model in a better way (or) Any better solution to retrieve the data from multiple partitions?
Any help would be really appreciable.
Thanks,
Harry

Yes, the use of IN on the partition key is discouraged as it put more load on coordinating node, especially if many partitions will be specified in IN clause. Multiple separate requests done async, for example, could even be more performant, and make less load on coordinating nodes.
Also, you need into account the size of your partitions - from quick look to schema, I see that every partition will grow to ~55Mb in one year if you're doing sampling every minute. Having too wide partitions could also lead to some performance problems (although not always, depends on the use case). Maybe you'll need to change partition key to include year, or year+month to make smaller partitions. But in this case, some additional logic should be added to your code when you retrieve data that span several years/months.
P.S. Maybe this is not a fully answering to your question, but commentary field is too small for it :-)

Related

How to scale a range sharded index on a timestamp column in YugabyteDB?

Is there any performance tuning to do for a write-bound workload in YugabyteDB? We thought that by simply adding additional nodes to our YugabyteDB cluster, without further tuning, we would have seen some noticeable increase in writes, however this is not the case. Schema can be found below.
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
update_id | character varying(255) | | not null | | extended | |
node_id | character varying(255) | | not null | | extended | |
data | character varying | | not null | | extended | |
created_at | timestamp without time zone | | | timezone('utc'::text, now()) | plain | |
Indexes:
"test_pkey" PRIMARY KEY, lsm (update_id HASH)
"test_crat" lsm (created_at DESC)
This table has tablets spread across all tservers with RF=3. Created_at is a timestamp that changes all of the time. At this point it has no more than two days of data, all new inserts are acquiring a new timestamp.

In the case of the schema called out above, the test_crat index here is limited to 1 tablet because it is range-sharded. Since created_at has only recent values they will end up going to 1 shard/tablet even with tablet splitting, meaning that all inserts will go to 1 shard. As explained in this Google Spanner documentation, whose sharding, replication, and transactions architecture YugabyteDB is based off of, this is an antipattern for scalability. As mentioned in that documentation:
If you need a global (cross node) timestamp ordered table, and you need to support higher write rates to that table than a single node is capable of, use application-level sharding. Sharding a table means partitioning it into some number N of roughly equal divisions called shards. This is typically done by prefixing the original primary key with an additional ShardId column holding integer values between [0, N). The ShardId for a given write is typically selected either at random, or by hashing a part of the base key. Hashing is often preferred because it can be used to ensure all records of a given type go into the same shard, improving performance of retrieval. Either way, the goal is to ensure that, over time, writes are distributed across all shards equally. This approach sometimes means that reads need to scan all shards to reconstruct the original total ordering of writes.
What that would mean is: to get recent changes, you would have to query each of the shards. Suppose you have 32 shards:
select * from raw3 where shard_id = 0 and created_at > now() - INTERVAL 'xxx';
..
select * from raw3 where shard_id = 31 and created_at > now() - INTERVAL 'xxx';
On the insert, every row could just be given a random value for your shard_id column from 0..31. And your index would change from:
(created_at DESC)
to
(shard_id HASH, created_at DESC)
Another approach you could use that may not be as intuitive, but may be more effective, would be to use a partial index for each shard_id that you would want.
Here is a simple example using 4 shards:
create index partial_0 ON raw3(created_at DESC) where (extract(epoch from timezone('utc',created_at)) * 1000)::bigint % 4=0;
The partial index above only includes rows where the modulus of the epoch in milliseconds of created_at timestamp is 0. And you repeat for the other 3 shards:
create index partial_1 ON raw3(created_at DESC) where (extract(epoch from timezone('utc',created_at)) * 1000)::bigint % 4 = 1;
create index partial_2 ON raw3(created_at DESC) where (extract(epoch from timezone('utc',created_at)) * 1000)::bigint % 4 = 2;
create index partial_3 ON raw3(created_at DESC) where (extract(epoch from timezone('utc',created_at)) * 1000)::bigint % 4 = 3;
And then when you query PostgreSQL is smart enough to pick the right index:
yugabyte=# explain analyze select * from raw3 where (extract(epoch from timezone('utc',created_at)) * 1000)::bigint % 4 = 3 AND created_at < now();
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Scan using partial_3 on raw3 (cost=0.00..5.10 rows=10 width=16) (actual time=1.429..1.429 rows=0 loops=1)
Index Cond: (created_at < now())
Planning Time: 0.210 ms
Execution Time: 1.502 ms
(4 rows)
No need for a new shard_id column in the base table or in the index. If you want to reshard down the road, you can recreate new partial indexes with different shards and drop the old indexes.
More information about the DocDB sharding layer within YugabyteDB can be found here. If you are interested in the different sharding strategies we evaluated, and why we decided on consistent hash sharding as the default sharding strategy, take a look at this blog written by our Co-Founder and CTO Karthik Ranganathan.

How to find range in Cassandra Primary key?

Use case: Find maximum counter value in a specific id range
I want to create a table with these columns: time_epoch int, t_counter counter
The frequent query is:
select time_epoch, MAX t_counter where time_epoch >= ... and time_epoch < ...
This is to find the counter in specific time range. Planning to make time_epoch as primary key. I am not able to query the data. It is always asking for ALLOW FILTERING. Since its a very costly function, We dont want to use it.
How to design the table and query for the use case.

Let's assume that we can "bucket" (partition) your data by day, assuming that enough write won't happen in a day to make the partitions too large. Then, we can cluster by time_epoch in DESCending order. With time based data, storing data in descending order often makes the most sense (as business reqs usually care more about the most-recent data).
Therefore, I'd build a table like this:
CREATE TABLE event_counter (
day bigint,
time_epoch timestamp,
t_counter counter,
PRIMARY KEY(day,time_epoch))
WITH CLUSTERING ORDER BY (time_epoch DESC);
After inserting a few rows, the clustering order becomes evident:
> SELECT * FROM event_counter ;
WHERE day=20210219
AND time_epoch>='2021-02-18 18:00'
AND time_epoch<'2021-02-19 8:00';
day | time_epoch | t_counter
----------+---------------------------------+-----------
20210219 | 2021-02-19 14:09:21.625000+0000 | 1
20210219 | 2021-02-19 14:08:32.913000+0000 | 2
20210219 | 2021-02-19 14:08:28.985000+0000 | 1
20210219 | 2021-02-19 14:08:05.389000+0000 | 1
(4 rows)
Now SELECTing the MAX t_counter in that range should work:
> SELECT day,max(t_counter) as max
FROM event_counter
WHERE day=20210219
AND time_epoch>='2021-02-18 18:00'
AND time_epoch<'2021-02-19 09:00';
day | max
----------+-----
20210219 | 2

Unfortunately there is no better way. Think about it.
If you know cassandra architecture then you would know that your data is spread across multiple nodes based on primary key. only way to filter on values from primary key would be to transverse each node which is essentially what "ALLOW FILTERING" is done.

CQL query in Cassandra with composite partition key

My main problem is with paginating Cassandra resultset on a table with a composite partition key. However, I am trying to narrow it down with a simple scenario. Say, I have a table,
CREATE TABLE numberofrequests (
cluster text,
date text,
time text,
numberofrequests int,
PRIMARY KEY ((cluster, date), time)
) WITH CLUSTERING ORDER BY (time ASC)
And I have a data like,
cluster | date | time | numberofrequests
---------+------------+------+------------------
c2 | 01/04/2015 | t1 | 1
c2 | d1 | t1 | 1
c2 | 02/04/2015 | t1 | 1
c1 | d1 | t1 | 1
c1 | d1 | t2 | 2
Question: Is there any way I can query data for cluster=c2? I don't care about the 'date' and honestly speaking I keep this only for partitioning purpose to avoid hot-spots. I tried the following,
select * from numberofrequests where token(cluster,date)>=token('c2','00/00/0000');
select * from numberofrequests where token(cluster,date)>=token('c2','1');
select * from numberofrequests where token(cluster,date)>=token('c2','a');
select * from numberofrequests where token(cluster,date)>=token('c2','');
My schema uses the default partitioner (Murmur3Partitioner). Is this achievable at all?

Cassandra needs the partitioning key (PK) to locate the queried row. Any queries based only on parts of the PK will not work, since its murmur3 hash won't match the hash based on the complete PK as initially created by the partitioner. What you could do instead is to use the ByteOrderedPartitioner. This would allow you to use the token() function as in your examples by keeping the byte order of the PK instead of using a hash function. But in most cases, that's a bad idea, as data will not be distributed evenly across the cluster and you'll end up with hotspots you tried to avoid in first place.

Cassandra Data Model for Sensor Data - Value | Timestamp

I'm new to Cassandra and I'm trying to define a data model that fits my requirements.
I have a sensor that collects one value every millisecond and I have to store those data in Cassandra. The queries that I want to perform are:
1) Give me all the sensor values from - to these timestamp values
2) Tell me when this range of values was recorded
I'm not sure if there exist a common schema that can satisfy both queries because I want to perform range queries on both values. For the first query I should use something like:
CREATE TABLE foo (
value text,
timestamp timestamp,
PRIMARY KEY (value, timestamp));
but then for the second query I need the opposite since I can't do range queries on the partition key without using a token that restricts the timestamp:
CREATE TABLE foo (
value text,
timestamp timestamp,
PRIMARY KEY (timestamp, value));
So do I need two tables for this? Or there exist another way?
Thanks
PS: I need to be as fast as possible while reading

I have a sensor that collects one value every millisecond and I have to store those data in Cassandra.
The main problem I see here, is that you're going to run into Cassandra's limit of 2 billion col values per partition fairly quickly. DataStax's Patrick McFadin has a good example for weather station data (Getting Started with Time Series Data Modeling) that seems to fit here. If I apply it to your model, it looks something like this:
CREATE TABLE fooByTime (
sensor_id text,
day text,
timestamp timestamp,
value text,
PRIMARY KEY ((sensor_id,day),timestamp)
);
This will partition on both sensor_id and day, while sorting rows within the partition by timestamp. So you could query like:
> SELECT * FROM fooByTime WHERE sensor_id='5' AND day='20151002'
AND timestamp > '2015-10-02 00:00:00' AND timestamp < '2015-10-02 19:00:00';
sensor_id | day | timestamp | value
-----------+----------+--------------------------+-------
5 | 20151002 | 2015-10-02 13:39:22-0500 | 24
5 | 20151002 | 2015-10-02 13:49:22-0500 | 23
And yes, the way to model in Cassandra, is to have one table for each query pattern. So your second table where you want to range query on value might look something like this:
CREATE TABLE fooByValues (
sensor_id text,
day text,
timestamp timestamp,
value text,
PRIMARY KEY ((sensor_id,day),value)
);
And that would support queries like:
> SELECT * FROm foobyvalues WHERE sensor_id='5'
AND day='20151002' AND value > '20' AND value < '25';
sensor_id | day | value | timestamp
-----------+----------+-------+--------------------------
5 | 20151002 | 22 | 2015-10-02 14:49:22-0500
5 | 20151002 | 23 | 2015-10-02 13:49:22-0500
5 | 20151002 | 24 | 2015-10-02 13:39:22-0500

Data scheme Cassandra using various data types

Currently I am developing a solution in the field of time-series data. Within these data we have: an ID, a value and a timestamp.
So here it comes: the value might be of type boolean, float or string. I consider three approaches:
a) For every data type a distinct table, all sensor values of type boolean into a table, all sensor values of type string into another. The obvious disadvantage is that you have to know where to look for a certain sensor.
b) A meta-column describing the data type plus all values of type string. The obvious disadvantage is the data conversion e.g. for calculating the MAX, AVG and so on.
c) Having three columns of different type but only one will be with a value per record. The disadvantage is 500000 sensors firing every 100ms ... plenty of unused space.
As my knowledge is limited any help is appreciated.

500000 sensors firing every 100ms
First thing, is to make sure that you partition properly, to make sure that you don't exceed the limit of 2 billion columns per partition.
CREATE TABLE sensorData (
stationID uuid,
datebucket text,
recorded timeuuid,
intValue bigint,
strValue text,
blnValue boolean,
PRIMARY KEY ((stationID,datebucket),recorded));
With a half-million every 100ms, that's 500 million in a second. So you'll want to set your datebucket to be very granular...down to the second. Next I'll insert some data:
stationid | datebucket | recorded | blnvalue | intvalue | strvalue
--------------------------------------+---------------------+--------------------------------------+----------+----------+----------
8b466f1d-8d6b-46fa-9f5b-8c4eb51aa40c | 2015-04-22T14:54:29 | 6338df40-e929-11e4-88c8-21b264d4c94d | null | 59 | null
8b466f1d-8d6b-46fa-9f5b-8c4eb51aa40c | 2015-04-22T14:54:29 | 633e0f60-e929-11e4-88c8-21b264d4c94d | null | null | CD
8b466f1d-8d6b-46fa-9f5b-8c4eb51aa40c | 2015-04-22T14:54:29 | 6342f160-e929-11e4-88c8-21b264d4c94d | True | null | null
3221b1d7-13b4-40d4-b41c-8d885c63494f | 2015-04-22T14:56:19 | a48bbdf0-e929-11e4-88c8-21b264d4c94d | False | null | null
...plenty of unused space.
You might be suprised. With the CQL output of SELECT * above, it appears that there are null values all over the place. But watch what happens when we use the cassandra-cli tool to view how the data is stored "under the hood:"
RowKey: 3221b1d7-13b4-40d4-b41c-8d885c63494f:2015-04-22T14\:56\:19
=> (name=a48bbdf0-e929-11e4-88c8-21b264d4c94d:, value=, timestamp=1429733297352000)
=> (name=a48bbdf0-e929-11e4-88c8-21b264d4c94d:blnvalue, value=00, timestamp=1429733297352000)
As you can see, the data (above) stored for the CQL row where stationid=3221b1d7-13b4-40d4-b41c-8d885c63494f AND datebucket='2015-04-22T14:56:19' shows that blnValue has a value of 00 (false). But also notice that intValue and strValue are not present. Cassandra doesn't force a null value like an RDBMS does.
The obvious disadvantage is the data conversion e.g. for calculating the MAX, AVG and so on.
Perhaps you already know this, but I did want to mention that Cassandra CQL does not contain definitions for MAX, AVG or any other data aggregation function. You'll either need to do that client-side, or implement Apache-Spark to perform OLAP-type queries.
Be sure to read through Patrick McFadin's Getting Started With Time Series Data Modeling. It contains good suggestions on how to solve time series problems like this.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Aggregation in Cassandra across partitions - cassandra

Related

How to scale a range sharded index on a timestamp column in YugabyteDB?

How to find range in Cassandra Primary key?

CQL query in Cassandra with composite partition key

Cassandra Data Model for Sensor Data - Value | Timestamp

Data scheme Cassandra using various data types

Categories

Resources