Opscenter won't use a separate cassandra cluster - cassandra

We're using cloudformation to automate the setup and tear down of several cassandra clusters we use for load testing. During this load test, we use opscenter to monitor our throughput. What I've found is that storing the opscenter data in our test's target cluster is skewing our node's data ownership information. As a result, I'd like to move opscenter and the agent data to it's own node. I have a single c3.4xl set up with a single cassandra instance and opscenter. I have the following configuration files.
opscenter server
/etc/opscenter/clusters/usergrid.conf
[cassandra]
seed_hosts = ec2-23-22-188-56.compute-1.amazonaws.com,ec2-54-163-164-41.compute-1.amazonaws.com,ec2-54-166-10-160.compute-1.amazonaws.com,ec2-54-166-219-212.compute-1.amazonaws.com,ec2-54-211-181-126.compute-1.amazonaws.com,ec2-54-82-161-157.compute-1.amazonaws.com,ec2-54-82-30-122.compute-1.amazonaws.com,ec2-54-83-98-182.compute-1.amazonaws.com,ec2-54-91-209-251.compute-1.amazonaws.com
[storage_cassandra]
seed_hosts = ec2-54-204-237-40.compute-1.amazonaws.com
api_port = 9160
datastax-agent
cat /var/lib/datastax-agent/conf/address.yaml
stomp_interface: ec2-54-204-237-40.compute-1.amazonaws.com
However in the agents I see this in the logs in /var/log/datastax-agent/agent.log.
INFO [thrift-init] 2014-11-03 14:33:41,069 Connected to Cassandra cluster: usergrid
INFO [thrift-init] 2014-11-03 14:33:41,071 in execute with client org.apache.cassandra.thrift.Cassandra$Client#6deebf54
INFO [thrift-init] 2014-11-03 14:33:41,072 Using partitioner: org.apache.cassandra.dht.Murmur3Partitioner
INFO [pdp-loader] 2014-11-03 14:33:41,072 Attempting to load stored metric values.
ERROR [pdp-loader] 2014-11-03 14:33:41,092 There was an error when attempting to load stored rollups.
me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:Keyspace 'OpsCenter' does not exist)
at me.prettyprint.cassandra.connection.client.HThriftClient.getCassandra(HThriftClient.java:112)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:251)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:132)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:290)
at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53)
at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49)
at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:101)
at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48)
at clj_hector.core$execute_query.doInvoke(core.clj:201)
at clojure.lang.RestFn.invoke(RestFn.java:423)
at clj_hector.core$get_column_range.doInvoke(core.clj:298)
at clojure.lang.RestFn.invoke(RestFn.java:587)
at opsagent.cassandra$scan_pdps$fn__1051.invoke(cassandra.clj:182)
at opsagent.cassandra$scan_pdps.invoke(cassandra.clj:181)
at opsagent.cassandra$process_pdp_row$fn__1060.invoke(cassandra.clj:199)
at opsagent.cassandra$process_pdp_row.invoke(cassandra.clj:197)
at opsagent.cassandra$process_pdp_row.invoke(cassandra.clj:195)
at opsagent.cassandra$load_pdps_with_retry$fn__1066.invoke(cassandra.clj:213)
at opsagent.cassandra$load_pdps_with_retry.invoke(cassandra.clj:210)
at opsagent.cassandra$setup_cassandra$f__388__auto____1094$fn__1095$f__388__auto____1102.invoke(cassandra.clj:357)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:745)
Caused by: InvalidRequestException(why:Keyspace 'OpsCenter' does not exist)
at org.apache.cassandra.thrift.Cassandra$set_keyspace_result.read(Cassandra.java:5452)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_set_keyspace(Cassandra.java:531)
at org.apache.cassandra.thrift.Cassandra$Client.set_keyspace(Cassandra.java:518)
at me.prettyprint.cassandra.connection.client.HThriftClient.getCassandra(HThriftClient.java:110)
... 22 more
Generally this would indicate that the client cannot connect to the storage Cassandra node. However, from the agent node, I can execute the following command.
cassandra-cli -h ec2-54-204-237-40.compute-1.amazonaws.com
Which I can then describe the keyspace, which works.
[default#unknown] describe OpsCenter;
WARNING: CQL3 tables are intentionally omitted from 'describe' output.
See https://issues.apache.org/jira/browse/CASSANDRA-4377 for details.
Keyspace: OpsCenter:
Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Durable Writes: true
Options: [us-east:1]
Column Families:
ColumnFamily: bestpractice_results
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.IntegerType)
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: events
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Column Metadata:
Column Name: success
Validation Class: org.apache.cassandra.db.marshal.BooleanType
Column Name: action
Validation Class: org.apache.cassandra.db.marshal.LongType
Column Name: level
Validation Class: org.apache.cassandra.db.marshal.LongType
Column Name: time
Validation Class: org.apache.cassandra.db.marshal.LongType
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: events_timeline
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.LongType
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: pdps
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: rollups300
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: rollups60
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: rollups7200
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: rollups86400
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.IntegerType
GC grace seconds: 0
Compaction min/max thresholds: 4/32
Read repair chance: 0.25
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
ColumnFamily: settings
"{"info": "OpsCenter management data.", "version": [5, 0, 1]}"
Key Validation Class: org.apache.cassandra.db.marshal.BytesType
Default column value validator: org.apache.cassandra.db.marshal.BytesType
Cells sorted by: org.apache.cassandra.db.marshal.BytesType
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 1.0
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: 0.01
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
[default#unknown]
This signals to me that the target Cassandra node is up and running, and has the keyspace + column families. It also indicates I don't have any sort of network firewall issues between the agent -> cassandra. I'm at a loss to explain why I'm receiving this error message. Am I still missing something in my configuration, or is this a bug?
Cassandra: 1.2.19
Opscenter: 5.0.1
DS Agent: 5.0.1
Any help would be greatly appreciated!
Thanks,
Todd
UPDATE
Here is the agent log. Note my IP's have changed since this is a new environment. It appears that it's trying to connect to 10.81.168.96:9160, which is NOT the ec2 IP that's set in my settings of ec2-174-129-181-123.compute-1.amazonaws.com. Not sure where that's coming from, but it's not what is set on the opscenter server.
agent.log
https://gist.github.com/tnine/f509c120465eb80ade92
Sorry for the gist, but I've exceeded the character limit.

If you look though the logs of the opscenter instance you should see this:
exceptions.Exception: Storing data in a separate cluster is only supported when managing DSE clusters.
Though in the OpsCenter docs it has this:
[storage_cassandra] seed_hosts
Used when using a different cluster for OpsCenter storage. A Cassandra
seed node is used to determine the ring topology and obtain gossip
information about the nodes in the cluster. This should be the same
comma-delimited list of seed nodes as the one configured for your
Cassandra or DataStax Enterprise cluster by the seeds property in the
cassandra.yaml configuration file.
So you would think it's possible but apparently it's not.

Related

Cassandra data compacted until deletion with no TTL set up?

I was testing node repair on my Cassandra cluster (v3.11.5) while simultaneously stress-testing it with cassandra-stress (v3.11.4). The disk space run out and the repair failed. As a result gossip got disabled on the nodes. Sstables that were being anticompacted got cleaned up (effectively = deleted), which dropped the disk usage by ~half (to ~1.5TB per node) within a minute. And this I understand.
What I do not undestand is what happened next. The sstables started getting continuously compacted into smaller ones and eventually deleted. As a result the disk usage continued to drop (this time slowly), after a day or so it went from ~1.5TB per node to ~50GB per node. The data that was residing in the cluster was randomly generated by the cassandra-stress, so I see no way to confirm whether it's intact, however I find highly unlikely that it is, as the disk usage dropped that much. Also I have no TTL set up (at least that I would know of, might be missing something), so I would not expect the data being deleted. But I believe this is the case.
Anyway, can anyone point me to what is happening?
Table schema:
> desc test-table1;
CREATE TABLE test-keyspace1.test-table1 (
event_uuid uuid,
create_date timestamp,
action text,
business_profile_id int,
client_uuid uuid,
label text,
params text,
unique_id int,
PRIMARY KEY (event_uuid, create_date)
) WITH CLUSTERING ORDER BY (create_date DESC)
AND bloom_filter_fp_chance = 0.1
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.DeflateCompressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Logs:
DEBUG [CompactionExecutor:7] 2019-11-23 20:17:19,828 CompactionTask.java:255 - Compacted (59ddec80-0e20-11ea-9612-67e94033cb24) 4 sstables to [/data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3259-big,] to level=0. 93.264GiB to 25.190GiB (~27% of original) in 5,970,059ms. Read Throughput = 15.997MiB/s, Write Throughput = 4.321MiB/s, Row Throughput = ~909/s. 1,256,595 total partitions merged to 339,390. Partition merge counts were {2:27340, 3:46285, 4:265765, }
(...)
DEBUG [CompactionExecutor:7] 2019-11-24 03:50:14,820 CompactionTask.java:255 - Compacted (e1bd7f50-0e4b-11ea-9612-67e94033cb24) 32 sstables to [/data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3301-big,] to level=0. 114.787GiB to 25.150GiB (~21% of original) in 14,448,734ms. Read Throughput = 8.135MiB/s, Write Throughput = 1.782MiB/s, Row Throughput = ~375/s. 1,546,722 total partitions merged to 338,859. Partition merge counts were {1:12732, 2:42441, 3:78598, 4:50454, 5:36032, 6:52989, 7:21216, 8:34681, 9:9716, }
DEBUG [CompactionExecutor:15] 2019-11-24 03:50:14,852 LeveledManifest.java:423 - L0 is too far behind, performing size-tiering there first
DEBUG [CompactionExecutor:15] 2019-11-24 03:50:14,852 CompactionTask.java:155 - Compacting (85e06040-0e6d-11ea-9612-67e94033cb24) [/data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3259-big-Data.db:level=0, /data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3299-big-Data.db:level=0, /data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3298-big-Data.db:level=0, /data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3300-big-Data.db:level=0, /data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3301-big-Data.db:level=0,]
(...)
DEBUG [NonPeriodicTasks:1] 2019-11-24 06:02:50,117 SSTable.java:105 - Deleting sstable: /data/cassandra/data/test-keyspace1/test-table1-f592e9600b9511eab562b36ee84fdea9/md-3259-big
edit:
I performed some additional testing. To my best knowledge there is no TTL set up, see query result straight after cassandra-stress started inserting data:
> SELECT event_uuid, create_date, ttl(action), ttl(business_profile_id), ttl(client_uuid), ttl(label), ttl(params), ttl(unique_id) FROM test-table1 LIMIT 1;
event_uuid | create_date | ttl(action) | ttl(business_profile_id) | ttl(client_uuid) | ttl(label) | ttl(params) | ttl(unique_id)
--------------------------------------+---------------------------------+-------------+--------------------------+------------------+------------+-------------+----------------
00000000-001b-adf7-0000-0000001badf7 | 2018-01-10 10:08:45.476000+0000 | null | null | null | null | null | null
So neither TTL nor tombstones deletion should be related to the issue. It's likely that there are no duplicates, as the data is highly randomized. No Replication Factor changes were made, as well.
What I found out is that the data volume decrease starts every time after cassandra-stress gets stopped. Sadly, still don't know the exact reason.
I guess, when you think of it from a Cassandra perspective there really are only a few options on why your data shrinks:
1) TTL expired past GC Grace
2) Deletes past GC grace
3) The same records exists in multiple sstables (i.e. "updates")
4) Change in RF to a lower number (essentially a "cleanup" - token reassignment)
In any of the above cases when compaction runs it will either remove or reconcile records which could shrink up the space consumption. Without having the sstables around any more, it's hard to determine which, if not a combination of the above, occurred.
-Jim

Adding a new node to cassandra db cluster(that currently has only one node) and over 500GB of Data

I have a cassandra instance running in production in a project where I have connected hundreds of thousands of sensors to dump data to the cassandra server using c# cassandradriver.
The Server has a 2 TB SSD.
I am not experiencing any performance issues as of Right now, but I plan to add more sensors,
I have only one KeySpace and only one Table in that KeySpace. The structure of the Table is as follows
CREATE TABLE xxxxkeyspace.sensorreadings (
signalid int,
monthyear int,
fromtime bigint,
totime bigint,
avg decimal,
insertdate bigint,
max decimal,
min decimal,
readings text,
PRIMARY KEY (( signalid, monthyear ), fromtime, totime)
) WITH bloom_filter_fp_chance = 0.01
AND comment = ''
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
AND caching = {
'keys' : 'ALL',
'rows_per_partition' : 'NONE'
}
AND compression = {
'chunk_length_in_kb' : 64,
'class' : 'LZ4Compressor',
'crc_check_chance' : 1.0,
'enabled' : true
}
AND compaction = {
'bucket_high' : 1.5,
'bucket_low' : 0.5,
'class' : 'SizeTieredCompactionStrategy',
'enabled' : true,
'max_threshold' : 32,
'min_sstable_size' : 50,
'min_threshold' : 4,
'tombstone_compaction_interval' : 86400,
'tombstone_threshold' : 0.2,
'unchecked_tombstone_compaction' : false
};
when i run the nodetool status it says that the keyspace has occupied more than 550GB of data on the SSD.
Will there be any issues during production If I add more nodes to the cluster during run-time. Note I can not tolerate a downtime greater than an hour.
You can add new node without any downtime, once a new node is added in existing ring, range-movement starts, once all data corresponding to acquired tokens are replicated on new node bootstrap process is completed. After bootstrap process finish for new node, all responsible write request are forwarded to this node as per its tokens.
Now only thing remains is cleanup the data from old nodes for tokens which have been moved to new node, this could be done anytime later using nodetool cleanup command.

Cassandra Stress tool does not honor Consistency Level when using profiles

I am trying to run a stress test using the cassandra-stress tool with profiles on a 6 node cluster with a replication factor=3.
./cassandra-stress user profile=/path/to/cassandra_stress.yaml duration=2h ops\(insert=20,select=10\) **cl=local_quorum** no-warmup -node nodeaddress -transport truststore=/path/to/tls/truststore.jks truststore-password=***** -rate threads=5 -log level=verbose file=/path/to/log -graph file=graph_.html title='Graph' & 2>1
The execution stops at some point with a ReadTimeout and the logs show the following:
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica were required but only 1 acknowledged the write)
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ALL (3 replica were required but only 2 acknowledged the read)
I am not sure why it is taking cl=local_quorum for writes but not for reads. Any insights would be helpful.
Profile
# Keyspace Name keyspace: d3 keyspace_definition: | CREATE KEYSPACE d3 WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3'} AND DURABLE_WRITES = true;
# Table name table: stress_offheap_long table_definition: | CREATE TABLE d3.stress_offheap_long (
dart_id timeuuid,
dart_version_id timeuuid,
account_id timeuuid,
amount double,
data text,
state text, PRIMARY KEY (dart_id, dart_version_id) ) WITH CLUSTERING ORDER BY (dart_version_id DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
columnspec:
- name: dart_id
size: gaussian(36..64)
population: uniform(1..10M)
- name: art_version_id
size: gaussian(36..64)
- name: account_id
size: gaussian(36..64)
population: uniform(1..10M)
- name: amount
size: fixed(1)
- name: data
size: gaussian(5000..20000)
- name: state
size: gaussian(1..2)
population: fixed(1)
### Batch Ratio Distribution Specifications ###
insert:
partitions: fixed(1)
select: fixed(1)/1000
batchtype: UNLOGGED # Unlogged batches
#
# A list of queries you wish to run against the schema
#
queries:
select:
cql: select * from stress_offheap_long where dart_id = ? and dart_version_id=? LIMIT 1
fields: samerow

Composite columns and "IN" relation in Cassandra

I have the following column family in Cassandra for storing time series data in a small number of very "wide" rows:
CREATE TABLE data_bucket (
day_of_year int,
minute_of_day int,
event_id int,
data ascii,
PRIMARY KEY (data_of_year, minute_of_day, event_id)
)
On the CQL shell, I am able to run a query such as this:
select * from data_bucket where day_of_year = 266 and minute_of_day = 244
and event_id in (4, 7, 11, 1990, 3433)
Essentially, I fix the value of the first component of the composite column name (minute_of_day) and want to select a non-contiguous set of columns based on the distinct values of the second component (event_id). Since the "IN" relation is interpreted as an equality relation, this works fine.
Now my question is, how would I accomplish the same type of composite column slicing programmatically and without CQL. So far I have tried the Python client pycassa and the Java client Astyanax, but without any success.
Any thoughts would be welcome.
EDIT:
I'm adding the describe output of the column family as seen through cassandra-cli. Since I am looking for a Thrift-based solution, maybe this will help.
ColumnFamily: data_bucket
Key Validation Class: org.apache.cassandra.db.marshal.Int32Type
Default column value validator: org.apache.cassandra.db.marshal.AsciiType
Cells sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.Int32Type)
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 0.1
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: default
Built indexes: []
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
There is no "IN"-type query in the Thrift API. You could perform a series of get queries for each composite column value (day_of_year, minute_of_day, event_id).
If your event_ids were sequential (and your question says they are not) you could perform a single get_slice query, passing in the range (e.g., day_of_year, minute_of_day, and range of event_ids). You could grab bunches of them in this way and filter the response programatically yourself (e.g., grab all data on the date with event ids between 4-3433). More data transfer, more processing on the client side so not a great option unless you really are looking for a range.
So, if you want to use "IN" with Cassandra you will need to switch to a CQL-based solution. If you are considering using CQL in python another option is cassandra-dbapi2. This worked for me:
import cql
# Replace settings as appropriate
host = 'localhost'
port = 9160
keyspace = 'keyspace_name'
# Connect
connection = cql.connect(host, port, keyspace, cql_version='3.0.1')
cursor = connection.cursor()
print "connected!"
# Execute CQL
cursor.execute("select * from data_bucket where day_of_year = 266 and minute_of_day = 244 and event_id in (4, 7, 11, 1990, 3433)")
for row in cursor:
print str(row) # Do something with your data
# Shut the connection
cursor.close()
connection.close()
(Tested with Cassandra 2.0.1.)

Cassandra query errors: invalid operations and exceptions

I am trying out some basic queries with Cassandra CounterColumnFamily from the cli. I basically have a column family 'page_view_counts' with Row Key as url. Each row has 3 counts (impressions, conversions and revenue) associated with it. I have indexed one of the counts as I want to retrieve rows with conversion > 10 etc.
This is output for describe page_view_counts:
ColumnFamily: page_view_counts
Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Default column value validator: org.apache.cassandra.db.marshal.CounterColumnType
Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 0.1
DC Local Read repair chance: 0.0
Replicate on write: true
Caching: KEYS_ONLY
Bloom Filter FP chance: default
Column Metadata:
Column Name: conversion
Validation Class: org.apache.cassandra.db.marshal.CounterColumnType
Index Name: page_view_counts_conversion_idx
Index Type: KEYS
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
some sample data in my column family:
RowKey: row27
=> (counter=conversion, value=27000)
=> (counter=impression, value=54000)
=> (counter=revenue, value=81000)
-------------------
RowKey: row29
=> (counter=conversion, value=29000)
=> (counter=impression, value=58000)
=> (counter=revenue, value=87000)
-------------------
RowKey: row81
=> (counter=conversion, value=81000)
=> (counter=impression, value=162000)
=> (counter=revenue, value=243000)
But I am getting these errors when I do the following queries :
[default#hector] get page_view_counts where conversion>23;
invalid operation for commutative columnfamily page_view_counts
[default#hector] get page_view_counts where conversion=long('1000');
invalid operation for commutative columnfamily page_view_counts
[default#hector] get page_view_counts where conversion=1000;
invalid operation for commutative columnfamily page_view_counts
[default#hector] get page_view_counts where KEY='row81';
java.lang.NumberFormatException: An hex string representing bytes must have an even length
[default#hector] get page_view_counts where KEY=bytes('row81');
java.lang.RuntimeException: java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: cannot parse 'row81' as hex bytes
[default#hector] get page_view_counts where KEY=utf8('row81');
invalid operation for commutative columnfamily page_view_counts
Can someone please help me with this?
Thanks.
Try this:
assume page_view_counts keys as UTF8Type;
get page_view_counts['row81'];

Resources