Cassandra: Empty results in where clause with primary key - cassandra

I have following model in Cassandra:
CREATE TABLE segment (
organizationid varchar,
segmentid int,
lengthmm int,
optimal_speed int,
speed_limit int,
wkt varchar,
road_class int,
PRIMARY KEY (organizationid, segmentid)
);
Here the description of:
CREATE TABLE tkm_fcd_cassandra.segment (
organizationid text,
segmentid int,
lengthmm int,
optimal_speed int,
road_class int,
speed_limit int,
wkt text,
PRIMARY KEY (organizationid, segmentid)
) WITH CLUSTERING ORDER BY (segmentid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
When I run the following query:
select * from segment;
It gives me following result:
organizationid | segmentid | lengthmm | optimal_speed | road_class | speed_limit | wkt
----------------------------+-----------+----------+---------------+------------+-------------+---------------------------------------------------------
'57ecdd14766299a02213c463' | 122406 | 49239 | 20 | 5 | 90 | 'LINESTRING (32.813454 39.918419,32.813469 39.917976)'
'57ecdd14766299a02213c463' | 122407 | 49239 | 20 | 5 | 90 | 'LINESTRING (32.813469 39.917976,32.813501 39.917533)'
'57ecdd14766299a02213c463' | 122408 | 49239 | 20 | 5 | 90 | 'LINESTRING (32.813501 39.917533,32.813532 39.917091)'
'57ecdd14766299a02213c463' | 122409 | 49239 | 20 | 5 | 90 | 'LINESTRING (32.813532 39.917091,32.813542 39.91665)'
'57ecdd14766299a02213c463' | 122410 | 49239 | 20 | 5 | 90 | 'LINESTRING (32.813542 39.91665,32.813112 39.916359)'
But when I run the following query:
select * from segment where organizationid = '57ecdd14766299a02213c463';
I have the following result:
organizationid | segmentid | lengthmm | optimal_speed | road_class | speed_limit | wkt
----------------+-----------+----------+---------------+------------+-------------+-----
(0 rows)
Here is the my the nodetool status:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.101 5.16 MiB 256 100.0% 249c522d-ead0-4370-ac1b-4ad446d4948b rack1
Other info:
[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]
I can't understand why Cassandra gives me empty result when I run where clause?

I looks like you have inserted your ID's with quotes around them. Try the following:
select * from segment where organizationid = '\'57ecdd14766299a02213c463\'';
Normally the output doesn't show ' around text values.

Related

Cassandra Upsert not working on each column

I am trying to update a record in a test keyspace and table. When I upsert a record, one column value change is accepted, while the other doesn't take. (Note: I'm also not able to delete the record, despite no error message)
Observe how middle_initial does not update, while title does... What gives?
//Before
cqlsh:my_keyspace> SELECT * FROM user;
last_name | first_name | middle_initial | title
-----------+------------+----------------+-------
Rodriguez | Mary | Q | null
Rodriquez | Mary | Q | O
Nguyen | Bill | null | Mr.
Nguyen | Wanda | null | Mrs.
//Command
cqlsh:my_keyspace> UPDATE user SET middle_initial = 'F', title = 'U' WHERE last_name = 'Rodriquez' AND first_name = 'Mary';
//After
cqlsh:my_keyspace> SELECT * FROM user;
last_name | first_name | middle_initial | title
-----------+------------+----------------+-------
Rodriguez | Mary | Q | null
Rodriquez | Mary | Q | U
Nguyen | Bill | null | Mr.
Nguyen | Wanda | null | Mrs.
//Additional Info
CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE my_keyspace.user (
last_name text,
first_name text,
middle_initial text,
title text,
PRIMARY KEY (last_name, first_name)
) WITH CLUSTERING ORDER BY (first_name ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
Manish was correct. My timestamp for middle_initial was set to a future date of 1623699999999999
To Delete the record (which was actually my goal) I did:
cqlsh:my_keyspace> DELETE FROM user USING timestamp 1623699999999999 WHERE first_name = 'Mary' AND last_name = 'Rodriquez';
This happens generally when your column writetime is in future. You can check writetime of your column
SELECT WRITETIME (middle_initial) FROM my_keyspace.user WHERE last_name = 'Rodriquez' AND first_name = 'Mary';

cassandra:sorting problem,ordering is wrong

I have a question about Cassandra. At present, "entities_by_time" is ok on the 18-bit uuid through column1 sorting, but there is something wrong with uuid ascending to the 19-bit sorting. Please help me.
cqlsh:minds> select * from entities_by_time where key='activity:user:990192934408163330' order by column1 desc limit 10;
key | column1 | value
----------------------------------+--------------------+--------------------
activity:user:990192934408163330 | 999979571363188746 | 999979571363188746
activity:user:990192934408163330 | 999979567064027139 | 999979567064027139
activity:user:990192934408163330 | 999979562764865555 | 999979562764865555
activity:user:990192934408163330 | 999979558465703953 | 999979558465703953
activity:user:990192934408163330 | 999979554170736649 | 999979554170736649
activity:user:990192934408163330 | 999979549871575047 | 999979549871575047
activity:user:990192934408163330 | 999979545576607752 | 999979545576607752
activity:user:990192934408163330 | 999979541290029073 | 999979541290029073
activity:user:990192934408163330 | 999979536990867461 | 999979536990867461
activity:user:990192934408163330 | 999979532700094475 | 999979532700094475
cqlsh:minds> select * from entities_by_time where key='activity:user:990192934408163330' order by column1 asc limit 10;
key | column1 | value
----------------------------------+---------------------+---------------------
activity:user:990192934408163330 | 1000054880351555598 | 1000054880351555598
activity:user:990192934408163330 | 1000054884671688706 | 1000054884671688706
activity:user:990192934408163330 | 1000054888966656017 | 1000054888966656017
activity:user:990192934408163330 | 1000054893257429005 | 1000054893257429005
activity:user:990192934408163330 | 1000054897552396308 | 1000054897552396308
activity:user:990192934408163330 | 1000054901843169290 | 1000054901843169290
activity:user:990192934408163330 | 1000054906138136577 | 1000054906138136577
activity:user:990192934408163330 | 1000054910433103883 | 1000054910433103883
activity:user:990192934408163330 | 1000054914723876869 | 1000054914723876869
activity:user:990192934408163330 | 1000054919010455568 | 1000054919010455568
CREATE TABLE minds.entities_by_time (
key text,
column1 text,
value text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'enabled': 'false'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.1
AND speculative_retry = '99PERCENTILE';
Through inquiry, it is found that in Cassandra, 1007227353832624141 is less than 963426376394739730. Why?
Good call Chris! The table definition tells it all! I recreated your table and ran queries sorting in both directions:
flynn#cqlsh:stackoverflow> SELECT * FROM entities_by_time
WHERE key='activity:user:990192934408163330' ORDER BY column1 DESC;
key | column1 | value
----------------------------------+---------------------+---------------------
activity:user:990192934408163330 | 999979571363188746 | 999979571363188746
activity:user:990192934408163330 | 999979567064027139 | 999979567064027139
activity:user:990192934408163330 | 963426376394739730 | 963426376394739730
activity:user:990192934408163330 | 1007227353832624141 | 1007227353832624141
activity:user:990192934408163330 | 1000054884671688706 | 1000054884671688706
activity:user:990192934408163330 | 1000054880351555598 | 1000054880351555598
(6 rows)
flynn#cqlsh:stackoverflow> SELECT * FROM entities_by_time
WHERE key='activity:user:990192934408163330' ORDER BY column1 ASC;
key | column1 | value
----------------------------------+---------------------+---------------------
activity:user:990192934408163330 | 1000054880351555598 | 1000054880351555598
activity:user:990192934408163330 | 1000054884671688706 | 1000054884671688706
activity:user:990192934408163330 | 1007227353832624141 | 1007227353832624141
activity:user:990192934408163330 | 963426376394739730 | 963426376394739730
activity:user:990192934408163330 | 999979567064027139 | 999979567064027139
activity:user:990192934408163330 | 999979571363188746 | 999979571363188746
(6 rows)
So to your question...
in Cassandra, 1007227353832624141 is less than 963426376394739730. Why?
Simply put, because 9 > 1, that's why.
Your table definition clusters on column1, which is a TEXT/UTF8 string and not a numeric. Essentially, Cassandra is sorting strings the only way it knows how - in ASCII-betical order, which is not alpha-numeric order.
Store your numerics as numerics, and sorting will behave in ways that are more predictable.

Cassandra Predicates on non-primary-key columns (eventtype) are not yet supported for non secondary index queries

i developed a table as shown as below with primary key as id which is a uuid type
id | date | eventtype | log | password | priority | sessionid | sourceip | user | useragent
--------------------------------------+--------------------------+--------------+----------+----------+----------+-----------+--------------+------------+------------
6b47e9b0-d11a-11e8-883c-5153f134200b | null | LoginSuccess | demolog | 1234 | 10 | Demo_1 | 123.12.11.11 | Aqib | demoagent
819a58d0-cd3f-11e8-883c-5153f134200b | null | LoginSuccess | demolog | 1234 | 10 | Demo_1 | 123.12.11.11 | Aqib | demoagent
f4fae220-d133-11e8-883c-5153f134200b | 2018-10-01 04:01:00+0000 | LoginSuccess | demolog | 1234 | 10 | Demo_1 | 123.12.11.11 | Aqib | demoagent
But when i try to query some thing like below
select * from loginevents where eventtype='LoginSuccess';
i get an error like below
InvalidRequest: Error from server: code=2200 [Invalid query] message="Predicates on non-primary-key columns (eventtype) are not yet supported for non secondary index queries"
This is my table
cqlsh:events> describe loginevents;
CREATE TABLE events.loginevents (
id uuid PRIMARY KEY,
date timestamp,
eventtype text,
log text,
password text,
priority int,
sessionid text,
sourceip text,
user text,
useragent text
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
How can i solve this
An immediate answer to your question would be to create a secondary index on the column eventtype like this:
CREATE INDEX my_index ON events.loginevents (eventtype);
Then you can filter on this particular column :
SELECT * FROM loginevents WHERE eventtype='LoginSuccess';
However this solution can badly impact the performances of your cluster.
If you come from the SQL world and are new to Cassandra, go read an introduction on cassandra modeling, like this one.
The first thing is to identify the query, then create the table according to.
In Cassandra, data are distributed in the cluster according to the partition key, so reading records that belong to the same partition is very fast.
In your case, maybe a good start would be to group your records based on the eventtype :
CREATE TABLE events.loginevents (
id uuid,
date timestamp,
eventtype text,
log text,
password text,
priority int,
sessionid text,
sourceip text,
user text,
useragent text,
PRIMARY KEY (eventtype, id)
)
Then you can do select like this :
SELECT * FROM loginevents WHERE eventtype='LoginSuccess';
or even :
SELECT * FROM loginevents WHERE eventtype in ('LoginSuccess', 'LoginFailure');
(It's not a perfect model, it definitely needs to be improved before production.)
In Cassandra, you can only query on the PRIMARY key and some of the clustering columns and it's not possible to query on all of the fields.
if you want to query on "eventtype" you should use secondary indexes in the definition of table or index table by Apache Solr and query using Solr.Some things like below:
CREATE INDEX loginevents_type
ON events.loginevents (eventtype);

Using partition key along with secondary index

Following are the two queries that I need to perform.
select * from where dept = 100 and emp_id = 1;
select * from where dept = 100 and name = 'One';
Which of the below options is better ?
Option 1: Use secondary index along with a partition key. I assume this way query will be executed faster as there is no need to go different nodes and index needs to be searched only locally.
cqlsh:d2> desc table emp_by_dept;
CREATE TABLE d2.emp_by_dept (
dept int,
emp_id int,
name text,
PRIMARY KEY (dept, emp_id)
) WITH CLUSTERING ORDER BY (emp_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX emp_by_dept_name_idx ON d2.emp_by_dept (name);
cqlsh:d2> select * from emp_by_dept where dept = 100;
dept | emp_id | name
------+--------+------
100 | 1 | One
100 | 2 | Two
100 | 10 | Ten
(3 rows)
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query | 2015-06-15 17:36:55.860000 | 10.0.2.16 | 0
Parsing select * from emp_by_dept where dept = 100; [SharedPool-Worker-1] | 2015-06-15 17:36:55.861000 | 10.0.2.16 | 202
Preparing statement [SharedPool-Worker-1] | 2015-06-15 17:36:55.861000 | 10.0.2.16 | 418
Executing single-partition query on emp_by_dept [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10525
Acquiring sstable references [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10564
Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10635
Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10748
Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 | 10757
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 | 18141
Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 | 18166
Read 3 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 | 18335
Request complete | 2015-06-15 17:36:55.928174 | 10.0.2.16 | 68174
cqlsh:d2> select * from emp_by_dept where dept = 100 and name = 'One';
dept | emp_id | name
------+--------+------
100 | 1 | One
(1 rows)
Tracing session: c56e70a0-1357-11e5-ab8b-fb5400f1b4af
activity | timestamp | source | source_elapsed
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
Execute CQL3 query | 2015-06-15 17:42:20.010000 | 10.0.2.16 | 0
Parsing select * from emp_by_dept where dept = 100 and name = 'One'; [SharedPool-Worker-1] | 2015-06-15 17:42:20.010000 | 10.0.2.16 | 12
Preparing statement [SharedPool-Worker-1] | 2015-06-15 17:42:20.010000 | 10.0.2.16 | 19
Computing ranges to query [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 881
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 1144
Submitting range requests on 1 ranges with a concurrency of 1 (0.003515625 rows per range expected) [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 1238
Executing indexed scan for [100, 100] [SharedPool-Worker-2] | 2015-06-15 17:42:20.011000 | 10.0.2.16 | 1703
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 1827
Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 1929
Executing single-partition query on emp_by_dept.emp_by_dept_name_idx [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2058
Acquiring sstable references [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2087
Merging memtable tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2173
Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 | 2352
Seeking to partition indexed section in data file [SharedPool-Worker-2] | 2015-06-15 17:42:20.012001 | 10.0.2.16 | 2377
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.014000 | 10.0.2.16 | 4300
Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-06-15 17:42:20.014000 | 10.0.2.16 | 4322
Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-06-15 17:42:20.031000 | 10.0.2.16 | 21798
Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 21989
Executing single-partition query on emp_by_dept [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22374
Acquiring sstable references [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22385
Merging memtable tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22433
Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22514
Seeking to partition indexed section in data file [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 | 22523
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 22963
Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 22972
Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 22991
Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 | 23096
Request complete | 2015-06-15 17:42:20.033227 | 10.0.2.16 | 23227
Option 2: Create 2 tables as below.
CREATE TABLE d2.emp_by_dept (
dept int,
emp_id int,
name text,
PRIMARY KEY (dept, emp_id)
) WITH CLUSTERING ORDER BY (emp_id ASC);
select * from emp_by_dept where dept = 100 and emp_id = 1;
CREATE TABLE d2.emp_by_dept_name (
dept int,
emp_id int,
name text,
PRIMARY KEY (dept, name)
) WITH CLUSTERING ORDER BY (name ASC);
select * from emp_by_dept_name where dept = 100 and name = 'One';
Normally it is a good approach to use secondary indexes together with the partition key, because - as you say - the secondary key lookup can be performed on a single machine.
The other concept that needs to be taken into account is the cardinality of the secondary index. In your case emp_id is probably unique, and name is almost unique, so the index will most probably return a single row, and therefore it is not too efficient. For a good explanation I recommend this article: http://www.wentnet.com/blog/?p=77.
As consequence, if query time is critical and you can update both tables in the same time, I recommend using your option 2.
It would also be interesting to measure the two options with some generated data.
Option one won't be possible, as Cassandra does not support queries using both primary keys and secondary keys. Your best bet, would be to go with option two.
Although the similarities are many, don't think of it as a 'relational table'. Instead think of it as a nested, sorted map data structure.
Cassandra believes in de-normalization and duplication of data for better read performance. Therefore, option 2 is completely normal and within the best practices of Cassandra.
Few links which you might find useful - http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/
How do secondary indexes work in Cassandra?
Hope this helps.
Since maintaining two tables is harder than maintaining a single, the first option would be more preferable.
Query1 = select * from <> where dept = 100 and emp_id = 1;
Query2 = select * from <> where dept = 100 and name = 'One';
Option 1:
Write : time to write to emp_by_dept + time to update index
Read : Query1 will be a direct read from emp_by_dept, Query2 will be a read from emp_by_dept + get the location from index table + read the value from emp_by_dept
Option 2:
Write : time to write to emp_by_dept + time to write to emp_by_dept_name
Read: Query1 will be a direct read from emp_by_dept, Query2 will be a direct read from emp_by_dept_name (the required data is already sorted and kept )
So I assume write time should be almost the same in both cases (I have not tested this)
If your read response time is more important, then go for Option2.
If you are worried about maintaining 2 tables, go for option 1.
Thanks everyone for your inputs.

Cassandra CQL: different SELECT results

I am using latest Cassandra 2.1.0 and have the different results for the following queries.
select * from zzz.contact where user_id = 53528c87-0691-46f7-81a1-77173fd8390f
and contact_id = 5ea82764-ce42-45f3-8724-e121c8b7d32e;
returns me one decired record but
select * from zzz.contact where user_id = 53528c87-0691-46f7-81a1-77173fd8390f;
returns 6 other rows except the row which is returned by first SELECT.
Structure of the keyspace/table is:
CREATE KEYSPACE zzz
WITH replication = { 'class' : 'NetworkTopologyStrategy', 'DC1' : '2' };
CREATE TABLE IF NOT EXISTS contact (
user_id uuid,
contact_id uuid,
approved boolean,
ignored boolean,
adding_initiator boolean,
PRIMARY KEY ( user_id, contact_id )
);
Both instances are in keyspace and UN
d:\Tools\apache-cassandra-2.1.0\bin>nodetool status
Starting NodeTool
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: DC1
================
Status=Up/Down|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.0.146 135.83 KB 256 51.7% 6d035991-3471-498b-8051-55f99a2fdfed RAC1
UN 192.168.0.216 3.26 MB 256 48.3% d82f3a69-c6f8-4237-b50e-d2f370ac644a RAC1
I have two Cassandra instances.
Tried command "nodetool repair" - didn't help.
Tried to add ALLOW FILTERING in the end of the queries - didn't help.
Any help is highly appreciated.
UPD:
here is result of queries:
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
d:\Tools\apache-cassandra-2.1.0\bin>cqlsh 192.168.0.216
Connected to ClusterZzz at 192.168.0.216:9042.
[cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh> select * from zzz.contact where user_id = 53528c87-0691-46f7-81a1-77173fd8390f and contact_id = 5ea82764-ce42-45f3-8724-e121c8b7d32e;
user_id | contact_id | adding_initiator | approved | ignored
--------------------------------------+--------------------------------------+------------------+----------+---------
53528c87-0691-46f7-81a1-77173fd8390f | 5ea82764-ce42-45f3-8724-e121c8b7d32e | False | True | False
(1 rows)
cqlsh> select * from zzz.contact where user_id = 53528c87-0691-46f7-81a1-77173fd8390f;
user_id | contact_id | adding_initiator | approved | ignored
--------------------------------------+--------------------------------------+------------------+----------+---------
53528c87-0691-46f7-81a1-77173fd8390f | 6fc7f6e4-ac48-484e-9660-128476ca5bf9 | False | False | False
53528c87-0691-46f7-81a1-77173fd8390f | 7a240937-8b28-4424-9772-8c4c8e381432 | False | False | False
53528c87-0691-46f7-81a1-77173fd8390f | 8e6cb13a-96e7-45af-b9d8-40ea459df996 | False | False | False
53528c87-0691-46f7-81a1-77173fd8390f | 938af09a-0fe3-4cdd-b02e-cbdfb078335c | False | True | False
53528c87-0691-46f7-81a1-77173fd8390f | d84d9e7a-e81d-42a2-87b3-f163f7a9a646 | False | True | False
53528c87-0691-46f7-81a1-77173fd8390f | fd2ec705-1661-4cf8-98ef-46f627a9a382 | False | False | False
(6 rows)
cqlsh>
UPD #2:
Worth to mention that my nodes are on Windows7 machines. On production, we use Linux, so there were no problems like I have it with Windows nodes.

Resources