Cassandra . SSTableLoader Failed to decode value - cassandra

Doing bulkloading by generating sSTables and then using sstableloader to load the data.
When querying the data on the loaded table, Primary Key data is failed to be decoded. I am seeing below errors.
Error on select query ( Other than primary key columns are getting rendered correctly):
Failed to decode value '\xe4\xedQ\x9aX\x8dF\xab\x86\xf1\r\xe4]\xc3\x14C' (for column 'first_name') as text: 'utf8' codec can't decode byte 0xe4 in position 0: invalid continuation byte
Failed to decode value '$q\x9d\x94P\xb9Ni\x9d);\xd0\x1d33~' (for column 'first_name') as text: 'utf8' codec can't decode byte 0x9d in position 2: invalid start byte
Code to generate SSTables:
SSTableSimpleUnsortedWriter eventWriter = new SSTableSimpleUnsortedWriter(directory, partitioner, keySpace, tableName, UTF8Type.instance,null, 64);
eventWriter.addColumn(compType.builder().add(ByteBufferUtil.bytes("first_name")).build(), ByteBufferUtil.bytes(entry.firstName), timestamp);
eventWriter.addColumn(compType.builder().add(ByteBufferUtil.bytes("last_name")).build(), ByteBufferUtil.bytes(entry.lastName), timestamp);
eventWriter.addColumn(compType.builder().add(ByteBufferUtil.bytes("country")).build(), ByteBufferUtil.bytes(entry.countryText), timestamp);
Table definition:
CREATE TABLE test4 (
first_name varchar PRIMARY KEY,
country text,
last_name text,
) WITH bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
What should be done to have PrimaryKey decoded properly?

This problem is resolved.
There is no need of below statement. That means, primary key column does not need to be added separately.
eventWriter.addColumn(compType.builder().add(ByteBufferUtil.bytes("first_name")).build(), ByteBufferUtil.bytes(entry.firstName), timestamp);
It will get added when new row is created.
eventWriter.newRow(ByteBufferUtil.bytes(entry.firstName));

Related

order by caluse not working in cassandra CQLSH

I want query similar to this
SELECT uuid,data,name,time,tracker,type,userid FROM standardevents080406 ORDER BY userid DESC;
but it is not working where as simple where clause queries are working.
SELECT uuid,data,name,time,tracker,type,userid FROM standardevents080406 where userid='64419';
am i doing something wrong..
Description of column-family is as below
CREATE TABLE standardevents080406 ( uuid uuid PRIMARY KEY, data text, name text, time text, tracker text, type text, userid text ) WITH
bloom_filter_fp_chance=0.010000 AND caching='KEYS_ONLY' AND comment=''
AND dclocal_read_repair_chance=0.000000 AND gc_grace_seconds=864000 AND read_repair_chance=0.100000 AND replicate_on_write='true'
AND populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX time_ind ON standardevents080406 (time);
CREATE INDEX userid_ind ON standardevents080406 (userid);
You can't perform an ORDER BY on a column which is not part of the clustering key.
Your table definition does not include any clustering key, but a simple primary key. Sorting, in your situation, can be performed only client-side.
HTH,
Carlo

where condition for date & time is not working

i want to execute this query,
SELECT uuid,data,name,time,tracker,type,userid FROM standardevents0805
where time > '2014/08/04 00:00:00';
(i tried by putting in double quotes also "2014/08/04 00:00:00")
but it is not working properly.
below is the description of my column-family,
CREATE TABLE standardevents0805 (
uuid timeuuid PRIMARY KEY,
data text,
name text,
time timestamp,
tracker text,
type text,
userid text
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX time_in ON standardevents0805 (time);
CREATE INDEX userid_in ON standardevents0805 (userid);
You've put a lookup index on time, but such indexes (at the time of writing) allow only equals operations. Range queries can be performed only on clustering keys.

copy one table to another in cassandra

i want to copy data from standardevents to standardeventstemp..
below steps i am doing
COPY events.standardevents (uuid, data, name, time, tracker, type, userid) TO 'temp.csv';
truncate standardevents;
COPY event.standardeventstemp (uuid, data, name, time, tracker, type, userid) FROM 'temp.csv';
but i am getting below error after 3rd step
Bad Request: Invalid STRING constant (3a1ccec0-ef77-11e3-9e56-22000ae3163a) for
name of type uuid
aborting import at column #0, previously inserted values are still present.
can anybody explain the cause of this error and how can i resolve this
datatype of uuid is uuid the rest of the datatypes are varchar
CREATE TABLE standardevents (
uuid uuid PRIMARY KEY,
data text,
name text,
time text,
tracker text,
type text,
userid text
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
It turned out that this is a known bug in Cassandra 1.2.2.
The same commands work fine in 2.0.x, so an upgrade will fix the problem.

Cassandra RPC TimeOut on Secondary Index

We are getting rpc_timeout when running query on a secondary index in cassandra.
The Secondary Index column holds only 2 values, either 'true' or 'false'.
There query has pagination build in, to limit the number of records returned
Here is the Query
Select id_firm, id_uuid from efstatus where isFinal='true' and TOKEN(id_firm) >= TOKEN(99625490-29b4-4474-a731-9b7664f642f8) LIMIT 25;
This is the Table Structure
CREATE TABLE efstatus (
id_firm uuid,
id_uuid uuid,
isfinal text,
json_data text,
type text,
year text,
PRIMARY KEY (id_firm, id_uuid)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX efstatus_isfinal ON efstatus (isfinal);
CREATE INDEX efstatus_year ON efstatus (year);
Running Trace ON gives no relevant information. This is what I see
Request did not complete within rpc_timeout.
unsupported operand type(s) for /: 'NoneType' and 'float'
We are using DataStax version 3.1.4, which has I believe Cassandra v 1.2.10.1
Any help would be appreciated.

insert into column family with case sensitive column name

I am using the following Cassandra/CQL versions:
[cqlsh 4.0.1 | Cassandra 2.0.1 | CQL spec 3.1.1 | Thrift protocol 19.37.0]
I am trying to insert data into a pre-existing CF with case sensitive column names. I hit "unknown identifier" errors when trying to insert data.
Following is how the column family is described:
CREATE TABLE "Sample_List_CS" (
key text,
column1 text,
"fName" text,
"ipSubnet" text,
"ipSubnetMask" text,
value text,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=0 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='false' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='NONE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX ipSubnet ON "Sample_List_CS" ("ipSubnet");
The insert statements result in errors:
cqlsh:Sample_KS> INSERT INTO "Sample_List_CS" (key,column1,"fName") VALUES ('123','1','myValue');
Bad Request: Unknown identifier fName
cqlsh:Sample_KS> INSERT INTO "Sample_List_CS" (key,column1,"ipSubnet") VALUES ('123','1','255');
Bad Request: Unknown identifier ipSubnet
Any idea what I am doing wrong?
As I understand it when using WITH COMPACT STORAGE a table may only have one column other than the primary key.
As quoted in the manual:
Using the compact storage directive prevents you from adding more than
one column that is not part of the PRIMARY KEY.
For you that means you can only have one of these 4 columns in your table:
"fName"
"ipSubnet"
"ipSubnetMask"
value
(Alternatively, you could add 3 of them to the primary key definition.)
Thus it makes sense that the other three columns lead to an Unknown identifier error.

Resources