Error in Cassandra while exporting data - cassandra

I have a column family (eventData) in cassandra keyspace. Which has the following definition :
CREATE TABLE eventData (
col1 text,
col2 text,
col3 text,
col4 int,
col5 int,
col6 int,
col7 timestamp,
PRIMARY KEY (col7, col1, col2)
);
I have scheduled on cron job which export the data from cassandra in a text file and it is defined like this :
dd=$(date --date yesterday "+%Y-%m-%d")
echo "select * FROM keyspacename.eventData where col7 = '$dd' ;" | /home/cassuser/Desktop/cassandra212/bin/cqlsh > /home/cassuser/Desktop/cassandra212/Dump/output-$dd.txt
So, above statement give the following error every day when cron job runs. But i run the same query from cqlsh manually it export the data without any error. Can anyone tell me the reason for this?
Error :
<stdin>:2:errors={}, last_host=localhost
I read many post on SO, most them say this error might be because of read_timeout. But my question is why can't i get the same error while running the same query from cqlsh manually.

Related

How to convert teradata recursive query to spark sql

I am trying to convert below Teradata SQL to Spark SQL but unable to. Can someone suggest a solution?
create multiset table test1 as
(
WITH RECURSIVE test1 (col1, col2, col3) AS
(
sel col11, col2, col3
from
test2 root
where
col3 = 1
UNION ALL
SELECT
indirect.col11,
indirect.col2 || ',' || direct.col2 as col2,
indirect.col3
FROM
test1 direct,
test2 indirect
WHERE
direct.col1 = indirect.col11
and direct.col3 + 1 = indirect.col3
)
sel col1 as col11,
col2
from
test1 QUALIFY ROW_NUMBER() OVER(PARTITION BY col1
ORDER BY
col3 DESC) = 1
)
with data primary index (col11) ;
Thanks.
I tried the approach myself as set out here http://sqlandhadoop.com/how-to-implement-recursive-queries-in-spark/ some time ago.
I cannot find my simplified version, but this approach is the only way to do it currently. I assume that in future Spark SQL support will be added for this - although???
On a further note: I have seen myself the requirement to develop KPIs along this while loop approach. I would suggest that the recursive SQL as well as while loop for KPI-generation not be considered a use case for Spark, and, hence to be done in a fully ANSI-compliant database and sqooping of the result into Hadoop - if required.

c* schema - denormalization vs materialized view

Schema is:
col0 int,
col1 text,
col2 text,
stamp timestamp,
somemap map<text, int>
I want to query for somemap
- using col0, col1 and a range of stamp
- using col0, col1, col2 and a range of stamp
I need every value of somemap for distinct col0, col1, col2, stamp to be present for either query (IE for the first query I want all the values of col2 to be there)
I've tried various combinations of columns for primary key but I can't find one that permits both types of queries.
I can denormalize this and create both types of tables:
- primary key ((col0, col1), stamp, col2)
- primary key ((col0, col1), col2, stamp)
What Im hoping for is a way to use a materialized view to accomplish this.

Error when changing column type in cassandra

I am having trouble changing the type of the column in a table in cassandra.
Here is the link to the documentation which says that it is possible to change the type if the column is not part of the PRIMARY KEY and does not have an INDEX on it. I just can't get it to work.
cqlsh:test> show VERSION
[cqlsh 3.1.8 | Cassandra 1.2.13 | CQL spec 3.0.0 | Thrift protocol 19.36.2]
CREATE TABLE mytable (
id1 text,
id2 text,
col1 int,
col2 bigint,
col3 text,
PRIMARY KEY (id1, id2)
) WITH CLUSTERING ORDER BY (id2 DESC);
UPDATE mytable SET col1 = 123, col2 = 1234, col3 = '12345' WHERE id1 = 'id1' AND id2 = 'id2';
SELECT * FROM mytable ;
id1 | id2 | col1 | col2 | col3
-----+-----+------+------+-------
id1 | id2 | 123 | 1234 | 12345
When I attempt to change column types, I get errors.
ALTER TABLE mytable ALTER col1 TYPE text;
Bad Request: Cannot change col1 from type int to type text: types are incompatible.
ALTER TABLE mytable ALTER col1 TYPE text;
Bad Request: Cannot change col1 from type int to type text: types are incompatible.
ALTER TABLE mytable ALTER col2 TYPE text;
Bad Request: Cannot change col2 from type bigint to type text: types are incompatible.
ALTER TABLE mytable ALTER col3 TYPE bigint;
Bad Request: Cannot change col3 from type text to type bigint: types are incompatible.
ALTER TABLE mytable ALTER col3 TYPE int;
Bad Request: Cannot change col3 from type text to type int: types are incompatible.
What am I missing or doing wrong? Any help is appreciated.
From the page you linked:
The bytes stored in values for that column remain unchanged, and if
existing data cannot be [deserialized] according to the new type, your CQL
driver or interface might report errors
Thus, you can only change between compatible types.
This is primarily intended for people upgrading from a Thrift schema where it was common to leave all columns defined as a blob.

ERROR CASSANDRA: 'ascii' codec can't decode bye 0xe1 in position 27: ordinal not in range(128) cqlsh

I'm new in Cassandra and I have a trouble inserting some rows in a database getting the error of the title.
I use cassandra 1.0.8 and cqlsh for doing changes in my database.
Next, I explain the given steps before I get the error:
CREATE A COLUMN FAMILY
CREATE TABLE test (
col1 int PRIMARY KEY,
col2 bigint,
col3 boolean,
col4 timestamp
);
INSERT SEVERAL ROWS WITHOUT SPECIFICYING ALL OF COLUMNS OF THE TABLE
insert into test (col1, col2, col3) values (1, 100, true);
insert into test (col1, col2, col3) values (2, 200, false);
SELECT FOR CHECKING THAT ROWS HAVE BEEN INSERTED CORRECTLY
select * from test;
The result is the following:
INSERT A ROW SPECIFICYING A VALUE FOR THE col4 (NOT SPECIFIED BEFORE)
insert into test (col1, col2, col3, col4) values (3, 100, true, '2011-02-03');
SELECT FOR CHECKING THAT ROW HAS BEEN INSERTED CORRECTLY
select * from test;
In this SELECT is the error. The result is the following:
SELECT EACH COLUMN OF THE TABLE SEPARATELY
select col1 from test;
select col2 from test;
select col3 from test;
select col4 from test;
it works fine and shows the right values:
Then, my question is: what's the problem in the first SELECT? what's wrong?
Thanks in advance!!
NOTE:
If I define col4 as Integer rather than a timestamp it works. However, I've tried to insert col4 as the normalized format yyyy-mm-dd HH:mm (I've tried with '2011-02-03 01:05' and '2011-02-03 01:05:10') but it doesn't work.
Cassandra 1.0.8 shipped with CQL2 and that's where your problem is coming from. I managed to recreate this in 1.0.8 but it works fine with 1.2.x so my advice is upgrade if you can.
In C* 1.2.10
cqlsh> update db.user set date='2011-02-03 01:05' where user='JCTYpjJlM';
cqlsh> SELECT * from db.user ;
user | date | password
-----------+--------------------------+----------
xvkYQKerQ | null | 765
JCTYpjJlM | 2011-02-03 01:05:00+0200 | 391
#mol
Weird, try to insert col4 as Integer (convert to milliseconds first) or use the normalized format : yyyy-mm-dd HH:mm
Accodring to the doc here, you can omit the time and just input the date but it seems that breaks something in your case

Cassandra: Query with where clause containing greather- or lesser-than (< and >)

I'm using Cassandra 1.1.2 I'm trying to convert a RDBMS application to Cassandra. In my RDBMS application I have following table called table1:
| Col1 | Col2 | Col3 | Col4 |
Col1: String (primary key)
Col2: String (primary key)
Col3: Bigint (index)
Col4: Bigint
This table counts over 200 million records. Mostly used query is something like:
Select * from table where col3 < 100 and col3 > 50;
In Cassandra I used following statement to create the table:
create table table1 (primary_key varchar, col1 varchar,
col2 varchar, col3 bigint, col4 bigint, primary key (primary_key));
create index on table1(col3);
I changed the primary key to an extra column (I calculate the key inside my application).
After importing a few records I tried to execute following cql:
select * from table1 where col3 < 100 and col3 > 50;
This result is:
Bad Request: No indexed columns present in by-columns clause with Equal operator
The Query select col1,col2,col3,col4 from table1 where col3 = 67 works
Google said there is no way to execute that kind of queries. Is that right? Any advice how to create such a query?
Cassandra indexes don't actually support sequential access; see http://www.datastax.com/docs/1.1/ddl/indexes for a good quick explanation of where they are useful. But don't despair; the more classical way of using Cassandra (and many other NoSQL systems) is to denormalize, denormalize, denormalize.
It may be a good idea in your case to use the classic bucket-range pattern, which lets you use the recommended RandomPartitioner and keep your rows well distributed around your cluster, while still allowing sequential access to your values. The idea in this case is that you would make a second dynamic columnfamily mapping (bucketed and ordered) col3 values back to the related primary_key values. As an example, if your col3 values range from 0 to 10^9 and are fairly evenly distributed, you might want to put them in 1000 buckets of range 10^6 each (the best level of granularity will depend on the sort of queries you need, the sort of data you have, query round-trip time, etc). Example schema for cql3:
CREATE TABLE indexotron (
rangestart int,
col3val int,
table1key varchar,
PRIMARY KEY (rangestart, col3val, table1key)
);
When inserting into table1, you should insert a corresponding row in indexotron, with rangestart = int(col3val / 1000000). Then when you need to enumerate all rows in table1 with col3 > X, you need to query up to 1000 buckets of indexotron, but all the col3vals within will be sorted. Example query to find all table1.primary_key values for which table1.col3 < 4021:
SELECT * FROM indexotron WHERE rangestart = 0 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 1000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 2000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 3000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 4000 AND col3val < 4021 ORDER BY col3val;
If col3 is always known small values/ranges, you may be able to get away with a simpler table that also maps back to the initial table, ex:
create table table2 (col3val int, table1key varchar,
primary key (col3val, table1key));
and use
insert into table2 (col3val, table1key) values (55, 'foreign_key');
insert into table2 (col3val, table1key) values (55, 'foreign_key3');
select * from table2 where col3val = 51;
select * from table2 where col3val = 52;
...
Or
select * from table2 where col3val in (51, 52, ...);
Maybe OK if you don't have too large of ranges. (you could get the same effect with your secondary index as well, but secondary indexes aren't highly recommended?). Could theoretically parallelize it "locally on the client side" as well.
It seems the "Cassandra way" is to have some key like "userid" and you use that as the first part of "all your queries" so you may need to rethink your data model, then you can have queries like select * from table1 where userid='X' and col3val > 3 and it can work (assuming a clustering key on col3val).

Resources