cassandra dsbulk mapping failed - cassandra

I am using dsbulk to load dataset into the datastax astra
error message:
my table structure:
CREATE TABLE project(
FL_DATE date,
OP_CARRIER text,
DEP_DELAY float,
ARR_DELAY float,
PRIMARY KEY ((FL_DATE), OP_CARRIER)
) WITH CLUSTERING ORDER BY (OP_CARRIER ASC);
my mapping error
i try changing datatype still not working. Appreciate if anyone can help me

Assumptions:
Both secure connect bundle and input csv is loacted at /path/to/ directory
Table Structure:
token#cqlsh:payloadtest> DESC TABLE projectjk;
CREATE TABLE projectjk.projectjk (
fl_date date,
op_carrier text,
arr_delay float,
dep_delay float,
PRIMARY KEY ((fl_date), op_carrier)
) WITH CLUSTERING ORDER BY (op_carrier ASC)
...;
Starting with an empty table:
token#cqlsh:projectjk> select * from projectjk;
fl_date | op_carrier | arr_delay | dep_delay
---------+------------+-----------+-----------
(0 rows)
Input sample csv file contents:
% cat /path/to/projectjk.csv
fl_date,op_carrier,dep_delay,arr_delay
2020-01-01,WN,44.0,363.0
2020-01-02,AN,42.0,143.42
DSBulk configuration contents is:
% cat projectjk.conf
dsbulk {
connector {
name = "csv"
}
csv {
url='/path/to/projectjk.csv'
header=true
}
schema {
keyspace=projectjk
table=projectjk
}
log.stmt.level=EXTENDED
}
datastax-java-driver {
basic {
cloud.secure-connect-bundle="/path/to/secure-connect-projectjk.zip"
}
advanced.auth-provider {
username = "CHANGE_ME"
password = "CHANGE_ME"
}
}
DSBulk Load command executed is:
./dsbulk load -f projectjk.conf

Related

How to run CQL in Zeppelin by taking input in user input format?

I was trying to run CQL query by taking in user input format in Zeppelin tool:-
%cassandra
SELECT ${Select Fields Type=uuid ,uuid | created_by | email_verify| username} FROM
${Select Table=keyspace.table_name}
${WHERE email_verify="true" } ${ORDER BY='updated_date' }LIMIT ${limit = 10};
while running this query I was getting this error:
line 4:0 mismatched input 'true' expecting EOF
(SELECT uuid FROM keyspace.table_name ["true"]...)
You need to move WHERE and ORDER BY out of the dynamic form declaration.
The input field declaration is looks as following: ${field_name=default_value}. In your case, instead of WHERE ..., you've got the field name of WHERE email_verify.
It should be as following (didn't tested):
%cassandra
SELECT ${Select Fields Type=uuid ,uuid | created_by | email_verify| username} FROM
${Select Table=keyspace.table_name}
WHERE ${where_cond=email_verify='true'} ORDER BY ${order_by='updated_date'} LIMIT ${limit = 10};
Update:
here is the working example for table with following structure:
CREATE TABLE test.scala_test2 (
id int,
c int,
t text,
tm timestamp,
PRIMARY KEY (id, c)
) WITH CLUSTERING ORDER BY (c ASC)

Cassandra QueryBuilder not returning any result, whereas same query works fine in CQL shell

SELECT count(*) FROM device_stats
WHERE orgid = 'XYZ'
AND regionid = 'NY'
AND campusid = 'C1'
AND buildingid = 'C1'
AND floorid = '2'
AND year = 2017;
The above CQL query returns correct result - 32032, in CQL Shell
But when I run the same query using QueryBuilder Java API , I see the count as 0
BuiltStatement summaryQuery = QueryBuilder.select()
.countAll()
.from("device_stats")
.where(eq("orgid", "XYZ"))
.and(eq("regionid", "NY"))
.and(eq("campusid", "C1"))
.and(eq("buildingid", "C1"))
.and(eq("floorid", "2"))
.and(eq("year", "2017"));
try {
ResultSetFuture tagSummaryResults = session.executeAsync(tagSummaryQuery);
tagSummaryResults.getUninterruptibly().all().stream().forEach(result -> {
System.out.println(" totalCount > "+result.getLong(0));
});
I have only 20 partitions and 32032 rows per partition.
What could be the reason QueryBuilder not executing the query correctly ?
Schema :
CREATE TABLE device_stats (
orgid text,
regionid text,
campusid text,
buildingid text,
floorid text,
year int,
endofwindow timestamp,
categoryid timeuuid,
devicestats map<text,bigint>,
PRIMARY KEY ((orgid, regionid, campusid, buildingid, floorid,year),endofwindow,categoryid)
) WITH CLUSTERING ORDER BY (endofwindow DESC,categoryid ASC);
// Using the keys function to index the map keys
CREATE INDEX ON device_stats (keys(devicestats));
I am using cassandra 3.10 and com.datastax.cassandra:cassandra-driver-core:3.1.4
Moving my comment to an answer since that seems to solve the original problem:
Changing .and(eq("year", "2017")) to .and(eq("year", 2017)) solves the issue since year is an int and not a text.

com.datastax.driver.core.exceptions.InvalidQueryException: Invalid operator IN for PRIMARY KEY part

I have cassandra 2.1.15.
I have this table
CREATE TABLE ks_mobapp.messages (
pair_id text,
belong_to text,
message_id timeuuid,
cli_time bigint,
sender text,
text text,
time bigint,
PRIMARY KEY ((pair_id, belong_to), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
I was trying to delete multiple record as
instances.getCqlSession().execute(QueryBuilder.delete()
.from(AppConstants.KEYSPACE, "messages")
.where(QueryBuilder.eq("pair_id", pairId))
.and(QueryBuilder.eq("belong_to", currentUser.value("userId")))
.and(QueryBuilder.in("message_id", msgId)));
I am getting error:
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Invalid operator IN for PRIMARY KEY part message_id
Then I tried:
Session session = instances.getCqlSession();
PreparedStatement statement = session.prepare("DELETE FROM ks_mobApp.messages WHERE pair_id = ? AND belong_to = ? AND message_id = ?;");
Iterator<String> iterator = msgId.iterator();
while(iterator.hasNext()) {
try {
session.executeAsync(statement.bind(pairId, currentUser.value("userId"), UUID.fromString(iterator.next())));
} catch(Exception ex) {
}
}
Its working nice. Is this the correct way? I can't use IN for same partition key ?
DELETE in Query only supported for partition key.
Delete IN relation is only supported for partition key)
There are some WHERE clause restrictions for the UPDATE and DELETE statements in cassandra 2.x
more specifically you can only use the IN operator on the last partition key column. So in your case the last partition column is belong_to. so IN can only be used on that column.
However these limitation are removed in cassandra 3.0. and it will allow
IN to be specified on any partition key column
IN to be specified on any clustering column
Here is the patch https://issues.apache.org/jira/browse/CASSANDRA-6237
Read this also http://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

phpcassa throws warning on LongType and dateOf(TimeUUIDType)

I have the following table in cassandra;
CREATE TABLE reports (
c_date text,
c_n int,
c_id timeuuid,
report_id bigint,
report text,
PRIMARY KEY ((c_date, c_n), c_id)
)
c_date is for querying reports by date.
c_n is the number of nodes to prevent hotspots(number of nodes to distribute data evenly).
c_id is the inserted timeuuid.
My select query (cql 3) is the following;
select report, dateOf(c_id), report_id
from keyspace.reports
where c_date = '2013-08-02' and
c_n = 1 and
c_id > minTimeuuid('2013-08-02 02:52:10-0400');
I have successfully get the result set;
However, when I use cql_get_rows() function implemented on another example (here),
the timestamp (dateOf(id)) cannot be parsed correctly and bigint fields yield the following warning;
PHP Warning: unpack(): Type N: not enough input, need 4, have 0
in /home/arascan/my-project/tools/vendor/phpcassa/lib/phpcassa/Schema/DataType/LongType.php on line 47
The returned data from cql_get_rows() is the following;
[0] => Array
(
[reportid] => 281474976712782
[report] => some_report
[dateOf(c_id)] => d:1375426331.32100009918212890625;
)
How can I prevent this function throw warning and get the timestamp in date format?
(Please don't suggest # usage)

Inserting data in table with umlaut is not possible

I am using Cassandra 1.2.5 (cqlsh 3.0.2) and trying to inserting data in a small test-database with german characters which is not possible. I get back the message from cqlsh: "Bad Request: Input length = 1"
below is the setup of the keyspace, the table and the insert.
CREATE KEYSPACE test WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
use test;
CREATE TABLE testdata (
id varchar,
text varchar,
PRIMARY KEY (id)
This is working:
insert into testdata (id, text) values ('4711', 'test');
This is not allowed:
insert into testdata (id, text) values ('4711', 'töst`);
->Bad Request: Input length = 1
my locale is :de_DE.UTF-8
Does Cassandra 1.2.5 has a problem with Umlaut ?
I just did what you posted and it worked for me. The one thing that was different however, is that instead of a single quote, you finished 'töst` with a backtick. That doesn't allow me to finish the statement in cqlsh. When I replace that with 'töst' it succeeds and I get:
cqlsh:test> select * from testdata;
id | text
------+------
4711 | töst

Resources