Using DSBulk to load into a CQL set returns "Invalid set literal - bind variables are not supported inside collection literals" - cassandra

I try to load with dsbulk a huge amount of data into a table wit a set using:
dsbulk load test.json \
-h cassandra-db -u ... -p ... -k mykeyspace \
-query "update mykeyspace.mytable set value_s = value_s +{:value_s} where value_1=:value_1 and value_2=:value_2"
I get the following error:
Operation LOAD_20220629-122020-418911 failed: Invalid set literal for value_s: bind variables are not supported inside collection literals
If I use
-query "update mykeyspace.mytable set value_s = value_s +{'mystaticvalue'} where value_1=:value_1 and value_2=:value_2"
the load is executed as expected. Anyone an idea how I can parameterize my set svalue?
Alternatively, I can create individual update statements, which I then execute via the cqlsh. Unfortunately the processing time is really slow. I have > 1 billion records to insert.

Unfortunately, Cassandra does not allow the use of bind variables when updating CQL collections. As a result, you won't be able to do it with the Bulk Loader either. Cheers!

Related

cassandra bind variables produces error: no viable alternative at input '?'

I'm using cassandra. I am trying to update the gc_grace value using new bind variable.
ALTER table keyspace.table_name with gc_grace_seconds = ? ;
I got the following error:
no viable alternative at input '?'
How can I solve this?
As I see from the source code (maybe I'm wrong), but ALTER TABLE doesn't support bindings, so you can't use them for this command (and all DDL commands), and need to just use execute with specific value
It looks like you're trying to bind parameters programatically to set GC grace on a table. It isn't possible to do that using the Cassandra drivers.
It will only work through cqlsh. For example:
cqlsh> ALTER TABLE community.maptbl WITH gc_grace_seconds = 3600;
It doesn't make sense to do it in your app and it is not recommended. Cheers!

updating all Cassandra tables starting with a specific name

I am trying to alter my cassandra tables starting with a specific name.
My table starts with sample_1,sample_2,sample_13567,sample_adgf and so on...
The table names are random but starting with same prefix.
I want to add a new column to all these tables.
Can some one suggest me the update query using the regex for table names.
If you are using linux You can this in two step :
First Generate all alter command into a file like below :
for i in {1..13567}; do echo "ALTER TABLE sample_$i ADD test text;"; done > alter.cql
The above command will create alter command to add test text column for table sample_1 to sample_13567 and store into a file alter.cql
Now you can just load the cql file into cqlsh like below :
cqlsh 127.0.0.1 -u cassandra -p cassandra -k ashraful_test -f alter.cql
Here
-u username
-p password
-k keyspace_name
-f file name to load
By the way having too much table is not a good idea.
Check this link https://stackoverflow.com/a/33389204/2320144

greenplum database "relation does not exist"

I am getting "relation does not exist" error while trying to truncate a particular table.The table actually exists in the database.
Also when I click on this table in pg admin I get the warning for vacuum.
Are these things related.?
------ Adding few more details----
Truncate statement is called within a greenplum function. This job truncates and load the table on a daily basis(This table is queried in reports)The issue pops up once in a while and if we go and restart the same job again after few minutes it succeeds.
Please try to do the below select * from schemaname.tablename limit 10; If you don't use the schema name then you have to set the search path as below and then run your select
set search_path=schemaname;

How can i describe table in cassandra database?

$describe = new Cassandra\SimpleStatement(<<<EOD
describe keyspace.tablename
EOD
);
$session->execute($describe);
i used above code but it is not working.
how can i fetch field name and it's data type from Cassandra table ?
Refer to CQL documentation. Describe expects a table/schema/keyspace.
describe table keyspace.tablename
Its also a cqlsh command, not an actual cql command. To get this information query the system tables. try
select * from system.schema_columns;
- or for more recent versions -
select * from system_schema.columns ;
if using php driver may want to check out http://datastax.github.io/php-driver/features/#schema-metadata
Try desc table keyspace.tablename;

RPC timeout error while exporting data from CQL

I am trying to export data from cassandra using CQL client. A column family has about 100000 rows in it. when i am copying dta into csv file using COPY TO command i get following rpc_time out error.
copy mycolfamily to '/root/mycolfamily.csv'
Request did not complete within rpc_timeout.
I am running in:
[cqlsh 3.1.6 | Cassandra 1.2.8 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
How can I increase RPC timeout limit?
I tried adding rpc_timeout_in_ms: 20000 (defalut is 10000) in my conf/cassandra.yaml file. but while restarting cassandra I get:
[root#user ~]# null; Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=rpc_timeout_in_ms for JavaBean=org.apache.cassandra.config.Config#71bfc4fc; Unable to find property 'rpc_timeout_in_ms' on class: org.apache.cassandra.config.Config
Invalid yaml; unable to start server. See log for stacktrace.
The COPY command currently does the same thing with SELECT with LIMIT 99999999. So, it will eventually goes to timeout while your data is growing. Here's the export function;
https://github.com/apache/cassandra/blob/trunk/bin/cqlsh#L1524
I'm doing the same export on production. What I'm doing is the following;
make select * from table where timeuuid = someTimeuuid limit 10000
write the result set to a csv file w/ >> mode
make the next selects with respect to the last timeuuid
You can pipe command in cqlsh by the following cqlsh command
echo "{$cql}" | /usr/bin/cqlsh -u user -p password localhost 9160 > file.csv
You can use Auto pagination by specifying fetch size in Datastax Java driver.
Statement stmt = new SimpleStatement("SELECT id FROM mycolfamily;");
stmt.setFetchSize(500);
session.execute(stmt);
for (Row r:result.all()){
//write to file
}
I have encountered the same problem a few minutes ago then I have found CAPTURE and it worked:
First start capturing on cqlsh and then run your query with some limiting of your choice.
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/capture_r.html
The best way yo export the data is using nodetool snapshot option. This returns immediately and can be restored later on. The only issue is that this export is per node and for the entire cluster.
Example:
nodetool -h localhost -p 7199 snapshot
See reference:
http://docs.datastax.com/en/archived/cassandra/1.1/docs/backup_restore.html#taking-a-snapshot

Resources