Can i do 'DESCRIBE KEYSPACES' from cqlengine? - cassandra

I'm using cqlengine from Django. Is there a way to make DESCRIBE KEYSPACES from it. It works form cqlsh.
Couldn't find anything in Docs

Best bet is getting it from the cluster metadata. DESCRIBE is not part of cql but cqlsh is using the python driver just like cqlengine so you can use same mechanism:
https://github.com/apache/cassandra/blob/16490a48b02b6f206a78717e9b816983f0b76bb1/bin/cqlsh.py#L619
map(str, yourconnection.metadata.keyspaces.keys())
from the metadata you can collect most data you want like this. You can also query the system schema tables but that changes between versions a bit so I would recommend letting driver do it for you.

Related

"LIKE" and "ILIKE" queries in AWS Keyspaces

As there is no support for custom index in AWS Keyspaces what would be the best solution / pattern to be able to run LIKE or ILIKE queries on specific columns of a Cassandra Table?
In vanilla Cassandra, you can use SSTable secondary index to use LIKE queries, but we can't in AWS...
Is there any query for Cassandra as same as SQL:LIKE Condition?
Feeding an OpenSearch service, or even a good old Postgres at the same time of updating Keyspaces seems a bit overkill to me.
Fetching all columns in-memory somewhere to do the query seems slow as well.
What would be the lightest infra / architecture to implement to provide a LIKE query support based on AWS Keyspaces as source of truth?
You can use a Lexi-graphical Select statement to narrow your query down the same way you would do a LIKE statement. If you needed to further narrow it down you could do that narrowing client side. I would love to learn more your use case so I can better assist you.

How to migrate data between two tables in Cassandra properly

I have to change the schema of one of my tables in Cassandra. It's cannot be done by simply using ALTER TABLE command, because there are some changes in primary key.
So the question is: How to do such a migration in the best way?
Using COPY command in cql is not an option in here because dump file can be really huge.
Can I solve this problem by not creating some custom application?
Like Guillaume has suggested in the comment - you can't do this directly in cassandra. Schema altering operations are very limited here. You have to perform such migration manually using one of suggested there tools OR if you have very large tables you can leverage Spark.
Spark can efficiently read data from your nodes, transform them locally and save them back to db. Remember that such migration requires reading whole db content, so might take a while. It might be the most performant solution, however needs some bigger preparation - Spark cluster setup.

thrift or CQL3 for cassandra wide rows?

we are going to create a new project on cassandra with php or java.
As we estimated, there will be 20K req/sec to cassandra cluster.
Specially wide column feature is important for this project, but i can not make it clear: should i prefer thrift API or CQL3 library like php-driver etc?
There is an post that says 'Thrift API is not going to be getting new features' in this link. So i am not sure about thrift.
if i decided to use cql3, i have to alter table to be sure column exists before all insert queries like this, which is discussed at here. i think this will be a performance issue for me.
So which of them is best to my case ?
Thrift is a legacy interface in Cassandra. All new development should use the native CQL interface.
I'm not clear on why you think you'd need to do an alter table frequently. Typically you would define a schema once and rarely if ever use alter table.

Using Pig with Cassandra CQL3

When trying to run PIG against a CQL3 created Cassandra Schema,
-- This script simply gets a row count of the given column family
rows = LOAD 'cassandra://Keyspace1/ColumnFamily/' USING CassandraStorage();
counted = foreach (group rows all) generate COUNT($1);
dump counted;
I get the following Error.
Error: Column family 'ColumnFamily' not found in keyspace 'KeySpace1'
I understand that this is by design, but I have been having trouble finding the correct method to load CQL3 tables into PIG.
Can someone point me in the right direction? Is there a missing bit of documentation?
This is now supported in Cassandra 1.2.8
As you mention this is by design because if thrift was updated to allow for this it would compromise backwards computability. Instead of creating keyspaces and column families using CQL (I'm guessing you used cqlsh) try using the C* CLI.
Take a look at these issues as well:
https://issues.apache.org/jira/browse/CASSANDRA-4924
https://issues.apache.org/jira/browse/CASSANDRA-4377
Per this https://github.com/alexliu68/cassandra/pull/3, it appears that this fix is planned for the 1.2.6 release of Cassandra. It sounds like they're trying to get that out in the reasonably near future, but of course there's no certain ETA.
As e90jimmy said, its supported in Cassandra 1.2.8, but we have a issue when using counter column type. This was fixed by Alex Liu but due to regression problem in 1.2.7 the patch doesn't go ahead:
https://issues.apache.org/jira/browse/CASSANDRA-5234
To correct this, wait until 2.0 become production ready or download the source, apply the patch from the above link by yourself and rebuild the cassandra .jar. Worked for me by now...
The best way to access Cql3 Tables in Pig is by using the CqlStorage Handler
The syntax is similar to what you have a above
row = Load 'cql://Keyspace/ColumnFamily/' Using CqlStorage()
More info In the Dev Blog Post

What is a good Bulk data loading tool for Cassandra

I'm looking for a tool to load CSV into Cassandra. I was hoping to use RazorSQL for this but I've been told that it will be several months out.
What is a good tool?
Thanks
1) If you have all the data to be loaded in place you can try sstableloader(only for cassandra 0.8.x onwards) utility to bulk load the data.For more details see:cassandra bulk loader
2) Cassandra has introduced BulkOutputFormat bulk loading data into cassandra with hadoop job in latest version that is cassandra-1.1.x onwards.
For more details see:Bulkloading to Cassandra with Hadoop
I'm dubious that tool support would help a great deal with this, since a Cassandra schema needs to reflect the queries that you want to run, rather than just being a generic model of your domain.
The built-in bulk loading mechanism for cassandra is via BinaryMemtables: http://wiki.apache.org/cassandra/BinaryMemtable
However, whether you use this or the more usual Thrift interface, you still probably need to manually design a mapping from your CSV into Cassandra ColumnFamilies, taking into account the queries you need to run. A generic mapping from CSV-> Cassandra may not be appropriate since secondary indexes and denormalisation are commonly needed.
For Cassandra 1.1.3 and higher, there is the CQL COPY command that is available for importing (or exporting) data to (or from) a table. According to the documentation, if you are importing less than 2 million rows, roughly, then this is a good option. Is is much easier to use than the sstableloader and less error prone. The sstableloader requires you to create strictly formatted .db files whereas the CQL COPY command accepts a delimited text file. Documenation here:
http://www.datastax.com/docs/1.1/references/cql/COPY
For larger data sets, you should use the sstableloader.http://www.datastax.com/docs/1.1/references/bulkloader. A working example is described here http://www.datastax.com/dev/blog/bulk-loading.

Resources