Unable to get all the results of Astyanax IndexQuery execution - cassandra

I am using Astyanax client to read data from Cassandra column families. My requirement is to query on a secondary index column of a column family through astyanax. I was able to find a few examples from the astyanax website to read data from Cassandra using an expression on an index column. I used the examples to write my own code:
OperationResult<Rows<String, String>> result = null;
IndexQuery<String, String> query = keyspace.prepareQuery("ColumnFamilyName").searchWithIndex().setRowLimit(100).autoPaginateRows(true).addExpression().whereColumn("ColumnNameWhichHasSecondaryIndex").equals().value("value");
result = query.execute();
for(Row<String, String> row : result.getResult()) {
...
}
This is the issue I am facing while running this code is it is only returning me 100 rows, while the column family has a few thousand rows that satisfy the query criteria.
I am not able to find the right API/pattern to address my issue. Could some one please guide me on this issue please?
Astanax Version: 1.56.24
Apache Cassandra Version: 1.1.8
Jdk Version: 1.7
Platform: Rhel 5.4 32bit

setRowLimit(100) means every time you execute the query, it return at most 100 items.
So if you have more than 100 items, you should execute query again and again to get
all the result under the setting autoPaginateRows(true)

Increase your row limit:
.setRowLimit(100)

Related

Cassandra Python driver doesn't page large queries

It is said in the documentation that cassandra-driver does automatic paging when queries are large enough (with default_fetch_size being 5000 rows) and will return PagedResult.
I have tested reading data from my local Cassandra which contains 9999 rows with SimpleStatement with my own fetch size, but it returned the ResultSet (9999 rows) instead of pages (instance of PagedResult). Also, I tried to change the Session.default_fetch_size but it didn't work as well.
Here's my code..
My first attempt: This is the SimpleStatement code i have made to change the fetch size.
cluster = Cluster()
session = cluster.connect(keyspace_name)
query = "SELECT * FROM user"
statement = SimpleStatement(query, fetch_size=10)
rows = list(session.execute(statement))
print(len(rows))
It prints 9999 (all rows), not 10 rows as I already set the fetch_size.
My second attempt: I tried to change the query fetch size by changing session's default fetch size Session.default_fetch_size.
cluster = Cluster()
session = cluster.connect(keyspace_name)
session.default_fetch_size = 10
query = "SELECT * FROM user"
rows = list(session.execute(query))
print(len(rows))
It also prints 9999 rows instead of 10.
My goal is not to limit the rows from my fetch query, such as SELECT * FROM user LIMIT 10. What I want is to fetch the rows page by page to avoid overload on memory.
So what actually happened?
Note: I am using Cassandra-Driver 3.25 for Python and using Python3.7
I am sorry if my additional information still doesn't make my question a good one. I never ask any questions before. So...any suggestions are welcome :)
Your test is invalid because your code is faulty.
When you list(), you are in fact "materialising" all the result pages. Your code is not iterating over the rows but retrieving all of the rows.
The driver automatically fetches the next page in the background until there are no more pages to fetch. It may not seem like it but each page only contains fetch_size rows.
Retrieving the next page happens transparently so to you it seems like the results are not getting paged at all but that automatic behaviour from the driver is working as designed. Cheers!

Paging through Cassandra using QueryBuilder

The DataStax documentation says that to page through all data, the following CQL query is useful:
SELECT * FROM test WHERE token(k) > token(42);
Is it possible to build this query using the QueryBuilder? It provides a token method, but that seems to work only on column names, not on values.
Ideally, the value (in the example: 42) is of type Object, just like in the eq/gte/lte functions.
Try using automatic paging with the .fetchSize method. It uses token under the hood:
Automatic paging is introduced Cassandra 2.0. Automatic paging allows the developer to iterate on an entire ResultSet without having to care about its size: some extra rows are fetched as the client code iterate over the results while the old ones are dropped. The amount of rows that must be retrieved can be parameterized at query time. In the Java Driver this will looks like:
Statement stmt = new SimpleStatement("SELECT * FROM images");
stmt.setFetchSize(100);
ResultSet rs = session.execute(stmt);
Source: http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
QueryBuilder.fcall("token", value) ;
can solve the problem!

How to check if a Cassandra table exists

Is there an easy way to check if table (column family) is defined in Cassandra using CQL (or API perhaps, using com.datastax.driver)?
Right now I am leaning towards executing SELECT 1 FROM table and checking for exception but maybe there is a better way?
As of 1.1 you should be able to query the system keyspace, schema_columnfamilies column family. If you know which keyspace you want to check, this CQL should list all column families in a keyspace:
SELECT columnfamily_name
FROM schema_columnfamilies WHERE keyspace_name='myKeyspaceName';
The report describing this functionality is here: https://issues.apache.org/jira/browse/CASSANDRA-2477
Although, they do note that some of the system column names have changed between 1.1 and 1.2. So you might have to mess around with it a little to get your desired results.
Edit 20160523 - Cassandra 3.x Update:
Note that for Cassandra 3.0 and up, you'll need to make a few adjustments to the above query:
SELECT table_name
FROM system_schema.tables WHERE keyspace_name='myKeyspaceName';
The Java driver (since you mentioned it in your question) also maintains a local representation of the schema.
Driver 3.x and below:
KeyspaceMetadata ks = cluster.getMetadata().getKeyspace("myKeyspace");
TableMetadata table = ks.getTable("myTable");
boolean tableExists = (table != null);
Driver 4.x and above:
Metadata metadata = session.getMetadata();
boolean tableExists =
metadata.getKeyspace("myKeyspace")
.flatMap(ks -> ks.getTable("myTable"))
.isPresent();
I just needed to manually check for the existence of a table using cqlsh.
Possibly useful general info.
describe keyspace_name.table_name
If it doesn't exist you'll get 'table_name' not found in keyspace 'keyspace'
If it does exist you'll get a description of the table.
For the .NET driver CassandraCSharpDriver version 3.17.1 the following code creates a table if it doesn't exist yet:
var ks = _cassandraSession.Cluster.Metadata.GetKeyspace(keyspaceName);
var tableNames = ks.GetTablesNames();
if(!tableNames.Contains(tableName.ToLowerInvariant()))
{
var stmt = new SimpleStatement($"CREATE TABLE {tableName} (id text PRIMARY KEY, name text, price decimal, volume int, time timestamp)");
_cassandraSession.Execute(stmt);
}
You will need to adapt the list of table columns to your needs. This can also be awaited by using await _cassandraSession.ExecuteAsync(stmt).ConfigureAwait(false) in an async method.
Also, I want to mention that I'm using Cassandra version 4.0.1.

Astyanax key range query

trying to write a query which will paginate through all rows in a column family using astyanax client and RowSliceQuery.
keyspace.prepareQuery(COLUMN_FAMILY).getKeyRange(null, null, null, null, 100);
Done this successfully using hector where 1st call is done with null start and end keys. After retrieving 1st page I use last key from the result to make query for second page and etc. This is code for 1st page using hector.
HFactory.createRangeSlicesQuery(keyspace,
LongSerializer.get(), new CompositeSerializer(),
BytesArraySerializer.get())
.setColumnFamily(COLUMN_FAMILY)
.setRange(null, null, false, 100).setRowCount(100);
Now when I am trying to do this with astyanax I am getting errors about null and non-null keys and tokens. Not sure what tokens do in this query. Also I am able to use allRows(), but would like to do this using key range query as it gives me more flexibility.
Does anybody have an example of key range query using astyanax? I cannot find an example neither in "getting started" documentation or anywhere else on the net.
Thanks!
Anton
What you are referring to is the getRowRange method:
keyspace.prepareQuery(CF_STANDARD1)
.getRowRange(startKey, endKey, startToken, endToken, count)
Note however that this works only when the ByteOrderedPartitioner is used. Since by default Cassandra uses the Murmur3Partitioner, this will usually not work. Using an index to do this instead is recommended. Astyanax also provides the reverse index search recipe which takes advantage of a second column family which stores your keys as columns to allow efficient range searches on the original data.
Check this sample code. I hope this code will help you in doing the paging.
IndexQuery<String, String> query = keyspace
.prepareQuery(CF_STANDARD1).searchWithIndex()
.setRowLimit(10).autoPaginateRows(true).addExpression()
.whereColumn("Index2").equals().value(42);
Best,

Cassandra Client - examining columns as Strings

I am new to Cassandra & I am using the Hector Java Client for writing/reading from it.
I've the following code to insert values -
Mutator<String> mutator = HFactory.createMutator(keyspaceOperator, StringSerializer.get());
mutator.insert("jsmith", stdColumnDefinition, HFactory.createStringColumn("first", "John"));
Now, when I get the values back via the Hector client - it works cleanly -
columnQuery.setColumnFamily(stdColumnDefinition).setKey("jsmith").setName("first");
QueryResult<HColumn<String, String>> result = columnQuery.execute();
However, when I try to get the values from the command line cassandra client, I get the data in bytes, rather than human readable String format. Is there a way for me to fix this so that I can use the cassandra client to spit out Strings -
here is the sample output
[default#keyspaceUno] list StandardUno ;
Using default limit of 100
RowKey: 6a736d697468
=> (column=6669727374, value=4a6f686e, timestamp=1317183324576000)
=> (column=6c617374, value=536d697468, timestamp=1317183324606000)
1 Row Returned.
Thanks.
You can either modify the schema so that Cassandra interprets the column as a string by default, or you can tell the CLI how to interpret the data, either on a one-shot basis, or for the rest of the session, using the "assume" command.
See http://www.datastax.com/docs/0.8/dml/using_cli#reading-rows-and-columns for examples.

Resources