Datastax Java Driver failed to scan an entire table - cassandra

I iterated over the entire table and received less partitions than expected.
Initially, I thought that it must be something wrong on my end, but after checking the existence of every row (I have a list of billions of keys with which I used) by using simple where query, and also verifying the expected number with the spark connector, I conclude that it can't be anything other than the driver.
I have billions of data rows, yet receiving half a billion less.
anyone else encountered this issue and was able to resolve it?
adding code snippet
The structure of the table is a simple counter table ,
CREATE TABLE counter_data (
id text,
name text,
count_val counter,
PRIMARY KEY (id, name)
) ;
public class CountTable {
private Session session;
private Statement countQuery;
public void initSession(String table) {
QueryOptions queryOptions = new QueryOptions();
queryOptions.setConsistencyLevel(ConsistencyLevel.ONE);
queryOptions.setFetchSize(100);
QueryLogger queryLogger = QueryLogger.builder().build();
Cluster cluster = Cluster.builder().addContactPoints("ip").withPort(9042)
.build();
cluster.register(queryLogger);
this.session = cluster.connect("ks");
this.countQuery = QueryBuilder.select("id").from(table);
}
public void performCount(){
ResultSet results = session.execute(countQuery);
int count = 0;
String lastKey = "";
results.iterator();
for (Row row : results) {
String key = row.getString(0);
if (!key.equals(lastKey)) {
lastKey = key;
count++;
}
}
session.close();
System.out.println("count is "+count);
}
public static void main(String[] args) {
CountTable countTable = new CountTable();
countTable.initSession("counter_data");
countTable.performCount();
}
}

Upon checking your code, the consistency level requested is ONE, compared to a dirty read in RDBMS world.
queryOptions.setConsistencyLevel(ConsistencyLevel.ONE);
For stronger consistency, that is to get back all records use local_quorum. Update your code as follows
queryOptions.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);
local_quorum guarantees that majority of the nodes in the replica (in your case 2 out of 3) respond to the read request and hence stronger consistency resulting in accurate number of rows. Here is documentation reference on consistency.

Related

Azure Cosmos Db, select after row?

I'm trying to select some rows after x rows, something like:
SELECT * from collection WHERE ROWNUM >= 235 and ROWNUM <= 250
Unfortunately it looks like ROWNUM isn't resolved in azure cosmos db.
Is there another way to do this? I've looked at using continuation tokens but it's not helpful if a user skips to page 50, would I need to keep querying with continuation tokens to get to page 50?
I've tried playing around with the page size option but that has some limitations in terms of how many things it can return at any one time.
For example I have 1,000,000 records in Azure. I want to query rows
500,000 to 500,010. I can't do SELECT * from collection WHERE ROWNUM >= 500,000 and ROWNUM <= 500,010 so how do I achieve this?
If you don't have any filters, you can't retrieve items in specific range via query sql direcly in cosmos db so far. So, you need to use pagination to locate your desire items. As I know, pagination is supported based on continuation token only so far.
Please refer to the function as below:
using JayGongDocumentDB.pojo;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Linq;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace JayGongDocumentDB.module
{
class QuerySample1
{
public static async void QueryPageByPage()
{
// Number of documents per page
const int PAGE_SIZE = 2;
int currentPageNumber = 1;
int documentNumber = 1;
// Continuation token for subsequent queries (NULL for the very first request/page)
string continuationToken = null;
do
{
Console.WriteLine($"----- PAGE {currentPageNumber} -----");
// Loads ALL documents for the current page
KeyValuePair<string, IEnumerable<Student>> currentPage = await QueryDocumentsByPage(currentPageNumber, PAGE_SIZE, continuationToken);
foreach (Student student in currentPage.Value)
{
Console.WriteLine($"[{documentNumber}] {student.Name}");
documentNumber++;
}
// Ensure the continuation token is kept for the next page query execution
continuationToken = currentPage.Key;
currentPageNumber++;
} while (continuationToken != null);
Console.WriteLine("\n--- END: Finished Querying ALL Dcuments ---");
}
public static async Task<KeyValuePair<string, IEnumerable<Student>>> QueryDocumentsByPage(int pageNumber, int pageSize, string continuationToken)
{
DocumentClient documentClient = new DocumentClient(new Uri("https://***.documents.azure.com:443/"), "***");
var feedOptions = new FeedOptions
{
MaxItemCount = pageSize,
EnableCrossPartitionQuery = true,
// IMPORTANT: Set the continuation token (NULL for the first ever request/page)
RequestContinuation = continuationToken
};
IQueryable<Student> filter = documentClient.CreateDocumentQuery<Student>("dbs/db/colls/item", feedOptions);
IDocumentQuery<Student> query = filter.AsDocumentQuery();
FeedResponse<Student> feedRespose = await query.ExecuteNextAsync<Student>();
List<Student> documents = new List<Student>();
foreach (Student t in feedRespose)
{
documents.Add(t);
}
// IMPORTANT: Ensure the continuation token is kept for the next requests
return new KeyValuePair<string, IEnumerable<Student>>(feedRespose.ResponseContinuation, documents);
}
}
}
Output:
Hope it helps you.
Update Answer:
No such function like ROW_NUMBER() [How do I use ROW_NUMBER()? ] in cosmos db so far. I also thought skip and top.However, top is supported and skip yet(feedback).It seems skip is already in processing and will be released in the future.
I think you could push the feedback related to the paging function.Or just take above continuation token as workaround temporarily.

IllegalArgumentException: Table xyz does not exist in keyspace my_ks

I am developing an application, where I am trying to create a table if not exists and making a Query on it. It is working fine in normal cases. But for the first time , when the table is created , then when trying to Query the same table, the application is throwing :
IllegalArgumentException: Table xyz does not exist in keyspace my_ks
Same happens if I drop the table, and when my code recreates the table again.
For other cases, when the table exists, it is working fine. Is it some kind of replication issue, or should use a timeout from some time when the table is created for first time.
Following is the code snippet:
// Oredr 1: First this will be called
public boolean isSchemaExists() {
boolean isSchemaExists = false;
Statement statement = QueryBuilder
.select()
.countAll()
.from(keyspace_name, table_name);
statement.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);
try {
Session session = cassandraClient.getSession(someSessionKey);
ResultSet resultSet = null;
resultSet = session.execute(statement);
if (resultSet.one() != null) {
isSchemaExists = true;
}
} catch (all exception handling)
}
return isSchemaExists;
}
// Oredr 2: if previous method returns false then this will be get called
public void createSchema(String createTableScript) {
Session session = cassandraClient.getSession(someSessionKey);
if (isKeySpaceExists(keyspaceName, session)) {
session.execute("USE " + keyspaceName);
}
session.execute(createTableScript);
}
//Oredr 3: Now read the table, this is throwing the exception when the table
// is created for first time
public int readTable(){
Session session = cassandraClient.getSession(someSessionKey);
MappingManager manager = new MappingManager(session);
Mapper<MyPojo> mapper = manager.mapper(MyPojo.class);
Statement statement = QueryBuilder
.select()
.from(keyspaceName, tableName)
.where(eq("col_1", someValue)).and(eq("col_2", someValue));
statement.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);
ResultSet resultSet = session.execute(statement);
result = mapper.map(resultSet);
for (MyPojo myPojo : result) {
return myPojo.getCol1();
}
}
In isSchemaExists function use system.tables.
SELECT * FROM system.tables WHERE keyspace_name='YOUR KEYSPACE' AND table_name='YOUR TABLE'
Corresponding Java Code:
Statement statement = QueryBuilder
.select()
.from("system", "tables")
.where(eq("keyspace_name", keyspace)).and(eq("table_name", table));
It seems like in isSchemaExists you are using actual table and keyspace which will not exist when dropped or not created. That's the reason it is throwing you error table does not exist.

Lose Properties when convert Cassandra column to java object

I use spring-data-cassandra-1.2.1.RELEASE to operate Cassandra database. Things all go well .But recent days I got a problem, when I using the code to get data:
public UserInfoCassandra selectUserInfo(String passport) {
Select select = QueryBuilder.select().from("userinfo");
select.setConsistencyLevel(ConsistencyLevel.QUORUM);
select.where(QueryBuilder.eq("passport", passport));
UserInfoCassandra userinfo = operations.selectOne(select,
UserInfoCassandra.class);
return userinfo;
}
there were many properties in userinfo , but I just get two the passport and uid properties.
I debug into the method,got that the data getting from db is right,all properties were ready.but when converting them to a java object ,some disappear.. the converting code:
protected <T> T selectOne(Select query, CassandraConverterRowCallback<T> readRowCallback) {
ResultSet resultSet = query(query);
Iterator<Row> iterator = resultSet.iterator();
if (iterator.hasNext()) {
Row row = iterator.next();
T result = readRowCallback.doWith(row);
if (iterator.hasNext()) {
throw new DuplicateKeyException("found two or more results in query " + query);
}
return result;
}
return null;
}
the row data is right ,but the result is wrong, who can help ?
Most probably your entity class and it's corresponding relational model are mismatched.

How to print column values in Mutator before execute?

Below is the code I going to insert into cassandra
Set<String> keys = MY_KEYS;
Map<String, String> pairsOfNameValues = MY_MUTATION_BY_NAME_AND_VALUE;
Set<HColumn<String, String>> colums = new HashSet<HColumn<String,String>>();
for (Entry<String, String> pair : pairsOfNameValues.entrySet()) {
colums.add(HFactory.createStringColumn(pair.getKey(), pair.getValue()));
}
Mutator<String> mutator = template.createMutator();
String column_family_name = template.getColumnFamily();
for (String key : keys) {
for (HColumn<String, String> column : colums) {
mutator.addInsertion(key, BASIC_COLUMN_FAMILY, column);
}
}
mutator.execute();
There are some cases where I don't know how many columns are inserted into the mutator. Is there any to print the data before/after the execution method.
I tried Mutationresult.tostring(). It gives the following response.
MutationResult took (3750us) for query (n/a) on host:
localhost(127.0.0.1):9160
Also Mutator to String didn't give me desired result.
Please help.
Yep, try mutator.getPendingMutationCount() before executing the query.
Other than that, you'll have to push the logic of counting what columns you are adding to the mutator manually. The toString() doesn't give you what you want as you are supposed to bind the mutator.execute() to a MutationResult. E.g:
MutationResult mr = mutator.execute();
But the mutation result doesn't give you much more either. You can know these 3 things (2 really...)
// the execution time
long getExecutionTimeMicro();
long getExecutionTimeNano();
// host used for the exec.
CassandraHost getHostUsed();

Limitation in Cassandra-0.8.1 when using batch mutation

I found some exceptions from cassandra when I do batch mutation, it said "already has modifications in this mutation", but the info given are two different operations.
I use Super column with counters in this case, it's like
Key: md5 of urls, utf-8
SuperColumnName: date, utf-8
ColumnName: Counter name is a random number from 1 to 200,
ColumnValue:1L
L
public void SuperCounterMutation(ArrayList<String> urlList) {
LinkedList<HCounterSuperColumn<String, String>> counterSuperColumns;
for(String line : urlList) {
String[] ele = StringUtils.split(StringUtils.strip(line), ':');
String key = ele[0];
String SuperColumnName = ele[1];
LinkedList<HCounterColumn<String>> ColumnList = new LinkedList<HCounterColumn<String>>();
for(int i = 2; i < ele.length; ++i) {
ColumnList.add(HFactory.createCounterColumn(ele[i], 1L, ser));
}
mutator.addCounter(key, ColumnFamilyName, HFactory.createCounterSuperColumn(SuperColumnName, ColumnList, ser, ser));
++count;
if(count >= BUF_MAX_NUM) {
try {
mutator.execute();
} catch(Exception e) {
e.printStackTrace();
}
mutator = HFactory.createMutator(keyspace, ser);
count = 0;
}
}
return;
}
Error info from cassandra log showed that the duplicated operations have the same key only, SuperColumnName are not the same, and for counter name set, some conflicts have intersects and some not.
I'm using Cassandra 0.8.1 with hector 0.8.0-rc2
Can anyone tell me the reason of this problem? Thanks in advance!
Error info from cassandra log showed that the duplicated operations have the same key
Bingo. You'll need to combine operations from the same key into a single mutation.

Resources