Tombstoned cells without DELETE - cassandra

I'm running Cassandra cluster
Software version: 2.0.9
Nodes: 3
Replication factor: 2
I'm having a very simple table where I insert and update data.
CREATE TABLE link_list (
url text,
visited boolean,
PRIMARY KEY ((url))
);
There is no expire on rows and I'm not doing any DELETEs. As soon as I run my application it quickly slows down due to the increasing number of tombstoned cells:
Read 3 live and 535 tombstoned cells
It gets up to thousands in few minutes.
My question is what is responsible for generating those cells if I'm not doing any deletions?
// Update
This is the implementation I'm using to talk to Cassandra with com.datastax.driver.
public class LinkListDAOCassandra implements DAO {
public void save(Link link) {
save(new VisitedLink(link.getUrl(), false));
}
#Override
public void save(Model model) {
save((Link) model);
}
public void update(VisitedLink link) {
String cql = "UPDATE link_list SET visited = ? WHERE url = ?";
Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getVisited(), link.getUrl());
}
public void save(VisitedLink link) {
String cql = "SELECT url FROM link_list_inserted WHERE url = ?";
if(Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl()).all().size() == 0) {
cql = "INSERT INTO link_list_inserted (url) VALUES (?)";
Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl());
cql = "INSERT INTO link_list (url, visited) VALUES (?,?)";
Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl(), link.getVisited());
}
}
public VisitedLink getByUrl(String url) {
String cql = "SELECT * FROM link_list WHERE url = ?";
for(Row row : Cassandra.DB.execute(cql, url)) {
return new VisitedLink(row.getString("url"), row.getBool("visited"));
}
return null;
}
public List<Link> getLinks(int limit) {
List<Link> links = new ArrayList();
ResultSet results;
String cql = "SELECT * FROM link_list WHERE visited = False LIMIT ?";
for(Row row : Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, limit)) {
try {
links.add(new Link(new URL(row.getString("url"))));
}
catch(MalformedURLException e) { }
}
return links;
}
}
This is the execute implementation
public ResultSet execute(String cql, ConsistencyLevel cl, Object... values) {
PreparedStatement statement = getSession().prepare( cql ).setConsistencyLevel(cl);
BoundStatement boundStatement = new BoundStatement( statement );
boundStatement.bind(values);
return session.execute(boundStatement);
}
// Update 2
An interesting finding from the cfstats shows that only one table has tombstones. It's link_list_visited. Does it mean that updating a column with a secondary index will create tombstones?
Table (index): link_list.link_list_visited
SSTable count: 2
Space used (live), bytes: 5055920
Space used (total), bytes: 5055991
SSTable Compression Ratio: 0.3491883995187955
Number of keys (estimate): 256
Memtable cell count: 15799
Memtable data size, bytes: 1771427
Memtable switch count: 1
Local read count: 85703
Local read latency: 2.805 ms
Local write count: 484690
Local write latency: 0.028 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 32
Compacted partition minimum bytes: 8240
Compacted partition maximum bytes: 7007506
Compacted partition mean bytes: 3703162
Average live cells per slice (last five minutes): 3.0
Average tombstones per slice (last five minutes): 674.0

The only major differences between a secondary index and an extra column family to manually hold the index is that the secondary index only contains information about the current node (i.e. it does not contain information about other node's data) and the operations over the secondary index as a result of an update on the primary table are atomic operations. Other than that you can see it as a regular column family with the same weak spots, a high number of updates on the primary column family will lead to a high number of deletes on the index table because the updates on the primary table will be translated as a delete/insert operation on the index table. Said deletions in the index table are the source of the tombstones. Cassandra deletes are logical deletes until the next repair process (when the tombstones will be removed).
Hope it helps!

Related

Azure Change Feed and Querying based on Partition

When we fetch data from Document Db in the change feed, we only want it per partition and have tried adding PatitionKey to the code.
do
{
FeedResponse<PartitionKeyRange> pkRangesResponse = await client.ReadPartitionKeyRangeFeedAsync(
collectionUri,
new FeedOptions
{
RequestContinuation = pkRangesResponseContinuation,
PartitionKey = new PartitionKey("KEY"),
});
partitionKeyRanges.AddRange(pkRangesResponse);
pkRangesResponseContinuation = pkRangesResponse.ResponseContinuation;
}
while (pkRangesResponseContinuation != null);
It returns single range and when we go perform the second query
IDocumentQuery<Document> query = client.CreateDocumentChangeFeedQuery(
collectionUri,
new ChangeFeedOptions
{
PartitionKeyRangeId = pkRange.Id,
StartFromBeginning = true,
RequestContinuation = continuation,
MaxItemCount = -1,
});
It returns all the results from all partitions. Is there a way to restrict the results from single partition only?
Changefeed works at a PartitionKey Range level.
What are partition key ranges?
Document Db currently has 10 GB Physical partitions.
The partition key that you specify is the Logical Partition Key.
Document Db internally maps this logical partition key to a Physical Partition using a hash.
So its possible that a bunch of logical partitions are sharing the same physical partition.
So a physical partition is assigned for a range of these hashes.
The minimum grain that is allowed to read from changefeed would be Partition key ranges.
So for the you would have to query the partition key range id for the partition that you are interested in. Then query the Changefeed for that range id and filter out the data that is not associated to the partition id.
Note: Document db transparently creates new physical partitions if a particular partition gets full. So the partition key range id for a given logical partition could change over time.
This link explains this in good detail:
https://learn.microsoft.com/en-us/azure/cosmos-db/partition-data#partitioning-in-azure-cosmos-db

Losing data on bulk inserts in Cassandra

I'm in trouble for losing data on insert in my Cassandra.
I am doing great bulk inserts from csv files which I read via Stream. The data is duplicated into two tables, because of different queries. Every 30,000th element I split my data to new partition (chunkCounter).
private PersistenceInformation persist(final String period, final String tradePartner, final Integer version, Stream<Transaction> transactions) {
int elementsInChunkCounter = 0;
int chunkCounter = 1;
int elementCounter = 0;
Iterator<Transaction> iterator = transactions.filter(beanValidator).iterator();
List<List<?>> listImportData = new ArrayList<>(30000);
List<List<?>> listGtins = new ArrayList<>(30000);
while (iterator.hasNext()) {
Transaction tr = iterator.next();
List<Object> importTemp = new ArrayList<>(9);
importTemp.add(period);
importTemp.add(tradePartner);
importTemp.add(version);
importTemp.add(chunkCounter);
importTemp.add(tr.getMdhId());
importTemp.add(tr.getGtin());
importTemp.add(tr.getQuantity());
importTemp.add(tr.getTransactionId());
importTemp.add(tr.getTimestamp());
listImportData.add(importTemp);
List<Object> gtinTemp = new ArrayList<>(8);
gtinTemp.add(period);
gtinTemp.add(tradePartner);
gtinTemp.add(version);
gtinTemp.add(chunkCounter);
gtinTemp.add(tr.getMdhId());
gtinTemp.add(tr.getGtin());
gtinTemp.add(tr.getQuantity());
gtinTemp.add(tr.getTimestamp());
listGtins.add(gtinTemp);
elementsInChunkCounter++;
elementCounter++;
if (elementsInChunkCounter == 30000) {
elementsInChunkCounter = 0;
chunkCounter++;
ingestImportData(listImportData);
listImportData.clear();
ingestGtins(listGtins);
listGtins.clear();
}
}
if (!listImportData.isEmpty()) {
ingestImportData(listImportData);
}
if (!listGtins.isEmpty()) {
ingestGtins(listGtins);
}
return new PersistenceInformation();
}
private void ingestImportData(List<List<?>> list) {
String cqlIngest = "INSERT INTO import_data (pd, tp , ver, chunk, mdh_id, gtin, qty, id, ts) VALUES (?,?,?,?,?,?,?,?,?)";
cassandraOperations.ingest(cqlIngest, list);
}
private void ingestGtins(List<List<?>> list) {
String cqlIngest = "INSERT INTO gtins (pd, tp, ver, chunk, mdh_id, gtin, qty, ts) VALUES (?,?,?,?,?,?,?,?)";
cassandraOperations.ingest(cqlIngest, list);
}
This worked pretty well until I noticed that sometimes a dataset goes missing. There is an entry in the second table (gtins) but the data set in the main table was not inserted. The application counted it but the database did not write it.
The table is built this way:
CREATE TABLE import_data (
tp text,
pd text,
ver int,
chunk int,
mdh_id uuid,
gtin text,
qty float,
id text,
ts timestamp
PRIMARY KEY ((tp, pd, ver, chunk), ts, mdh_id)) WITH CLUSTERING ORDER BY (ts DESC);
The mdh_id is a UUID from my application, so that every data set has a unique key and is not accidentally overridden.
The Cassandra log files didn't even show a warning.
At the moment I am evaluating BatchStatement but I need to insert every 8th dataset because of the 5kb limit, otherwise the database lost even more entries.
Any suggestions whats going wrong in my application is highly appreciated. Thanks a lot.

Insert is 10 times faster than Update in Cassandra. Is it normal?

In my Java application accessing Cassandra, it can insert 500 rows per second, but only update 50 rows per second(actually the updated rows didn't exist).
Updating one hundred fields is as fast as updating one field.
I just use CQL statements in the Java application.
Is this situation normal? How can I improve my application?
public void InsertSome(List<Data> data) {
String insertQuery = "INSERT INTO Data (E,D,A,S,C,......) values(?,?,?,?,?,.............); ";
if (prepared == null)
prepared = getSession().prepare(insertQuery);
count += data.size();
for (int i = 0; i < data.size(); i++) {
List<Object> objs = getFiledValues(data.get(i));
BoundStatement bs = prepared.bind(objs.toArray());
getSession().execute(bs);
}
}
public void UpdateOneField(Data data) {
String updateQuery = "UPDATE Data set C=? where E=? and D=? and A=? and S=?; ";
if (prepared == null)
prepared = getSession().prepare(updateQuery);
BoundStatement bs = prepared.bind(data.getC(), data.getE(),
data.getD(), data.getA(), data.getS());
getSession().execute(bs);
}
public void UpdateOne(Data data) {
String updateQuery = "UPDATE Data set C=?,U=?,F........where E=? and D=? and A=? and S=? and D=?; ";
if (prepared == null)
prepared = getSession().prepare(updateQuery);
......
BoundStatement bs = prepared.bind(objs2.toArray());
getSession().execute(bs);
}
Schema:
Create Table Data (
E,
D,
A,
S,
D,
C,
U,
S,
...
PRIMARY KEY ((E
D),
A,
S)
) WITH compression = { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 64 }
AND compaction = { 'class' : 'LeveledCompactionStrategy' };
Another scenario:
I used the same application to access another cassandra cluster. The result was different. UPDATE was as fast as INSERT. But it only INSERT/UPDATE 5 rows per second. This cassandra cluster is the DataStax Enterprise running on GCE(I used the default DataStax Enterprise on Google Cloud Launcher)
So I think it's probably that some configurations are the reasons. But I don't know what they are.
Conceptually UPDATE and INSERT are the same so I would expect similar performance. UPDATE doesn't check to see if the data already exists (unless you are doing a lightweight transaction with IF EXISTS).
I noticed that each of your methods prepare a statement if it is not null. Is it possible the statement is being reprepared each time? That would add for a roundtrip for every method invocation. I also noticed that InsertSome does multiple inserts per invocation, where UpdateOne / UpdateOneField execute one statement. So if the statement were prepared every time, thats an invocation per update, where it's only done once per insert for a list.
Cassandra uses log-structured merge trees for an on-disk format, meaning all writes are done sequentially (the database is the append-only log). That implies a lower write latency.
At the cluster level, Cassandra is also able to achieve greater write scalability by partitioning the key space such that each machine is only responsible for a portion of the keys. That implies a higher write throughput, as more writes can be done in parallel.

Cassandra Datastax driver not returning rows via prepare

I am using Cassandra 1.2.5 and Secondary Index. When I run prepared statement no data returned. I have data. Also for the index column I do have duplicate values. What I am doing is retruning a list of video_id based on a user_id. The tables Describe looks like this:
[default#video] describe videos;
WARNING: CQL3 tables are intentionally omitted from 'describe' output.
See https://issues.apache.org/jira/browse/CASSANDRA-4377 for details.
ColumnFamily: videos
Key Validation Class: org.apache.cassandra.db.marshal.IntegerType
Default column value validator: org.apache.cassandra.db.marshal.IntegerType
Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 0.1
DC Local Read repair chance: 0.0
Populate IO Cache on flush: false
Replicate on write: true
Caching: ALL
Bloom Filter FP chance: default
Built indexes: [videos.videos_user_id_idx]
Column Metadata:
Column Name: video_id
Validation Class: org.apache.cassandra.db.marshal.IntegerType
Column Name: user_id
Validation Class: org.apache.cassandra.db.marshal.IntegerType
Index Name: videos_user_id_idx
Index Type: KEYS
Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
My code looks like this:
int concurrency = 3;
//final BoundStatement query = null;
try {
// Create session to hosts
Cluster cluster = new Cluster.Builder().addContactPoints(String.valueOf("localhost")).build();
// final int maxRequestsPerConnection = 10;
// int maxConnections = concurrency / maxRequestsPerConnection + 1;
int maxConnections = 3;
PoolingOptions pools = cluster.getConfiguration().getPoolingOptions();
pools.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.LOCAL, concurrency);
pools.setCoreConnectionsPerHost(HostDistance.LOCAL, maxConnections);
pools.setMaxConnectionsPerHost(HostDistance.LOCAL, maxConnections);
pools.setCoreConnectionsPerHost(HostDistance.REMOTE, maxConnections);
pools.setMaxConnectionsPerHost(HostDistance.REMOTE, maxConnections);
Session session = cluster.connect();
//get list of video ids
String cql1 = "SELECT video_id from video.videos WHERE user_id=?";
com.datastax.driver.core.PreparedStatement stmt = session.prepare(cql1);
BoundStatement b = stmt.bind();
BigInteger i = BigInteger.valueOf(9);
b.setVarint("user_id",i);
long start, end;
start = System.nanoTime();
com.datastax.driver.core.ResultSet rs1 = session.execute(b);
end = System.nanoTime();
System.out.println("Datastax driver CQL Query prepared overall time ns:"
+ (end - start));
while(rs1.iterator().hasNext()) {
System.out.println("user_id:" + rs1.iterator().next().getVarint("video_id"));
}
Note even if I change the statement to replace the ? with a value of 9 I still get no rows back.
Any Ideas what I am doing wrong??
Thanks,
-Tony
Try this code to retrieve data:
Cluster cluster = Cluster.builder()
.addContactPoint("127.0.0.1")
// .addContactPoint("some.other.ip")
.build();
Session session = cluster.connect();
String statement = "SELECT * FROM pixel.user;";
// String statement = "SELECT video_id from video.videos WHERE user_id=9";
session.execute(statement);
ResultSet rs = session.execute(statement);
for(Row r : rs.all())
System.out.println(r.toString());
Once you got the basics down, its time for the bound statement:
int user_id = 9;
String statement = "SELECT * from video.videos WHERE user_id=?";
PreparedStatement pStatement = session.prepare(statement);
BoundStatement boundStatement = new BoundStatement(pStatement);
PreparedStatement ps = session.prepare(statement);
BoundStatement bs = ps.bind();
bs.bind(user_id); // a csv list: bs.bind(9, "string val of second ?, etc...");
// session.execute(bs);
ResultSet rs = session.execute(bs);

Cassandra CQL3 Composite keys return duplicate values

I am new to CQL & composite keys (I previously used CLI)
I am looking to implement my old super-column-family with composite keys instead.
In short, my look-up model is:
blocks[file_id][position][block_id]=size
I have the folowing CQL table with composite keys:
CREATE TABLE blocks (
file_id text,
start_position bigint,
block_id text,
size bigint,
PRIMARY KEY (file_id, start_position,block_id)
);
I insert these sample values:
/*Example insertions*/
INSERT INTO blocks (file_id, start_position, block_id,size) VALUES ('test_schema_file', 0, 'testblock1', 500);
INSERT INTO blocks (file_id, start_position, block_id,size) VALUES ('test_schema_file', 500, '2testblock2', 501);
I query using this Astyanax code:
OperationResult result = m_keyspace.prepareQuery(m_BlocksTable).getKey(file).execute();
ColumnList<BlockKey> columns = (ColumnList<BlockKey>) result.getResult();
for (Column<BlockKey> column : columns) {
System.out.println(StaticUtils.fieldsToString(column.getName()));
try{
long value=column.getLongValue();
System.out.println(value);
}catch(Exception e){
System.out.println("Can't get size");
}
}
When I iterate over the result, I get 2 results for each column. One that contains a "size", and one where a "size" column doesn't exist.
recorder.data.models.BlockKey Object {
m_StartPosition: 0
m_BlockId: testblock1
m_Extra: null
}
Can't get size
recorder.data.models.BlockKey Object {
m_StartPosition: 0
m_BlockId: testblock1
m_Extra: size
}
500
recorder.data.models.BlockKey Object {
m_StartPosition: 500
m_BlockId: 2testblock2
m_Extra: null
}
Can't get size
recorder.data.models.BlockKey Object {
m_StartPosition: 500
m_BlockId: 2testblock2
m_Extra: size
}
501
So I have two questions:
Theoretically I do not need a size column, it should be a value of the composite key: blocks[file_id][position][block_id]=size instead of blocks[file_id][position][block_id]['size'] = size. . How do I correctly insert this data in CQL3 without creating the redundant size column?
Why am I getting the extra column without 'size', if I never inserted such a row?
The 'duplicates' are because, with CQL, there are extra thrift columns inserted to store extra metadata. With your example, from cassandra-cli you can see what's going on:
[default#ks1] list blocks;
------------------- RowKey: test_schema_file
=> (column=0:testblock1:, value=, timestamp=1373966136246000)
=> (column=0:testblock1:size, value=00000000000001f4, timestamp=1373966136246000)
=> (column=500:2testblock2:, value=, timestamp=1373966136756000)
=> (column=500:2testblock2:size, value=00000000000001f5, timestamp=1373966136756000)
If you insert data with CQL, you should query with CQL too. You can do this with Astyanax by using m_keyspace.prepareCqlStatement().withCql("SELECT * FROM blocks").execute();.

Resources