I am unable to understand how Cassandra counters are stored on the disk.
Create test table
create table testcounter (
id text,
count counter,
PRIMARY KEY(id))
WITH compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
Add data
update testcounter set count = count + 10 where id = 'testrow';
Check sstable
nodetool flush test testcounter
sstabledump /usr/local/var/lib/cassandra/data/test/testcounter-87d6ae20908e11e9a5779f988085883a/mc-1-big-Data.db
Response from sstabledump
[
{
"partition" : {
"key" : [ "testrow" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 63,
"cells" : [
{ "name" : "count", "value" : 422215477737628, "tstamp" : "2019-06-16T23:30:34.423470Z" }
]
}
]
}
Update existing data
update testcounter set count = count + 10 where id = 'testrow';
update testcounter set count = count + 10 where id = 'testrow';
Flush
nodetool flush test testcounter
At this point, there are two sets of db files.
ls /usr/local/var/lib/cassandra/data/test/testcounter-87d6ae20908e11e9a5779f988085883a/
backups mc-1-big-Digest.crc32 mc-1-big-Statistics.db mc-2-big-CompressionInfo.db mc-2-big-Filter.db mc-2-big-Summary.db
mc-1-big-CompressionInfo.db mc-1-big-Filter.db mc-1-big-Summary.db mc-2-big-Data.db mc-2-big-Index.db mc-2-big-TOC.txt
mc-1-big-Data.db mc-1-big-Index.db mc-1-big-TOC.txt mc-2-big-Digest.crc32 mc-2-big-Statistics.db
sstabledump for mc-1
[
{
"partition" : {
"key" : [ "testrow" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 63,
"cells" : [
{ "name" : "count", "value" : 422215477737628, "tstamp" : "2019-06-16T23:30:34.423470Z" }
]
}
]
}
sstabledump for mc-2
[
{
"partition" : {
"key" : [ "testrow" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 65,
"cells" : [
{ "name" : "count", "value" : 422215477737628, "tstamp" : "2019-06-16T23:34:37.245893Z" }
]
}
]
}
It looks like there are no tombstones and even the counter values are not stored. What is happening behind-the-scenes?
After 2.1 its actually a read before write then stores essentially a packed tuple which isnt very obvious or easy to deserialize. Might be worth opening a jira to have sstabledump deserialize the context and make it more readable.
For more details see: https://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
Related
I have a dataset like this:
{
"_id" : ObjectId("5ede1b6c317aca326c2f18d7"),
"createdate" : ISODate("2020-06-11T18:30:00.000Z"),
"userHolder" : [
{
"time" : "12:00",
"user" : [
"5ede1ff42b3e633edc0ba10e"
]
},
{
"time" : "16:30",
"user" : []
}
],
},
{
"_id" : ObjectId("5ede1b6c317aca326c2f18d8"),
"createdate" : ISODate("2020-06-121T18:30:00.000Z"),
"userHolder" : [
{
"time" : "12:30",
"user" : [
"5ede1ff42b3e633edc0ba10f"
]
},
{
"time" : "13:00",
"user" : [
"5ede1ff42b3e633edc0ba10e"
]
},
{
"time" : "12:00",
"user" : [
"5ede1ff42b3e633edc0ba10f"
]
},
{
"time" : "16:30",
"user" : []
}
],
}
I split the half hour entry. i,e full day 48 columns on userHolder columns. Like 12:30, 13:00, 13:30 and so on. If user not have entry then that column will not create.
So if I want to search 5ede1ff42b3e633edc0ba10e this id on the complete table then how to write the query.
I tried to use >$all operator but this not works on nested structure.
There is a $elemMatch but for that query will be too large as I have to write the 48 conditions of timestamp. Expected result is query return the _id of the entry so that it will clear that these id will exist on n numbers of entry. I want the Data not count.
Any help is really appreciated for that.
I have a document in the below format. The goal is to group the document by student name and sort it by rank in the ascending order. Once that is done, iterate through the rank(within a student) and if each subsequent rank is greater than the previous one, the version field needs to be incremented. As part of a pipeline, student_name will be passed to me so matching by student name should be good instead of grouping.
NOTE: Tried it with python and works to some extent. A python solution would also be great!
{
"_id" : ObjectId("5d389c7907bf860f5cd11220"),
"class" : "I",
"students" : [
{
"student_name" : "AAA",
"Version" : 2,
"scores" : [
{
"value" : "50",
"rank" : 2
},
{
"value" : "70",
"rank" : 1
}
]
},
{
"student_name" : "BBB",
"Version" : 5,
"scores" : [
{
"value" : 80,
"rank" : 2
},
{
"value" : 100,
"rank" : 1
},
{
"value" : 100,
"rank" : 1
}
]
}
]
}
I tried this piece of code to sort
def version(student_name):
db.column.aggregate(
[
{"$unwind": "$students"},
{"$unwind": "$students.scores"},
{"$sort" : {"students.scores.rank" : 1}},
{"$group" : {"students.student_name}
]
)
for i in range(0,(len(students.scores)-1)):
if students.scores[i].rank < students.scores[i+1].rank:
tag.update_many(
{"$inc" : {"students.Version":1}}
)
The expected output for student AAA should be
{
"_id" : ObjectId("5d389c7907bf860f5cd11220"),
"class" : "I",
"students" : [
{
"student_name" : "AAA",
"Version" : 3, #version incremented
"scores" : [
{
"value" : "70",
"rank" : 1
},
{
"value" : "50",
"rank" : 2
}
]
}
I was able to sort the document.
pipeline = [
{"$unwind": "$properties"},
{"$unwind": "$properties.values"},
{"$sort" : {"$properties.values.rank" : -1}},
{"$group": {"_id" : "$properties.property_name", "values" : {"$push" : "$properties.values"}}}
]
import pprint
pprint.pprint(list(db.column.aggregate(pipeline)))
I want to view the "rowkey" with its stored data in cassandra 3.0. I know, the depreciated cassandra-cli had the 'list'-command. However, in cassandra 3.0, I cannot find the replacement for the 'list'-command. Anyone knows the new cli-command for 'list'?
You can use sstabledump utility as #chris-lohfink suggested. How to use it? Create keyspace, table in it populate some data:
cqlsh> CREATE KEYSPACE IF NOT EXISTS minetest WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> CREATE TABLE object_coordinates (
... object_id int PRIMARY KEY,
... coordinate text
... );
cqlsh> use minetest;
cqlsh:minetest> insert into object_coordinates (object_id, coordinate) values (564682,'59.8505,34.0035');
cqlsh:minetest> insert into object_coordinates (object_id, coordinate) values (1235,'61.7814,40.3316');
cqlsh:minetest> select object_id, coordinate, writetime(coordinate) from object_coordinates;
object_id | coordinate | writetime(coordinate)
-----------+-----------------+-----------------------
1235 | 61.7814,40.3316 | 1480436931275615
564682 | 59.8505,34.0035 | 1480436927707627
(2 rows)
object_id is a primary (partition key) key, coordinate is clustering one.
Flush changes to disk:
# nodetool flush
Find sstable on disk and analyze it:
# cd /var/lib/cassandra/data/minetest/object_coordinates-e19d4c40b65011e68563f1a7ec2d3d77
# ls
backups mc-1-big-CompressionInfo.db mc-1-big-Data.db mc-1-big-Digest.crc32 mc-1-big-Filter.db mc-1-big-Index.db mc-1-big-Statistics.db mc-1-big-Summary.db mc-1-big-TOC.txt
# sstabledump mc-1-big-Data.db
[
{
"partition" : {
"key" : [ "1235" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 18,
"liveness_info" : { "tstamp" : "2016-11-29T16:28:51.275615Z" },
"cells" : [
{ "name" : "coordinate", "value" : "61.7814,40.3316" }
]
}
]
},
{
"partition" : {
"key" : [ "564682" ],
"position" : 43
},
"rows" : [
{
"type" : "row",
"position" : 61,
"liveness_info" : { "tstamp" : "2016-11-29T16:28:47.707627Z" },
"cells" : [
{ "name" : "coordinate", "value" : "59.8505,34.0035" }
]
}
]
}
]
Or with -d flag:
# sstabledump mc-1-big-Data.db -d
[1235]#0 Row[info=[ts=1480436931275615] ]: | [coordinate=61.7814,40.3316 ts=1480436931275615]
[564682]#43 Row[info=[ts=1480436927707627] ]: | [coordinate=59.8505,34.0035 ts=1480436927707627
Output says that 1235 and 564682 and saves coordinates in those partitions.
Link to doc http://www.datastax.com/dev/blog/debugging-sstables-in-3-0-with-sstabledump
PS. sstabledump is provided by cassandra-tools package in ubuntu.
According to this issue, Cassandra's storage format was updated in 3.0.
If previously I could use cassandra-cli to see how the SSTable is built, to get something like this:
[default#test] list phonelists;
-------------------
RowKey: scott
=> (column=, value=, timestamp=1374684062860000)
=> (column=phonenumbers:bill, value='555-7382', timestamp=1374684062860000)
=> (column=phonenumbers:jane, value='555-8743', timestamp=1374684062860000)
=> (column=phonenumbers:patricia, value='555-4326', timestamp=1374684062860000)
-------------------
RowKey: john
=> (column=, value=, timestamp=1374683971220000)
=> (column=phonenumbers:doug, value='555-1579', timestamp=1374683971220000)
=> (column=phonenumbers:patricia, value='555-4326', timestamp=137468397122
What would the internal formal look like in the latest version of Cassandra? Could you provide an example?
What utility can I use to see the internal representation of the table in Cassandra in a way listed above, but with a new SSTable format?
All that I have found on the internet is that the partition header how stores column names, row stores clustering values and that there are no duplicated values.
How can I look into it?
Prior to 3.0 sstable2json was a useful utility for getting an understanding of how data is organized in SSTables. This feature is not currently present in cassandra 3.0, but there will be an alternative eventually. Until then myself and Chris Lohfink have developed an alternative to sstable2json (sstable-tools) for Cassandra 3.0 which you can use to understand how data is organized. There is some talk about bringing this into cassandra proper in CASSANDRA-7464.
A key differentiator between the storage format between older verisons of Cassandra and Cassandra 3.0 is that an SSTable was previously a representation of partitions and their cells (identified by their clustering and column name) whereas with Cassandra 3.0 an SSTable now represents partitions and their rows.
You can read about these changes in more detail by visiting this blog post by the primary developer of these changes who does a great job explaining it in detail.
The largest benefit you will see is that in the general case your data size will shrink (in some cases by a large factor), as a lot of the overhead introduced by CQL has been eliminated by some key enhancements.
Here's an example showing the difference between C* 2 and 3.
Schema:
create keyspace demo with replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
use demo;
create table phonelists (user text, person text, phonenumbers text, primary key (user, person));
insert into phonelists (user, person, phonenumbers) values ('scott', 'bill', '555-7382');
insert into phonelists (user, person, phonenumbers) values ('scott', 'jane', '555-8743');
insert into phonelists (user, person, phonenumbers) values ('scott', 'patricia', '555-4326');
insert into phonelists (user, person, phonenumbers) values ('john', 'doug', '555-1579');
insert into phonelists (user, person, phonenumbers) values ('john', 'patricia', '555-4326');
sstable2json C* 2.2 output:
[
{"key": "scott",
"cells": [["bill:","",1451767903101827],
["bill:phonenumbers","555-7382",1451767903101827],
["jane:","",1451767911293116],
["jane:phonenumbers","555-8743",1451767911293116],
["patricia:","",1451767920541450],
["patricia:phonenumbers","555-4326",1451767920541450]]},
{"key": "john",
"cells": [["doug:","",1451767936220932],
["doug:phonenumbers","555-1579",1451767936220932],
["patricia:","",1451767945748889],
["patricia:phonenumbers","555-4326",1451767945748889]]}
]
sstable-tools toJson C* 3.0 output:
[
{
"partition" : {
"key" : [ "scott" ]
},
"rows" : [
{
"type" : "row",
"clustering" : [ "bill" ],
"liveness_info" : { "tstamp" : 1451768259775428 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-7382" }
]
},
{
"type" : "row",
"clustering" : [ "jane" ],
"liveness_info" : { "tstamp" : 1451768259793653 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-8743" }
]
},
{
"type" : "row",
"clustering" : [ "patricia" ],
"liveness_info" : { "tstamp" : 1451768259796202 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-4326" }
]
}
]
},
{
"partition" : {
"key" : [ "john" ]
},
"rows" : [
{
"type" : "row",
"clustering" : [ "doug" ],
"liveness_info" : { "tstamp" : 1451768259798802 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-1579" }
]
},
{
"type" : "row",
"clustering" : [ "patricia" ],
"liveness_info" : { "tstamp" : 1451768259908016 },
"cells" : [
{ "name" : "phonenumbers", "value" : "555-4326" }
]
}
]
}
]
While the output is larger (that is more of a consequence of the tool). The key differences you can see are:
Data is now a collection of Partitions and their Rows (which include cells) instead of a collection of Partitions and their Cells.
Timestamps are now at the row level (liveness_info) instead of at the cell level. If some row cells differentiate in their timestamps, the new storage engine does delta encoding to save space and associated the difference at the cell level. This also includes TTLs. As you can imagine this saves a lot of space if you have a lot of non-key columns as the timestamp does not need to be repeated.
The clustering information (in this case we are clustered on 'person') is now present at the Row level instead of cell level, which saves a bunch of overhead as the clustering column values don't have to be at the cell level.
I should note that in this particular example data case the benefits of the new storage engine aren't completely realized since there is only 1 non-clustering column.
There are a number of other improvements not shown here (like the ability to store row-level range tombstones).
I Have the following Collection :
/* 0 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b4f1cb3d2eacb1300002b"),
"answers" : [],
"questions" : []
}
/* 1 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b6b9eb3d2eacb1300002c"),
"answers" : [
"1",
"8"
],
"questions" : [
"1",
"2",
"3"
]
}
/* 2 */
{
"clientID" : ObjectId("51b9c10d91d1a3a52b0000b8"),
"_id" : ObjectId("532b6baeb3d2eacb1300002d"),
"answers" : [
"1",
"8"
],
"questions" : [
"1",
"2",
"3"
]
}
/* 3 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("533b828146ca43634000002d"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
/* 4 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("5351be327b539a4d1a00002b"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
/* 5 */
{
"clientID" : ObjectId("5335f9d864e2b1290c00012e"),
"_id" : ObjectId("5351be5ec89d717d1a00002b"),
"answers" : [
"ORANGE"
],
"questions" : [
"Color"
]
}
I am running the following code in order to find how many times the (questions,answers) combination appears in the collection:
o.map= function(){
emit({"questions" : this.questions, "answers" :this.answers },this.clientID)
};
o.reduce = function(answers, collection){
return collection.length;
};
logSearchDB.mapReduce(o,function (err, results) {
results.sort(function(a, b){return b.value-a.value});
for (var i = 0; i < results.length; i++) {
console.log(JSON.stringify(results[i]))
};
})
The output is:
{"_id":{"questions":[],"answers":[]},"value":"51b9c10d91d1a3a52b0000b8"}
{"_id":{"questions":["Color"],"answers":["ORANGE"]},"value":3}
{"_id":{"questions":["1","2","3"],"answers":["1","8"]},"value":2}
I expected that the first row will have "value" : 1
I guess the 'reduce' function got a 'collection' object : "51b9c10d91d1a3a52b0000b8", instead of getting an array : ["51b9c10d91d1a3a52b0000b8"].
Why the map reduce doesn't collect everything into an array?
The reason why you have just a plain value in that first row is because there was only one occurrence of your key value. This is generally how mapReduce works, at least in the way it was specified in the original papers.
So the reduce function is not actually called when there only is a single key. To work around this you use the finalize function in your map reduce:
var finalize = function(key,value) {
if ( typeof(value) != "number" )
value = 1;
return value;
};
db.collection.mapReduce(
mapper,
reducer,
{
"finalize": finalize,
"out": { "inline": 1 }
}
);
That runs over all of the output and sees that when the value is seen to be not a nunber, being the clientID you are emitting, then the value is set at 1 because that is how hany are in the grouping.
Really your query is better suited to the aggregation framework than mapReduce. The aggregation framework is a native code implementation as opposed to using a JavaScript interpreter. It runs much faster than mapReduce:
db.collection.aggregate([
{ "$group": {
"_id": {
"questions": "$questions",
"answers": "$answers"
},
"count": { "$sum": 1 }
}}
])
So it is the better option to use. It was a later introduction to MongoDB so people still tend to think in terms of mapReduce or otherwise there is legacy code from earlier versions of MongoDB. But this has been around for quite a while now.
Also see the operator reference for the aggregation framework.