cassandra pagination: the difference between driver and CQL - cassandra

I am reading driver pagination in here. but CQL also support LIMIT in WHERE clause. Wonder what is the difference between these two

Pagination is how much you of your result you work with at a time.
WHERE and LIMIT are about what is in your result.
Imagine you request all rows where X < 100. This may refer to 1 million different rows. If the client or the server requested all of this at once it would cause a lot of resource pressure. To avoid this the driver is capable of asking for just a few rows at a time. This allows the client and the server to work with a stream of items rather than allocating space for everything up front.

Related

What is the Impact of ALLOW FILTERING on Cassandra?

According to official Cassandra blog, ALLOW FILTERING is highly inefficient. But if for some reason one has to use such query, what would be the impact on other applications that use Cassandra to get data? Would only the thread(s) that are busy fetching rows for my query would be slow, or would whole Cassandra would be slow, and consequently, all other applications that are getting data from Cassandra will get their response slow?
It will likely affect the whole node. A problem around it is that your one query with a limit of 10 will not just read 10 records and return, but (possibly) a LOT of data. It is possible to make efficient ALLOW FILTERING queries, which things like the spark driver (token limited queries per token range or within a partition) can do. I would very strongly recommend not even attempting it though. It might work at first but your poor operations team will curse your name.
With faster disks, the obj allocations since this is unthrottled will cause serious GC overhead. This is very similiar to the issue seen when using queues or a lot of tombstones, the JVM building and throwing away the rows overruns the allocation rate the garbage collector can keep up with without longer pauses (early promotions, fragmentation in cms, allocation spikes messing up g1 younggen ratios).
If cross partitions, like with normal range queries, the coordinator will attempt to estimate the ranges it will need to read and the replicas for them to fan out with some limited concurrency. Its a rough estimate because it only has its own data to extrapolate but when the data is then further filtered and not just "number of partitions within range" its likely gonna be wrong and underestimate. Most likely it will query one range at a time, querying next replica set range if it isnt met. With vnodes this can be a very long list, and sequentially walking them will likely not complete within timeout. Luckily this will impact mostly just the one query, but it is still essentially reading the entire dataset off disk from every replica set in the cluster from 1 query. If you make 100/sec the cluster will probably be hosed.

Will Elasticsearch survive this much load or simply die?

We have Elasticsearch Server with 1 cluster 3 Nodes, we are expecting that queries fired per second will be 800-1000, so we want to know if we get load like 1000 queries per second then will the elasticsearch server respond with delays or it will simply stop working ?
Queries are all query_string, fuzzy (prefix & wildcard queries are not used).
There's a few factors to consider assuming that your network has the necessary throughput:
What's the CPU speed and number of cores for each node?
Should have 2GHZ quad cores at the very least. Also the nodes should be dedicated to ELK, so they aren't busy with other tasks.
How much ram do your nodes have?
Probably want to be north of 10GB at least
Are your logs filtered and indexed?
Having your logs filtered will greatly reduce the work load generated by the queries. Additionally, filtered logs can make it so that you don't have to query as much with wild cards (which are very expensive).
Hope that helps point in a better direction :)
One immediate suggestion: if you are expecting sustained query rates of 800 - 1K/sec you do not want the nodes storing the data (which will be handling indexing of new records, merging and shard rebalancing) to also be having to deal with query scatter/gather operations. Consider a client + data node topology where you keep your 3 nodes and add n client nodes (data and master set to false in their configs.) The actual value for n will vary based on your actual performance; this will be something you'll want to determine via experimentation.
Other factors equal or unknown, abundant memory is a good resource to have. Review the Elastic team's guidance on hardware and be sure to link through to the discussion on heap.

Fetchsize for large result

I'm using the Datastax Java driver and have a partition key that has around 750000 items that I'd like to iterate over. I currently hit a ReadTimeoutException. Will setting Statement#setFetchSize(2000) be all I need to do to avoid the timeout (assuming I have memory in my client, that is)? Or will I need to do the paging myself manually?
Assuming you are using the driver with protocol v2 or higher, this should be all you need. Automatic paging will occur under the hood, returning up to 2000 rows at a time.

Cassandra multi row selection

Somewhere I have heard that using multi row selection in cassandra is bad because for each row selection it runs new query, so for example if i want to fetch 1000 rows at once it would be the same as running 1000 separate queries at once, is that true?
And if it is how bad would it be to keep selecting around 50 rows each time page is loaded if say i have 1000 page views in a single minute, would it severely slow cassandra down or not?
P.S I'm using PHPCassa for my project
Yes, running a query for 1000 rows is the same as running 1000 queries (if you use the recommended RandomPartitioner). However, I wouldn't be overly concerned by this. In Cassandra, querying for a row by its key is a very common, very fast operation.
As to your second question, it's difficult to tell ahead of time. Build it and test it. Note that Cassandra does use in memory caching so if you are querying the same rows then they will cache.
We are using Playorm for Cassandra and there is a "findAll" pattern there which provides support to fetch all rows quickly. Visit
https://github.com/deanhiller/playorm/wiki/Support-for-retrieving-many-entities-in-parallel for more details.
1) I have little bit debugged the Cassandra code base and as per my observation to query multiple rows at the same time cassandra has provided the multiget() functionality which is also inherited in phpcassa.
2) Multiget is optimized to to handle the batch request and it saves your network hop.(like for 1k rows there will be 1k round trips, so it definitely reduces the time for 999 round trips)
3) More about multiget() in phpcassa: php cassa multiget()

Cassandra multiget performance

I've got a cassandra cluster with a fairly small number of rows (2 million or so, which I would hope is "small" for cassandra). Each row is keyed on a unique UUID, and each row has about 200 columns (give or take a few). All in all these are pretty small rows, no binary data or large amounts of text. Just short strings.
I've just finished the initial import into the cassandra cluster from our old database. I've tuned the hell out of cassandra on each machine. There were hundreds of millions of writes, but no reads. Now that it's time to USE this thing, I'm finding that read speeds are absolutely dismal. I'm doing a multiget using pycassa on anywhere from 500 to 10000 rows at a time. Even at 500 rows, the performance is awful sometimes taking 30+ seconds.
What would cause this type of behavior? What sort of things would you recommend after a large import like this? Thanks.
Sounds like you are io-bottlenecked. Cassandra does about 4000 reads/s per core, IF your data fits in ram. Otherwise you will be seek-bound just like anything else.
I note that normally "tuning the hell" out of a system is reserved for AFTER you start putting load on it. :)
See:
http://spyced.blogspot.com/2010/01/linux-performance-basics.html
http://www.datastax.com/docs/0.7/operations/cache_tuning
Is it an option to split up the multi-get into smaller chunks? By doing this you would be able to spread your get across multiple nodes, and potentially increase your performance, both by spreading the load across nodes and having smaller packets to deserialize.
That brings me to the next question, what is your read consistency set to? In addition to an IO bottleneck as #jbellis mentioned, you could also have a network traffic issue if you are requiring a particularly high level of consistency.

Resources