Where is slow queries data in OpsCenter read from? - cassandra

Since our former data model is not very correct, the Slow queries panel shows that there are some queries which are performing slowly.
As I am planing to redesign the data model, I want to clear out the old information displayed in this panel, so I can see only information about my new data model. However, I do not know where OpsCenter is reading this data from.
My idea is that if this information is stored in a table or file, I can truncate or delete them. Or am I totally wrong with that assumption and this could be done by a configuration file modification or something similar instead?
OpsCenter Version: 6.0.3
Cassandra Version: 2.1.15.1423
DataStax Enterprise Version: 4.8.10

It follows dse_perf.node_slow_log. Each node will track new events in the log as they occur, and store their top X. When viewing it in UI it gets the top X from each node and merges them. To "reset" you can truncate the log and restart the datastax agents to clear its current top X. There is a feature to reset for you in future but in 6.0.3 its a little difficult.

Related

Cursor based pagination in Cassandra?

I am planning to build cursor-based pagination APIs for the data in Cassandra.
Does Cassandra Java driver support cursor-based pagination? Or what would be a good approach to achieve this?
Please review this DataStax document as it talks about large result sets and how to save "state" to continue where you left off:
https://docs.datastax.com/en/developer/java-driver-dse/1.2/manual/paging/
In particular, the piece I believe that discusses what you're trying to do:
"Saving and reusing the paging state
Sometimes it is convenient to save the paging state in order to restore it later. For example, consider a stateless web service that displays a list of results with a link to the next page. When the user clicks that link, we want to run the exact same query, except that the iteration should start where we stopped on the previous page."

ScyllaDB 2.1 - Inconsistency with Materialized View

While deciding on the technology stack for my own product, I decided to go with scyllaDB for database due to it's impressive performance.
For local development, I setup Cassandra on my Macbook.
Considering ScyllaDB now supports (experimental) MV (Materialized View), it made the development easy. For dev server, I'm running ScyllaDB on Ubuntu 16.04 hosted on Linod.
I am facing following issues :
After a few weeks, one day when I deleted an entry from base table (from ScyllaDB running on Ubuntu) using the partition key, the respective MV still showed the respective entry for the deleted record.
It was fixed after I dropped the whole Key-Space and recreated it, but I'm unable to pinpoint what caused this inconsistency.
When I dropped the MV and recreated it, it did not copy the old data.
I tried to search, but could not find a way to force MV to read from base table and populate itself.
For the first issue, I would like to know if anyone faced similar scenario. Also if there is anything I can do to prevent this from happening or if it can't be prevented and that is what it means to be "experimental".
Any help or reference is appreciated.
In 2.1 Scylla lacked view building (that is, using existing data to populate a view on creation), but that is solved in 2.2.
Indeed the MV status of 2.1 is incomplete. It gotten much better in 2.2 which will be released this week. It's still not GA yet but we have a branch on top of 2.2 that merged newer changes from master which is almost there. It should reach GA quality within 2 months.
Note that the Cassandra MV status is experimental and we have been opening JIRA tickets everywhere we identified there is design flaw in C*'s MV.
tldr; I would suggest you either stick with cassandra if you want MV, or manually do the MV's in scylla.
Materialized views are super experimental. I ran them for about 6 months in production replacing their functionality manually. This was done to improve performance. So if performance is your goal here, I suggest avoiding them.
I can attest that the materialized views if created on a already populated table will infact populate the materialized view on their own so this seems like a scylladb problem. Cassandra has a different problem where the writes will crater the DB if you do this on a large production table.
I also did not have issues with truncating the primary table and seeing the reflection in cassandra.
Additionally I had tried scylladb for a spike for performance reasons. I found it very difficult to work with and dropped it after spending a week trying to get it to do what I knew cassandra would do.
Thanks #Highstead for confirming the automatic population of MV if base table has entries while creating the MV.
For the main query of the inconsistency in tables and MV, I found out that it was due to truncate query on base table.
Also found an issue for it https://github.com/scylladb/scylla/issues/3188
It states that currently, truncating the base table wont clear the MVs created from that table.
Vice-versa, you can run truncate query on the MV and it won't throw an exception (where it should've) and MV will be cleared even when base table contains entries.
So solution for now is to truncate each MV along with tables separately.

Cassandra read operation error using datastax cassandra

Sorry if this is an existing question, but any of the existing ones resolved my problem..
I've installed Cassandra single noded. I don't have a large application right now, but I think this can be the case soon, and I will need more and more nodes..
Well, I'm saving data from a stream to Cassandra, and this were going well, but suddently, when I tried to read data, I've started to receive this error:
"Not enough replica available for query at consistency ONE (1 required but only 0 alive)"
My keyspace was built using simplestrategy with replication_factor = 1. Im saving data separated by a field called "catchId", so most of my queries are like: "select * from data where catchId='xxx'". catchId is a partition key.
I'm using the cassandra-driver-core version 3.0.0-rc1.
The thing is that I don't have that much of data rigth now, and I'm thinking if it will be better to use a RDBMS for now, and migrate to Cassandra only when I have a better infrastructure.
Thanks :)
It seems that your node is unable to respond when you try to make your read (in general this error appears for more than one node). If you do not have lots of data, it's very strange, so this is probably a bad design choice. This can emanate from several things, so you have to make a few investigations.
study your logs ! In particular the system.log
you can change your read_request_timeout_in_ms parameter in cassandra.yaml. Although it's not agood idea in production, it will say you if it's just temporary problem (your request succeed after a little time) or a bigger problem
study your CPU and memory behavior when you are doing requests
if you are very motivated, you can install opscenter which will you give more valuable informations
How and how many write requests are you doing ? It can overwhelm cassandra (even if it's designed for). I recommend to make async requests to avoid problems.

How do production Cassandra DBA's do table changes & additions?

I am interested in how the Cassandra production DBA's processes change when using Cassandra and performing many releases over a year. During the releases, columns in tables would change frequently and so would the number of Cassandra tables, as new features and queries are supported.
In the relational DB, in production, you create the 'view' and BOOM you get the data already there - loaded from the view's query.
With Cassandra, does the DBA have to create a new Cassandra table AND have to write/run a script to copy all the required data into that table? Can a production level Cassandra DBA provide some pointers on their processes?
We run a small shop, so I can tell you how I manage table/keyspace changes, and that may differ from how others get it done. First, I keep a text .cql file in our (private) Git repository that has all of our tables and keyspaces in their current formats. When changes are made, I update that file. This lets other developers know what the current tables look like, without having to use SSH or DevCenter. This also has the added advantage of giving us a file that allows us to restore our schema with a single command.
If it's a small change (like adding a new column) I'll try to get that out there just prior to deploying our application code. If it's a new table, I may create that earlier, as a new table without code to use it really doesn't hurt anything.
However, if it is a significant change...such as updating/removing an existing column or changing a key...I will create it as a new table. That way, we can deploy our code to use the new table(s), and nobody ever knows that we switched something behind the scenes. Obviously, if the table needs to have data in it, I'll have export/import scripts ready ahead of time and run those right after we deploy.
Larger corporations with enterprise deployments use tools like Chef to manage their schema deployments. When you have a large number of nodes or clusters, an automated deployment tool is really the best way to go.

how to reload maps from db when split brain occurs due to network partitioning in hazelcast

I am using hazelcast 3.2.2 community edition.
I am performing various tests with hazelcast. I have two separate VMs which are running two hazelcast instances as linux service forming a single cluster. I will refer them as HAZ-A and HAZ-B in this context.
Here is the test flow (link means Physical link in this context):
1) HAZ-A is up, HAZ-B is up.
2) Link down of HAZ-A, HAZ-B link is up.
Perform some operations say change password of a user, so HAZ-B will have two versions of user object (One will be backup of HAZ-A say version 1, another will be updated copy say version 2).
3) Link down of HAZ-B, HAZ-A link is already down. Hence links of both HAZ-A and HAZ-B are down.
4) Restore link of HAZ-A. Link is already down of HAZ-B.
Perform some operations say change password of a user, at this time I am getting stale data, since HAZ-A did not get a single chance to sync with HAZ-B.
So the point here is:
Can we implement/inject any kind of listener which will detect
interface up/down or link up/down and upon detection we can simply
re-sync data from db ?
From the documentation it seems both HAZ-A and HAZ-B will load the values from the DB and when they eventually see each other, they will merge
From Chapter 18
If a MapStore was in use, those lost partitions would be reloaded from some database, making each mini-cluster complete. Each mini-cluster will then recreate the missing primary partitions and continue to store data in them, including backups on the other nodes.

Resources