Cassandra cluster key usage - cassandra

I'm banging my head on this, but, frankly speaking, my brains won't get it - or so it seems.
I have a column family that holds jobs for a rather large group of actors. It is a central job management and scheduling table that must be distributed and available throughout the whole cluster and possibly even traverses datacenter barriers some day in the near future.
Each job executor actor system, the ones that actually execute the jobs, is installed alongside one Cassandra node - that is, on the same node. Actually of course there is s master actor that pulls the jobs and distributes them to the actor agents, but that has nothing to do with my question.
There are also some actor systems that can create jobs in the central job table to be executed by other actors or even actor systems, but usually the jobs are loaded batch wise or manually through a web interface.
An actor that is to execute a job always only queries it's local cassandra node. If finished, it will update the job table to indicate it is finished. This write should, in normal circumstances, also only update records with jobs, for which his local Cassandra node is authoritative.
Now, sometimes it may happen that an actor system on a given host has nothing to do. In this case it should indeed get jobs from other nodes too, but of course it will still only talk to it's local Cassandra node. I know this works and it doesn't bother me a bit.
What keep me up at night is this:
How would I create a compound key to achieve the local authoritative of a Cassandra node for job entries for it's local actor system and thereby it's job execution actors, without splitting the job table in multiple column families or the like?
In other words: how can I create a compound key that makes sure that a) jobs are evenly distributed through my cluster and
b) a local query on the job table only returns jobs for which this Cassandra node is authoritative and
c) my distributed agent system still has the possibility to fetch jobs from other nodes, in case it has no own jobs to execute???
A last word on c) above. I do not want to do 2 queries in the case there is no local job, but still only on!
Any hints on this?
This is general structure of job table so far:
ClusterKey UUID: Primary Key
JobScope String: HOST / GLOBAL / SERVICE / CHANNEL
JobIdentifier String: Web-Crawler, Twitter
Description String:
URL String:
JobType String: FETCH / CLEAN / PARSE /
Job String: Definition of the job
AdditionalData Collection:
JobStatus String: NEW / WORKING / FINISHED
User String:
ValidFrom Timestamp:
ValidUntill Collection:
Still in the process setting everything up, so no query so far defined. But an Actor will pull jobs out of it and set status and so

Cassandra has no way of "pinning" a key to a node, if that's what you are after.
If I were you, I'd stop worrying about whether my local node was authoritative for some set of data, and start leveraging the built-in consistency controls in Cassandra for managing the set of nodes that you read from or write to.
Lots of information here on read consistency and write consistency- using the right consistency will ensure that your application scales well while keeping it logically correct: http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
Another item worth mentioning is atomic "compare and swap", also known as lightweight transactions. Let's say you want to ensure that a given job is only performed once. You could add a field indicating whether the job has been "picked up", then query on that field (where picked_up = 0) and simultaneously (and atomically) update the field to indicate that you are "picking up" that work. That way no other actors will pick it up again.
Info on lightweight transactions here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_ltwt_transaction_c.html

Related

Implications of keeping Cassandra ResultSet open for a while

I'm using Cassandra Java driver with a fetch size set to 1k. I need to query all records in a table and perform some time consuming action for a every row.
What will happen if I'll keep the ResultSet open (not fully iterated) for a one day?
What I don't care about:
consistency. If some new record will be written in the meantime, I'm ok to fetch it. However, I'm fine if I won't get it
fault tolerance. If during that process some node will fail, I'm fine if the query will fail too. However, I would like to detect that from the client perspective.
What I care about:
Cassandra resource utilization - I don't want to cause cluster outage due to some blocked resources
lateness - I don't want to block (or slow down much) cluster for other consumers of that table
I would like to get all records which existed when I started the query (assuming no deletions). However, they don't have to be up to date
The paging state is the information about the last read data (literally serialized partition key, clustering, and remaining). When sent to coordinator it will look for everything greater than that. So there are no resources in the server spent for this and no performance impact vs a normal read.
Cassandra does not have any features to allow isolation even within a single query. If data has changed from when the first query was made and the second, you will get the up to date information.

Run VoltDB stored procedures at regular interval from VoltDB

Is there any way to execute VoltDB stored procedures at regular interval or schedule store procedure to run at a specific time?
I am exploring VotlDB to shift out product from RDBMS to VotlDB. Out produce written in java.
Most of the query can be migrated into the VoltDB stored procedures. But In our product, we have cron job in oracle which executes at regular interval. Now I do not find such features in VoltDB.
I know VoltDB stored procedures can be called from the application at regular interval but our product deploys in an Active-Active mode, in that case, all application will call store procedure at regular interval and that is not a good solution or otherwise, we have to develop some mechanism to run procedure from one instance only.
so It would be good if I get cron job feature from VoltDB.
I work at VoltDB. There isn't currently a feature like this in VoltDB, for example like DBMS_JOB in Oracle.
You could certainly use a cron job on one of the servers in your cluster, or on some other server within your network that could invoke sqlcmd to run a script or echo individual SQL statements or execute procedure commands through sqlcmd to the database. Making cron jobs highly available is a general problem. You might find these other discussions helpful:
How to convert Linux cron jobs to "the Amazon way"?
https://www.reddit.com/r/linuxadmin/comments/3j3bz4/run_cronjob_only_on_one_node_in_cluster/
You could also look into something like rcron.
One thing to be careful of when converting from an RDBMS to VoltDB is that VoltDB is optimized for processing many small transactions in parallel across many partitions. While the architecture of serialized execution per partition excels for many operational and streaming workloads, it is not designed to perform bulk operations on many rows at a time, especially transactions that need to perform writes on many rows that may be in different partitions within one transaction.
If you have a periodic job that does something like "process all the new rows that meet some criteria" you may find this transaction is slow and every time it runs it could delay other parts of the workload, especially if many rows have accumulated. It would be more the "VoltDB Way" to replace a simple INSERT statement that you may be using to ingest data (to be processed later by a scheduled job) with a procedure that inserts and immediately processes the row of data. You might even need a procedure that checks for other records and processes small sets of rows as a group, for example stitching together segments of data that go together but may have arrived out of order. By operating on fewer records at a time within one partition at a time, this type of procedure would be more scalable and would keep the data closer to your desired finished state in real time, rather than always having some data waiting to be processed.

Hazelcast - when a new cluster member is merging, is the new member operational?

When a new member joins a cluster, table repartitioning and data merge will happen.
If the data is large, I believe it will take some time. While it is happening, what is the state of the cache like?
If I am using embedded mode, does it block my application until the merging is completed? or if I don't want to work with an incomplete cache, do I need to wait (somehow) before starting my application operations?
Partition migration will start as soon as the member joins the cluster. It will not block your application because it will progress asynchronously in the background.
Only mutating operations that fall into a migrating partition are blocked. Read-only operations are not blocked.
Mutating operations will get PartitionMigrationException which is a RetryableHazelcastException so they will be retried for default 2 minutes. If you have small partition sizes, then migration of a partition will last shorter. You can increase partition count via system property hazelcast.partition.count.
If you want to block your application until all migrations finish, you can check isClusterSafe method to make sure there are no migrating partitions in the cluster. But beware that isClusterSafe returns the status of the cluster rather than current member so it might not be something to rely on. Instead, I would recommend not to block the application while partitions are migrating.

Is the my cassandra config implement true

I have a cassandra cluster with three nodes under normal circumstances. When I send write request cluster from node.js, I want all nodes to write back to me after writing, while reading, i want to be able to read which node I am connected to. I want this setup to continue when one of the three nodes has died. I chose replication factor= 3 consistency=2
How should I implement a configuration. Is the config implement true ?
With my respects...
So I unfortunately have no real clue about the provided numbers from the node JS driver, but I know something about the consistency levels, which I suspect you are using in the background, assuming that you are using this driver: http://datastax.github.io/nodejs-driver/.
Just a basic thing: The nodes don't write back to you directly, but your query is sent to one node, the coordinator of that query, which then distributes the query in your cluster according to your consistency level specifications (at least if it's a simple query, more complex destribution logic applies in case of batch queries). The coordinator then reports back to you when the query is executed.
Whether your requirements can be fulfilled at all depends on the replication factor you chose. The problem here is that cassandra only knows so many of them. The options for writing are: all (which at first looks what you want), quorum (which is also an option), one and any. So let's assume you choose all, because you want to write to all replicas. That's totally fine, but if one of the nodes goes down, there will be failing writes, because one of the replicas could not be updated. In case you are actually using replication factor 3, you can fallback to write with quorum, which is 2 nodes in this case. What should happen if another node fails? I know, very unlikely, but I've seen it in production, so it happens from time to time. Should the single last node be updated in this case? Then you need to fallback to consistency level 1. Everything fine.
But what if you choose the replication factor to be 5? Well, there is no way of saying: I want 4 nodes. You can only have a quorum in case of a failure of one node, and that would be 3, not 4. And the next fallback would be 1 and not 2.
The final question is: if you lose one node and you do a fallbak in the writing part, what happens when your node comes back (assuming that there are lost updates because some of the hinted handoffs are already discarded)? The reading part of your application can always read stale data because you always only read from a single node. It seems to me like you are trying to compensate for that in the write part. My personal idea would be using quorum when reading and writing, this way it's guaranteed that you read current data and a single node can go down (with replication factor 5 it's even 2 nodes). Also keep in mind that when you write to a node, cassandra will always attempt to write to the replicas in the background, so it tries to keep your data up to date. The risk of reading stale data even with a consistency level pair of one-one can be acceptable if you really need the speed.

Why Cassandra cluster need synchronized clocks between nodes?

In the introduction course of Cassandra DataStax they say that all of the clocks of a Cassandra cluster nodes, have to be synchronized, in order to prevent READ queries to 'old' data.
If one or more nodes are down they can not get updates, but as soon as they back up again - they would update and there is no problem...
So, why Cassandra cluster need synchronized clocks between nodes?
In general it is always a good idea to keep your server clocks in sync, but a primary reason why clock sync is needed between nodes is because Cassandra uses a concept called 'Last Write Wins' to resolve conflicts and determine which mutation represents the most correct up-to date state of data. This is explained in Why cassandra doesn't need vector clocks.
Whenever you 'mutate' (write or delete) column(s) in cassandra a timestamp is assigned by the coordinator handling your request. That timestamp is written with the column value in a cell.
When a read request occurs, cassandra builds your results finding the mutations for your query criteria and when it sees multiple cells representing the same column it will pick the one with the most recent timestamp (The read path is more involved than this but that is all you need to know in this context).
Things start to become problematic when your nodes' clocks become out of sync. As I mentioned, the coordinator node handling your request assigns the timestamp. If you do multiple mutations to the same column and different coordinators are assigned, you can create some situations where writes that happened in the past are returned instead of the most recent one.
Here is a basic scenario that describes that:
Assume we have a 2 node cluster with nodes A and B. Lets assume an initial state where A is at time t10 and B is at time t5.
User executes DELETE C FROM tbl WHERE key=5. Node A coordinates the request and it is assigned timestamp t10.
A second passes and a User executes UPDATE tbl SET C='data' where key=5. Node B coordinates the request and it is assigned timestamp t6.
User executes the query SELECT C from tbl where key=5. Because the DELETE from Step 1 has a more recent timestamp (t10 > t6), no results are returned.
Note that newer versions of the datastax drivers will start defaulting to use Client Timestamps to have your client application generate and assign timestamps to requests instead of relying on the C* nodes to assign them. datastax java-driver as of 3.0 now defaults to client timestamps (read more about there in 'Client-side generation'). This is very nice if all requests come from the same client, however if you have multiple applications writing to cassandra you now have to worry about keeping your client clocks in sync.

Resources