How to use Sharding in arangodb? - node.js

I am new to sharding feature of arangodb, I have learned various documents related to this, but still can't find the way to configure it. Anybody please provide the step-by-step procedure to setup sharding or any refrence which is useful for me. I am using arangodb with node.js and angularJS on linux mint.
I often visit https://docs.arangodb.com and these are little useful
https://docs.arangodb.com/2.5/Installing/Cluster.html and How to set clusters and sharding in ArangoDB?

sharding is only available in ArangoDB when its clustered.
Except for that you configure the number of shards a collection is distributed into, its completely transparent to your AQL queries - the ArangoDB Coordinator instances do all the heavy lifting for you. Thats probably why shards don't appear in a more detailed way here.
The easiest way to set up an ArangoDB Cluster is to use Mesosphere DCOS - Click on ArangoDB and choose the environment to deploy your cluster to.
The more detailed manual about setting up clusters can be found here: https://docs.arangodb.com/3.0/Manual/Deployment/index.html
Also featuring other installation methods. We will put more effort to explain sharding in clusters soon.

Related

Alternative to presto with fallback mechanism

I am using presto as querying layer over cassandra for various aggregations but facing an issue where if the node goes down or timeout occurs for some reason, the running query fails. I need to have some kind of fallback mechanism.
Is there any alternative to presto with which i can implement fallback mechanism or if there's any way to implement it in presto itself.
Some vendors like Qubole have implemented a retry mechanism for specifically these kind of issues (which are more visible in the cloud, especially if you use spot nodes on AWS or pre-emptible VMs on GCP.
Note : This works only with Qubole's Managed Presto Service.
Disclaimer: I work for Qubole

Distributed Data Store - Hazelcast Vs Cassandra

We need to choose between HazelCast Or Cassandra as a distributed data store option. I have worked with cassandra but not with Hazelcast, will like to have a comparative analysis done features like :
Replication
Scalability
Availability
Data Distribution
Performance of reads/writes
Consistency
Will appreciate some help here to help us make the right choice.
The following page and the documents on the page might help on your decision: https://hazelcast.com/use-cases/nosql/apache-cassandra-replacement/
https://db-engines.com/en/system/Cassandra%3BHazelcast

Using Spark in conjunction with Cassandra?

In our current infrastructure we use a Cassandra cluster as our backend database, and via Solr we use a web UI for our customers to perform read queries on our database as necessary.
I've been asked to look into Spark as something that we could implement in the future, but I'm having trouble understanding how it will improve what we currently do.
So my basic questions are:
1) Is Spark something that would replace Solr for querying the database, like when a user is looking something up on our site?
2) Just a general idea, what type of infrastructure would be necessary to improve our current situation (5 Cassandra nodes, all of which also run Solr).
In other words, we would simple be looking at building another cluster of just Spark nodes?
3) Can Spark nodes run on the same physical machine as Cassandra? I'm guessing it would be a bad idea due to memory constraints as my very basic understanding of Spark is that it does everything in memory.
4) Any good quick/basic resources I can use to start figuring out how Spark might benefit us? I have access to Datastax Academy courses so I'm going through those, just wondering if there is anything else to help with my research.
Basically once I figure out what it is, and more importantly how/if it is something we can use to our advantage I'll start playing with some test instances, but I should probably familiarize myself with the basics first.
1) No, Spark is a batch processing system and Solr is live indexing solution. Latency on solr is going to be sub second, Spark jobs are meant to take minutes (or more). There should really be no situation where Spark can be a drop in replacement for Solr.
2) I generally recommend a second Datacenter running both C* and Spark on the same machines. This will have the data from the first Datacenter via replication.
3) Spark Does not do everything in memory. Depending on your use case it can be a great idea to run on the same machines as C*. This can allow for data locality in reading from C* and help out significantly on table scan times. I usually also recommend colocating Spark Executors and C* nodes.
4) DS Academy 320 course is probably the best resource out there atm. https://academy.datastax.com/courses/getting-started-apache-spark

How to create nodes and collections(table equivalent in mysql)

I am new to cassandra,
I have more than 10 years experience of working with mysql and sql server and it makes my job harder to transfer to nosql databases.
As you know sql databases provide very user friendly workbrench for working with them.
But the only thing that I found for cassandra was datastax and I am not even sure if I can even create nodes and colloections in a visualize way and not by command line. Is it possible to do such a thing in cassandra ?
You can use OpsCenter and DevCenter, both from DataStax. OpsCenter lets you do operations-centric tasks, while DevCenter lets you perform developer-centric tasks.

how to integrate cassandra with zookeeper to support transactions

I have a Cassandra cluster and Zookeeper server installed. Now I want to support transactions in cassandra using zookeeper. How do i do that.
Zookeeper creates znodes to perform read and write operations and data to and fro goes through znodes in Zookeeper. I want to know that how to support rollback and commit feature in cassandra using Zookeeper. Is there any way by which we can specify cassandra configurations in zookeeper or zookeeper configurations in cassandra.
I know cassandra and zookeeper individually how data is read and written but I dont know how to integrate both of them using Java.
how can we do transactions in Cassandra using Zookeeper.
Thanks.
I have a Cassandra cluster and Zookeeper server installed. Now I want to support transactions in cassandra using zookeeper. How do i do that.
With great difficulty. Cassandra does not work well as a transactional system. Writes to multiple rows are not atomic, there is no way to rollback writes if some writes fail, and there is no way to ensure readers read a consistent view when reading.
I want to know that how to support rollback and commit feature in cassandra using Zookeeper.
Zookeeper won't help you with this, especially the commit feature. You may be able to write enough information to zookeeper to roll back in case of failure, but if you are doing that, you might as well store the rollback info in cassandra.
Zookeeper and Cassandra work well together when you use Zookeeper as a locking service. Look at the Cages library. Use zookeeper to co-ordinate read/writes to cassandra.
Trying to use cassandra as a transactional system with atomic commits to multiple rows and rollbacks is going to be very frustrating.
There are ways you can use to implement transactions in Cassandra without ZooKeeper.
Cassandra itself has a feature called Lightweight transactions which provides per key linearizability and compare-and-set. With such primitives you can implement serializable transactions on the application level by youself.
Please see the Visualization of serializable cross shard client-side transactions post for for details and step-by-step visualization.
The variants of this approach are used in Google's Percolator system and in CockroachDB.
By the way, if you're fine with Read Committed isolation level then it makes sense to take a look on the RAMP transactions paper by Peter Bailis.
There is a BATCH feature for Cassandra's CQL3 (Cassandra 1.2 is the formal version that released CQL3), which allegedly can atomically apply all the updates in the BATCH as one unit all-or-nothing.
This does not mean you can rollback a successfully executed BATCH as an RDBMS could do, that would have to be manually done.
Depending on the consistency and preferences you provide to the BATCH statement, guarantees of atomicity of the updates can be increased or decreased to some degree with the UNLOGGED option.
http://www.datastax.com/docs/1.2/cql_cli/cql/BATCH
Well, I'm not an exepert at this (far from it actually) but the way I see it, either you deploy some middleware made by yourself, in order to guarantee the specific properties you are looking for or you can just have Cassandra write data to auxilliary files and then copy them through the file system, since the copy function in Java works as an atomic operation.
I don't know anything about the size of the data files you are considering so I don't really know if it is doable, however there might be a way to use this property through smaller bits of information and then combining them as a whole.
Just my 2 cents...

Resources