Which couchbase node will serve request? - node.js

I am having NodeJS service which talks to couchbase cluster to fetch the data. The couchbase cluster has 4 nodes(running on ip1, ip2, ip3, ip4) and service also is running on same 4 servers. On all the NodeJS services my connection string looks like this:
couchbase://ip1,ip2,ip3,ip4
but whenever I try to fetch some document from bucket X, console shows node on ip4 is doing that operation. No matter which NodeJS application is making request the same ip4 is serving all the request.
I want each NodeJS server to use their couchbase node so that RAM and CPU consumption on all the servers are equal so I changed the order of IPs in connection string but every time request is being served by same ip4.
I created another bucket and put my data in it and try to fetch it but again it went to same ip4. Can someone explain why is this happening and can it cause high load on one of the node?

What do you mean by "I want each NodeJS server to use their couchbase node"?
In Couchbase, part of the active dataset is on each node in the cluster. The sharding is automatic. When you have a cluster, the 1024 active vBuckets (shards) for each Bucket are spread out across all the nodes of the cluster. So with your 4 nodes, there will be 256 vBuckets on each node. Given the consistent hashing algorithm used by the Couchbase SDK, it will be able to tell from the key which vBucket that object goes into and combined with the cluster map it got from the cluster, know which node that vBucket lives in the cluster. So an app will be getting data from each of the nodes in the cluster if you have it configured correctly as the data is evenly spread out.
On the files system there will be as part of the Couchbase install a CLI tool call vbuckettool that takes an objectID and clustermap as arguments. All it does is the consistent hashing algorithm + the clustermap. So you can actually predict where an object will go even if it does not exist yet.
On a different note, best practice in production is to not run your application on the same nodes as Couchbase. It really is supposed to be separate to get the most out of its shared nothing architecture among other reasons.

Related

Sharing hazelcast cache between multiple application and using write behind and read through

Question - Can I share the same hazelcast cluster (cache) between the multiple application while using the write behind and read through functionality using map store and map loaders
Details
I have enterprise environment have the multiple application and want to use the single cache
I have multiple application(microservices) ie. APP_A, APP_B and APP_C independent of each other.
I am running once instance of each application and each node will be the member node of the cluster.
APP_A has MAP_A, APP_B has MAP_B and APP_C has MAP_C. Each application has MapStore for their respective maps.
If a client sends a command instance.getMap("MAP_A").put("Key","Value") . This has some inconsistent behavior. Some time I see data is persistent in database but some times not.
Note - I wan to use the same hazelcast instance across all application, so that app A and access data from app B and vice versa.
I am assuming this is due to the node who handles the request. If request is handle by node A then it will work fine, but fails if request is handled by node B or C. I am assuming this is due to Mapstore_A implementation is not available with node B and C.
Am I doing something wrong? Is there something we can do to overcome this issue?
Thanks in advance.
Hazelcast is a clustered solution. If you have multiple nodes in the cluster, the data in each may get moved from place to place when data rebalancing occurs.
As a consequence of this, map store and map loader operations can occur from any node.
So all nodes in the cluster need the same ability to connect to the database.

Need more insight into Hazelcast Client and the ideal scenario to use it

There is already a question on the difference between Hazelcast Instance and Hazelcast client.
And it is mentioned that
HazelcastInstance = HazelcastClient + AnotherFeatures
So is it right to say client just reads and writes to the cluster formed without getting involved in the cluster? i.e. client does not store the data?
This is important to know since we can configure JVM memory as per the usage. The instances forming the cluster will be allocated more than the ones that are just connecting as a client.
It is a little bit more complicated than that. The Hazelcast Lite Member is a full-blown cluster member, without getting partitions assigned. That said, it doesn't store any data but otherwise behaves like a normal member.
Clients on the other side are simple proxies that have to forward everything to one cluster member to get any operation done. You can imagine a Hazelcast client to be something like a JDBC client, that has just enough code to connect to the cluster and redirect requests / retrieve responses.

Unable to understand why N1QL Queries in couchbase hangs?

I have a couchbase cluster setup (couchbase version 4.1) where there are N data nodes, 1 Query Node and 1 Index Node. Data nodes have roughly 1 million key value pairs in a single bucket. This whole setup is hosted in Microsoft Azure within a virtual network. And can assure you that each node has enough resources that RAM, CPU or Disk is not an issue.
Now i can GET/SET JSON documents in my couchbase server without any issue. I am just testing, so ports are not issue as i have opened all ports between machines for now.
But when i try to run N1QL queries (from couchbase shell or using python SDK) it does not work. The query just hangs and i don't get any reply from server. On the other hand, once in a while the query just works without any issue and then after a minute it again stops working.
I have created PRIMARY index on my bucket and any other required Global Secondary Index if needed.
I also installed sample buckets provided by couchbase. Same problems exist.
Does anyone have a clue what the issue could be?
Your query hangs probably because you are straining the server too much, I don't know how many N1QL ops you are push each second, but for that type of query you will benefit the most with several tweaks, which lower cpu usage and increase efficiency.
Create a specific covering index such as:
create index inx_id_email on clients(id,email) where transaction_successful=false
use explain keyword to check if your query is using the index.
(explain SELECT id, email FROM clients where transaction_successful = false LIMIT 100 OFFSET 200)
I believe that your query/index nodes are utilized too much because you actually doing the equivalent to primary scan in relational databases.

How to handle read/write request in cassandra

I have 5 node cluster with 2 Cassandra,2 solr and 1 hadoop on EC2 with DSE4.5.
My requirement is I dont want to hard code node IP address while requesting for Reading/writing from Cluster. I have to develop web service, thru which requester can send read/write request to my cluster and web service has to determine following
1) route read request to appropriate node.
2) route write request to appropriate node.
If there is any write request then it should direct to Cassandra node on basis of keyspace and replication factor. if it is a read request then request should route to Solr node (as I done indexing on solr) and if there is any analytic query then request should route to hadoop.
And if any node goes down in that case response will not affect.
Apart from dedicated request, is there any way to request a cluster ?
by dedicated mean giving specific IP address for read and write.
Is any method or algorithm exist in DSE? or Is there any tool available in for this?
The Java driver should take care of all of that for you:
http://www.datastax.com/documentation/developer/java-driver/2.0/common/drivers/introduction/introArchOverview_c.html
For example:
Nodes discovery: the driver automatically discovers and uses all nodes of the Cassandra cluster, including newly bootstrapped ones
Configurable load balancing: the driver allows for custom routing and load balancing of queries to Cassandra nodes. Out of the box, round robin is provided with optional data-center awareness (only nodes from the local data-center are queried (and have connections maintained to)) and optional token awareness (that is, the ability to prefer a replica for the query as coordinator).
Transparent failover: if Cassandra nodes fail or become unreachable, the driver automatically and transparently tries other nodes and schedules reconnection to the dead nodes in the background.
On the Solr query side, you can use the SolrJ load balancer, but you have to hard-wire the list of nodes to be used as coordinator nodes, but SolrJ will round robin for you.

Which Cassandra node should I connect to?

I might be misunderstanding something here, as it's not clear to me how I should connect to a Cassandra cluster. I have a Cassandra 1.2.1 cluster of 5 nodes managed by Priam, on AWS. i would like to use Astyanax to connect to this cluster by using a code similar to the code bellow:
conPool = new ConnectionPoolConfigurationImpl(getConecPoolName()) .setMaxConnsPerHost(CONNECTION_POOL_SIZE_PER_HOST).setSeeds(MY_IP_SEEDS)
.setMaxOperationsPerConnection(100) // 10000
What should I use as MY_IP_SEEDS? Should I use the IPs of all my nodes split by comma? Or should I use the IP of just 1 machine (the seed machine)? If I use the ip of just one machine, I am worried about overloading this machine with too many requests.
I know Priam has the "get_seeds" REST api (https://github.com/Netflix/Priam/wiki/REST-API) that for each node returns a list of IPs and I also know there is one seed per RAC. However, I am not sure what would happen if the seed node gets down... I would need to connect to others when trying to make new connections, right?
Seed nodes are only for finding the way into the cluster on node startup - no overload problems.
Of course one of the nodes must be reachable and up in the cluster to get the new one up and running.
So the best way is to update the seed list from Priam before starting the node. Priam should be behind an automatically updated DNS entry.
If you're highest availability you should regularly store the current list of seeds from Priam and store them in a mirrored fashion just as you store your puppet or chef config to be able to get nodes up even when Priam isn't reachable.

Resources