So I have an app I am working on and I am wondering if I am doing it correctly.
I am running cluster on my node.js app, here is a link to cluster. I couldn't find anywhere that states if I should only run cluster on a single server or if it is okay to run it on a cluster of servers. If I continue down the road I am going I will have a cluster inside a cluster.
So that it is not just opinions as answers, here is my question. Was cluster the package made to do what I am doing (cluster of workers on a single server inside a cluster of servers)?
Thanks in advance!
Cluster wasn't specifically designed for that, but there is nothing about it which would cause a problem. If you've designed your app to work with cluster, it's a good indication that your app will also scale across multiple servers. The main gotcha would be if you're doing anything stateful on the filesystem. For example, if a user uploads a photo and you store it on the server disk, that would be problematic when scaling out across multiple servers (that don't share the same disk).
Related
I have a distributed map stored in hazelcast. My hazelcast cluster run in a cloud either private or public. My app may not run on the same network where hazelcast cluster is running.
My app tries to access distributed map using IMap.get() may be thousands per second. I tried to major performance of the above operation on the local cluster by running hazelcast cluster on my local machine. I could read everything in 15-20ms. But I am not getting the same performance if hazelcast cluster runs in the cloud.
If you are reading a map, more frequently, Will it increase the load on hazelcast in the cloud environment?, yes any reasons?
Performance of running software locally will always be different than running in a distributed environment, more so when servers are located elsewhere - network latencies being the most prominent factor.
Servers in cloud, app on local = not the recipe for best performance. Either move all cluster components- servers and app clients, in one network (aim for same availability zone if looking for best performance) or expect delays. Its not the cloud in particular that deteriorates the performance, its the way VMs are setup in cloud. For example, if one VM is in us-east-1 and other in London and your app is in Tokyo then expect inferior performance numbers.
I've been using Couchbase for my database solution and so far it looks very good.
I'm confused however with connecting to a Cluster. A Cluster is just a group of nodes so when you use the API to connect to a Cluster what do you use as the IP? Do you just use one of the nodes in the Cluster? Does it matter which one?
I'm personally using the Node.js API.
Technically all you need is just one node in the list. As soon as it connects to that one, it will get the cluster map of the entire cluster and know all of the rest of the nodes. No it does not matter which node.
That being said, best practice is to have at least 3 nodes of the cluster listed in the connection string or better yet if the SDK you are using supports it, use a DNS SRV record with at least 3 nodes in there. With three nodes in the list if for some reason (e.g. server failure or maintenance) one of the nodes is unavailable, you can still bootstrap an application server to get that cluster map with one of the other nodes in the list.
I asked this question a few months ago on couchbase forums and the author of the node.js module answered that you should use "some" of them
like :
cluster.openBucket("couchbase://server1,server2,server3", function(err) {});
if you have server4 and 5 are added , they will be automatically added to the cluster as soon as they are available in the cluster.
Check here for details : https://forums.couchbase.com/t/couchnode-connection-to-cluster/6281
I'm working on a project with Node.js that involves a server. Now due to large number of jobs, I need to perform clustering to divide the jobs between different servers (different physical machines). Note that my jobs has nothing to do do with internet, so I cannot use stateless connection (or redis to keep state) and a load balancer in front of the servers to distribute the connection.
I already read about the "cluster" module, but, from what i understood, it seems to scale only on multiprocessors on the same machine.
My question: is there any suitable distributed module available in Node.js for my work? What about Apache mesos? I have heard that mesos can abstract multiple physical machines into a single server? is it correct? If yes, it is possible to use the node.js cluster module on top of the mesos, since now we have only one virtual server?
Thanks
My question: is there any suitable distributed module available in Node.js for my work?
Don't know.
I have heard that mesos can abstract multiple physical machines into a single server? is it correct?
Yes. Almost. It allows you to pool resources (CPU, RAM, DISK) across multiple machines, gives you ability to allocate resources for your applications, run and manage the said applications. So you can ask Mesos to run X instances of node.js and specify how much resource does each instance needs.
http://mesos.apache.org
https://www.cs.berkeley.edu/~alig/papers/mesos.pdf
If yes, it is possible to use the node.js cluster module on top of the mesos, since now we have only one virtual server?
Admittedly, I don't know anything about node.js or clustering in node.js. Going by http://nodejs.org/api/cluster.html, it just forks off a bunch of child workers and then round robins the connection between them. You have 2 options off the top of my head:
Run node.js on Mesos using an existing framework such as Marathon. This will be fastest way to get something going on Mesos. https://github.com/mesosphere/marathon
Create a Mesos framework for node.js, which essentially does what cluster node.js is doing, but across the machines. http://mesos.apache.org/documentation/latest/app-framework-development-guide/
In both these solutions, you have the option of letting Mesos create as many instances of node.js as you need, or, use Mesos to run cluster node.js on each machine and let it manage all the workers on that machine.
I didn't google, but there might already be a node.js mesos framework out there!
I am new to Cassandra and I want to install it. So far I've read a small article on it.
But there one thing that I do not understand and it is the meaning of 'node'.
Can anyone tell me what a 'node' is, what it is for, and how many nodes we can have in one cluster ?
A node is the storage layer within a server.
Newer versions of Cassandra use virtual nodes, or vnodes. There are 256 vnodes per server by default.
A vnode is essentially the storage layer.
machine: a physical server, EC2 instance, etc.
server: an installation of Cassandra. Each machine has one installation of Cassandra. The Cassandra server runs core processes such as the snitch, the partitioner, etc.
vnode: The storage layer in a Cassandra server. There are 256 vnodes per server by default.
Helpful tip:
Where you will get confused is that Cassandra terminology (in older blog posts, YouTube videos, and so on) had been used inconsistently. In older versions of Cassandra, each machine had one Cassandra server installed, and each server contained one node. Due to the 1-to-1-to-1 relationship between machine-server-node in old versions of Cassandra people previously used the terms machine, server and node interchangeably.
Cassandra is a distributed database management system designed to handle large amounts of data across many commodity servers. Like all other distributed database systems, it provides high availability with no single point of failure.
You may got some ideas from the description of above paragraph. Generally, when we talk Cassandra, we mean a Cassandra cluster, not a single PC. A node in a cluster is just a fully functional machine that is connected with other nodes in the cluster through high internal network. All nodes work together to make sure that even if one of them failed due to unexpected error, they as a whole cluster can provide service.
All nodes in a Cassandra cluster are same. There is no concept of Master node or slave nodes. There are multiple reason to design like this, and you can Google it for more details if you want.
Theoretically, you can have as many nodes as you want in a Cassandra cluster. For example, Apple used 75,000 nodes served Cassandra summit in 2014.
Of course you can try Cassandra with one machine. It still work while just one node in this cluster.
What is meant by a node in cassandra?
Cassandra Node is a place where data is stored.
Data centerĀ is a collection of related nodes.
A cluster is a component which contains one or more data centers.
In other words collection of multiple Cassandra nodes which communicates with each other to perform set of operation.
In Cassandra, each node is independent and at the same time interconnected to other nodes.
All the nodes in a cluster play the same role.
Every node in a cluster can accept read and write requests, regardless of where the data is actually located in the cluster.
In the case of failure of one node, Read/Write requests can be served from other nodes in the network.
If you're looking to understand Cassandra terminology, then the following post is a good reference:
http://exponential.io/blog/2015/01/08/cassandra-terminology/
Cluster Stability: 1 - Experimental
Currently I'm working with node.js. Are you guys using Cluster in production? Shall I go with nginx and run two node process in production? Please suggest.
I tried cluster with single node process, but didn't get much performance. :(
Did lots of trial and error with performance computing and response time. Finally fixed to work with N two-core machines in EC2. Running two node process in different ports in each machine. Configured nginx in each machine to route the requests to two node process running in different ports. Finally put all the machines under an ELB. Happy time :)