How to run the same script on several Linux systems concurrently - linux

I have a question related to latency benchmark. I run Apache ZooKeeper in a cluster of 5 machines (one leader and the rest are followers). There is another machine (client) used to sequence send requests to the protocol.
I manage to run a benchmark program which lasts for pre-selected time, aims to send requests simultaneously and continuously to each ZooKeeper server. When the pre-selected time elapses, I can see the latency result.
However, the above benchmark uses only one client machine to run the benchmark code. Now, I want to increase the number of client machines to make more machines send requests simultaneously. Note that I want to use same code above to test the latency. The question is how to run the benchmark code from different machines simultaneously?
I guess it should be a Linux script which runs in different machines at the same time.
My experiments are runusing remote Linux cluster which is accessed with SSH.
I look forward to hearing from you
Cheers,

Ansible is certainly the kind of tool you're searching for

Related

How does a project like folding#home communicate with your computer to solve protein folding?

How does a program like folding#home work? Does my computer
individually perform a unit of "work" on it completely separate to other computers running folding#home? Then send the answer back when it's completed?
Or does Folding#home see all the computers connected to it as the project having let's say 1000 cores and then when work is done it's the equivalent of saying something like make -j <total number of cores>
Projects likes Folding#Home and BOINC are examples of loosely-coupled parallel computing where each task is fully self-contained and can be completed without communication with other computing entities. They are also examples of a pattern known as controller/worker (used to be known as master/worker), in which a central controller splits a large task into a pool of small(er) subtasks and distributes it to a bunch of worker processes on a first come first served basis, which corresponds to your first point.
In F#H (and BOINC), client computers connect to the server, request a task, work on it until it's complete, then connect to the server again to return the result and request a new task. The benefits of this are automatic load balancing, fault tolerance (via redundancy), and no need for scheduling.
When you run make -j #cores, make launches a number of parallel jobs but those jobs are usually interdependent, so make has to schedule them in an optimal way. The jobs are then run as processes on the same computer which affords make full process control. If a build step fails, the entire build job aborts immediately and the user can quickly look into the problem, fix it, and restart the build. This is not a viable model for when a client computer could have an arbitrary compute speed, could connect and disconnect at any time, and/or could decide to simply stop processing tasks. There are distributed versions of make like dmake that run different parts of the build process on different remote nodes, but that still happens in a tightly controlled environment, typically on a build cluster.
Note that on a very high level of abstraction the two are basically equivalent with the main difference being whether jobs are pushed or pulled. While job pulling works fine on all kinds of systems, job pushing usually requires (tightly-coupled) systems with predictable characteristics and good scheduling algorithms to be efficient.

Faye clustering multiple nodes NodeJS

I am trying to make a pub/sub infra using faye (nodejs). I wish to know whether horizontal scaling would be possible or not.
One nodejs process will run on single core, so when people are talking about clustering, they talk about creating multiple processes on the same machine, sharing a port, and sharing data through redis.
Like this:
http://www.davidado.com/2013/12/18/using-node-js-cluster-with-socket-io-for-push-notifications/
Firstly, I don't understand how we make sure that each of the forked processes goes to a different core. If I fork 10 node servers on a machine with 4 cores, is it taken care that they are equally distributed?
What if I wish to add is a new machine, and thus scale it. I have not seen any such support anywhere. I am not sure if it is even possible to do it.
Let's say somehow multiple nodes are being used and there is some load balancer. But one client will connect to only one server process. So when a client C1 publishes on a channel on which a client C2 has subscribed, and C1 is connected to process P1 and C2 is connected to process P2, how will P1 publish the message to C2 when it doesn't have the connection?
This would probably be possible in case of a single machine, because the cluster module enables all processes to share the same port and the connections too.
I am fairly new to the web world, as well as nodejs and faye. Please enlighten me if there is something wrong in the question.
You are correct in thinking that the cluster module allows multiple cores to be used on a single machine. The cluster module allows the same application to be spawned multiple times whilst listening to the same port. The distribution amongst the cores is down to the operating system, so if you have 10 processes and 4 cores then the OS will figure out how best to distribute them (as long as they haven't been spawned with a set affinity). By default this shouldn't be a concern for you.
Load-balancing can be done through node too but that is separate from clustering. Instead you would have a separate application that would grab the load statistics on each running server and proxy the http request to the most appropriate server (using http-proxy as an example). A very primitive load balancer will send one request to each running server instance incrementally to give an even distribution.
The final point about sharing messages between all the instances assumes that there is a single point where all the messages are held. In the article you linked to they assume that there is only one server and all the processes share access to the redis instance. As they all access the same redis instance, all processes will be able to receive the same messages. If we're going to start thinking about multiple servers that are in different locations in the world that all have different message stores (i.e. their own redis instances) then we get into the domain of 'replication'. Some data stores are built with this in mind and redis is one of them. You end up with a 'master' set of data and a set of 'slaves' that will periodically update with the master and grab anything they are missing. It is important to note here that messages will not be sent in 'real-time' here unless you have a very intensive replication process.
In conclusion, developers go through this chain of scaling for their applications. The first is to make the application multi-process (the cluster module). The second is to have a load balancer that proxies the http request to the appropriate server that is running the multi-process application. The third is to replicate the datastores so that the servers can run independently but keep in sync with each other.

JMeter never fails

I'm trying to stress test a server with JMeter. I followed the manual and successfully created the tests (Test are running ok and response is correct).
However even if I keep increasing the number of threads it never fails, but I keep reading that there must be limitations? So what am I doing wrong?
My CPU is running on +/-5% when I'm not running JMeter. Running 3000 threads I see the number of threads increase by 3000 and CPU usage goes to +/-15%. Also JMeter never complains something went wrong.
My JMeter configuration is:
Number of threads: 3000
Ramp-Up Period: 30
LoopCount: Forever (Let it run for over an hour and still nothing goes wrong)
The bottleneck now is my internet connection which simply can't handle this load and maxes out at 2.1Mbps. Is this causing the problem? It is increasing my latency from 10ms per thread to over 5000ms per thread, but threads are still running.
Assuming you have confirmed that you definitely aren't getting back any errors (e.g. using a results table listener, or logging/displaying only errors using a results graph listener) and your internet connection is running at capacity then yes, it does sound like your internet connection is the bottleneck. It doesn't sound like your server is being stressed at all.
If you can easily make use of other machines (e.g. servers in the same location as the server you are testing), you could try using JMeter remote (distributed) testing to sidestep the limitations of your internet connection. See http://jmeter.apache.org/usermanual/remote-test.html for details.
Alternatively, if it's easy (e.g. if you're using VM's in a cloud and can easily spin one up with your software on), you could try using the least-powerful server you can instead and stress testing that to see if you can make it struggle even with your internet connection (just as a sanity check).
If this doesn't help, more details on your server (hardware specifications, web server software and thread pool settings, language) and the site/pages you are testing (mostly static or dynamic? large requests/responses?) would be useful. I've certainly managed to make lower-powered machines (e.g. EC2 m1.small) struggle using JMeter over a 2Mbps connection, but it depends on the site you're testing.

Performance/Load testing

I wish to do a performance test on my site, simulating thousands of user and find per server capacity limit. The tool I'm using is jmeter and I have prepared a .jmx for the test scenario. But when I try to simulate 1000 of users simultaneously I start to get:
<httpSample t="0" lt="0" ts="1338538936990" s="false" lb="VerifyPassword" rc="Non HTTP response code: java.net.SocketException" rm="Non HTTP response message: Too many open files" tn="LoadConfig 1-901" dt="text" by="1375"/>
I think the error is on the client side because of the too many socket connection. If so how can I simulate the case from my local machine? Can I increase the number of open sockets on linux?
Also one thing I discover testing from a single client can give false alarm where the client is the bottleneck and the server works fine. How can I do a performance testing such that I simulate a real life scenario such that I have 10K+ users each have its own CPU/ RAM and then do a performance testing?
I have run JMeter from .NET but I think will be the same for your case.
You cannot increase the number of sockets. You should do a distributed load testing.
Luckly for you Jmeter has this ability :)
The google term you should look for is distributed JMeter testing or remote JMeter testing. If it happens that you only can use your local machine you might use virtual machines to create several JMeter distributed instances...
Check:
http://jmeter.apache.org/usermanual/remote-test.html

Linux vs Win runtime timings

I have an application which was ported from Windows to Linux. Now the same code compiles on VS C++ and g++, but there is a difference in performance when it's running on Win and when it's running on Linux. The scope of this application is caching. It's a node between a server and a client, and it's caching client requests and server response in a list, so that any other client which makes requests that was already processed by the server, this node will response instead of forwarding it to server.
When this node runs on Windows, the client gets all it needs in about 7 seconds. But when same node is running on Linux (Ubuntu 9.04), the client starts up in 35 seconds. Every test is from scratch. I'm trying to understand why is this timing difference. A weird scenario is when the node is running on Linux but in a Virtual Machine, hosted by Win. In this case, load time is around 7 seconds, just like it was running Win natively. So, my impression is that there is a problem with networking.
This node is using UDP protocol for sending and receiving network data, and it's using boost::asio as implementation. I tried to change all supported socket flags, changed buffer size, but nothing.
Does someone know why is this happening, or any network settings related with UDP that might influence the performance?
Thanks.
If you suspect a network problem take a network capture (Wireshark is great for this kind of problem) and look at the traffic.
Find out where the time is being spent, either based on the network capture or based on the output of a profiler.
Once you know that you're half way to a solution.
These timing differences can depend on many factors, but the first one coming to mind is that you are using a modern Windows version. XP already had features to keep recently used applications in memory, but in Vista this was much better optimized. For each application you load, a special load file is created that is equal to how it looks in memory. Next time you load your application, it should go a lot faster.
I don't know about Linux, but it is very well possible that it needs to load your app completely each time. You can test the difference in performance between the two systems much better if you compare performance when running. Leave your application open (if it is possible with your design) and compare again.
These differences in how the system optimizes memory are backed up by your scenario using the VM approach.
Basically, if you rule out other running applications and if you run your application in high priority mode, the performance should be close to equal, but it depends on whether you use operating system specific code, how you access the file system, how you you use the UDP protocol etc etc.

Resources