Distributed A/B-like testing with Locust - performance-testing

I need to compare network latency for 2 clients half the globe away from each other and visualize the response time history for each location the request comes from side-by-side. I've been researching load and performance testing tools and found Locust to be very convenient. Is there a way I can achieve my goal with Locust in a quick/standard/non-hacky way?

Whether you mean you need 2 clients in different locations running the same task to compare or 2 clients in 2 different locations running 2 different tasks, Locust can handle either of those scenarios.
Check out the Tasks section of the documentation for details about how to write tasks. You can write a single task that does both of what you need, two different tasks that are randomly selected for each user that starts, write two tasks and have them be weighted in their randomness (including making one weighted at 0 so it's never run, effectively turning one task off so only the other one is run at that time), and many other options. Which method is best will depend on exactly what you need and how you want to do it. It may take some experimentation to determine what's best.
As for running in multiple locations, you can run the test separately in different places and compare results, or Locust can run distributed so you can have workers in multiple locations running at the same time. You may also want to look at using Docker, which in some ways can make running in different locations easier if you use AWS, Azure, GCP or whatever other cloud provider to spin up instances to run on.

Related

How does a project like folding#home communicate with your computer to solve protein folding?

How does a program like folding#home work? Does my computer
individually perform a unit of "work" on it completely separate to other computers running folding#home? Then send the answer back when it's completed?
Or does Folding#home see all the computers connected to it as the project having let's say 1000 cores and then when work is done it's the equivalent of saying something like make -j <total number of cores>
Projects likes Folding#Home and BOINC are examples of loosely-coupled parallel computing where each task is fully self-contained and can be completed without communication with other computing entities. They are also examples of a pattern known as controller/worker (used to be known as master/worker), in which a central controller splits a large task into a pool of small(er) subtasks and distributes it to a bunch of worker processes on a first come first served basis, which corresponds to your first point.
In F#H (and BOINC), client computers connect to the server, request a task, work on it until it's complete, then connect to the server again to return the result and request a new task. The benefits of this are automatic load balancing, fault tolerance (via redundancy), and no need for scheduling.
When you run make -j #cores, make launches a number of parallel jobs but those jobs are usually interdependent, so make has to schedule them in an optimal way. The jobs are then run as processes on the same computer which affords make full process control. If a build step fails, the entire build job aborts immediately and the user can quickly look into the problem, fix it, and restart the build. This is not a viable model for when a client computer could have an arbitrary compute speed, could connect and disconnect at any time, and/or could decide to simply stop processing tasks. There are distributed versions of make like dmake that run different parts of the build process on different remote nodes, but that still happens in a tightly controlled environment, typically on a build cluster.
Note that on a very high level of abstraction the two are basically equivalent with the main difference being whether jobs are pushed or pulled. While job pulling works fine on all kinds of systems, job pushing usually requires (tightly-coupled) systems with predictable characteristics and good scheduling algorithms to be efficient.

Is it possible to run different tasks on different schedules with prefect?

I'm moving my first steps with prefect, and I'm trying to see what its degrees of freedom are. To this end, I'm investigating whether prefect supports running different tasks on different schedules in the same python process. For example, Task A might have to run every 5 minutes, while Task B might run twice a day with a Cron scheduler.
It seems to me that schedules are associated with a Flow, not with a task, so to do the above, one would have to create two distinct one-task Flows, each with its own schedule. But even as that, given that running a flow is a blocking operation, I can't see how to "start" both flows concurrently (or pseudo-concurrently, I'm perfectly aware the flows won't execute on separate threads).
Is there a built-in way of getting the tasks running on their independent schedules? I'm under the impression that there is a way to achieve this, but given my limited experience with prefect, I'm completely missing it.
Many thanks in advance for any pointers.
You are right that schedules are associated with Flows and not Tasks, so the only place to add a schedule is a Flow. Running a Flow is a blocking operation if you are using the open source Prefect core only. For production use cases, it's recommended running your Flows against Prefect Cloud or Prefect Server. Cloud is the managed offering and Server is when you host it yourself. Note that Cloud has a very generous free tier.
When using a backend, you will use an agent that will kick off the flow run in a new process. This will not be blocking.
To start with using a backend, you can check the docs here
This Prefect Discourse topic discusses a very similar problem and shows how you could solve it using a flow-of-flows orchestrator pattern.
One way to approach it is to leverage Caching to avoid recomputation of certain tasks that require lower-frequency scheduling than the main flow.

Migrating multi-threaded program to Docker containers

I have an old algorithm that executes two steps, say A and B. A deals with connecting to an external service and retrieve data points. A needs to be executed n times, and all the data so gathered is passed as input to B. To help scale the A part, it was implemented using multi-threading where 10 threads are spawned and each connects to n/10 endpoints. Once all threads complete execution, the complete dataset is provided as input to B.
We are planning to create a Docker image of this algo. While we do that I would like to explore if we can do away with the multi threading and instead deploy multiple containers. This gives us better scalability as n is a variable and sometimes is very small or very large.
I could use Kubernetes to orchestrate these containers. However, I see two challenges:
1. How do I gather back all the data points into my core algo?
2. How do I know that all containers have finished processing, and that I could move to step B?
Any pointers or help is appreciated.

Jmeter web application performance testing doubts?

I am new to jmeter and have a couple of doubt about web application performance testing.
Is it necessary to load all embedded resource in jmeter for performance testing ?
I have written a Jmeter script that exercise all REST apis. Is this enough to find the application performance at the server side ?
How Ramp up time affects the Performance test ?
For how much time the test needs to be executed, to get an accurate performance report ?
Load Generation configuration - Generating load from machines attached to application cluster / from different LAN ?
Kindly find my view on the questions below:
I believe that load test needs to be as much realistic as possible so representing real browser behavior is a must. Real browsers download embedded resources like scripts, images and styles, moreover, they use a concurrent thread pool of 2 - 8 threads to do this in parallel. So you need to configure JMeter similarly. However real browsers download these assets only once, on subsequent requests they return embedded resources from cache. So make sure that you configure JMeter to:
download embedded resources
use concurrent pool for it
add HTTP Cache Manager to your test plan
It should be enough from functional point of view as usually static content is being served separately. However see point 1, if you have possibility to simulate real user behavior - go for it
It is better to have reasonable ramp-up and ramp-down periods so the load could increment gradually so both server and load generator sides won't experience peak stress loads (unless it is your test case). See the bit on ramp-up from JMeter documentation
Ramp-up needs to be long enough to avoid too large a work-load at the start of a test, and short enough that the last threads start running before the first ones finish (unless one wants that to happen).
Start with Ramp-up = number of threads and adjust up or down as needed.
By default, the thread group is configured to loop once through its elements.
Usually peak load follows general Pareto principle, during "peak" periods application served 80% of requests during 1-2 hours time frame and remaining 20% of requests were more or less equally distributed between remaining 20 hours in a day. So it should be enough to test your application providing anticipated peak load for a couple of hours. Again if time allows I would recommend to go for Soak testing to see if there are any memory leaks and for Stress testing to determine application load boundaries and whether it recovers from stress load or not
Theoretically application shouldn't care regarding requests source (unless it uses different logic to handle requests from i.e. different geo regions). One thing is obvious: don't run load generator and application under test on the same machine. If one JMeter instance cannot create enough load to implement test scenario - go for distributed testing
I'd like to add some more perspective:
Question 1 & 2:
The Pareto principle can be applied here also - meaning, that it takes a lot of effort to properly simulate reality, downloading all resources used by a browser to render a page and to give the proper 'weights' to different URLs, simulating user behaviour accurately. This is where many load tests fail, because simulating reality accurately is very, very hard. As the previous response mentions, most static content is often served via CDNs or similar anyway, and what you really want to test is usually your own system's capability to handle traffic.
Considering the above, I would say that if you spend 20% of effort setting up a load test that tests your REST API, you will get 80% of the results you want. If you on the other hand go for a completely realistic test, you will spend another 80% of effort for only 20% more results. The effect of this is that in many cases it is better to go for the simpler test, that does not simulate reality accurately. It gives you the most return on your invested time.
Question 3: Agree fully with previous response here. Ramp up slowly, unless your specific use case sees very sudden traffic peaks (like if you're an online auction service or ticket sales or similar). Can also be a good idea to configure your test so it spends some time on a "plateau" after ramping up to peak load, and not just stopping the load test once you reach the peak.
Question 4: I would say you need to run the load test long enough to produce stable, statistically significant results. This can be 5 minutes or 5 hours depending on your scenario, but half an hour is probably a good minimum time to aim for in mostly all cases. The test duration should not be dependent on how long your site tends to experience peak load in real life though - not unless you're doing some kind of soak test.
Question 5: Traffic origin is sometimes worth thinking about, as different source locations lead to different network delay between (simulated) clients and server, which affects transaction rates. If you run a load test with 1,000 VUs on a system located in New York, and generate the traffic from Australia, you will not get a lot of transactions per second due to the high network delay. If you run the same test using a load generator in New York instead, your transaction rate will be a lot higher because the network delay is so much lower. Of course, you can always add more concurrent clients/VUs/connections and get the same transaction rate on a high-delay network link that you would on a low-delay link, but at the cost of forcing the server to keep a lot more (TCP) connection state, using more file descriptors and buffer memory. I.e. might not be a very realistic scenario.

Resource mange external nodes in Jenkins for tests

My problem is that I have code that need a rebooted node. I have many long running Jenkins test jobs that needs to be executed on rebooted nodes.
My existing solution is to define multiple "proxy" machines in Jenkins with the same label (TestLable) and 1 executor per machine. I bind all the test jobs to the label (TestLable). In the test execution script I detect the Jenkins machine (Jenkins env. NODE_NAME) and use that to know what physical physical machine the tests should use.
Do anybody know of a better solution?
The above works but I need to define a high number of “nodes/machines” that may not be needed. What I would like was a plugin that would be able to grant a token to a Jenkins job. This way a job would not be executed before a Jenkins executor and a token was free. The token should be a string so that my test jobs could use it to know what external node it could use.
We have written our own scheduler that allocates stuff before starting Jenkins nodes. There may be a better solution - but this works for us mostly. I've yet to come across an off-the-shelf scheduler that can deal with complicated allocation of different hardware resources. We have n box types, allocated to n build types.
Some build types we have are not compatible together without destroying all persistent data - which may be required as it takes a long time to gather. Some jobs require combinations of these hardware types. We store the details in a DB, and then use business logic to determine how it is allocated. We've often found that particular job types need additional business logic or extra data fields to account for their specific requirements.
So it may be the best way is to write your own scheduler, in your own language of choice, which takes into account your particular needs.

Resources