Jbehave - thucydides multi-thread execution of selected stories - multithreading

my organization is using selenium automation framework designed using Jbehave-Thucydides-Maven. The framework consists of 1500+ tests, however, not all are required to be executed every time. Whenever we execute small batch (say 10 scripts or so), all 1500+ scripts are loaded in system and filtering is done on "Meta" tags (which are passed at time of execution) to execute selected 10 scripts. This is causing high overall execution time wherein actual scripts execution takes only 10 mins whereas loading the scripts and filtering takes 15+ mins making high total execution time. Below is the snap-shot of the maven pom which is used to trigger the multi-thread execution. Could you please advise what changes are required so that only required 10 scripts are loaded in system rather than whole 1500+ scripts.enter image description here

Related

JMeter - Wait until the predefined time to start next thread

I configured Java based Selenium WebDriver test in Apache JMeter with the following setup:
Number of Threads (Users): 10
Ramp-up period (Second): 120
Loop Count: 1
I ticked the Delay Thread Creation until needed to save resources.
My expectation regarding the functionality:
I expected that if I have 10 users with 120 seconds ramp up time, then every user activity will start each other and the Jmeter will wait at least 12 seconds to start the next thread.
The issue is:
The threads start sometimes within 11 seconds, sometimes 12 seconds.
I don't know why does it happen because I would like to see the threads start after each other exactly in 12 seconds.
The question is
Are there any solution that to tell the JMeter to wait exactly 12 seconds for next thread start?
Here is the picture about started jobs with date time stamp:
I don't think you will be able to achieve this level of precision using ramp-up period approach of the normal Thread Group, a better idea would be going for the Ultimate Thread Group (can be installed using JMeter Plugins Manager) which allows absolute flexibility in terms of definition of ramp-up, ramp-down and time to hold the load.
Example setup:
Example output:
In order to get only one execution of the "job" per each virtual user you can use Throughput Controller configured like:
You can add Flow Control Action for pausing exact time
it allows pauses to be included without needing to generate a sample. For variable delays, set the pause time to zero, and add a Timer as a child.

VSTest: Order the execution of test assemblies

Our codebase has more than 100 projects with tests in each. Some test assemblies take more time while some other assemblies are taking less time for the execution of the tests.
The Azure DevOps Server is running our whole test suit in parallel, which makes it really fast.
But the problem is, that the long running tests are started in the middle of the testrun, which has the effect, that the whole testrun will be longer.
Is there a way, to influence the order of how and when the test assemblies are started? I want to start the long running test assemblies first and after that the fast test assemblies.
Since you are running the Test in parallel, you could try to use the Based on past running time of tests option in Visual Studio Test task.
According to this doc about Parallel test:
This setting considers past running times to create slices of tests so that each slice has approximately the same running time. Short-running tests will be batched together, while long-running tests will be allocated to separate slices.
This option allows tests to be run in groups based on running time. Finally , each group will be completed in a similar time.
Hope this helps.
We have achieved this by arranging the project-folders so they sort to give the longest running test assemblies first. You can see the order that VSTest finds the assemblies in the Azure DevOps output. From there you can rename folder to affect the order.
It would be nice if there was another way to effect this.

Applying BDD testing to batch scenarios?

I'm trying to apply BDD practices to my organization. I work in a bank where the nightly batch job is a huge orchestration multi-system flow of batch jobs running and passing data between one another.
During our tests, interactive online tests probably make up only 40-50% of test scenarios while the rest are embedded inside the batch job. As an example, the test scenario may be:
Given that my savings account has a balance of $100 as of 10PM
When the nightly batch is run at 11PM
Then at 3AM after the batch run is finished, I should come back and see that I have an additional accrued interest of $0.001.
And the general ledger of the bank should have an additional entry for accrued interest of $0.001.
So as you can see, this is an extremely asynchronous scenario. If I were to use Cucumber to trigger it, I can probably create a step definition to insert the $100 balance into the account by 10PM, but it will not be realistic to use Cucumber to trigger the batch to be run at 11PM as batch jobs are usually executed by operators using their own scheduling tools such as Control-M. And having Cucumber then wait and listen a few hours before verifying the accrued interest, I'm not sure if I'll run into a timeout or not.
This is just one scenario. Batch runs are very expensive for the bank and we always tack on as many scenarios as possible to ride on a single batch run. We also have aging scenarios where we need to run 6 months of batch just to check whether the final interest at the end of a fixed deposit term is correct or not (I definitely cannot make Cucumber wait and listen for that long, can I?)
My question is, is there any example where BDD practices were applied to large batch scenarios such as these? How would one approach this?
Edit to explain why I am not targeting to execute isolated test scenarios where I am in control:
We do isolated scenarios in one of the test levels (we call it Systems Test in my bank) and BDD indeed does work in that context. But eventually, we need to hit a test level that has an entire end-to-end environment, typically in SIT. In this environment, it is a criteria for multiple test scenarios to be run in parallel, none of which have complete control over the environment. Depending on the scope of the project, this environment may run up to 200 applications. So customer channels such as Internet Banking will run transactional scenarios, whiles at the core banking system, scenarios such as interest calculation, automatic transfers etc will be executed. There will also be accounting scenarios where a general ledger system consolidates and balances all the accounts in the environment. To do manual testing in this environment frequently requires at least 30-50 personnel executing transactions and checking on results.
What I am trying to do is to find a way to leverage on a BDD framework to automate test execution and capture the results so that we do not have to manually track them all in the environment.
It sounds to me as if you are not in control over the execution of the scenario.
It is obviously so that waiting for a couple of hours before validating a result is a not a great idea.
Is it possible to extract just the part of the batch that is interesting in this scenario? If that is possible, then I would not expect the execution time to 4 - 6 hours.
If it isn't possible to execute the desired functionality in isolation, then you have a problem regarding test-ability of your system. This is very common and something you really want to address. If the only way to test is to run the entire system, then you are not able to confidently say that it is working properly since all combinations that need testing are hard, sometimes even impossible, to execute.
Unfortunately, there doesn't seem to exist a quick fix. You need to be in a position where you are able to verify small parts of the system in order to verify them fast and reliably. And it doesn't matter if you are using Cucumber or any other tool to for the verification, all tools will have the same issue.
One approach you might consider would be to have a reporting process that queries the results of each batch run. It would then store the results you were interested in (i.e. those from your tests) in to a test analysis database.
I'm assuming that each batch run has a unique identifier. This identifier would be used as the key for the test results.
Here is an example of how it might work:
We know when the batch runs are finished (say this is at 4am). We schedule a reporting job to start after batch run completion (say at 5am) that analyses the test accounts.
The reporting job looks at Account X and Account Y. It records the amount of money in their account in a table alongside the unique identifier for the batch run. This information is stored in a test results database.
A separate process matches up test scenarios with test results. It knows test scenario 29 was tied to batch run ZZ20 and so goes looking in the test results database for the analysis from batch run ZZ20.
In the morning the test engineer checks the results of the run. They see that test scenario 29 failed as there was only £100 in Account X rather than the £100.001 that was expected.
This setup would allow you to synchronously process asynchronous batch runs. It would be challenging to configure though, as you would need to do a lot of automation around reporting and linking test scenarios with test results.

Preventing cronjobs from overlapping

I have 3 different jobs set up in crontab (call them jobA, jobB, jobC) that run at different intervals and start at different times during the day. For example, jobA runs once per hour at 5 mins past the hour, jobB runs every 30 mins at 9 and 39 mins past the hour, and jobC runs every 15 mins. They are not dependent on each other, but for various reasons they can NOT be running at the same time.
The problem is that sometimes one of the jobs takes a long time to run and another one starts before the first one is done, causing issues.
Is there some way to queue or spool these jobs so that one will not start until the current running one has finished? I tried using this solution but this does not guarantee that the pending jobs will resume in the same order they were supposed to start. A queue would be best, but I cannot find anything about how to do this.
You can't do that using cron. Cron is used to run a specific command at specific time. You can do it by the solution you proposed, but that adds a lot more complexity.
I suggest, writing/coding the requirement in high level language like java and use a mutil-thread program to achieve what you need.
Control-m is another scheduling software, with a lot of other features as well. You would be able to integrate the above use-case in it.

How to define frequency of a job in application by users?

I have an application that has to launch jobs repeatingly. But (yes, that would have been to easy without a but...) I would like users to define their backup frequency in application.
In worst case, they would have to choose between :
weekly,
daily,
every 12 hours,
every 6 hours,
hourly
In best case, they should be able to use crontab expressions (see documentation for example)
How to do this? Do I launch a job every minutes that check for last execution time, frequency and then launches another job if needed? Do I create a sort of queue that will be executed by a masterjob?
Any clues, ideas, opinions, best pratices, experiences are welcome!
EDIT : Solved this problem using Akka scheduler. Ok, this is a technical solution not a design answer but still everything works great.
Each user defined repetition is an actor that send messages every period to a new actor to execute the actual job.
There may be two ways to do this depending on your requirements/architecture:
If you can only use Play:
The user creates the job and the frequency it will run (crontab, whatever).
On saving the job, you calculate the first time it will have to be run. You then add an entry to a table JOBS with the execution time, job id, and any other information required. This is required as Play is stateless and information must be stored in the DB for later retrieval.
You have a job that queries the table for entries whose execution date is less than now. Retrieves the first, runs it, removes it from the table and adds a new entry for next execution. You should keep some execution counter so if a task fails (which means the entry is not removed from DB) it won't block execution of the other tasks by the job trying again and again.
The frequency of this job is set to run every second. That way while there is information in the table, you should execute the request around as often as they are required. As Play won't spawn a new job while the current one is working if you have enough tasks this one job will serve all. If not, it will be killed at some point and restored when required.
Of course, the crons of the users will not be too precise, as you have to account for you own cron delays plus execution delays on all the tasks in queue, which will be run sequentially. Not the best approach, unless you somehow disallow crons which run every second or more often than every minute (to be safe). Doing a check on execution time of the crons to kill them if they are over a certain amount of time would be a good idea.
If you can use more than Play:
The better alternative I believe is to use Quartz (see this) to create a future execution when the user creates the job, and reproram it once the execution is over.
There was a discussion on google-groups about it. As far as I remember you must define a job which start every 6 hours and check which backups must be done. So you must remember when the last backup job was finished and make the control yourself. I'm unsure if Quartz can handle such a requirement.
I looked in the source-code (always a good source ;-)) and found a method every, where I think this should be do what you want. How ever I'm unsure if this is a clever design, because if you have 1000 user you will have then 1000 Jobs. I'm unsure if Play was build to handle such a large number of jobs.
[Update] For cron-expressions you should have a look into JobPlugin.scheduleForCRON()
There are several ways to solve this.
If you don't have a really huge load of jobs, I'd just persist them to a table using the required flexibility. Then check all of them every hour (or the lowest interval you support) and run those eligible. Simple.
Or, if you prefer to use cron syntax anyway, just write (export) jobs to a user crontab using a wrapper which calls back to your running app, or starts the job in a standalone process if that's possible.

Resources