How to show analysis result in jenkins and mark the build status

How to show analysis result in jenkins and mark the build status - python-3.x

I am executing a python script which actually analyze the system related metrics based on the threshold defined in the threshold_config.ini file.
The program can analyze data for metrics like disk, memory, swap and cpu.
For each metric i have two threshold value, one is for warning and another is for critical.
The script analysis and produce the report in text file saying Critical or warning for each of the systems.
I want to display this in jenkins, like a junit result, did anyone have any idea how to take custom reports and shows them in the jenkins junit format. Also I need to mark the build stable or unstable based on the warn and critical values.
Please help.

For the first bit (JUnit result format), you may want to translate results of comparison of your metrics against thresholds into a JUnit XML file, one per comparison. This requires low-level implementation, but you wouldn't be the first person doing that. If there is a better way to do that would depend on the exact format of results you've got.
For the second bit (passing/failing the build), you could use popular Jenkins JUnit plugin which will detect all your JUnit XML files and mark the build accordingly.

Related

Java Cucumber rerun certain failed scenarios

I would like to run all my features and then after all of them to re-run all the scenarios that failed and have certain tag (for example "rerun-on-fail" tag).
I can always parse the results of the first run and then run all the filtered scenarios manually, but I was wondering if there could be a way to dynamically add scenarios to the "queue" on runtime. Probably by making custom test runner. Although Cucumber runner class it final and overall cucumber code doesn't seem to be opened to extension much. Any ideas how to achieve this?
Edit: looks like there's interface FeatureSupplier, which looks pretty good for this.

How to compare WebdriverIO reports and generate the difference?

We are using WebdriverIO for our automated tests and we generate HTML reports with Mochawesome in the end based on the result JSON files.
Now we have a lot of implemented tests and we want to fetch the difference between two testruns as fast as possible. Therefore it would be cool if we will have a possibility to compare two testrun results with each other and to generate also a HTML report only with the test result differences.
Maybe there is a still existing implemantation/package to do that? Yes, of course it is possible to compare the two different JSON result files with each other, but I prefer a still implemented solution to save effort.
How would you do the comparison in my case?
Thanks,
Martin

You could set up a job in a CI tool like Jenkins.
Here it always compares the latest results with the previous build and tells you if it is a new failure, regression issue or a fixed script.
Regression indicates that test passed in the previous build, but failing in the new build
Failed indicates, it is failing from the past couple of builds
Fixed indicates, that it was failing in the last build, but now passing in the latest build

Is it possible to use Benchmark.NET to "fail" a CI build if performance has regressed too much?

I have unit tests. If one of them fails, my build fails.
I would like to apply the same principle to performance. I have a series of microbenchmarks for several hot paths through a library. Empirically, slowdowns in these areas have a disproportionate effect on the library's overall performance.
It would be nice if there were some way to have some concept of a "performance build" that can fail in the event of a too-significant performance regression.
I had considered hard-coding thresholds that must not be exceeded. Something like:
Assert.IsTrue(hotPathTestResult.TotalTime <= threshold)
but pegging that to an absolute value is hardware and environment-dependent, and therefore brittle.
Has anyone implemented something like this? What does Microsoft do for Kestrel?

I would not do this via unit-tests -- it's the wrong place.
Do this in a build/test-script. You gain more flexibility and can do a lot of more things that may be necessary.
A rough outline would be:
build
run unit tests
run integration tests
run benchmarks
upload benchmark results to results-store (commercial product e.g. "PowerBI")
check current results with previous results
upload artefacts / deploy packages
On 6. if there is a regression you can let the build fail with non-zero exit-code.
BenchmarkDotNet can export results as JSON, etc., so you can take advantage of that.
The point is how to determine if a regression occures. Espcecially on CI builds (with containers, and that like) there may be different hardware on different benchmark-runs, so the results are not 1:1 comparable, and you have to take this into account.
Personally I don't let the script fail in case of a possible regression, but it sends an information about that, so I can manually check if it's a true regression or just a cause by different hardware.
Regression is simply detected if the current results are worse than the median of the last 5 results. Of course this is a rough method, but an effective one and you can tune that to your needs.

LibreOffice: determine source code part responsible for printing

I am trying to implement some additional functionality to the LibreOffice printing process (some special info should be added automatically to the margins of every printed page). I am using RHEL 6.4 with LibreOffice 4.0.4 and Gnome 2.28.
My purpose is to research the data flow between LibreOffice and system components and determine which source codes are responsible for printing. After that I will have to modify these parts of code.
Now I need an advice on the methods of source code research. I found a plenty of tools and from my point of view:
strace seem to be very low-level;
gprof requires binaries recompiled with "-pg" CFLAGS; have no idea how to do it with LibreOffice;
systemtap can probe syscalls only, isn't it?
callgrind + Gprof2Dot are quite good together but perform strange results (see below);
For instance here is the call graph from callgrind output with Gprof2Dot visualisation. I started callgrind with such a command:
valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes /usr/lib64/libreoffice/program/soffice --writer
and received four output files:
-rw-------. 1 root root 0 Jan 9 21:04 callgrind.out.29808
-rw-------. 1 root root 427196 Jan 9 21:04 callgrind.out.29809
-rw-------. 1 root root 482134 Jan 9 21:04 callgrind.out.29811
-rw-------. 1 root root 521713 Jan 9 21:04 callgrind.out.29812
The last one (pid 29812) corresponds to the running LibreOffice Writer GUI application (i determined it with strace and ps aux). I pressed CTRL+P and OK button. Then I closed the application hoping to see the function responsible for printing process initialisation in logs.
The callgrind output was processed with a Gprof2Dot tool according to this answer. Unfortunately, I cannot see on the picture neither the actions I am interested in, nor the call graph as is.
I will appreciate for any info about the proper way of resolving such a problem. Thank you.

The proper way of solving this problem is remembering that LibreOffice is open source. The whole source code is documented and you can browse documentation at docs.libreoffice.org. Don't do that the hard way :)
Besides, remember that the printer setup dialog is not LibreOffice-specific, rather, it is provided by the OS.

What you want is a tool to identify the source code of interest. Test Coverage (TC) tools can provide this information.
What TC tools do is determine what code fragments have run, when the program is exercised; think of it as collecting as set of code regions. Normally TC tools are used in conjunction with (interactive/unit/integration/system) tests, to determine how effective the tests are. If only a small amount of code has been executed (as detected by the TC tool), the tests are interpreted as ineffective or incomplete; if a large percentage has been covered, one has good tests asd reasonable justification for shipping the product (assuming all the tests passed).
But you can use TC tools to find the code that implements features. First, you execute some test (or perhaps manually drive the software) to exercise the feature of interest, and collect TC data. This tells you the set of all the code exercised, if the feature is used; it is an overestimation of the code of interest to you. Then you exercise the program, asking it to do some similar activity, but which does not exercise the feature. This identifies the set of code that definitely does not implement the feature. Compute the set difference of the code-exercised-with-feature and ...-without to determine code which is more focused on supporting the feature.
You can naturally get tighter bounds by running more exercises-feature and more doesn't-exercise-feature and computing differences over unions of those sets.
There are TC tools for C++, e.g., "gcov". Most of them, I think, won't let/help you compute such set differences over the results; many TC tools seem not to have any support for manipulating covered-sets. (My company makes a family of TC tools that do have this capability, including compute coverage-set-differences, including C++).
If you actually want to extract the relevant code, TC tools don't do that.
They merely tell you what code by designating text regions in source files. Most test coverage tools only report covered lines as such text regions; this is partly because the machinery many test coverage tools use is limited to line numbers recorded by the compiler.
However, one can have test coverage tools that are precise in reporting text regions in terms of starting file/line/column to ending file/line/column (ahem, my company's tools happen to do this). With this information, it is fairly straightforward to build a simple program to read source files, and extract literally the code that was executed. (This does not mean that the extracted code is a well-formed program! for instance, the data declarations won't be included in the executed fragments although they are necessary).
OP doesn't say what he intends to do with such code, so the set of fragments may be all that is needed. If he wants to extract the code and the necessary declarations, he'll need more sophisticated tools that can determine the declarations needed. Program transformation tools with full parsers and name resolvers for source code can provide the necessary capability for this. This is considerably more complicated to use than just test coverage tools with ad hoc extraction text extraction.

Time virtualisation on linux

I'm attempting to test an application which has a heavy dependency on the time of day. I would like to have the ability to execute the program as if it was running in normal time (not accelerated) but on arbitrary date/time periods.
My first thought was to abstract the time retrieval function calls with my own library calls which would allow me to alter the behaviour for testing but I wondered whether it would be possible without adding conditional logic to my code base or building a test variant of the binary.
What I'm really looking for is some kind of localised time domain, is this possible with a container (like Docker) or using LD_PRELOAD to intercept the calls?
I also saw a patch that enabled time to be disconnected from the system time using unshare(COL_TIME) but it doesn't look like this got in.
It seems like a problem that must have be solved numerous times before, anyone willing to share their solution(s)?
Thanks
AJ

Whilst alternative solutions and tricks are great, I think you're severely overcomplicating a simple problem. It's completely common and acceptable to include certain command-line switches in a program for testing/evaluation purposes. I would simply include a command line switch like this that accepts an ISO timestamp:
./myprogram --debug-override-time=2014-01-01Z12:34:56
Then at startup, if set, subtract it from the current system time, and indeed make a local apptime() function which corrects the output of regular system for this, and call that everywhere in your code instead.
The big advantage of this is that anyone can reproduce your testing results, without a big readup on custom linux tricks, so also an external testing team or a future co-developer who's good at coding but not at runtime tricks. When (unit) testing, that's a major advantage to be able to just call your code with a simple switch and be able to test the results for equality to a sample set.
You don't even have to document it, lots of production tools in enterprise-grade products have hidden command line switches for this kind of behaviour that the 'general public' need not know about.

There are several ways to query the time on Linux. Read time(7); I know at least time(2), gettimeofday(2), clock_gettime(2).
So you could use LD_PRELOAD tricks to redefine each of these to e.g. substract from the seconds part (not the micro-second or nano-second part) a fixed amount of seconds, given e.g. by some environment variable. See this example as a starting point.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string