There are some libraries in Haskell for testing properties or for unit tests, etc. But I have yet to find one that allows for testing termination/nontermination of a function.
Of course you can't simply test if an expression terminates because of the Halting problem. But I could have a guess on the upper bound of how long the function takes to evaluate and if the function doesn't end in that time, the test would fail as the function was (likely) not going to terminate.
Is there a library that allows for testing termination like this?
Related
Background
I am writing integration tests using the guidelines in the Rust Book.
A requirement is to be able to run setup and teardown code at two levels:
Test level: Before and after each test.
Crate level: Before and after all tests in the crate.
The basic idea is as follows:
I wrote a TestRunner class.
Each integration test file (crate) declares a lazy-static singleton instance.
Each #[test] annotated method passes a test function to a run_test() method of the TestRunner.
With this design it is straightforward to implement Test level setup and teardown. The run_test() method handles that.
It is also no problem to implement the setup part of the Crate level, because the TestRunner knows when run_test() is called for the first time.
The Problem
The remaining problem is how to get the TestRunner to execute the Crate level teardown after the last test has run.
As tests may run in any order (or even in parallel), the TestRunner cannot know when it is running the last test.
What I Have Tried
I used the ctor crate and the #[dtor] annotation to mark a function as one that will run before the process ends. This function calls the Crate level teardown function.
This is an unsatisfactory solution because of this issue which documents the limitations of what can be done in a dtor function.
A Step in the Right Direction
I propose to pass a test_count argument to the TestRunner's constructor function.
As the TestRunner now knows how many test functions there are, it can call the Crate level teardown function after the last one completes. (there are some thread safety issues to handle, but they are manageable).
The Missing Link
Clearly, the above approach is error prone as it depends on the developer updating the test_count argument every time she adds or removes a test or marks one as ignored.
The Remaining Problem
I would therefore like to be able to detect the number of tests in the crate at compile time without any manual intervention by the developer.
I am not familiar enough with Rust macros to write such a macro myself, but I assume it is possible.
Maybe it is even possible to use the test crate object model for this (See https://doc.rust-lang.org/test/struct.TestDesc.html)
Can anyone suggest a clean way to detect (at compile time) the number of tests that will run in the crate so I can pass it to the constructor of the TestRunner?
I am using Haskell Test Framework through Stack to evaluate QuickCheck properties. When I run stack test, failing properties are reported in the form of Gave up! Passed only 95 tests. The many examples of property testing I've found report failures in the form of Falsifiable, after 48 tests followed by the arguments that failed. These examples, however, seem to be running QuickCheck directly instead of through Stack and HTF.
How can I configure my environment to report the arguments generated by QuickCheck that failed to satisfy the property under test? As pointed out in Testing with HTF, documentation is already sparse and poor for some of these tools alone, let alone for combining them together.
"Gave up!" means a different kind of failure than "Falsifiable".
QuickCheck has a way of discarding test cases you consider "improper", counting neither towards actual success or failure. A typical source of such discards comes from using the implication operator (==>), where tests cases which do not satisfy the precondition are discarded: "success" is only counted when the precondition is satisfied, to give you a better idea of the extent to which you are testing the postcondition to the right (which is likely the part that actually matters to you as a user). Explicit use of the discard property is also possible, with a different meaning from an actual failure such as returning False.
Discarded tests thus do not falsify the property as a whole (an implication with a false precondition is logically true), but too many discarded tests may result in insufficient coverage, which is signaled through the failure you observed, and there is no counterexample to print. To resolve this failure, find where the discards are coming from, possible outcomes include:
use a better generator (avoiding discards);
raise the discard threshold, #stefanwehr shows how to do this in HTF in the other answer;
these discards should actually be failures.
#Li-yao Xia is right by saying that your generator generates to many discardable test cases. To raise the discard threshold with HTF, you would write your property like this:
prop_somePropertyWithRaisedDiscardThreshold =
withQCArgs (\args -> args { maxDiscardRatioy = 1000 })
somePredicateOrProperty
The args variable has type Args, directly from the quickcheck package.
I came from JS world and I am used to do thorough testing of all the possible cases that can be a result of weak typing. That way, inside a function I check all the incoming parameters to conform to some criteria.
As an example, in function createUser(username, id, enabled, role){} I would check if username is a string, id is a UUID, status is boolean, and role is a string that must be 'admin', 'user' or 'system'.
I create tests for these cases to make sure that when I pass wrong parameters, tests fail and I need to find bugs that lead to this. At the end, I have quite a lot of tests, many of which are type-checking tests.
Now, I am playing with Swift which is strongly-typed. I use it to create a client app that consumes data from a NodeJS server side. If I want to create a similar createUser() function in Swift, it seems like I need much less tests because type checking is in the language itself.
Is it right to think that a basically a strongly-typed language needs less tests than a weakly-typed one? Some tests just seem to be unnecessary in Swift and the whole test process seems to be more lightweight.
Are there things I can do to write even less tests by using language constructs in some specific manner and still be sure the code is correct and would pass tests by definition?
The use of
optionals and non-optionals, guard, if let
may save you some nil checks
for example -
Guard Statement
A guard statement is used to transfer program control out of a scope if one or more conditions aren’t met.
A guard statement has the following form:
guard condition else {
statements
}
and more generally, read this -
https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/Statements.html
If you look at the call stack of a program and treat each return pointer as a token, what kind of automata is needed to build a recognizer for the valid states of the program?
As a corollary, what kind of automata is needed to build a recognizer for a specific bug state?
(Note: I'm only looking at the info that could be had from this function.)
My thought is that if these form regular languages than some interesting tools could be built around that. E.g. given a set of crash/failure dumps, automatically group them and generate a recognizer to identify new instances of know bugs.
Note: I'm not suggesting this as a diagnostic tool but as a data management tool for turning a pile of crash reports into something more useful.
"These 54 crashes seem related, as do those 42."
"These new crashes seem unrelated to anything before date X."
etc.
It would seem that I've not been clear about what I'm thinking of accomplishing, so here's an example:
Say you have a program that has three bugs in it.
Two bugs that cause invalid args to be passed to a single function tripping the same sanity check.
A function that if given a (valid) corner case goes into an infinite recursion.
Also as that when the program crashes (failed assert, uncaught exception, seg-V, stack overflow, etc.) it grabs a stack trace, extracts the call sites on it and ships them to a QA reporting server. (I'm assuming that only that information is extracted because 1, it's easy to get with a one time per project cost and 2, it has a simple, definite meaning that can be used without any special knowledge about the program)
What I'm proposing would be a tool that would attempt to classify incoming reports as connected to one of the known bugs (or as a new bug).
The simplest thing would be to assume that one failure site is one bug, but in the first example, two bugs get detected in the same place. The next easiest thing would be to require the entire stack to match, but again, this doesn't work in cases like the second example where you have multiple pieces of (valid) valid code that can trip the same bug.
The return pointer on the stack is just a pointer to memory. In theory if you look at the call stack of a program that just makes one function call, the return pointer (for that one function) can have different value for every execution of the program. How would you analyze that?
In theory you could read through a core dump using a map file. But doing so is extremely platform and compiler specific. You would not be able to create a general tool for doing this with any program. Read your compiler's documentation to see if it includes any tools for doing postmortem analysis.
If your program is decorated with assert statements, then each assert statement defines a valid state. The program statements between the assertions define the valid state changes.
A program that crashes has violated enough assertions that something broken.
A program that's incorrect but "flaky" has violated at least one assertion but hasn't failed.
It's not at all clear what you're looking for. The valid states are -- sometimes -- hard to define but -- usually -- easy to represent as simple assert statements.
Since a crashed program has violated one or more assertions, a program with explicit, executable assertions, doesn't need an crash debugging. It will simply fail an assert statement and die visibly.
If you don't want to put in assert statements then it's essentially impossible to know what state should have been true and which (never-actually-stated) assertion was violated.
Unwinding the call stack to work out the position and the nesting is trivial. But it's not clear what that shows. It tells you what broke, but not what other things lead to the breakage. That would require guessing what assertions where supposed to have been true, which requires deep knowledge of the design.
Edit.
"seem related" and "seem unrelated" are undefinable without recourse to the actual design of the actual application and the actual assertions that should be true in each stack frame.
If you don't know the assertions that should be true, all you have is a random puddle of variables. What can you claim about "related" given a random pile of values?
Crash 1: a = 2, b = 3, c = 4
Crash 2: a = 3, b = 4, c = 5
Related? Unrelated? How can you classify these without knowing everything about the code? If you know everything about the code, you can formulate standard assert-statement conditions that should have been true. And then you know what the actual crash is.
I have a Haskell RPCXML (HaXR) server process, run with GHC, that needs to execute any function that it's passed. These functions will all be defined at runtime so the compiled server won't know about them.
Is there a way to load a function definition at runtime? A method that avoids disk IO is preferable.
Thanks.
hint seems to be popular these days.
Although to load a function definition I think you will either have to put it into a module, or re-interpret it every time you use it.