Implicitly registering unit tests in Haskell - haskell

I'm new to Haskell, with a C++ background. I'm doing some exercises in Haskell, and I want to implement them as a bunch of functions covered with unit tests, so the testing driver is my only app.
And with my background, I'm looking for something like GTest. HUnit is its analog in Haskell world. But need to explicitly register tests is really annoying - thats tedious and violates DRY principle.
So I was thinking about experimenting with custom testing framework.Seems that template Haskell can be used to automate providing assertion descriptions and registering tests within one module. But how can automatically collect all tests from all linked modules?
Of course, it is always possible to write build script that would grep sources and generate required code, but I wonder, if this can be done in Haskell only?

test-framework-th provides this functionality for test-framework. The simplest thing is to use the defaultMainGenerator function to collect all top-level definitions prefixed with case_ (HUnit) or prop_ (QuickCheck) into test groups.
If you have multiple test groups, you do still need to list them in a main entry point for your tests. There’s probably a way around that, and I guess that’s what you’re really asking about, but honestly I have found little need to break tests into more than a handful of modules. The effort needed to avoid repetition is sometimes less than the effort needed to maintain it.

Related

Is it common to have example values for compile-time checking and where should they go in the code?

I have a reasonably complex structure of data types and records, which is not so easy to make sense of by just looking at production code for someone not familiar with the codebase.
To make sense of it, I created this kind of dummy functions that have two advantages: 1. they're checked at compile-time and 2. they serve as some kind of documentation, showcasing an example of how the overall structure of data types and records mixes together:
-- benefit 1: a newcomer can quickly make sense of the type system
-- benefit 2: easier to keep track of how the types evolve because of compile-time checking
exampleValue1 :: ApiResponseContent
exampleValue1 = ApiOnlineResponseContent [
OnlineResultRow (EntityId 10) [Just (FvInt 1), Just (FvFloat 1.5), Nothing],
OnlineResultRow (EntityId 20) [Just (FvInt 2), Nothing, Just (FvBool True)]
]
The only thing that bothers me a bit is that they feel a bit awkward put within the production code, as they're clearly dead code. However they're not tests either, they're just compiled-time-checked examples of how values can be assembled together from complex nested types. Therefore, they clearly don't belong to production code, but they don't quite belong to tests either.
Is it common practice to have this kind of compile-time examples? And where should they be placed within the codebase?
Have you considered including the examples as part of Haddock documentation?
Otherwise, there's a school of thought that try to reframe tests as examples. You can find a brief mention of this in Gerard Meszaros's work on unit testing. Dan North has also repeatedly used similar language, but it can often be difficult to track down his many iterations of such ideas. He tends to 'think in public' - here's one such reflection:
"I call this development, using example-guided design"
I understand that the kind of example being asked about isn't quite like that. Still, I think that it better belongs with test code than with production code.
If you put the examples in the production code, you essentially make it part of the API of your library (if you're shipping a library). This means that changing the examples would constitute a breaking change. That doesn't seem right to me.
In the BDD/DDD community, there's a lot of emphasis on tests as examples, also in the sense that automated tests serve as documentation. If Haddock documentation isn't an option, I'd consider putting the examples in the test code, as documentation. I sometimes do that by simply putting such 'vacuous tests' in a file called examples, perhaps with a little comment at the top to explain that the code in that file assists learning rather than verify behaviour.
It dodges the risk of introducing redundant breaking changes in the production code, and seems conceptually like a better fit.
I disagree that these aren't tests. Even “does this example compile” could be seen as a test, but probably you could also use them to actually test some functions, while you're at it.
So, put these definitions in your test suite, and use them for unit testing the functions that will actually be dealing with such values.
The main convention I’ve seen for housing such examples is to include ….Tutorial modules, such as Dhall.Tutorial and Clash.Tutorial, or parallel …-tutorial packages such as lens-tutorial.
These contain Haddock documentation, organised so as to read in linear order, alongside example definitions like yours.
With good examples, I find this convention very helpful for understanding and experimenting with a new package in addition to types and reference docs. It’s also fairly discoverable when browsing packages on Hackage, especially if you ensure that the reference and README link to the tutorial for longer explanations.
These modules can be just compiled as basic validation, but having more machine validation for docs is incredibly valuable, so I think it’s even better to link them into the test suite (e.g. hspec) as examples for unit tests (HUnit) or property tests (QuickCheck/hedgehog), or organise them as documentation tests (doctest).
In addition to GHC’s code coverage tools, weeder can help identify examples that aren’t being tested.

How to avoid code redundancy in large amounts of Node.JS BDD tests

For the last few months, I was working on the backend (REST API) of a quite big project that we started from scratch. We were following BDD (behavior-driven-development) standards, so now we have a large amount of tests (~1000). The tests were written using chai - a BDD framework for Node.JS, but I think that this question can be expanded to general good practices when writing tests.
At first, we tried to avoid code redundancy as much as possible and it went quite well. As the number of lines of code and people working on the project grew it was becoming more and more chaotical, but readable. Sometimes minor changes in the code that could be applied in 15 minutes caused the need to change e.g. mock data and methods in 30+ files etc which meant 6 hours of changes and running tests (extreme example).
TL:DR
We want to refactor now these BDD tests. As an example we have such a function:
function RegisterUserAndGetJWTToken(user_data, next: any){
chai.request(server).post(REGISTER_URL).send(user_data).end((err: any, res: any) => {
token = res.body.token;
next(token);
})
}
This function is used in most of our test files. Does it make sense to create something like a test-suite that would contain this kind of functions or are there better ways to avoid redundancy when writing tests? Then we could use imports like these:
import {RegisterUserAndGetJWTToken} from "./test-suite";
import {user_data} from "./test-mock-data";
Do you have any good practices that you can share?
Are there any npm packages that could be useful (or packages for
other programming languages)?
Do you think that this approach has also downsides (like chaos when
there would be multiple imports)?
Maybe there is a way to inject or inherit the test-suite for
each file, to avoid imports and have it by default in each file?
EDIT: Forgot to mention - I mean integration tests.
Thanks in advance!
Refactoring current test suite
Your principle should be raising the level of abstraction in the tests themselves. This means that a test should consist of high-level method calls, expressed in domain language. For example:
registerUser('John', 'john#smith.com')
lastEmail = getLastEmailSent()
lastEmail.receipient.should.be 'john#smith.com'
lastEmail.contents.should.contain 'Dear John'
Now in the implementation of those methods, there could be a lot of things happening. In particular, the registerUser function could do a post request (like in your example). The getLastEmailSent could read from a message queue or a fake SMTP server. The thing is you hide the details behind an API.
If you follow this principle, you end up creating an Automation Layer - a domain-oriented, programmatic API to your system. When creating this layer, you follow all the good design principles, like DRY.
The benefit is that when a change in the code happens, there will be only one place to change in the test code - in the Automation Layer, and not in the test themselves.
I see that what you propose (extracting the RegisterUserAndGetJWTToken and test data) is a good step towards creating an automation layer. I wouldn't worry about the require calls. I don't see any reason for not being explicit about what our test depends on. Maybe at a later stage some of those could be gathered in larger modules (registration, emailing etc.).
Good practices towards a maintainable test suite
Automate at the right level.
Sometimes it's better to go through the UI or REST, but often a direct call to a function will be more sensible. For example, if you write a test for calculating taxes on an invoice, going through the whole application for each of the test-cases would be an overkill. It's much better to leave one end-to-end test see if all the pieces act together, and automate all the specific cases at the lowest possible level. That way we get both good coverage, as well as speed and robustness of the test-suite.
The guiding principle when writing a test is readability.
You can refer to this discussion for a good explanation.
Treat your test helper code / Automation Layer with the same care as you treat your production code.
This means you should refactor it with great care and attention, following all the good design principles.

Can I use the Rust lexer or parser to retrieve a list of functions within a Rust file?

The lexer/parser file located here is quite large and I'm not sure if it is suitable for just retrieving a list of Rust functions. Perhaps writing my own/using another library would be a better route to take?
The end objective would be to create a kind of execution manager. To contextualise, it would be able to read a list of function calls wrapped in a function. The function calls that are within the function will then be able to be re/ordered from some web interface. Thought it might be nice to manage larger applications this way.
No. I mean, not really. Whether you write your own parser or re-use syntex, you're going to hit a fundamental limitation: macros.
So let's say you go all-out and expand macro_rules!-based macros, including the ones defined in external crates (which means you'll also need to extract rustc's crate metadata loading... which isn't stable). What about procedural macros and custom derive attributes? Those are defined in code and depend on compiler-internal interfaces to function.
The only way this is likely to ever work correctly is if you build on top of the compiler, or duplicate a huge amount of work (which also involves unstable binary interfaces).
You could use syntex to parse the Rust code in a build script.

What is the value add of BDD?

I am now working on a project where we are using cucumber-jvm to drive acceptance tests.
On previous projects I would create internal DSLs in groovy or scala to drive acceptance tests. These DSLs would be fairly simple to use such that even a non-techie would be able to write tests with a little bit of guidance.
What I see is that BDD adds another layer of indirection and semantic sugar to the tests, but I fail to see the value-add, especially if the non-techies can use an internal DSL.
In the case of cucumber, stepDefs seem to scatter the code that drives any given test over several different classes, making the test code difficult to read and debug outside the feature file. On the other hand putting all the code pertaining to one test in a single stepDef class discourages re-use of stepsDefs. Both outcomes are undesirable, leaving me asking what is the use of natural language worth all this extra, and unintuitive indirection?
Is there something I am missing? Like a subtle philosophical difference between ATDD and BDD? Does the former imply imperative testing whereas the latter implies declarative testing? Do these aesthetic differences have intrinsic value?
So I am left asking what is the value add to justify the deterioration in the readability of the actual code that drives the test. Is this BDD stuff actually worth the pain? Is the value add more than just aesthetic?
I would be grateful if someone out there could come up with a compelling argument as to why the gain of BDD surpasses the pain of BDD?
What I see is that BDD adds another layer of indirection and semantic sugar to the tests, but I fail to see the value-add, especially if the non-techies can use an internal DSL.
The extra layer is the plain language .feature file and at the point of creation it has nothing to do with testing, it has to do with creating the requirements of the system using a technique called specification by example to create well defined stories. When written properly in the business language, specification by example are very powerful at creating a shared understanding. This exercise alone can both reduce the amount of rework and can find defects before development starts. This exercise is otherwise known as deliberate discovery.
Once you have a shared understanding and agreement on the specifications, you enter development and make those specifications executable. Here is where you would use ATDD. So BDD and ATDD are not comparable, they are complimentary. As part of ATDD, you drive the development of the system using the behaviour that has been defined by way of example in the story. the nice thing you have as a developer is a formal format that contains preconditions, events, and postconditions that you can automate.
Here on, the automated running of the executable specifications on a CI system will reduce regression and provide you with all the benefits you get from any other automated testing technique.
These really interesting thing is that the executable specification files are long-lived and evolve over time and as you add/change behaviour to your system. Unlike most Agile methodologies where user stories are throw-away after they have been developed, here you have a living documentation of your system, that is also the specifications, that is also the automated test.
Let's now run through a healthy BDD-enabled delivery process (this is not the only way, but it is the way we like to work):
Deliberate Discovery session.
Output = agreed specifications delta
ATDD to drive development
Output = actualizing code, automated tests
Continuous Integration
Output = report with screenshots is browsable documentation of the system
Automated Deployment
Output = working software being consumed
Measure & Learn
Output = New ideas and feedback to feed the next deliberate discover session
So BDD can really help you in the missing piece of most delivery systems, the specifications part. This is typically undisciplined and freeform, and is left up to a few individuals to hold together. This is how BDD is an Agile methodology and not just a testing technique.
With that in mind, let me address some of your other questions.
In the case of cucumber, stepDefs seem to scatter the code that drives any given test over several different classes, making the test code difficult to read and debug outside the feature file. On the other hand putting all the code pertaining to one test in a single stepDef class discourages re-use of stepsDefs. Both outcomes are undesirable, leaving me asking what is the use of natural language worth all this extra, and unintuitive indirection?
If you make the stepDefs a super thin layer on top of your automation testing codebase, then it's easy to reuse the automation code from multiple steps. In the test codebase, you should utilize techniques and principles such as the testing pyramid and the shallow depth of test to ensure you have a robust and fast test automation layer. What's also interesting about this separation is that it allows you to ruse the code between your stepDefs and your unit/integration tests.
Is there something I am missing? Like a subtle philosophical difference between ATDD and BDD? Does the former imply imperative testing whereas the latter implies declarative testing? Do these aesthetic differences have intrinsic value?
As mentioned above, ATDD and BDD are complimentary and not comparable. On the point of imperative/declarative, specification by example as a technique is very specific. When you are performing the deliberate discovery phase, you always as the question "can you give me an example". In that example, you would use exact values. If there are two values that can be used in the preconditions (Given) or event (When) steps, and they have different outcomes (Then step), it means you have two different scenarios. If the have the same outcome, it's likely the same scenario. Therefore as part of the BDD practice, the steps need to be declarative as to gain the benefits of deliberate discovery.
So I am left asking what is the value add to justify the deterioration in the readability of the actual code that drives the test. Is this BDD stuff actually worth the pain? Is the value add more than just aesthetic?
It's worth it if you are working in a team where you want to solve the problem of miscommunication. One of the reasons people fail with BDD is because the writing and automation of features is lefts to the developers and the QA's, and the artifacts are no longer coherent as living specifications, they are just test scripts.
Test scripts tell you how a system does a particular thing but it does not tell you why.
I would be grateful if someone out there could come up with a compelling argument as to why the gain of BDD surpasses the pain of BDD?
It's about using the right tool for the right job. Using Cucumber for writing unit tests or automated test scripts is like using a hammer to put a screw into wood. It might work, but it's never pretty and it's always painful!
On the subject of tools, your typical business analyst / product owner is not going to have the knowledge needed to peek into your source control and work with you on adding / modifying specs. We created a commercial tool to fix this problem by allowing your whole team to collaborate over specifications in the cloud and stays in sync (realtime) with your repository. Check out Simian.
I have also answered a question about BDD here that may be of interest to you that focuses more on development:
Should TDD and BDD be used in conjunction?
Cucumber and Selenium are two popular technologies. Most of the organizations use Selenium for functional testing. These organizations which are using Selenium want to integrate Cucumber with selenium as Cucumber makes it easy to read and to understand the application flow.    Cucumber tool is based on the Behavior Driven Development framework that acts as the bridge between the following people: 
Software Engineer and Business Analyst. 
Manual Tester and Automation Tester. 
Manual Tester and Developers. 
Cucumber also benefits the client to understand the application code as it uses ​Gherkin language which is in Plain Text. Anyone in the organization can understand the behavior of the software.  The syntax's of Gherkin is in the simple text which is ​readable and understandable​.

Can you disallow a common lisp script called from common lisp to call specific functions?

Common Lisp allows to execute/compile code at runtime. But I thought for some (scripting-like) purposes it would be good if one could disallow a user-script to call some functions (especially for application extensions). One could still ask the user if he will allow an extension to access files/... I'm thinking of something like the Android permission system for Common Lisp. Is this possible without rewriting the evaluation code?
The problem I see is, that in Common Lisp you would probably want a script to be able to use reader macros and normal macros and for the latter operators like intern, but those would allow you to get arbitrary symbols (by string manipulation & interning), so simply scanning the code before evaluation won't suffice to ensure that specific functions aren't called.
So, is there something like a lock for functions? I thought of using fmakunbound / makunbound (and keeping the values in a local variable), but would that be possible in a multi-threaded environment?
Thanks in advance.
This is not part of the Common Lisp specification and there is no Common Lisp implementation that is extended to make this kind of restriction easy.
It seems to me like it would be easier to use operating system restrictions (e.g. rlimit, capabilities, etc) to enforce what you want on the Common Lisp process.
This is not an unusual desire, i.e. to run untrusted 3rd party code in a sandbox.
You can hand craft a sandbox by creating a custom parser and interpreter for your scripting language. It is pedantic, but true, than any program with an API is providing such a service. API designers and implementors needs to worry about the vile users.
You can still call eval or the compiler to run your sandbox scripts. It just means you need to assure that your reader, parser and language decline to provide access to any risky functionality.
You can use a lisp package to create a good sandbox. You can still use s-expressions for your scripting language's syntax, but you must cripple the standard reader so the user can't escape package-sandbox. You can still use the evaluator and the compiler, but you need to be sure the package you have boxed the user into contains no functionality that he can use to do inappropriate things.
Successful sandbox design and construction is easier when you start with an empty sandbox and slowly add functionality. Common Lisp is a big language and that creates a huge surface for attacker to poke at. So if you create a sandbox out of a package it's best to start with an empty package and add functions one at a time. Thinking thru what risks they create. The same approach is good when creating your crippled reader. Don't start with the full reader and throw things away, start with a useless reader and add things. Sadly taking that advice creates a pretty significant cost to getting started. But, if you look around I suspect you can find an existing safe reader.
Xach's suggestion is another way to go and in many case more straight forward.

Resources