What's new in QuickCheck 2? - haskell

What are the major differences between QuickCheck 1 and QuickCheck 2? From looking at Haddock docs I can see that it is split across more modules, coarbitrary has been replaced by the new Fun type and FunArbitrary class (which seems easier to understand to me), and testing monadic code is now supported. What else should I be aware of?

I've seen one major advancement in QuickCheck 2, I think as important as monadic code testing, if not more :
class Arbitrary a where
arbitrary :: Gen a
shrink :: a -> [a]
This, is really awesome. The shrink method is optional, but if you can provide a list of "possibly empty" reduction of your type, then when QuickCheck find a faulty check, it will try to reduce your faulty data to the minimum by trying to shrink it and then re-test it. It shrink it as long as it fails.
A little sample to convince you, Without shrinking :
FormulaPrim deparsing : *** Failed! Falsifiable (after 4 tests):
Poly (Polynome "p" [(CoeffRatio (26 % 25),PolyRest (CoeffRatio (129 % 40))),(CoeffInt 96,PolyRest (CoeffInt 11)),(CoeffInt 29,PolyRest (CoeffRatio (147 % 121))),(CoeffRatio (62 % 9),PolyRest (CoeffRatio (90 % 43))),(CoeffInt 56,PolyRest (CoeffInt 27))])
With :
FormulaPrim deparsing : *** Failed! Falsifiable (after 2 tests and 3 shrinks):
Poly (Polynome "t" [(CoeffInt 14,PolyRest (CoeffInt 126))])
Shorter fail example mean quicker debug :-)

Related

What does the maxSize field describe in Args from Test.QuickCheck?

I'm looking at maxSize in Args.
its description says: "Size to use for the biggest test cases". But how is the size of test cases determined? I'd rather to ask than to go through the source code:
myArgs :: Args
myArgs = Args{replay=Nothing
,maxSuccess=1000
,maxDiscardRatio=1
,maxSize=1
,chatty=False
,maxShrinks=0}
So for example if I have an arbitrary of type Gen String
and another of type Gen [String], then if maxSize=1 this means that the length of the generated string is 1 and the length of the generated list of String is 1?
If I recall correctly, each Arbitrary instance is free to respect or use the size parameter as it makes sense. For instance, size doesn't make much sense for an Arbitrary for Bool.
Try experimenting with it yourself. For lists, it definitely has an effect like you assume:
Prelude Test.QuickCheck> propList = const True :: [Int] -> Bool
Prelude Test.QuickCheck> verboseCheckWith (Test.QuickCheck.stdArgs { maxSize = 2 }) propList
Passed:
[]
Passed:
[1]
Passed:
[]
Passed:
[0]
Passed:
[]
Passed:
[1]
(I've edited the output to get the point across, because the actual output is, as implied by the function name, verbose.)
According to the outdated manual it means length for lists (note that String is also a list). The documentation does not seem to specify it anywhere for the particular instances.
The size factor in the sized function mentioned in the manual is calculated with the following comment:
-- e.g. with maxSuccess = 250, maxSize = 100, goes like this:
-- 0, 1, 2, ..., 99, 0, 1, 2, ..., 99, 0, 2, 4, ..., 98.
Also, looking at the code it limits integral values by bounding them from -size to size.
The meaning of the size parameter is somewhat arbitrary. The manual says:
Different test data generators interpret the size parameter in different ways: some ignore it, while the list generator, for example, interprets it as an upper bound on the length of generated lists. You are free to use it as you wish to control your own test data generators.
It's basically an integer that the framework passes to the arbitrary action, which begins at 0 and grows gradually. This allows test cases to start simpler and get gradually more complex. As examples, this is length for lists and strings, or absolute magnitude for integers. Several combinators are provided for manipulsting the size parameter.
The maxSize configuration option is provided to allow you to trade-off numbers of tests with individual test complexity. If you specify a large number of tests, do you want the complexity of the data to keep increasing (possibly causing large slowdowns) or do you just want to test lots of cases once a certain complexity threshold is reached?

Using Z3 with parallelization from SBV

I'd like to use Z3 via SBV using multiple cores. Based on this answer I should be able to do that just by passing parallel.enable=true to the z3 executable on the command line. Since I am using SBV, I need to go through SBV's interface to various SMTLib solvers, so here's what I tried:
foo = runSMTWith z3par $ do
...
where
z3par = z3
{ SBV.solver = (SBV.solver z3)
{ SBV.options = \cfg -> SBV.options (SBV.solver z3) cfg ++ ["parallel.enable=true"]
}
}
However, I am not seeing any signs of Z3 running with parallelization enabled:
CPU usage doesn't go above one core
No speedup compared to running without this flag
How do I enable Z3 parallelization, when going via SBV?
What you're doing is essentially how it is done from SBV. You might want to increase verbosity of z3 and output the diagnostics to a file to inspect later. Something like:
import Data.SBV
import Data.SBV.Control
foo :: IO (Word64, Word64)
foo = runSMTWith z3{solver = par} $ do
x <- sWord64 "x"
y <- sWord64 "y"
setOption $ DiagnosticOutputChannel "diagnostic_output"
constrain $ x * y .== 13
constrain $ x .> 1
constrain $ y .> 1
query $ do ensureSat
(,) <$> getValue x <*> getValue y
where par = (solver z3) {options = \cfg -> options (solver z3) cfg ++ extras}
extras = [ "parallel.enable=true"
, "-v:3"
]
Here, we're not only setting z3's parallel-mode, but we're also telling it to increase verbosity and put all the diagnostics in a file. (Side note: There are many other settings in the parallel section of z3 config, you can see what they are by issuing z3 -pd in your command line and looking at the output. You can set any other parameters from there by adding it to the extras variable above.)
When I run the above, I get:
*Main> foo
(6379316565415788539,3774100875216427415)
But I also get a file named diagnostic_output created in the current directory, which contains the following lines, amongst others:
(tactic.parallel :progress 0% :open 1)
(tactic.parallel :split-cube 0)
(parallel.tactic simplify-1)
(tactic.parallel :progress 100.00% :status sat :open 0)
So z3 is indeed in the parallel mode and things are happening. Of course, what exactly it does is more or less a black-box, and it's impossible to interpret the above output without inspecting z3 internals. (I don't think the meaning of these stats nor the strategies for the parallel solver are that well documented. If you find a good documentation on the details, please do report!)
Update
As of this commit, you can now simply say:
runSMTWith z3{extraArgs = ["parallel.enable=true"]} $ do ...
simplifying the programming a bit further.
Solver agnostic concurrency directly from SBV
Note that SBV also has combinators for running things concurrently directly from Haskell. See the functions:
satConcurrentWithAny
satConcurrentWithAll
proveConcurrentWithAny
proveConcurrentWithAll
These functions are solver agnostic, you can use them with any solver of your choosing. Of course, they require you to restructure your problem and do a manual decomposition to take advantage of the multiple-cores in your computer and stitch the solutions together yourself. But they also give you full control over how you want to structure your expensive search.

Haskell hanging on number conversion

I have the following piece of code that seems to consistently hang when running with after compiling with GHC (although no build failures with -Werror).
import Data.Aeson
import Data.Scientific
import qualified Data.HashMap.Strict as S
myObj = Object $
S.fromList [("bla", Number $ pc * 100.0)]
where pc = 10 / 9
And when trying to access myObj the program will hang. After some digging it seems like haskell has a tough time with the number conversion (although no warnings or errors with the above snippet). If I change the 9 above to a 10, it doesn't hang. But I'm curious, why does the above hang?
The conversion of 10 % 9 (a Rational) to Scientific is what does not terminate.
10 / 9 :: Scientific
From the documentation of Data.Scientific:
WARNING: Although Scientific is an instance of Fractional, the methods
are only partially defined! Specifically recip and / will diverge
(i.e. loop and consume all space) when their outputs have an infinite
decimal expansion. fromRational will diverge when the input Rational
has an infinite decimal expansion. Consider using fromRationalRepetend
for these rationals which will detect the repetition and indicate
where it starts.
Therefore, try this instead:
let Right (x, _) = fromRationalRepetend Nothing (10 / 9) in x
You will have to decide what measures are appropriate. I decided here to ignore the possibility of Left.

Running into memory issues with Data.Sequence on a manageably sized dataset

TL;DR: I'm working on a piece of code which generates a (long) array of numbers. I'm able to generate this array, convert it to a List and then calculate the maximum (using a strict left fold). BUT, I run into memory issues when I try to convert the list to a Sequence prior to calculating the maximum. This is quite counter-intuitive to me.
My question: Why is this happening and what is the correct approach for converting the data to a Sequence structure?
Background:
I'm working on a problem which I've chosen to tackle in using three steps (below).
*Note: I'm intentionally keeping the problem statement vague so this post doesn't serve as a hint.
Anyways, my proposed approach:
(i) First, generate a long list of integers, namely, the number of factors for each integer from 1 to 100 million (NOT the factors themselves, just the number of factors)
(ii) second, convert this list into a Sequence.
(iii) lastly, use an efficient sliding window maximum algorithm to calc my answer (this step requires dequeue operations, hence the need for a Sequence)
(Again, the specifics of the problem aren't that relevant as I'm just curious as to why I'm running into this particular issue in the first place.)
What I've done so far?
Step 1 was fairly straightforward - see output below (full code is included at the bottom). I just bruteforce a sieve using an Unboxed Array and the accumArray function, nothing fancy. Note: I've used this same algorithm to solve a number of other such problems so I'm reasonably confident that it's giving the right answer.
For the purposes of showing execution time / memory-usage stats, I've (admittedly arbitrarily) chosen to calculate the maximum element in the resulting array - the idea is simply to use a function which forces construction of all elements of the Array, thereby ensuring that we see meaningful stats for exec time / memory-usage.
The following main function...
main = print $ maximum' $ elems (sieve (10^8))
...results in the following (i.e., it says that the number below 100 million with the most divisors has a total of 768 divisors):
Linking maxdivSO ...
768
33.73s user 70.80s system 99% cpu 1:44.62 total
344,214,504,640 bytes allocated in the heap
58,471,497,320 bytes copied during GC
200,062,352 bytes maximum residency (298 sample(s))
3,413,824 bytes maximum slop
386 MB total memory in use (0 MB lost due to fragmentation)
%GC time 24.7% (30.5% elapsed)
The problem
It seems like we can accomplish the first step without breaking a sweat since I've allocated a total of 5gb to my VirtualBox and the above code uses <400mb (as reference, I've seen programs execute successfully and report using 3gb+ of total memory). In other words, it seems like we've accomplished Step 1 with plenty of headroom.
So I'm a bit surprised as to why the following version of the main function fails. We attempt to perform the same calculation of the maximum but after converting the list of integers to a Sequence. The following code...
main = print $ maximum' $ fromList $ elems (sieve (10^8))
...results in the following:
Linking maxdivSO ...
maxdivSO: out of memory (requested 2097152 bytes)
39.48s user 76.35s system 99% cpu 1:56.03 total
My question: Why does the algorithm (as currently written) run out of memory if we try to convert the list to a Sequence? And how might I go about successfully converting this list into a Sequence?"
(I'm not one to stubbornly stick to brute-force for these types of problems - but I have a strong suspicion that this particular issue is due to my not being able to reason well about evaluation.)
The code itself:
{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.Word (Word32, Word16)
import Data.Foldable (Foldable, foldl')
import Data.Array.Unboxed (UArray, accumArray, elems)
import Data.Sequence (fromList)
main :: IO ()
main = print $ maximum' $ elems (sieve (10^8)) -- <-- works
--main = print $ maximum' $ fromList $ elems (sieve (10^8)) -- <-- doesn't work
maximum' :: (Foldable t, Ord a, Num a) => t a -> a
maximum' = foldl' (\x acc -> if x > acc then x else acc) 0
sieve :: Int -> UArray Word32 Word16
sieve u' = accumArray (+) 2 (1,u) ( (1,-1) : factors )
where
u = fromIntegral u'
cap = floor $ sqrt (fromIntegral u) :: Word32
factors = [ (i*d,j) | d <- [2..cap]
, i <- [2..(u `quot` d)]
, d <= i, let j = if i == d then 1 else 2
]
I think the reason for this is that to get the first element of of a sequence requires the full sequence to be constructed in memory (since the internal representation of the sequence is a tree). In the list case elems yields the elements lazily.
Rather than turning the full array into a sequence, why not make the sequence only as long as your sliding window?

Haskell Type that Supports Addition and Subtraction Only

Say I have a type defined like such.
data Seconds | Seconds Integer
I could define a function for counting down like this.
decrementTimer :: Seconds -> Seconds -> Seconds
decrementTimer (Seconds internalSecondsOne) (Seconds internalSecondsTwo) = Seconds $ internalSecondsOne - internalSecondsTwo
But that seems tedious and messy, and I'd have to do it for every representation of time; hours, minutes, time periodd data that holds seconds, minutes, and hours.
What I really want to do is "implement"(?) the Num type class, so I can do something like this.
decrementTimer :: Seconds -> Seconds -> Seconds
decrementTimer a b = a - b
But then wouldn't I need to support multiplication and division? It doesn't really make sense to divide Seconds by Seconds. How would I go about making a type support addition and subtraction? Or if it's impossible or my reasoning is completely wrong, what would be the idiomatic way to do this in Haskell?
You're out of luck with the standard prelude, the Num typeclass requires you to implement functions that just don't make sense for this datatype. There are basically three options
Name the functions something other than + and -. This is probably the preferred option.
Implement the Num typeclass, but have the functions that don't make sense throw an error. This has the downside that it turns what should be a compile-time error into a run-time error.
Use a different Prelude, such as the Numeric Prelude that splits the functions from Num out into other type classes. This option is the most mathematically correct, but also kind of inconvenient since it doesn't use the standard Prelude.
First – why not use an existing physical-quantity–library? For instance, dimensional-tf. It's kind of strange to limit yourself to seconds, when these are really just one of many possible time units, though the fact that you use Integer rather than the more obvious Double indicates you're indeed interested in a fixed time-raster, quantised to seconds.
The precise type class for something that can be added and subtracted, but not multiplied, exists: AdditiveGroup in the vector-spaces package.
instance AdditiveGroup Seconds where
zeroV = Seconds 0
Seconds a ^+^ Seconds b = Seconds $ a+b
negateV (Seconds a) = Seconds $ negate a
In fact, you might also define a vector space instance:
instance VectorSpace Seconds where
type Scalar Seconds = Integer
μ *^ (Seconds a) = Seconds $ μ * a
Though this doesn't really seem all that useful with the integer quantisation, you'd normally have type Scalar Seconds = Double instead.
You could make functions named + and - that work on seconds, but there's no way for it to be the same + and - from the Num type class without making Seconds an instance of Num (which therefore will lead any code that gets a Seconds value as a generic Num a to expect that it can use the other Num functions as well).
All you have to do is explicitly import the Prelude, either hiding + and - or importing it qualified.
The trouble then is that any code using your + and - also has to do something to resolve the ambiguity with the Prelude + and -; either you only have one version of + in scope, or at least one of them must always be referred to with a qualified name (some variant of Prelude.+, P.+, S.+, Seconds.+, etc). For an obscure name, this is sometimes acceptable. It's probably not a good idea with something as common and fundamental as +.
You could make that option nicer by making + and - functions in a new type class (say PlusMinus), and write instance Num a => PlusMinus a where (+) = (Prelude.+) etc. You then also make Seconds an instance of PlusMinus.1
What this buys you is that any code that wants to use your new + operator can at least safely hide the Prelude's + while still being able to use + on other Num types. It does still impose some bother on every module wanting to use your + though, and it has the potential to be confusing (someone one day may see + being used on Seconds without being deeply familiar with all this, and assume that they can use other numeric operations on Seconds).
Probably better would be to make functions that aren't called + and -. You can use new multi-character operators containing + and - if you want (though it can be tricky to find ones that aren't used by other libraries).
Here's an approach I once took that was sort-of massive overkill, but also sort-of satisfying.
The problem was that I had vectors representing absolute positions, and also vectors representing offsets. I decided it made sense to add and subtract offsets, but not positions. However it did make sense to add an offset to a position to get a position, or to subtract two positions to get an offset, and even to multiply an offset by a scalar to get an offset.
So what I ended up doing was to define a type class something like this:
{-# LANGUAGE MultiParamTypeClasses, TypeFamilies #-}
class Addable a b where
type Result a
(|+|) :: a -> b -> Result a b
instance Addable Offset Offset where
type Result Offset Offset = Offset
o |+| o = ...
instance Addable Position Offset where
type Result Position Offset = Position
p |+| o = ...
instance Addable Offset Position where
type Result Offset Position = Position
o |+| p = p |+| o
etc
So you end up using |+| rather than +, but it still ends up looking a bit like the algebra you're used to thinking in (once you get used to the convention that |+| is the "generalised" version of +, etc), and it lets you encode a lot of rules about what operations make sense in the type system, so the compiler can check them for you. The downside is a lot of boilerplate defining all the instances, but for a small fixed number of types that's something you only have to do once.
1You'll need extensions to make this work; it's a little unsafe in principle because there could be an instance of Num for Seconds out there somewhere, which would make Seconds match PlusMinus two different ways.
If you want to restrict the operations on a data type, the standard trick is to not export the constructor for your type. That way functions dont have access to the internals of the data, and can only use the operations you provide. So you'd want something like:
module Seconds (Seconds) where
newtype Seconds = Seconds Integer
mkSeconds :: Integer -> Seconds
addSeconds :: Seconds -> Seconds -> Seconds
subSeconds :: Seconds -> Seconds -> Seconds
note that the module exports Seconds, not Seconds(..), so the type Seconds is available, but the constructor is not. Now it's impossible to write a function
dangerousMult :: Seconds -> Seconds -> Seconds
dangerousMult (Seconds i) (Seconds j) = Seconds (i * j)

Resources