QuickCheck Generator not terminating - haskell

This QuickCheck generator
notEqual :: Gen (Int, Int)
notEqual = do
(a, b) <- arbitrary
if a /= b
then return (a, b)
else notEqual
Doesn't seem to terminate when passed to forAll. I'm guessing the recursion is going into an infinite loop somehow, but why?

If you are in doubt of termination/finding results you can always try to circumvent this:
notEqual' :: Gen (Int, Int)
notEqual' = do
start <- arbitrary
delta <- oneof [pos, neg]
pure (start, start + delta)
where
pos = getPositive <$> arbitrary
neg = getNegative <$> arbitrary
Of course internally both Postitive and Negative use suchThat so as Ashesh mentioned
notEqual :: Gen (Int, Int)
notEqual = genPair `suchThat` uncurry (/=)
where genPair = arbitrary
might be easier

The suchThat combinator that #Ashesh and #Carsten pointed out is definitely what I am looking for, to succinctly and idiomatically generate a non-equal pair.
An explanation for the infinite recursion (Thanks to #oisdk):
All QuckCheck runners (quickCheck, forAll etc.) pass a size parameter to test genarators. This has no defined semantics, but early tests use a small parameter, starting at 0*, and gradually growing. Generators use this to generate samples of different 'sizes,' whatever that may mean for a specific datatype.
arbitrary for integral types (called recursively by arbitrary for (Int, Int)), uses this for the magnitude of the generated value - generate an integral between 0 and size.
This means, unfortunately, that the first test attempted by quickCheck, (or in my case forAll,) uses the size 0, which can only generate (0, 0). This always fails the test of /=, causing the action to recurse infinitely, looking for more.
* I'm assuming this, as the behaviour of the size parameter doesn't seem to be documented anywhere.

Related

Creating a conditioned Arbitrary instance ( * Ambiguous type variable `a' arising from a use of `quickCheck')

I have this test I want to make:
prop_inverse_stringsToInts st = isDigitList st ==> st == map show (stringsToInts st)
Which is testing a function that converts a list of Strings to a list of Integers, but of course the strings need to be digits so I created a pre-condition that checks that using the isDigitList function I made, but the condition is too specific and quickCheck gives up : "*** Gave up! Passed only 43 tests; 1000 discarded tests."
So I wanted to create an Arbitrary instance for my case, but the thing is I am inexperienced with working with Arbitrary, so I don't really know how to do this and every time I shuffle code I get a new error. All I want is an Arbitrary that only returns the Foo [String] if it passes the isDigitList (which receives a [String] and returns a Bool). So far I have something like this :
Foo a = Foo [String] deriving (Show,Eq)
instance (Arbitrary a) => Arbitrary (Foo a ) where
arbitrary = do
st <- (arbitrary :: Gen [String])
if isDigitList st
then do return (Foo st)
else do return (Foo []) -- This is probably a bad idea
I altered my property to :
prop_inverse_stringsToInts :: Foo a -> Bool
prop_inverse_stringsToInts (Foo st) = st == map show (stringsToInts st)
But now I am getting the error "* Ambiguous type variable a0' arising from a use of `quickCheck'" even though I am running quickCheck like this : > quickCheck (prop_inverse_stringsToInts :: Foo a -> Bool)
Can someone help please? Thank you in advance!
It seems you know the basics, but I'll repeat them here just to be sure. There are two ways to get QuickCheck to generate the inputs you want:
Have it generate some inputs and then filter out ones you don't want, or
Have it generate only the inputs you want.
You started with option 1, but as you saw, that didn't work out great. Compared to all possible lists of String, there really aren't that many that are digit lists. The better option is to generate only the inputs you want.
To succeed at option 2, you need to make a generator, which would be a value of type Gen [String] that generates lists of Strings that fit your criteria. The generator you propose still uses the method of filtering, so you may want to try a different approach. Consider instead, something like:
genDigitStrings :: Gen [String]
genDigitStrings = do
intList <- arbitrary :: Gen [Integer]
return $ fmap show intList
This generator produces arbitrary lists of Strings that are always shown integers, meaning that they will always be digit lists. You can then go ahead and insert this into an Arbitrary instance for some newtype if you want.
For your own sanity, you can even check your work with a test like this:
propReallyActuallyDigitStrings = forAll genDigitStrings isDigitList
If that passes, you have some confidence that your generator really only produces digit lists, and if it fails, then you should adjust your generator.

Do notation for monad in function returning a different type

Is there a way to write do notation for a monad in a function which the return type isn't of said monad?
I have a main function doing most of the logic of the code, supplemented by another function which does some calculations for it in the middle. The supplementary function might fail, which is why it is returning a Maybe value. I'm looking to use the do notation for the returned values in the main function. Giving a generic example:
-- does some computation to two Ints which might fail
compute :: Int -> Int -> Maybe Int
-- actual logic
main :: Int -> Int -> Int
main x y = do
first <- compute x y
second <- compute (x+2) (y+2)
third <- compute (x+4) (y+4)
-- does some Int calculation to first, second and third
What I intend is for first, second, and third to have the actual Int values, taken out of the Maybe context, but doing the way above makes Haskell complain about not being able to match types of Maybe Int with Int.
Is there a way to do this? Or am I heading towards the wrong direction?
Pardon me if some terminology is wrongly used, I'm new to Haskell and still trying to wrap my head around everything.
EDIT
main has to return an Int, without being wrapped in Maybe, as there is another part of the code using the result of mainas Int. The results of a single compute might fail, but they should collectively pass (i.e. at least one would pass) in main, and what I'm looking for is a way to use do notation to take them out of Maybe, do some simple Int calculations to them (e.g. possibly treating any Nothing returned as 0), and return the final value as just Int.
Well the signature is in essence wrong. The result should be a Maybe Int:
main :: Int -> Int -> Maybe Int
main x y = do
first <- compute x y
second <- compute (x+2) (y+2)
third <- compute (x+4) (y+4)
return (first + second + third)
For example here we return (first + second + third), and the return will wrap these in a Just data constructor.
This is because your do block, implicitly uses the >>= of the Monad Maybe, which is defined as:
instance Monad Maybe where
Nothing >>=_ = Nothing
(Just x) >>= f = f x
return = Just
So that means that it will indeed "unpack" values out of a Just data constructor, but in case a Nothing comes out of it, then this means that the result of the entire do block will be Nothing.
This is more or less the convenience the Monad Maybe offers: you can make computations as a chain of succesful actions, and in case one of these fails, the result will be Nothing, otherwise it will be Just result.
You can thus not at the end return an Int instead of a Maybe Int, since it is definitely possible - from the perspective of the types - that one or more computations can return a Nothing.
You can however "post" process the result of the do block, if you for example add a "default" value that will be used in case one of the computations is Nothing, like:
import Data.Maybe(fromMaybe)
main :: Int -> Int -> Int
main x y = fromMaybe 0 $ do
first <- compute x y
second <- compute (x+2) (y+2)
third <- compute (x+4) (y+4)
return (first + second + third)
Here in case the do-block thus returns a Nothing, we replace it with 0 (you can of course add another value in the fromMaybe :: a -> Maybe a -> a as a value in case the computation "fails").
If you want to return the first element in a list of Maybes that is Just, then you can use asum :: (Foldable t, Alternative f) => t (f a) -> f a, so then you can write your main like:
-- first non-failing computation
import Data.Foldable(asum)
import Data.Maybe(fromMaybe)
main :: Int -> Int -> Int
main x y = fromMaybe 0 $ asum [
compute x y
compute (x+2) (y+2)
compute (x+4) (y+4)
]
Note that the asum can still contain only Nothings, so you still need to do some post-processing.
Willem's answer is basically perfect, but just to really drive the point home, let's think about what would happen if you could write something that allows you to return an int.
So you have the main function with type Int -> Int -> Int, let's assume an implementation of your compute function as follows:
compute :: Int -> Int -> Maybe Int
compute a 0 = Nothing
compute a b = Just (a `div` b)
Now this is basically a safe version of the integer division function div :: Int -> Int -> Int that returns a Nothing if the divisor is 0.
If you could write a main function as you like that returns an Int, you'd be able to write the following:
unsafe :: Int
unsafe = main 10 (-2)
This would make the second <- compute ... fail and return a Nothing but now you have to interpret your Nothing as a number which is not good. It defeats the whole purpose of using Maybe monad which captures failure safely. You can, of course, give a default value to Nothing as Willem described, but that's not always appropriate.
More generally, when you're inside a do block you should just think inside "the box" that is the monad and don't try to escape. In some cases like Maybe you might be able to do unMaybe with something like fromMaybe or maybe functions, but not in general.
I have two interpretations of your question, so to answer both of them:
Sum the Maybe Int values that are Just n to get an Int
To sum Maybe Ints while throwing out Nothing values, you can use sum with Data.Maybe.catMaybes :: [Maybe a] -> [a] to throw out Nothing values from a list:
sum . catMaybes $ [compute x y, compute (x+2) (y+2), compute (x+4) (y+4)]
Get the first Maybe Int value that's Just n as an Int
To get the first non-Nothing value, you can use catMaybes combined with listToMaybe :: [a] -> Maybe a to get Just the first value if there is one or Nothing if there isn't and fromMaybe :: a -> Maybe a -> a to convert Nothing to a default value:
fromMaybe 0 . listToMaybe . catMaybes $ [compute x y, compute (x+2) (y+2), compute (x+4) (y+4)]
If you're guaranteed to have at least one succeed, use head instead:
head . catMaybes $ [compute x y, compute (x+2) (y+2), compute (x+4) (y+4)]

Do newtypes incur no cost even when you cannot pattern-match on them?

Context
Most Haskell tutorials I know (e.g. LYAH) introduce newtypes as a cost-free idiom that allows enforcing more type safety. For instance, this code will type-check:
type Speed = Double
type Length = Double
computeTime :: Speed -> Length -> Double
computeTime v l = l / v
but this won't:
newtype Speed = Speed { getSpeed :: Double }
newtype Length = Length { getLength :: Double }
-- wrong!
computeTime :: Speed -> Length -> Double
computeTime v l = l / v
and this will:
-- right
computeTime :: Speed -> Length -> Double
computeTime (Speed v) (Length l) = l / v
In this particular example, the compiler knows that Speed is just a Double, so the pattern-matching is moot and will not generate any executable code.
Question
Are newtypes still cost-free when they appear as arguments of parametric types? For instance, consider a list of newtypes:
computeTimes :: [Speed] -> Length -> [Double]
computeTimes vs l = map (\v -> getSpeed v / l) vs
I could also pattern-match on speed in the lambda:
computeTimes' :: [Speed] -> Length -> [Double]
computeTimes' vs l = map (\(Speed v) -> v / l) vs
In either case, for some reason, I feel that real work is getting done! I start to feel even more uncomfortable when the newtype is buried within a deep tree of nested parametric datatypes, e.g. Map Speed [Set Speed]; in this situation, it may be difficult or impossible to pattern-match on the newtype, and one would have to resort to accessors like getSpeed.
TL;DR
Will the use of a newtype never ever incur a cost, even when the newtype appears as a (possibly deeply-buried) argument of another parametric type?
On their own, newtypes are cost-free. Applying their constructor, or pattern matching on them has zero cost.
When used as parameter for other types e.g. [T] the representation of [T] is precisely the same as the one for [T'] if T is a newtype for T'. So, there's no loss in performance.
However, there are two main caveats I can see.
newtypes and instances
First, newtype is frequently used to introduce new instances of type classes. Clearly, when these are user-defined, there's no guarantee that they have the same cost as the original instances. E.g., when using
newtype Op a = Op a
instance Ord a => Ord (Op a) where
compare (Op x) (Op y) = compare y x
comparing two Op Int will cost slightly more than comparing Int, since the arguments need to be swapped. (I am neglecting optimizations here, which might make this cost free when they trigger.)
newtypes used as type arguments
The second point is more subtle. Consider the following two implementations of the identity [Int] -> [Int]
id1, id2 :: [Int] -> [Int]
id1 xs = xs
id2 xs = map (\x->x) xs
The first one has constant cost. The second has a linear cost (assuming no optimization triggers). A smart programmer should prefer the first implementation, which is also simpler to write.
Suppose now we introduce newtypes on the argument type, only:
id1, id2 :: [Op Int] -> [Int]
id1 xs = xs -- error!
id2 xs = map (\(Op x)->x) xs
We can no longer use the constant cost implementation because of a type error. The linear cost implementation still works, and is the only option.
Now, this is quite bad. The input representation for [Op Int] is exactly, bit by bit, the same for [Int]. Yet, the type system forbids us to perform the identity in an efficient way!
To overcome this issue, safe coercions where introduced in Haskell.
id3 :: [Op Int] -> [Int]
id3 = coerce
The magic coerce function, under certain hypotheses, removes or inserts newtypes as needed to make type match, even inside other types, as for [Op Int] above. Further, it is a zero-cost function.
Note that coerce works only under certain conditions (the compiler checks for them). One of these is that the newtype constructor must be visible: if a module does not export Op :: a -> Op a you can not coerce Op Int to Int or vice versa. Indeed, if a module exports the type but not the constructor, it would be wrong to make the constructor accessible anyway through coerce. This makes the "smart constructors" idiom still safe: modules can still enforce complex invariants through opaque types.
It doesn't matter how deeply buried a newtype is in a stack of (fully) parametric types. At runtime, the values v :: Speed and w :: Double are completely indistinguishable – the wrapper is erased by the compiler, so even v is really just a pointer to a single 64-bit floating-point number in memory. Whether that pointer is stored in a list or tree or whatever doesn't make a difference either. getSpeed is a no-op and will not appear at runtime in any way at all.
So what do I mean by “fully parametric”? The thing is, newtypes can obviously make a difference at compile time, via the type system. In particular, they can guide instance resolution, so a newtype that invokes a different class method may certainly have worse (or, just as easily, better!) performance than the wrapped type. For example,
class Integral n => Fibonacci n where
fib :: n -> Integer
instance Fibonacci Int where
fib = (fibs !!)
where fibs = [ if i<2 then 1
else fib (i-2) + fib (i-1)
| i<-[0::Int ..] ]
this implementation is pretty slow, because it uses a lazy list (and performs lookups in it over and over again) for memoisation. On the other hand,
import qualified Data.Vector as Arr
-- | A number between 0 and 753
newtype SmallInt = SmallInt { getSmallInt :: Int }
instance Fibonacci SmallInt where
fib = (fibs Arr.!) . getSmallInt
where fibs = Arr.generate 754 $
\i -> if i<2 then 1
else fib (SmallInt $ i-2) + fib (SmallInt $ i-1)
This fib is much faster, because thanks to the input being limited to a small range, it is feasible to strictly allocate all of the results and store them in a fast O (1) lookup array, not needing the spine-laziness.
This of course applies again regardless of what structure you store the numbers in. But the different performance only comes about because different method instantiations are called – at runtime this means simply, completely different functions.
Now, a fully parametric type constructor must be able to store values of any type. In particular, it cannot impose any class restrictions on the contained data, and hence also not call any class methods. Therefore this kind of performance difference can not happen if you're just dealing with generic [a] lists or Map Int a maps. It can, however, occur when you're dealing with GADTs. In this case, even the actual memory layout might be completely differet, for instance with
{-# LANGUAGE GADTs #-}
import qualified Data.Vector as Arr
import qualified Data.Vector.Unboxed as UArr
data Array a where
BoxedArray :: Arr.Vector a -> Array a
UnboxArray :: UArr.Unbox a => UArr.Vector a -> Array a
might allow you to store Double values more efficiently than Speed values, because the former can be stored in a cache-optimised unboxed array. This is only possible because the UnboxArray constructor is not fully parametric.

Procedurally generating large list of values in Haskell -- most idiomatic approach? memory management?

I have a function that takes a series of random numbers/floats, and uses them to generate a value/structure (ie, taking a random velocity and position of the point a ball is thrown from and outputting the coordinates of where it would land). And I need to generate several thousands in succession.
The way I have everything implemented is each calculation takes in an stdGen, uses it to generate several numbers, and passes out a new stdGen to allow it to be chained to another one.
And to do this for 10000 items, I make a sort of list from generate_item n which basically outputs a (value,gen) tuple (the value being the value i'm trying to calculate), where the value of gen is the recursively outputted stdGen from the calculations involved in getting the value from generate_item n-1
However, this program seems to crawl to be impractically slow at around a thousand results or so. And seems to definitely not be scalable. Could it have to do with the fact that I am storing all of the generate_item results in memory?
Or is there a more idomatic way of approaching this problem in Haskell using Monads or something than what I have describe above?
Note that the code to generate the algorithm from the random value generates 10k within seconds even in high-level scripting languages like ruby and python; these calculations are hardly intensive.
Code
-- helper functions that take in StdGen and return (Result,new StdGen)
plum_radius :: StdGen -> (Float,StdGen)
unitpoint :: Float -> StdGen -> ((Float,Float,Float),StdGen)
plum_speed :: Float -> StdGen -> (Float,StdGen)
-- The overall calculation of the value
plum_point :: StdGen -> (((Float,Float,Float),(Float,Float,Float)),StdGen)
plum_point gen = (((px,py,pz),(vx,vy,vz)),gen_out)
where
(r, gen2) = plum_radius gen
((px,py,pz),gen3) = unitpoint r gen2
(s, gen4) = plum_speed r gen3
((vx,vy,vz),gen5) = unitpoint s gen4
gen_out = gen5
-- Turning it into some kind of list
plum_data_list :: StdGen -> Int -> (((Float,Float,Float),(Float,Float,Float)),StdGen)
plum_data_list seed_gen 0 = plum_point seed_gen
plum_data_list seed_gen i = plum_point gen2
where
(_,gen2) = plum_data_list seed_gen (i-1)
-- Getting 100 results
main = do
gen <- getStdGen
let data_list = map (plum_data_list gen) [1..100]
putStrLn List.intercalate " " (map show data_list)
Consider just using the mersenne-twister and the vector-random package , which is specifically optimized to generate large amounts of high-quality random data.
Lists are unsuitable for allocating large amounts of data -- better to use a packed representation -- unless you're streaming.
First of all, the pattern you are describing -- taking an StdGen and then returning a tuple with a value and another StdGen to be chained into the next computation -- is exactly the pattern the State monad encodes. Refactoring your code to use it might be a good way to start to become familiar with monadic patterns.
As for your performance problem, StdGen is notoriously slow. I haven't done a lot with this stuff, but I've heard mersenne twister is faster.
However, you might also want to post your code, since in cases where you are generating large lists, laziness can really work to your advantage or disadvantage depending on how you are doing it. But it is hard to give specific advice without seeing what you are doing. One rule of thumb just in case you are coming from another functional language such as Lisp -- when generating a list (or other lazy data structure -- e.g. a tree, but not a Int), avoid tail recursion. The intuition for it being faster does not transfer to lazy languages. E.g. use (written without the monadic style that I would acutally use in practice)
randoms :: Int -> StdGen -> (StdGen, [Int])
randoms 0 g = (g, [])
randoms n g = let (g', x) = next g
(g'', xs) = randoms (n-1) g'
in (g'', x : xs)
This will allow the result list to be "streamed", so you can access the earlier parts of it before generating the later parts. (In this state case, it's a little subtle because accessing the resulting StdGen will have to generate the whole list, so you'll have to carefully avoid doing that until after you have consumed the list -- I wish there was a fast random generator that supported a good split operation, then you could get around having to return a generator at all).
Oh, just in case you're having trouble getting going with the monads thing, here's the above function written with a state monad:
randomsM :: Int -> State StdGen [Int]
randomsM 0 = return []
randomsM n = do
x <- state next
xs <- randomsM (n-1)
return (x : xs)
See the correspondence?
The other posters have good points, StdGen doesn't perform very well, and you should probably try to use State instead of manually passing the generator along. But I think the biggest problem is your plum_data_list function.
It seems to be intended to be some kind of lookup, but since it's implemented recursively without any memoization, the calls you make have to recurse to the base case. That is, plum_data_list seed_gen 100 needs the random generator from plum_data_list seed_gen 99 and so on, until plum_data_list seed_gen 0. This will give you quadratic performance when you try to generate a list of these values.
Probably the more idiomatic way is to let plum_data_list seed_gen generate an infinite list of points like so:
plum_data_list :: StdGen -> [((Float,Float,Float),(Float,Float,Float))]
plum_data_list seed_gen = first_point : plum_data_list seed_gen'
where
(first_point, seed_gen') = plum_point seed_gen
Then you just need to modify the code in main to something like take 100 $ plum_data_list gen, and you are back to linear performance.

Using functors for global variables?

I'm learning Haskell, and am implementing an algorithm for a class. It works fine, but a requirement of the class is that I keep a count of the total number of times I multiply or add two numbers. This is what I would use a global variable for in other languages, and my understanding is that it's anathema to Haskell.
One option is to just have each function return this data along with its actual result. But that doesn't seem fun.
Here's what I was thinking: suppose I have some function f :: Double -> Double. Could I create a data type (Double, IO) then use a functor to define multiplication across a (Double, IO) to do the multiplication and write something to IO. Then I could pass my new data into my functions just fine.
Does this make any sense? Is there an easier way to do this?
EDIT: To be more clear, in an OO language I would declare a class which inherits from Double and then override the * operation. This would allow me to not have to rewrite the type signature of my functions. I'm wondering if there's some way to do this in Haskell.
Specifically, if I define f :: Double -> Double then I should be able to make a functor :: (Double -> Double) -> (DoubleM -> DoubleM) right? Then I can keep my functions the same as they are now.
Actually, your first idea (return the counts with each value) is not a bad one, and can be expressed more abstractly by the Writer monad (in Control.Monad.Writer from the mtl package or Control.Monad.Trans.Writer from the transformers package). Essentially, the writer monad allows each computation to have an associated "output", which can be anything as long as it's an instance of Monoid - a class which defines:
The empty output (mempty), which is the output assigned to 'return'
An associative function (`mappend') that combines outputs, which is used when sequencing operations
In this case, your output is a count of operations, the 'empty' value is zero, and the combining operation is addition. For example, if you're tracking operations separately:
data Counts = Counts { additions: Int, multiplications: Int }
Make that type an instance of Monoid (which is in the module Data.Monoid), and define your operations as something like:
add :: Num a => a -> a -> Writer Counts a
add x y = do
tell (Counts {additions = 1, multiplications = 0})
return (x + y)
The writer monad, together with your Monoid instance, then takes care of propagating all the 'tells' to the top level. If you wanted, you could even implement a Num instance for Num a => Writer Counts a (or, preferably, for a newtype so you're not creating an orphan instance), so that you can just use the normal numerical operators.
Here is an example of using Writer for this purpose:
import Control.Monad.Writer
import Data.Monoid
import Control.Applicative -- only for the <$> spelling of fmap
type OpCountM = Writer (Sum Int)
add :: (Num a) => a -> a -> OpCountM a
add x y = tell (Sum 1) >> return (x+y)
mul :: (Num a) => a -> a -> OpCountM a
mul x y = tell (Sum 1) >> return (x*y)
-- and a computation
fib :: Int -> OpCountM Int
fib 0 = return 0
fib 1 = return 1
fib n = do
n1 <- add n (-1)
n2 <- add n (-2)
fibn1 <- fib n1
fibn2 <- fib n2
add fibn1 fibn2
main = print (result, opcount)
where
(result, opcount) = runWriter (fib 10)
That definition of fib is pretty long and ugly... monadifying can be a pain. It can be made more concise with applicative notation:
fib 0 = return 0
fib 1 = return 1
fib n = join (fib <$> add n (-1) <*> add n (-2))
But admittedly more opaque for a beginner. I wouldn't recommend that way until you are pretty comfortable with the idioms of Haskell.
What level of Haskell are you learning? There are probably two reasonable answers: have each function return its counts along with its return value like you suggested, or (more advanced) use a monad such as State to keep the counts in the background. You could also write a special-purpose monad to keep the counts; I do not know if that is what your professor intended. Using IO for mutable variables is not the elegant way to solve the problem, and is not necessary for what you need.
Another solution, apart from returning a tuple or using the state monad explicitly, might be to wrap it up in a data type. Something like:
data OperationCountNum = OperationCountNum Int Double deriving (Show,Eq)
instance Num OperationCountNum where
...insert appropriate definitions here
The class Num defines functions on numbers, so you can define the functions +, * etc on your OperationCountNum type in such a way that they keep track of the number of operations required to produce each number.
That way, counting the operations would be hidden and you can use the normal +, * etc operations. You just need to wrap your numbers up in the OperationCountNum type at the start and then extract them at the end.
In the real world, this probably isn't how you'd do it, but it has the advantage of making the code easier to read (no explicit detupling and tupling) and being fairly easy to understand.

Resources