I'm currently writing an auto-grader for a Haskell course. For the section on "tail-recursion", I need a way to automatically and safely detect whether a given Haskell function is tail-recursive or not.
I've searched for existing tools but couldn't find anything. I assume there must be a way to do this automatically, because after all that's what the Haskell compiler does for us. The method doesn't have to be in a specific language or anything since the grader is an external entity in the project. For example, it can be a Haskell library, a command line tool, or code written in any other language (C, Java, Python, etc).
If there actually isn't any such tools, I assume I'm gonna have to use something like a lexical analyzer for Haskell, and write custom code that detects tail recursion myself.
I would first point out that tail recursion is rarely a virtue in Haskell. It's fine if you want to use Haskell as a medium for teaching tail recursion, but actually writing tail recursive functions in Haskell is usually misguided.
Presuming you still want to do this, I would highlight
after all that's what the Haskell compiler does for us
Yes, indeed. So why would any tool other than the compiler exist? The compiler already does exactly this. So, when you want to do this, use the compiler. I'm sure it won't be trivial, because you'll have to learn the compiler's types and other API. But it will actually be correct.
I would start by looking at a function like isAlwaysTailCalled, and see if that does what you want. If it doesn't, maybe you need to consume the AST of the function definition.
I basically agree with amalloy, however for this auto-grader (which presumably should only be a quick way to weed out clear-cut mistakes, not a complete reliable certification tool) I would just cobble something together in Template Haskell.
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE LambdaCase #-}
module TailRecCheck where
import Language.Haskell.TH
isTailRecursive :: Dec -> Bool
isTailRecursive (FunD fName clauses) = all isClauseTR clauses
where
isClauseTR (Clause _ (NormalB (AppE f x)) _)
-- Function application that could be a tail call
= case f of
-- It's only a tail call if the function is the
-- one we're currently defining, and if the rest
-- is not recursive
VarE fn -> fn==fName && isConstant x
-- Constant expressions are allowed as base cases
isClauseTR (Clause _ (NormalB body) _) = isConstant body
--isClauseTR _ ... _ = ...
isConstant (VarE n) = n /= fName
isConstant (ConE _) = True
isConstant (LitE _) = True
isConstant (AppE f x) = isConstant f && isConstant x
isConstant (InfixE l op r)
= all isConstant l && isConstant op && all isConstant r
--isConstant ... = ...
assertTailRecursiveDefs :: Q [Dec] -> Q [Dec]
assertTailRecursiveDefs n = n >>= mapM`id`\case
dec
| isTailRecursive dec -> return dec
| otherwise -> fail ("Function "++showName dec
++" is not tail recursive.")
where showName (FunD n _) = show n
To be used like
{-# LANGUAGE TemplateHaskell #-}
module TailRecCheckExample where
import TailRecCheck
assertTailRecursiveDefs [d|
f x = 4
g 0 = 1
g x = g (x-1)
h 0 = 1
h x = 1 + h (x-1)
|]
TailRecCheckExample.hs:7:1: error:
Function h_6989586621679042356 is not tail recursive.
|
7 | assertTailRecursiveDefs [d|
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^...
Related
Graham Hutton, in the 2nd edition of Programming in Haskell, spends the last 2 chapters on the topic of stack machine based implementation of an AST.
And he finishes by showing how to derive the correct implementation of that machine from the semantic model of the AST.
I'm trying to enlist the help of Data.SBV in that derivation, and failing.
And I'm hoping that someone can help me understand whether I'm:
Asking for something that Data.SBV can't do, or
Asking Data.SBV for something it can do, but asking incorrectly.
-- test/sbv-stack.lhs - Data.SBV assisted stack machine implementation derivation.
{-# LANGUAGE OverloadedLists #-}
{-# LANGUAGE ScopedTypeVariables #-}
import Data.SBV
import qualified Data.SBV.List as L
import Data.SBV.List ((.:), (.++)) -- Since they don't collide w/ any existing list functions.
-- AST Definition
data Exp = Val SWord8
| Sum Exp Exp
-- Our "Meaning" Function
eval :: Exp -> SWord8
eval (Val x) = x
eval (Sum x y) = eval x + eval y
type Stack = SList Word8
-- Our "Operational" Definition.
--
-- This function attempts to implement the *specification* provided by our
-- "meaning" function, above, in a way that is more conducive to
-- implementation in our available (and, perhaps, quite primitive)
-- computational machinery.
--
-- Note that we've (temporarily) assumed that this machinery will consist
-- of some form of *stack-based computation engine* (because we're
-- following Hutton's example).
--
-- Note that we give the *specification* of the function in the first
-- (commented out) line of the definition. The derivation of the actual
-- correct definition from this specification is detailed in Ch. 17 of
-- Hutton's book.
eval' :: Exp -> Stack -> Stack
-- eval' e s = eval e : s -- our "specification"
eval' (Val n) s = push n s -- We're defining this one manually.
where
push :: SWord8 -> Stack -> Stack
push n s = n .: s
eval' (Sum x y) s = add (eval' y (eval' x s))
where
add :: Stack -> Stack
add = uninterpret "add" s -- This is the function we're asking to be derived.
-- Now, let's just ask SBV to "solve" our specification of `eval'`:
spec :: Goal
spec = do x :: SWord8 <- forall "x"
y :: SWord8 <- forall "y"
-- Our spec., from above, specialized to the `Sum` case:
constrain $ eval' (Sum (Val x) (Val y)) L.nil .== eval (Sum (Val x) (Val y)) .: L.nil
We get:
λ> :l test/sbv-stack.lhs
[1 of 1] Compiling Main ( test/sbv-stack.lhs, interpreted )
Ok, one module loaded.
Collecting type info for 1 module(s) ...
λ> sat spec
Unknown.
Reason: smt tactic failed to show goal to be sat/unsat (incomplete quantifiers)
What happened?!
Well, maybe, asking SBV to solve for anything other than a predicate (i.e. - a -> Bool) doesn't work?
The fundamental issue here is that you are mixing SMTLib's sequence logic and quantifiers. And the problem turns out to be too difficult for an SMT solver to handle. This sort of synthesis of functions is indeed possible if you restrict yourself to basic logics. (Bitvectors, Integers, Reals.) But adding sequences to the mix puts it into the undecidable fragment.
This doesn't mean z3 cannot synthesize your add function. Perhaps a future version might be able to handle it. But at this point you're at the mercy of heuristics. To see why, note that you're asking the solver to synthesize the following definition:
add :: Stack -> Stack
add s = v .: s''
where (a, s') = L.uncons s
(b, s'') = L.uncons s'
v = a + b
while this looks rather innocent and simple, it requires capabilities beyond the current abilities of z3. In general, z3 can currently synthesize functions that only make a finite number of choices on concrete elements. But it is unable to do so if the output depends on input for every choice of input. (Think of it as a case-analysis producing engine: It can conjure up a function that maps certain inputs to others, but cannot figure out if something should be incremented or two things must be added. This follows from the work in finite-model finding theory, and is way beyond the scope of this answer! See here for details: https://arxiv.org/abs/1706.00096)
A better use case for SBV and SMT solving for this sort of problem is to actually tell it what the add function is, and then prove some given program is correctly "compiled" using Hutton's strategy. Note that I'm explicitly saying a "given" program: It would also be very difficult to model and prove this for an arbitrary program, but you can do this rather easily for a given fixed program. If you are interested in proving the correspondence for arbitrary programs, you really should be looking at theorem provers such as Isabelle, Coq, ACL2, etc.; which can deal with induction, a proof technique you will no doubt need for this sort of problem. Note that SMT solvers cannot perform induction in general. (You can use e-matching to simulate some induction like proofs, but it's a kludge at best and in general unmaintainable.)
Here's your example, coded to prove the \x -> \y -> x + y program is "correctly" compiled and executed with respect to reference semantics:
{-# LANGUAGE ScopedTypeVariables #-}
import Data.SBV
import qualified Data.SBV.List as L
import Data.SBV.List ((.:))
-- AST Definition
data Exp = Val SWord8
| Sum Exp Exp
-- Our "Meaning" Function
eval :: Exp -> SWord8
eval (Val x) = x
eval (Sum x y) = eval x + eval y
-- Evaluation by "execution"
type Stack = SList Word8
run :: Exp -> SWord8
run e = L.head (eval' e L.nil)
where eval' :: Exp -> Stack -> Stack
eval' (Val n) s = n .: s
eval' (Sum x y) s = add (eval' y (eval' x s))
add :: Stack -> Stack
add s = v .: s''
where (a, s') = L.uncons s
(b, s'') = L.uncons s'
v = a + b
correct :: IO ThmResult
correct = prove $ do x :: SWord8 <- forall "x"
y :: SWord8 <- forall "y"
let pgm = Sum (Val x) (Val y)
spec = eval pgm
machine = run pgm
return $ spec .== machine
When I run this, I get:
*Main> correct
Q.E.D.
And the proof takes almost no time. You can easily extend this by adding other operators, binding forms, function calls, the whole works if you like. So long as you stick to a fixed "program" for verification, it should work out just fine.
If you make a mistake, let's say define add by subtraction (modify the last line of it to ready v = a - b), you get:
*Main> correct
Falsifiable. Counter-example:
x = 32 :: Word8
y = 0 :: Word8
I hope this gives an idea of what the current capabilities of SMT solvers are and how you can put them to use in Haskell via SBV.
Program synthesis is an active research area with many custom techniques and tools. An out of the box use of an SMT-solver will not get you there. But if you do build such a custom system in Haskell, you can use SBV to access an underlying SMT solver to solve many constraints you'll have to handle during the process.
(Aside: An extended example, similar in spirit but with different goals, is shipped with the SBV package: https://hackage.haskell.org/package/sbv-8.5/docs/Documentation-SBV-Examples-Strings-SQLInjection.html. This program shows how to use SBV and SMT solvers to find SQL injection vulnerabilities in an idealized SQL implementation. That might be of some interest here, and would be more aligned with how SMT solvers are typically used in practice.)
Suppose I have a function f :: String -> String and want to match arguments of the form
_ ++ "bar"
where _ is an unspecified string that I would like to return. In other words, I want to match arguments like foobar and bazbar and return foo and baz respectively.
Using ViewPatterns this is possible as follows:
{-# LANGUAGE ViewPatterns #-}
f :: String -> String
f x#(reverse . take 3 $ reverse -> "bar") = take (n-3) x
where n = length x
...but this is far from ideal. Mainly because things will get hairy very quickly if I decide that I want to combine two or more such patterns.
Ideally, I want something to be able to write something like this:
f (x:"bar") = x
but unfortunately this is not valid Haskell.
Is there an adequate solution in ViewPatterns or another extension?
On built-in String, this is a very bad idea, since your pattern match turns out to be quite expensive. On other string types, like Text or ByteString, you can use pattern guards:
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text as T
f :: Text -> Text
f x | Just x' <- T.stripSuffix "bar" x = ...
Or with ViewPatterns (of which I am less fond):
{-# LANGUAGE OverloadedStrings, ViewPatterns #-}
import qualified Data.Text as T
f :: Text -> Text
f (T.stripSuffix "bar" -> Just x') = ...
stripSuffix :: (Eq a) => [a] -> [a] -> Maybe [a]
stripSuffix needle = go <*> drop (length needle)
where
go xs [] = if xs == needle then Just [] else Nothing
go (x:xs) (_:ys) = (x:) <$> go xs ys
f (stripSuffix "bar" -> Just pref) = pref
I haven't tested it too much but this is a simple solution that doesn't bring in extra machinery of regexs / parsers.
Don't do it with pattern matching. Pattern matches are typically cheap and match the structure of the input data. This is a very expensive pattern written as if it were very cheap to compute. If you want to do this, write it as a guard clause, where you can make it clear exactly what's happening.
This is almost possible with TemplateHaskell - someone more experienced with it could improve on this answer.
import Language.Haskell.TH
import Language.Haskell.TH.Syntax
(+++) :: Q Pat -> String -> Q Pat
x +++ y = [p| ((\string -> splitAt (length string - length y) string) -> (x, $literal_pattern)) |]
where literal_pattern = returnQ (LitP (StringL y))
This is usable in the pattern position, and you can pass in a pattern as the first argument using TemplateHaskell's quasiquoting:
f $([p|x|] +++ "bar") = x
Rather annoyingly, I can't find any explanation of how you can pass a pattern to a TemplateHaskell splice any more succintly than this.
Following code illustrates my intention. I want to pattern match, if it doesn't result is Nothing, if matches result is Just something
data MyData =
A Int
| B String
| C
ifA (A i) = Just i
ifA _ = Nothing
ifB (B str) = Just str
ifB _ = Nothing
ifC C = Just ()
ifC _ = Nothing
mbMult3 i = Just (i*3)
concWorld str = Just (str ++ "World")
example1 v = ifA v >>= mbMult3
example2 v = ifB v >>= concWorld
-- example2 (B "Hello ,") == Just "Hello, World"
-- example2 (A 3) == Nothing
Are there other ways to do ifA ifB ifC.
Edit:
I suspect that lens library might have something. But I know nothing about lens for now.
Updated ifC
Prisms† from the lens package model this. You can generate prisms for your type using makePrisms, then use ^? (or its prefix equivalent, preview) to access members of the sum type, producing Nothing if a different value is provided:
{-# LANGUAGE TemplateHaskell #-}
import Control.Lens
data MyData
= A Int
| B String
| C
deriving (Show)
makePrisms ''MyData
ghci> A 42 ^? _A
Just 42
ghci> A 42 ^? _B
Nothing
ghci> C ^? _C
Just ()
The nice thing about prisms is that they compose with other optics (e.g. lenses, folds, and traversals), as well as use them to traverse through nested sum types by composing prisms:
ghci> (A 42, C) ^? _1._A
Just 42
ghci> (B "hello", C) ^? _1._A
Nothing
ghci> Just (A 42) ^? _Just._A
Just 42
ghci> Just (B "hello") ^? _Just._A
Nothing
ghci> Nothing ^? _Just._A
Nothing
The lens package is fairly complicated, and explaining all of its features is well outside the scope of this answer. If you don’t need much of it, your solution is probably fine. But if you find yourself writing that sort of code a lot, it’s likely that lens can help, as long as you’re willing to accept the steep learning curve and often-confusing type errors.
† More generally, ^? works on any Fold that produces zero or one value (or more, it just ignores all but the first one), but prisms are specifically designed with sum types in mind, so they are more relevant here.
Here's a cute idiom that I use from time to time, since I'm not yet up on my lenses.
{-# LANGUAGE MonadComprehensions #-}
example1 v = [ x | A x <- pure v ] >>= mbMult3
I have a Haskell data type like
data Mytype
= C1
| C2 Char
| C3 Int String
If I case on a Mytype and forget to handle one of the cases, GHC gives me a warning (exhaustiveness check).
I now want to write a QuickCheck Arbitrary instance to generate MyTypes like:
instance Arbitrary Mytype where
arbitrary = do
n <- choose (1, 3 :: Int)
case n of
1 -> C1
2 -> C2 <$> arbitrary
3 -> C3 <$> arbitrary <*> someCustomGen
The problem with this is that I can add a new alternative to Mytype and forget to update the Arbitrary instance, thus having my tests not test that alternative.
I would like to find a way of using GHC's exhaustiveness checker to remind me of forgotten cases in my Arbitrary instance.
The best I've come up with is
arbitrary = do
x <- elements [C1, C2 undefined, C3 undefined undefined]
case x of
C1 -> C1
C2 _ -> C2 <$> arbitrary
C3 _ _ -> C3 <$> arbitrary <*> someCustomGen
But it doesn't really feel elegant.
I intuitively feel that there's no 100% clean solution to this, but would appreciate anything that reduces the chance of forgetting such cases - especially in a big project where code and tests are separated.
I implemented a solution with TemplateHaskell, you can find a prototype at https://gist.github.com/nh2/d982e2ca4280a03364a8. With this you can write:
instance Arbitrary Mytype where
arbitrary = oneof $(exhaustivenessCheck ''Mytype [|
[ pure C1
, C2 <$> arbitrary
, C3 <$> arbitrary <*> arbitrary
]
|])
It works like this: You give it a type name (like ''Mytype) and an expression (in my case a list of arbitrary style Gens). It gets the list of all constructors for that type name and checks whether the expression contains all of these constructors at least once. If you just added a constructor but forgot to add it to the Arbitrary instance, this function will warn you at compile time.
This is how it's implemented with TH:
exhaustivenessCheck :: Name -> Q Exp -> Q Exp
exhaustivenessCheck tyName qList = do
tyInfo <- reify tyName
let conNames = case tyInfo of
TyConI (DataD _cxt _name _tyVarBndrs cons _derives) -> map conNameOf cons
_ -> fail "exhaustivenessCheck: Can only handle simple data declarations"
list <- qList
case list of
input#(ListE l) -> do
-- We could be more specific by searching for `ConE`s in `l`
let cons = toListOf tinplate l :: [Name]
case filter (`notElem` cons) conNames of
[] -> return input
missings -> fail $ "exhaustivenessCheck: missing case: " ++ show missings
_ -> fail "exhaustivenessCheck: argument must be a list"
I'm using GHC.Generics to easily traverse the syntax tree of the Exp: With toListOf tinplate exp :: [Name] (from lens) I can easily find all Names in the whole exp.
I was surprised that the types from Language.Haskell.TH do not have Generic instances, and neither (with current GHC 7.8) do Integer or Word8 - Generic instances for these are required because they appear in Exp. So I added them as orphan instances (for most things, StandaloneDeriving does it but for primitive types like Integer I had to copy-paste instances as Int has them).
The solution is not perfect because it doesn't use the exhaustiveness checker like case does, but as we agree, that's not possible while staying DRY, and this TH solution is DRY.
One possible improvement/alternative would be to write a TH function that does this check for all Arbitrary instances in a whole module at once instead of calling exhaustivenessCheck inside each Arbitrary instance.
You want to ensure that your code behaves in a particular way; the simplest way to check the behaviour of code is to test it.
In this case, the desired behaviour is that each constructor gets reasonable coverage in tests. We can check that with a simple test:
allCons xs = length xs > 100 ==> length constructors == 3
where constructors = nubBy eqCons xs
eqCons C1 C1 = True
eqCons C1 _ = False
eqCons (C2 _) (C2 _) = True
eqCons (C2 _) _ = False
eqCons (C3 _ _) (C3 _ _) = True
eqCons (C3 _ _) _ = False
This is pretty naive, but it's a good first shot. Its advantages:
eqCons will trigger an exhaustiveness warning if new constructors are added, which is what you want
It checks that your instance is handling all constructors, which is what you want
It also checks that all constructors are actually generated with some useful probability (in this case at least 1%)
It also checks that your instance is usable, eg. doesn't hang
Its disadvantages:
Requires a large amount of test data, in order to filter out those with length > 100
eqCons is quite verbose, since a catch-all eqCons _ _ = False would bypass the exhaustiveness check
Uses magic numbers 100 and 3
Not very generic
There are ways to improve this, eg. we can compute the constructors using the Data.Data module:
allCons xs = sufficient ==> length constructors == consCount
where sufficient = length xs > 100 * consCount
constructors = length . nub . map toConstr $ xs
consCount = dataTypeConstrs (head xs)
This loses the compile-time exhaustiveness check, but it's redundant as long as we test regularly and our code has become more generic.
If we really want the exhaustiveness check, there are a few places where we could shoe-horn it back in:
allCons xs = sufficient ==> length constructors == consCount
where sufficient = length xs > 100 * consCount
constructors = length . nub . map toConstr $ xs
consCount = length . dataTypeConstrs $ case head xs of
x#(C1) -> x
x#(C2 _) -> x
x#(C3 _ _) -> x
Notice that we use consCount to eliminate the magic 3 completely. The magic 100 (which determined the minimum required frequency of a constructor) now scales with consCount, but that just requires even more test data!
We can solve that quite easily using a newtype:
consCount = length (dataTypeConstrs C1)
newtype MyTypeList = MTL [MyType] deriving (Eq,Show)
instance Arbitrary MyTypeList where
arbitrary = MTL <$> vectorOf (100 * consCount) arbitrary
shrink (MTL xs) = MTL (shrink <$> xs)
allCons (MTL xs) = length constructors == consCount
where constructors = length . nub . map toConstr $ xs
We can put a simple exhaustiveness check in there somewhere if we like, eg.
instance Arbitrary MyTypeList where
arbitrary = do x <- arbitrary
MTL <$> vectorOf (100 * consCount) getT
where getT = do x <- arbitrary
return $ case x of
C1 -> x
C2 _ -> x
C3 _ _ -> x
shrink (MTL xs) = MTL (shrink <$> xs)
Here I exploit an unused variable _x. This is not really more elegant than your solution, though.
instance Arbitrary Mytype where
arbitrary = do
let _x = case _x of C1 -> _x ; C2 _ -> _x ; C3 _ _ -> _x
n <- choose (1, 3 :: Int)
case n of
1 -> C1
2 -> C2 <$> arbitrary
3 -> C3 <$> arbitrary <*> someCustomGen
Of course, one has to keep the last case coherent with the dummy definition of _x, so it is not completely DRY.
Alternatively, one might exploit Template Haskell to build a compile-time assert checking that the constructors in Data.Data.dataTypeOf are the expected ones. This assert has to be kept coherent with the Arbitrary instance, so this is not completely DRY either.
If you do not need custom generators, I believe Data.Data can be exploited to generate Arbitrary instances via Template Haskell (I think I saw some code doing exactly that, but I can't remember where). In this way, there's no chance the instance can miss a constructor.
Here is a solution using the generic-random library:
{-# language DeriveGeneric #-}
{-# language TypeOperators #-}
import Generic.Random
import GHC.Generics
import Test.QuickCheck
data Mytype
= C1
| C2 Char
| C3 Int String
deriving Generic
instance Arbitrary Mytype where
arbitrary = genericArbitraryG customGens uniform
where
customGens :: Gen String :+ ()
customGens = someCustomGen :+ ()
someCustomGen :: Gen String
someCustomGen = undefined
genericArbitraryG takes care of generating each constructor of MyType. In this case we use uniform to get a uniform distribution of constructors. With customGens we define that each String field in Mytype is generated with someCustomGen.
See Generic.Random.Tutorial for more examples.
In my program for solving discrete maths, I want to let the user input a string of logic operations; e.g., if the user inputs let f (x:y:_) = x && y, then I would get a function f for use in the rest of the program. In GHCi, I can easily test my program by inputting let f (x:y:_) = x && y.
I have no idea how to achieve this task. I have taken a look into the eval function from the plugins package, but it seems not to be the right function. Can I do this in Haskell?
The code I'm planning to use this with is:
type TruthTable = [[Bool]]
type TruthTableResult = [([Bool], Bool)]
solveTable :: ([Bool] -> Bool) -> Integer -> (TruthTableResult)
solveTable f n = let table = truthTable n
result = map f table
in zipWith (\v r -> (v, r)) table result
There is no standard Haskell function, because Haskell is complied, not interpreted. However, there are libraries that allow you to read and compile Haskell code at run time. One of them is hint. Example for your case:
import Control.Monad
import Language.Haskell.Interpreter
main = do
-- fExpr is a Haskell code supplied by your user as a String
let fExpr = "let f (x:y:_) = x && y in f"
-- Create an interpreter that runs fExpr
r <- runInterpreter $ do
setImports ["Prelude"]
interpret fExpr (const True :: [Bool] -> Bool)
-- run it and get an interface to the function
case r of
Left err -> putStrLn $ "Ups... " ++ (show err)
Right f -> do
print $ f [True, False]
print $ f [True, True]
More examples available here.
You are writing an eval function - a form of runtime metaprogramming.
eval :: String -> a
If the string represents a Haskell program, then you must parse the string, type check it, and then compile it to a target interpreter or runtime. This requires access to the compiler as a library, either exported as a runtime service (in an interpreter) or as a separate package (as for a compiler).
The GHC implementation of Haskell has several libraries for doing runtime evaluation of Haskell code:
via the GHCi interpreter- hint
via the compiler - plugins
These apply only if your input language is Haskell.
If instead your input string represents a program in some other language, then you are looking for a DSL interpreter. This can be done by writing your own interpreter for the input language (or reusing a library if it is a common language).
The short answer is that Haskell has no "eval" function, unlike interpreted languages which can do this quite easily (after all, they have the interpreter handy and already running).
You can include the Haskell compiler as a library: see http://www.haskell.org/haskellwiki/GHC/As_a_library. This is the nearest thing to what you ask for.
However it sounds like you don't want the whole of Haskell here; what you are really want is a different language which may have Haskell-like syntax but is not the whole of Haskell. If so then the real solution is to define that language and write a parser for it. The Parsec library is the place to start for that.