Visiting GHC AST with SYB - haskell

I wrote a program that visited the AST with Haskell-src-exts. I'm trying to convert it to use the GHC API. The former uses Uniplate, while for the latter it seems that unfortunately I'm forced with SYB (the documentation is horribly scarce).
Here is the original code:
module Argon.Visitor (funcsCC)
where
import Data.Data (Data)
import Data.Generics.Uniplate.Data (childrenBi, universeBi)
import Language.Haskell.Exts.Syntax
import Argon.Types (ComplexityBlock(..))
-- | Compute cyclomatic complexity of every function binding in the given AST.
funcsCC :: Data from => from -> [ComplexityBlock]
funcsCC ast = map funCC [matches | FunBind matches <- universeBi ast]
funCC :: [Match] -> ComplexityBlock
funCC [] = CC (0, 0, "<unknown>", 0)
funCC ms#(Match (SrcLoc _ l c) n _ _ _ _:_) = CC (l, c, name n, complexity ms)
where name (Ident s) = s
name (Symbol s) = s
sumWith :: (a -> Int) -> [a] -> Int
sumWith f = sum . map f
complexity :: Data from => from -> Int
complexity node = 1 + visitMatches node + visitExps node
visitMatches :: Data from => from -> Int
visitMatches = sumWith descend . childrenBi
where descend :: [Match] -> Int
descend x = length x - 1 + sumWith visitMatches x
visitExps :: Data from => from -> Int
visitExps = sumWith inspect . universeBi
where inspect e = visitExp e + visitOp e
visitExp :: Exp -> Int
visitExp (If {}) = 1
visitExp (MultiIf alts) = length alts - 1
visitExp (Case _ alts) = length alts - 1
visitExp (LCase alts) = length alts - 1
visitExp _ = 0
visitOp :: Exp -> Int
visitOp (InfixApp _ (QVarOp (UnQual (Symbol op))) _) =
case op of
"||" -> 1
"&&" -> 1
_ -> 0
visitOp _ = 0
I need to visit function bindings, matches and expressions. This is what I managed to write (not working):
import Data.Generics
import qualified GHC
import Outputable -- from the GHC package
funcs :: (Data id, Typeable id, Outputable id, Data from, Typeable from) => from -> [GHC.HsBindLR id id]
funcs ast = everything (++) (mkQ [] (\fun#(GHC.FunBind {}) -> [fun])) ast
It complains that there are too many instances for id, but I don't know what the heck it is. The relevant GHC module is:
http://haddock.stackage.org/lts-3.10/ghc-7.10.2/HsBinds.html
I'm getting insane from this. The goal is to count complexity (as you can see in the original code). I'd like to switch to the GHC API because it uses the same parser as the compiler, so it can parse every module without worrying about extensions.
EDIT: Here is why the current code does not work:
λ> :m +Language.Haskell.GHC.ExactPrint.Parsers GHC Data.Generics Outputable
λ> r <- Language.Haskell.GHC.ExactPrint.parseModule src/Argon/Visitor.hs
λ> let ast = snd $ (\(Right t) -> t) r
.>
λ> :t ast
ast :: Located (HsModule RdrName)
λ> let funcs = everything (++) (mkQ [] (un#(FunBind _ _ _ _ _ _) -> [fun])) ast :: (Data id, Typeable id, Outputable id) => [HsBindLR id id]
.>
λ> length funcs
<interactive>:12:8:
No instance for (Data id0) arising from a use of ‘funcs’
The type variable ‘id0’ is ambiguous
Note: there are several potential instances:
instance Data aeson-0.8.0.2:Data.Aeson.Types.Internal.Value
-- Defined in ‘aeson-0.8.0.2:Data.Aeson.Types.Internal’
instance Data attoparsec-0.12.1.6:Data.Attoparsec.Number.Number
-- Defined in ‘attoparsec-0.12.1.6:Data.Attoparsec.Number’
instance Data a => Data (Data.Complex.Complex a)
-- Defined in ‘Data.Complex’
../..plus 367 others
In the first argument of ‘length’, namely ‘funcs’
In the expression: length funcs
In an equation for ‘it’: it = length funcs

The GHC AST is parametrised on the type of names used in the tree: the parser outputs an AST with RdrName names which it seems you're working with. The GHC Commentary and the Haddocks have more information.
You might have more luck if you tell the compiler that you are working with HsBindLR RdrName RdrName.
Like this:
import Data.Generics
import GHC
import Outputable -- from the GHC package
funcs :: (Data from, Typeable from) => from -> [GHC.HsBindLR RdrName RdrName]
funcs ast = everything (++) (mkQ [] (\fun#(GHC.FunBind {}) -> [fun])) ast

Related

Traversing a Template Haskell AST

I have gethered that Haskell code in template-haskell is not represented as a single AST, but rather four cross-referencing types of Pat, Exp, Dec and Type. I have also found no traversal facilities within the library, or anywhere else for that matter.
I was initially looking for a unified representation of the four said types:
-- The single representation for Haskell code
data HCode = HE Exp | HD Dec | HP Pat | HT Type
-- And common functions in tree traversal such as:
children :: HCode -> [HCode]
children (HE (VarE _)) = []
children (HE (AppTypeE e t)) = [HE e, HT t]
children c = ...
-- Ultimately a transform function similar to:
-- (Not really arguing about this exact model of tree transformation)
preorder :: (HCode -> HCode) -> HCode -> HCode
preorder f h =
let h' = f h
in rebuildWithChildren h' . fmap (preorder f) . children $ h'
And now I have grown to believe writing it this way, aside from being time-consuming, is wasteful, since traversing/transforming ASTs is common practice, and I figured it might be best to ask what available solution there is among the practitioners.
Generally, I'm not sure that generic traversal of TH is likely to come up very often. (I'm struggling to imagine a useful transformation of a TH AST in a situation where you wouldn't just generate the TH already transformed that way.) I guess there are some situations where you want to perform queries or transformations of user-supplied quasiquotes without parsing the entire AST?
Anyway, if you can find a use for it, you can use SYB generics. For example, here's a query to extract literals from patterns and expressions from an arbitrary TH "thing":
{-# LANGUAGE TemplateHaskell #-}
import Data.Generics
import Language.Haskell.TH
getLiterals :: Data d => d -> [Lit]
getLiterals = everything (++) (mkQ [] litE `extQ` litP)
where litE (LitE l) = [l]
litE _ = []
litP (LitP l) = [l]
litP _ = []
main = do mydec <- runQ [d| foo 4 = "hello" |]
print mydec
print $ getLiterals mydec
myexp <- runQ [| '1' + "sixteen" |]
print myexp
print $ getLiterals myexp
Here's a transformation that commutes all infix operators in patterns, expressions, and types (example for InfixT not shown):
{-# LANGUAGE TemplateHaskell #-}
import Data.Generics
import Language.Haskell.TH
causeChaos :: Data d => d -> d
causeChaos = everywhere (mkT destroyExpressions `extT` manglePatterns `extT` bludgeonTypes)
where destroyExpressions (InfixE l x r) = InfixE r x l
destroyExpressions (UInfixE l x r) = UInfixE r x l
destroyExpressions e = e
manglePatterns (InfixP l x r) = InfixP r x l
manglePatterns (UInfixP l x r) = UInfixP r x l
manglePatterns e = e
bludgeonTypes (InfixT l x r) = InfixT r x l
bludgeonTypes (UInfixT l x r) = UInfixT r x l
bludgeonTypes e = e
main = do mydec <- runQ [d| append :: [a] -> [a] -> [a]
append (x:xs) ys = x : append xs ys
append [] ys = ys
|]
print mydec
print $ causeChaos mydec

Running all combinations of tests and test cases in Haskell

I aim to be able to define a collection of test methods and a collection of test cases (input/output data) and then execute all of their combinations. The goal is to avoid re-writing the same code over and over again when having say, 3 different implementations of the same function and 4 test cases that the function should satisfy. A naive approach would require me to write 12 lines of code:
testMethod1 testCase1
testMethod1 testCase2
...
testMethod3 testCase4
I've a gut feeling that Haskell should provide a way to abstract this pattern somehow. The best thing I've currently came up with is this piece of code:
import Control.Applicative
data TestMethod a = TM a
data TestData inp res = TD inp res
runMetod (TM m) (TD x res) = m x == res
runAllMethods ((m, inp):xs) = show (runMetod m inp) ++ "\n" ++ runAllMethods xs
runAllMethods _ = ""
head1 = head
head2 (x:xs) = x
testMethods = [TM head1, TM head2]
testData = [TD [1,2,3] 1, TD [4,5,6] 4]
combos = (,) <$> testMethods <*> testData
main = putStrLn $ runAllMethods combos
This works, computes 2 tests against two 2 functions and prints out 4 successes:
True
True
True
True
However, this works only for lists of the same type, even though the head function is list type agnostic. I would like to have a test data collection of any lists, like so:
import Control.Applicative
data TestMethod a = TM a
data TestData inp res = TD inp res
runMetod (TM m) (TD x res) = m x == res
runAllMethods ((m, inp):xs) = show (runMetod m inp) ++ "\n" ++ runAllMethods xs
runAllMethods _ = ""
head1 = head
head2 (x:xs) = x
testMethods = [TM head1, TM head2]
testData = [TD [1,2,3] 1, TD ['a','b','c'] 'a']
combos = (,) <$> testMethods <*> testData
main = putStrLn $ runAllMethods combos
but this fails with an error:
main.hs:12:21: error:
No instance for (Num Char) arising from the literal ‘1’
In the expression: 1
In the first argument of ‘TD’, namely ‘[1, 2, 3]’
In the expression: TD [1, 2, 3] 1
Is it possible to achieve this test-function X test-case cross testing somehow?
You should really use QuickCheck or similar, like hnefatl said.
But just for the fun of it, let's get your idea to work.
So you have a polymorphic function and a lot of test cases of different types. The only thing that matters is that you can apply the function each of the types.
So let's have a look at your function. It's of type [a] -> a. How should your test data look like? It should consist of a list, of a value, and it should support equality comparison. That leads you to a definition like:
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ImpredicativeTypes #-}
{-# LANGUAGE RankNTypes #-}
data TestData where
TestData :: Eq a => [a] -> a -> TestData
You need to enable the GADTs language extension for this to work. For the following to work, you need these other two extensions (although the whole thing can be generalised with type classes to avoid that, just look at QuickCheck).
Now test it:
head1 = head
head2 (a : as) = a
test :: (forall a . [a] -> a) -> TestData -> Bool
test f (TestData as a) = f as == a
testAll :: [(forall a . [a] -> a)] -> [TestData] -> Bool
testAll fs testDatas = and $ test <$> fs <*> testDatas
main = putStrLn $ if testAll [head1, head2] [TestData "Foo" 'F', TestData [1..] 1]
then "Success!"
else "Oh noez!"
I'll leave it to you to generalise this for different test function types.

Arbitrary String generator in Haskell (Test.QuickCheck.Gen)

I am struggling on Real World Haskell Chapter 11 quickCheck generator implementation for a an algebraic data type.
Following the book implementation (which was published in 2008), I came up with the following:
-- file: ch11/Prettify2.hs
module Prettify2(
Doc(..)
) where
data Doc = Empty
| Char Char
| Text String
| Line
| Concat Doc Doc
| Union Doc Doc
deriving (Show, Eq)
And my Arbitrary implementation:
-- file: ch11/Arbitrary.hs
import System.Random
import Test.QuickCheck.Gen
import qualified Test.QuickCheck.Arbitrary
class Arbitrary a where
arbitrary :: Gen a
-- elements' :: [a] => Gen a {- Expected a constraint, but ‘[a]’ has kind ‘*’ -}
-- choose' :: Random a => (a, a) -> Gen a
-- oneof' :: [Gen a] -> a
data Ternary = Yes
| No
| Unknown
deriving(Eq, Show)
instance Arbitrary Ternary where
arbitrary = do
n <- choose (0, 2) :: Gen Int
return $ case n of
0 -> Yes
1 -> No
_ -> Unknown
instance (Arbitrary a, Arbitrary b) => Arbitrary (a, b) where
arbitrary = do
x <- arbitrary
y <- arbitrary
return (x, y)
instance Arbitrary Char where
arbitrary = elements (['A'..'Z'] ++ ['a' .. 'z'] ++ " ~!##$%^&*()")
I tried the two following implementation with no success:
import Prettify2
import Control.Monad( liftM, liftM2 )
instance Arbitrary Doc where
arbitrary = do
n <- choose (1,6) :: Gen Int
case n of
1 -> return Empty
2 -> do x <- arbitrary
return (Char x)
3 -> do x <- arbitrary
return (Text x)
4 -> return Line
5 -> do x <- arbitrary
y <- arbitrary
return (Concat x y)
6 -> do x <- arbitrary
y <- arbitrary
return (Union x y)
instance Arbitrary Doc where
arbitrary =
oneof [ return Empty
, liftM Char arbitrary
, liftM Text arbitrary
, return Line
, liftM2 Concat arbitrary arbitrary
, liftM2 Union arbitrary arbitrary ]
But it doesn't compile since No instance for (Arbitrary String)
I tried then to implement the instance for Arbitrary String in the following ways:
import qualified Test.QuickCheck.Arbitrary but it does not implement Arbitrary String neither
installing Test.RandomStrings hackage link
instance Arbitrary String where
arbitrary = do
n <- choose (8, 16) :: Gen Int
return $ randomWord randomASCII n :: Gen String
With the following backtrace:
$ ghci
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help
Prelude> :l Arbitrary.hs
[1 of 2] Compiling Prettify2 ( Prettify2.hs, interpreted )
[2 of 2] Compiling Main ( Arbitrary.hs, interpreted )
Arbitrary.hs:76:9:
The last statement in a 'do' block must be an expression
return <- randomWord randomASCII n :: Gen String
Failed, modules loaded: Prettify2
Would you have any good suggestion about how to implement this particular generator and - more in general - how to proceed in these cases?
Thank you in advance
Don't define a new Arbitrary type class, import Test.QuickCheck instead. It defines most of these instances for you. Also be careful about the version of quickcheck, RWH assumes version 1.
The resulting full implementation will be:
-- file: ch11/Arbitrary.hs
import Test.QuickCheck
import Prettify2
import Control.Monad( liftM, liftM2 )
data Ternary = Yes
| No
| Unknown
deriving(Eq, Show)
instance Arbitrary Ternary where
arbitrary = do
n <- choose (0, 2) :: Gen Int
return $ case n of
0 -> Yes
1 -> No
_ -> Unknown
instance Arbitrary Doc where
arbitrary =
oneof [ return Empty
, liftM Char arbitrary
, liftM Text arbitrary
, return Line
, liftM2 Concat arbitrary arbitrary
, liftM2 Union arbitrary arbitrary ]

Haskell More efficient way to parse file of lines of digits

So I have about a 8mb file of each with 6 ints seperated by a space.
my current method for parsing this is:
tuplify6 :: [a] -> (a, a, a, a, a, a)
tuplify6 [l, m, n, o, p, q] = (l, m, n, o, p, q)
toInts :: String -> (Int, Int, Int, Int, Int, Int)
toInts line =
tuplify6 $ map read stringNumbers
where stringNumbers = split " " line
and mapping toInts over
liftM lines . readFile
which will return me a list of tuples. However, When i run this, it takes nearly 25 seconds to load the file and parse it. Any way I can speed this up? The file is just plain text.
You can speed it up by using ByteStrings, e.g.
module Main (main) where
import System.Environment (getArgs)
import qualified Data.ByteString.Lazy.Char8 as C
import Data.Char
main :: IO ()
main = do
args <- getArgs
mapM_ doFile args
doFile :: FilePath -> IO ()
doFile file = do
bs <- C.readFile file
let tups = buildTups 0 [] $ C.dropWhile (not . isDigit) bs
print (length tups)
buildTups :: Int -> [Int] -> C.ByteString -> [(Int,Int,Int,Int,Int,Int)]
buildTups 6 acc bs = tuplify6 acc : buildTups 0 [] bs
buildTups k acc bs
| C.null bs = if k == 0 then [] else error ("Bad file format " ++ show k)
| otherwise = case C.readInt bs of
Just (i,rm) -> buildTups (k+1) (i:acc) $ C.dropWhile (not . isDigit) rm
Nothing -> error ("No Int found: " ++ show (C.take 100 bs))
tuplify6:: [a] -> (a, a, a, a, a, a)
tuplify6 [l, m, n, o, p, q] = (l, m, n, o, p, q)
runs pretty fast:
$ time ./fileParse IntList
200000
real 0m0.119s
user 0m0.115s
sys 0m0.003s
for an 8.1 MiB file.
On the other hand, using Strings and your conversion (with a couple of seqs to force evaluation) also took only 0.66s, so the bulk of the time seems to be spent not parsing, but working with the result.
Oops, missed a seq so the reads were not actually evaluated for the String version. Fixing that, String + read takes about four seconds, a bit above one with the custom Int parser from #Rotsor's comment
foldl' (\a c -> 10*a + fromEnum c - fromEnum '0') 0
so parsing apparently did take a significant amount of the time.

Haskell Hash table

I am trying to build a smallish haskell app that will translate a few key phrases from english to french.
First, i have a list of ordered pairs of strings that represent and english word/phrase followed by the french translations:
icards = [("the", "le"),("savage", "violent"),("work", "travail"),
("wild", "sauvage"),("chance", "occasion"),("than a", "qu'un")...]
next i have a new data:
data Entry = Entry {wrd, def :: String, len :: Int, phr :: Bool}
deriving Show
then i use the icards to populate a list of Entrys:
entries :: [Entry]
entries = map (\(x, y) -> Entry x y (length x) (' ' `elem` x)) icards
for simplicity, i create a new type that will be [Entry] called Run.
Now, i want to create a hash table based on the number of characters in the english word. This will be used later to speed up searchings. So i want to create a function called runs:
runs :: [Run]
runs = --This will run through the entries and return a new [Entry] that has all of the
words of the same length grouped together.
I also have:
maxl = maximum [len e | e <- entries]
It just so happens that Hackage has a hashmap package! I'm going to create a small data type based on that HashMap, which I will call a MultiMap. This is a typical trick: it's just a hash map of linked lists. I'm not sure what the correct name for MultiMap actually is.
import qualified Data.HashMap as HM
import Data.Hashable
import Prelude hiding (lookup)
type MultiMap k v = HM.Map k [v]
insert :: (Hashable k, Ord k) => k -> a -> MultiMap k a -> MultiMap k a
insert k v = HM.insertWith (++) k [v]
lookup :: (Hashable k, Ord k) => k -> MultiMap k a -> [a]
lookup k m = case HM.lookup k m of
Nothing -> []
Just xs -> xs
empty :: MultiMap k a
empty = HM.empty
fromList :: (Hashable k, Ord k) => [(k,v)] -> MultiMap k v
fromList = foldr (uncurry insert) empty
I mimicked only the essentials of a Map: insert, lookup, empty, and fromList. Now it is quite easy to turn entries into a MutliMap:
data Entry = Entry {wrd, def :: String, len :: Int, phr :: Bool}
deriving (Show)
icards = [("the", "le"),("savage", "violent"),("work", "travail"),
("wild", "sauvage"),("chance", "occasion"),("than a", "qu'un")]
entries :: [Entry]
entries = map (\(x, y) -> Entry x y (length x) (' ' `elem` x)) icards
fromEntryList :: [Entry] -> MutiMap Int Entry
fromEntryList es = fromList $ map (\e -> (len e, e)) es
Loading that up into ghci, we can now lookup a list of entries with a given length:
ghci> let m = fromEntryList entries
ghci> lookup 3 m
[Entry {wrd = "the", def = "le", len = 3, phr = False}]
ghci> lookup 4 m
[Entry {wrd = "work", def = "travail", len = 4, phr = False},
Entry {wrd = "wild", def = "sauvage", len = 4, phr = False}]
(Note that this lookup is not the one defined in Prelude.) You could similarly use the English word as a key.
-- import Data.List (find) -- up with other imports
fromEntryList' :: [Entry] -> MultiMap String Entry
fromEntryList' es = fromList $ map (\e -> (wrd e, e)) es
eLookup :: String -> MultiMap String Entry -> Maybe Entry
eLookup str m = case lookup str m of
[] -> Nothing
xs -> find (\e -> wrd e == str) xs
Testing...
ghci> let m = fromEntryList' entries
ghci> eLookup "the" m
Just (Entry {wrd = "the", def = "le", len = 3, phr = False})
ghci> eLookup "foo" m
Nothing
Notice how in eLookup we first perform the Map lookup in order to determine if anything has been placed in that slot. Since we are using a hash set, we need to remember that two different Strings might have the same hash code. So in the event that the slot is not empty, we perform a find on the linked list there to see if any of the entries there actually match the correct English word. If you are interested in performance, you should consider using Data.Text instead of String.
groupBy and sortBy are both in Data.List.
import Data.List
import Data.Function -- for `on`
runs :: [Run]
runs = f 0 $ groupBy ((==) `on` len) $ sortBy (compare `on` len) entries
where f _ [] = []
f i (r # (Entry {len = l} : _) : rs) | i == l = r : f (i + 1) rs
f i rs = [] : f (i + 1) rs
Personally, I would use a Map instead
import qualified Data.Map as M
runs :: M.Map String Entry
runs = M.fromList $ map (\entry -> (wrd entry, entry)) entries
and lookup directly by English word instead of a two step length-of-English-word and then English-word process.

Resources