Traversing a Template Haskell AST - haskell

I have gethered that Haskell code in template-haskell is not represented as a single AST, but rather four cross-referencing types of Pat, Exp, Dec and Type. I have also found no traversal facilities within the library, or anywhere else for that matter.
I was initially looking for a unified representation of the four said types:
-- The single representation for Haskell code
data HCode = HE Exp | HD Dec | HP Pat | HT Type
-- And common functions in tree traversal such as:
children :: HCode -> [HCode]
children (HE (VarE _)) = []
children (HE (AppTypeE e t)) = [HE e, HT t]
children c = ...
-- Ultimately a transform function similar to:
-- (Not really arguing about this exact model of tree transformation)
preorder :: (HCode -> HCode) -> HCode -> HCode
preorder f h =
let h' = f h
in rebuildWithChildren h' . fmap (preorder f) . children $ h'
And now I have grown to believe writing it this way, aside from being time-consuming, is wasteful, since traversing/transforming ASTs is common practice, and I figured it might be best to ask what available solution there is among the practitioners.

Generally, I'm not sure that generic traversal of TH is likely to come up very often. (I'm struggling to imagine a useful transformation of a TH AST in a situation where you wouldn't just generate the TH already transformed that way.) I guess there are some situations where you want to perform queries or transformations of user-supplied quasiquotes without parsing the entire AST?
Anyway, if you can find a use for it, you can use SYB generics. For example, here's a query to extract literals from patterns and expressions from an arbitrary TH "thing":
{-# LANGUAGE TemplateHaskell #-}
import Data.Generics
import Language.Haskell.TH
getLiterals :: Data d => d -> [Lit]
getLiterals = everything (++) (mkQ [] litE `extQ` litP)
where litE (LitE l) = [l]
litE _ = []
litP (LitP l) = [l]
litP _ = []
main = do mydec <- runQ [d| foo 4 = "hello" |]
print mydec
print $ getLiterals mydec
myexp <- runQ [| '1' + "sixteen" |]
print myexp
print $ getLiterals myexp
Here's a transformation that commutes all infix operators in patterns, expressions, and types (example for InfixT not shown):
{-# LANGUAGE TemplateHaskell #-}
import Data.Generics
import Language.Haskell.TH
causeChaos :: Data d => d -> d
causeChaos = everywhere (mkT destroyExpressions `extT` manglePatterns `extT` bludgeonTypes)
where destroyExpressions (InfixE l x r) = InfixE r x l
destroyExpressions (UInfixE l x r) = UInfixE r x l
destroyExpressions e = e
manglePatterns (InfixP l x r) = InfixP r x l
manglePatterns (UInfixP l x r) = UInfixP r x l
manglePatterns e = e
bludgeonTypes (InfixT l x r) = InfixT r x l
bludgeonTypes (UInfixT l x r) = UInfixT r x l
bludgeonTypes e = e
main = do mydec <- runQ [d| append :: [a] -> [a] -> [a]
append (x:xs) ys = x : append xs ys
append [] ys = ys
|]
print mydec
print $ causeChaos mydec

Related

Get all string splits

Say I have a string:
"abc7de7f77ghij7"
I want to split it by a substring, 7 in this case, and get all the left-right splits:
[ ("abc", "de7f77ghij7")
, ("abc7de", "f77ghij7")
, ("abc7de7f", "7ghij7")
, ("abc7de7f7", "ghij7")
, ("abc7de7f77ghij", "")
]
Sample implementation:
{-# LANGUAGE OverloadedStrings #-}
module StrSplits where
import qualified Data.Text as T
splits :: T.Text -> T.Text -> [(T.Text, T.Text)]
splits d s =
let run a l r =
case T.breakOn d r of
(x, "") -> reverse a
(x, y) ->
let
rn = T.drop (T.length d) y
an = (T.append l x, rn) : a
ln = l `T.append` x `T.append` d
in run an ln rn
in run [] "" s
main = do
print $ splits "7" "abc7de7f77ghij7"
print $ splits "8" "abc7de7f77ghij7"
with expected result:
[("abc","de7f77ghij7"),("abc7de","f77ghij7"),("abc7de7f","7ghij7"),("abc7de7f7","ghij7"),("abc7de7f77ghij","")]
[]
I'm not too happy about the manual recursion and let/case/let nesting. If my feeling that it doesn't look too good is right, is there a better way to write it?
Is there a generalized approach to solving these kinds of problems in Haskell similar to how recursion can be replaced with fmap and folds?
How about this?
import Data.Bifunctor (bimap)
splits' :: T.Text -> T.Text -> [(T.Text, T.Text)]
splits' delimiter string = mkSplit <$> [1..numSplits]
where
sections = T.splitOn delimiter string
numSplits = length sections - 1
mkSplit n = bimap (T.intercalate delimiter) (T.intercalate delimiter) $ splitAt n sections
I like to believe there's a way that doesn't involve indices, but you get the general idea. First split the string by the delimiter. Then split that list of strings at in two everywhere possible, rejoining each side with the delimiter.
Not the most efficient, though. You can probably do something similar with indices from Data.Text.Internal.Search if you want it to be fast. In this case, you wouldn't need to do the additional rejoining. I didn't experiment with it since I didn't understand what the function was returning.
Here's an indexless one.
import Data.List (isPrefixOf, unfoldr)
type ListZipper a = ([a],[a])
moveRight :: ListZipper a -> Maybe (ListZipper a)
moveRight (_, []) = Nothing
moveRight (ls, r:rs) = Just (r:ls, rs)
-- As Data.List.iterate, but generates a finite list ended by Nothing.
unfoldr' :: (a -> Maybe a) -> a -> [a]
unfoldr' f = unfoldr (\x -> (,) x <$> f x)
-- Get all ways to split a list with nonempty suffix
-- Prefix is reversed for efficiency
-- [1,2,3] -> [([],[1,2,3]), ([1],[2,3]), ([2,1],[3])]
splits :: [a] -> [([a],[a])]
splits xs = unfoldr' moveRight ([], xs)
-- This is the function you want.
splitsOn :: (Eq a) => [a] -> [a] -> [([a],[a])]
splitsOn sub xs = [(reverse l, drop (length sub) r) | (l, r) <- splits xs, sub `isPrefixOf` r]
Try it online!
Basically, traverse a list zipper to come up with a list of candidates for the split. Keep only those that are indeed splits on the desired item, then (un)reverse the prefix portion of each passing candidate.

Running all combinations of tests and test cases in Haskell

I aim to be able to define a collection of test methods and a collection of test cases (input/output data) and then execute all of their combinations. The goal is to avoid re-writing the same code over and over again when having say, 3 different implementations of the same function and 4 test cases that the function should satisfy. A naive approach would require me to write 12 lines of code:
testMethod1 testCase1
testMethod1 testCase2
...
testMethod3 testCase4
I've a gut feeling that Haskell should provide a way to abstract this pattern somehow. The best thing I've currently came up with is this piece of code:
import Control.Applicative
data TestMethod a = TM a
data TestData inp res = TD inp res
runMetod (TM m) (TD x res) = m x == res
runAllMethods ((m, inp):xs) = show (runMetod m inp) ++ "\n" ++ runAllMethods xs
runAllMethods _ = ""
head1 = head
head2 (x:xs) = x
testMethods = [TM head1, TM head2]
testData = [TD [1,2,3] 1, TD [4,5,6] 4]
combos = (,) <$> testMethods <*> testData
main = putStrLn $ runAllMethods combos
This works, computes 2 tests against two 2 functions and prints out 4 successes:
True
True
True
True
However, this works only for lists of the same type, even though the head function is list type agnostic. I would like to have a test data collection of any lists, like so:
import Control.Applicative
data TestMethod a = TM a
data TestData inp res = TD inp res
runMetod (TM m) (TD x res) = m x == res
runAllMethods ((m, inp):xs) = show (runMetod m inp) ++ "\n" ++ runAllMethods xs
runAllMethods _ = ""
head1 = head
head2 (x:xs) = x
testMethods = [TM head1, TM head2]
testData = [TD [1,2,3] 1, TD ['a','b','c'] 'a']
combos = (,) <$> testMethods <*> testData
main = putStrLn $ runAllMethods combos
but this fails with an error:
main.hs:12:21: error:
No instance for (Num Char) arising from the literal ‘1’
In the expression: 1
In the first argument of ‘TD’, namely ‘[1, 2, 3]’
In the expression: TD [1, 2, 3] 1
Is it possible to achieve this test-function X test-case cross testing somehow?
You should really use QuickCheck or similar, like hnefatl said.
But just for the fun of it, let's get your idea to work.
So you have a polymorphic function and a lot of test cases of different types. The only thing that matters is that you can apply the function each of the types.
So let's have a look at your function. It's of type [a] -> a. How should your test data look like? It should consist of a list, of a value, and it should support equality comparison. That leads you to a definition like:
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ImpredicativeTypes #-}
{-# LANGUAGE RankNTypes #-}
data TestData where
TestData :: Eq a => [a] -> a -> TestData
You need to enable the GADTs language extension for this to work. For the following to work, you need these other two extensions (although the whole thing can be generalised with type classes to avoid that, just look at QuickCheck).
Now test it:
head1 = head
head2 (a : as) = a
test :: (forall a . [a] -> a) -> TestData -> Bool
test f (TestData as a) = f as == a
testAll :: [(forall a . [a] -> a)] -> [TestData] -> Bool
testAll fs testDatas = and $ test <$> fs <*> testDatas
main = putStrLn $ if testAll [head1, head2] [TestData "Foo" 'F', TestData [1..] 1]
then "Success!"
else "Oh noez!"
I'll leave it to you to generalise this for different test function types.

Read of types sum

When I want to read string to type A I write read str::A. Consider, I want to have generic function which can read string to different types, so I want to write something like read str::A|||B|||C or something similar. The only thing I could think of is:
{-# LANGUAGE TypeOperators #-}
infixr 9 |||
data a ||| b = A a|B b deriving Show
-- OR THIS:
-- data a ||| b = N | A a (a ||| b) | B b (a ||| b) deriving (Data, Show)
instance (Read a, Read b) => Read (a ||| b) where
readPrec = parens $ do
a <- (A <$> readPrec) <|> (B <$> readPrec)
-- OR:
-- a <- (flip A N <$> readPrec) <|> (flip B N <$> readPrec)
return a
And if I want to read something:
> read "'a'"::Int|||Char|||String
B (A 'a')
But what to do with such weird type? I want to fold it to Int or to Char or to String... Or to something another but "atomic" (scalar/simple). Final goal is to read strings like "1,'a'" to list-like [D 1, D 'a']. And main constraint here is that structure is flexible, so string can be "1, 'a'" or "'a', 1" or "\"xxx\", 1, 2, 'a'". I know how to read something separated with delimiter, but this something should be passed as type, not as sum of types like C Char|I Int|S String|etc. Is it possible? Or no way to accomplish it without sum of types?
There’s no way to do this in general using read, because the same input string might parse correctly to more than one of the valid types. You could, however, do this with a function like Text.Read.readMaybe, which returns Nothing on ambiguous input. You might also return a tuple or list of the valid interpretations, or have a rule for which order to attempt to parse the types in, such as: attempt to parse each type in the order they were declared.
Here’s some example code, as proof of concept:
import Data.Maybe (catMaybes, fromJust, isJust, isNothing)
import qualified Text.Read
data AnyOf3 a b c = FirstOf3 a | SecondOf3 b | ThirdOf3 c
instance (Show a, Show b, Show c) => Show (AnyOf3 a b c) where
show (FirstOf3 x) = show x -- Can infer the type from the pattern guard.
show (SecondOf3 x) = show x
show (ThirdOf3 x) = show x
main :: IO ()
main =
(putStrLn . unwords . map show . catMaybes . map readDBS)
["True", "2", "\"foo\"", "bar"] >>
(putStrLn . unwords . map show . readIID) "100"
readMaybe' :: (Read a, Read b, Read c) => String -> Maybe (AnyOf3 a b c)
-- Based on the function from Text.Read
readMaybe' x | isJust a && isNothing b && isNothing c =
(Just . FirstOf3 . fromJust) a -- Can infer the type of a from this.
| isNothing a && isJust b && isNothing c =
(Just . SecondOf3 . fromJust) b -- Can infer the type of b from this.
| isNothing a && isNothing b && isJust c =
(Just . ThirdOf3 . fromJust) c -- Can infer the type of c from this.
| otherwise = Nothing
where a = Text.Read.readMaybe x
b = Text.Read.readMaybe x
c = Text.Read.readMaybe x
readDBS :: String -> Maybe (AnyOf3 Double Bool String)
readDBS = readMaybe'
readToList :: (Read a, Read b, Read c) => String -> [AnyOf3 a b c]
readToList x = repack FirstOf3 x ++ repack SecondOf3 x ++ repack ThirdOf3 x
where repack constructor y | isJust z = [(constructor . fromJust) z]
| otherwise = []
where z = Text.Read.readMaybe y
readIID :: String -> [AnyOf3 Int Integer Double]
readIID = readToList
The first output line echoes every input that parsed successfully, that is, the Boolean constant, the number and the quoted string, but not bar. The second output line echoes every possible interpretation of the input, that is, 100 as an Int, an Integer and a Double.
For something more complicated, you want to write a parser. Haskell has some very good libraries to build them out of combinators. You might look at one such as Parsec. But it’s still helpful to understand what goes on under the hood.

Visiting GHC AST with SYB

I wrote a program that visited the AST with Haskell-src-exts. I'm trying to convert it to use the GHC API. The former uses Uniplate, while for the latter it seems that unfortunately I'm forced with SYB (the documentation is horribly scarce).
Here is the original code:
module Argon.Visitor (funcsCC)
where
import Data.Data (Data)
import Data.Generics.Uniplate.Data (childrenBi, universeBi)
import Language.Haskell.Exts.Syntax
import Argon.Types (ComplexityBlock(..))
-- | Compute cyclomatic complexity of every function binding in the given AST.
funcsCC :: Data from => from -> [ComplexityBlock]
funcsCC ast = map funCC [matches | FunBind matches <- universeBi ast]
funCC :: [Match] -> ComplexityBlock
funCC [] = CC (0, 0, "<unknown>", 0)
funCC ms#(Match (SrcLoc _ l c) n _ _ _ _:_) = CC (l, c, name n, complexity ms)
where name (Ident s) = s
name (Symbol s) = s
sumWith :: (a -> Int) -> [a] -> Int
sumWith f = sum . map f
complexity :: Data from => from -> Int
complexity node = 1 + visitMatches node + visitExps node
visitMatches :: Data from => from -> Int
visitMatches = sumWith descend . childrenBi
where descend :: [Match] -> Int
descend x = length x - 1 + sumWith visitMatches x
visitExps :: Data from => from -> Int
visitExps = sumWith inspect . universeBi
where inspect e = visitExp e + visitOp e
visitExp :: Exp -> Int
visitExp (If {}) = 1
visitExp (MultiIf alts) = length alts - 1
visitExp (Case _ alts) = length alts - 1
visitExp (LCase alts) = length alts - 1
visitExp _ = 0
visitOp :: Exp -> Int
visitOp (InfixApp _ (QVarOp (UnQual (Symbol op))) _) =
case op of
"||" -> 1
"&&" -> 1
_ -> 0
visitOp _ = 0
I need to visit function bindings, matches and expressions. This is what I managed to write (not working):
import Data.Generics
import qualified GHC
import Outputable -- from the GHC package
funcs :: (Data id, Typeable id, Outputable id, Data from, Typeable from) => from -> [GHC.HsBindLR id id]
funcs ast = everything (++) (mkQ [] (\fun#(GHC.FunBind {}) -> [fun])) ast
It complains that there are too many instances for id, but I don't know what the heck it is. The relevant GHC module is:
http://haddock.stackage.org/lts-3.10/ghc-7.10.2/HsBinds.html
I'm getting insane from this. The goal is to count complexity (as you can see in the original code). I'd like to switch to the GHC API because it uses the same parser as the compiler, so it can parse every module without worrying about extensions.
EDIT: Here is why the current code does not work:
λ> :m +Language.Haskell.GHC.ExactPrint.Parsers GHC Data.Generics Outputable
λ> r <- Language.Haskell.GHC.ExactPrint.parseModule src/Argon/Visitor.hs
λ> let ast = snd $ (\(Right t) -> t) r
.>
λ> :t ast
ast :: Located (HsModule RdrName)
λ> let funcs = everything (++) (mkQ [] (un#(FunBind _ _ _ _ _ _) -> [fun])) ast :: (Data id, Typeable id, Outputable id) => [HsBindLR id id]
.>
λ> length funcs
<interactive>:12:8:
No instance for (Data id0) arising from a use of ‘funcs’
The type variable ‘id0’ is ambiguous
Note: there are several potential instances:
instance Data aeson-0.8.0.2:Data.Aeson.Types.Internal.Value
-- Defined in ‘aeson-0.8.0.2:Data.Aeson.Types.Internal’
instance Data attoparsec-0.12.1.6:Data.Attoparsec.Number.Number
-- Defined in ‘attoparsec-0.12.1.6:Data.Attoparsec.Number’
instance Data a => Data (Data.Complex.Complex a)
-- Defined in ‘Data.Complex’
../..plus 367 others
In the first argument of ‘length’, namely ‘funcs’
In the expression: length funcs
In an equation for ‘it’: it = length funcs
The GHC AST is parametrised on the type of names used in the tree: the parser outputs an AST with RdrName names which it seems you're working with. The GHC Commentary and the Haddocks have more information.
You might have more luck if you tell the compiler that you are working with HsBindLR RdrName RdrName.
Like this:
import Data.Generics
import GHC
import Outputable -- from the GHC package
funcs :: (Data from, Typeable from) => from -> [GHC.HsBindLR RdrName RdrName]
funcs ast = everything (++) (mkQ [] (\fun#(GHC.FunBind {}) -> [fun])) ast

Optimising Haskell data reading from file

I am trying to implement Kosaraju's graph algorithm, on a 3.5m line file where each row is two (space separated) Ints representing a graph edge. To start I need to create a summary data structure that has the node and lists of its incoming and outgoing edges. The code below achieves that, but takes over a minute, whereas I can see from posts on the MOOC forum that people using other languages are completing in <<10s. (getLines is taking 10s compared to under 1s in benchmarks I read about.)
I'm new to Haskell and have implemented an accumulation method using foldl' (the ' was a breakthrough in making it terminate at all), but it feels rather imperative in style, and I'm hoping that that's the reason why it is running slow. Moreover, I'm currently planning to use a similar pattern to conduct the depth-first-search, and I fear it will all just become too slow.
I have found this presentation and blog that talk about these sort of issues but at too expert a level.
import System.IO
import Control.Monad
import Data.Map.Strict as Map
import Data.List as L
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored (Edges, Edges) deriving (Show)
type Graph1 = Map NodeName Node
getLines :: FilePath -> IO [[Int]]
getLines = liftM (fmap (fmap read . words) . lines) . readFile
getLines' :: FilePath -> IO [(Int,Int)]
getLines' = liftM (fmap (tuplify2 . fmap read . words) . lines) . readFile
tuplify2 :: [a] -> (a,a)
tuplify2 [x,y] = (x,y)
main = do
list <- getLines "testdata.txt" -- [String]
--list <- getLines "SCC.txt" -- [String]
let
list' = createGraph list
return list'
createGraph :: [[Int]] -> Graph1
createGraph xs = L.foldl' build Map.empty xs
where
build :: Graph1-> [Int] -> Graph1
build = \acc (x:y:_) ->
let tmpAcc = case Map.lookup x acc of
Nothing -> Map.insert x (Node False ([y],[])) acc
Just a -> Map.adjust (\(Node _ (fwd, bck)) -> (Node False ((y:fwd), bck))) x acc
in case Map.lookup y tmpAcc of
Nothing -> Map.insert y (Node False ([],[x])) tmpAcc
Just a -> Map.adjust (\(Node _ (fwd, bck)) -> (Node False (fwd, (x:bck)))) y tmpAcc
Using maps:
Use IntMap or HashMap when possible. Both are significantly faster for Int keys than Map. HashMap is usually faster than IntMap but uses more RAM and has a less rich library.
Don't do unnecessary lookups. The containers package has a large number of specialized functions. With alter the number of lookups can be halved compared to the createGraph implementation in the question.
Example for createGraph:
import Data.List (foldl')
import qualified Data.IntMap.Strict as IM
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = IM.IntMap Node
createGraph :: [(Int, Int)] -> Graph1
createGraph xs = foldl' build IM.empty xs
where
addFwd y (Just (Node _ f b)) = Just (Node False (y:f) b)
addFwd y _ = Just (Node False [y] [])
addBwd x (Just (Node _ f b)) = Just (Node False f (x:b))
addBwd x _ = Just (Node False [] [x])
build :: Graph1 -> (Int, Int) -> Graph1
build acc (x, y) = IM.alter (addBwd x) y $ IM.alter (addFwd y) x acc
Using vectors:
Consider the efficient construction functions (the accumulators, unfolds, generate, iterate, constructN, etc.). These may use mutation behind the scenes but are considerably more convenient to use than actual mutable vectors.
In the more general case, use the laziness of boxed vectors to enable self-reference when constructing a vector.
Use unboxed vectors when possible.
Use unsafe functions when you're absolutely sure about the bounds.
Only use mutable vectors when there aren't pure alternatives. In that case, prefer the ST monad to IO. Also, avoid creating many mutable heap objects (i. e. prefer mutable vectors to immutable vectors of mutable references).
Example for createGraph:
import qualified Data.Vector as V
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = V.Vector Node
createGraph :: Int -> [(Int, Int)] -> Graph1
createGraph maxIndex edges = graph'' where
graph = V.replicate maxIndex (Node False [] [])
graph' = V.accum (\(Node e f b) x -> Node e (x:f) b) graph edges
graph'' = V.accum (\(Node e f b) x -> Node e f (x:b)) graph' (map (\(a, b) -> (b, a)) edges)
Note that if there are gaps in the range of the node indices, then it'd be wise to either
Contiguously relabel the indices before doing anything else.
Introduce an empty constructor to Node to signify a missing index.
Faster I/O:
Use the IO functions from Data.Text or Data.ByteString. In both cases there are also efficient functions for breaking input into lines or words.
Example:
import qualified Data.ByteString.Char8 as BS
import System.IO
getLines :: FilePath -> IO [(Int, Int)]
getLines path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let pairs = (map . map) (maybe (error "can't read Int") fst . BS.readInt) lines
return [(a, b) | [a, b] <- pairs]
Benchmarking:
Always do it, unlike me in this answer. Use criterion.
Based pretty much on András' suggestions, I've reduced a 113 second task down to 24 (measured by stopwatch as I can't quite get Criterion to do anything yet) (and then down to 10 by compiling -O2)!!! I've attended some courses this last year that talked about the challenge of optimising for large datasets but this was the first time I faced a question that actually involved one, and it was as non-trivial as my instructors' suggested. This is what I have now:
import System.IO
import Control.Monad
import Data.List (foldl')
import qualified Data.IntMap.Strict as IM
import qualified Data.ByteString.Char8 as BS
type NodeName = Int
type Edges = [NodeName]
type Explored = Bool
data Node = Node Explored Edges Edges deriving (Eq, Show)
type Graph1 = IM.IntMap Node
-- DFS uses a stack to store next points to explore, a list can do this
type Stack = [(NodeName, NodeName)]
getBytes :: FilePath -> IO [(Int, Int)]
getBytes path = do
lines <- (map BS.words . BS.lines) `fmap` BS.readFile path
let
pairs = (map . map) (maybe (error "Can't read integers") fst . BS.readInt) lines
return [(a,b) | [a,b] <- pairs]
main = do
--list <- getLines' "testdata.txt" -- [String]
list <- getBytes "SCC.txt" -- [String]
let list' = createGraph' list
putStrLn $ show $ list' IM.! 66
-- return list'
bmark = defaultMain [
bgroup "1" [
bench "Sim test" $ whnf bmark' "SCC.txt"
]
]
bmark' :: FilePath -> IO ()
bmark' path = do
list <- getLines path
let
list' = createGraph list
putStrLn $ show $ list' IM.! 2
createGraph' :: [(Int, Int)] -> Graph1
createGraph' xs = foldl' build IM.empty xs
where
addFwd y (Just (Node _ f b)) = Just (Node False (y:f) b)
addFwd y _ = Just (Node False [y] [])
addBwd x (Just (Node _ f b)) = Just (Node False f (x:b))
addBwd x _ = Just (Node False [] [x])
build :: Graph1 -> (Int, Int) -> Graph1
build acc (x, y) = IM.alter (addBwd x) y $ IM.alter (addFwd y) x acc
And now on with the rest of the exercise....
This is not really an answer, I would rather comment András Kovács post, if I add those 50 points...
I have implemented the loading of the graph in both IntMap and MVector, in a attempt to benchmark mutability vs. immutability.
Both program use Attoparsec for the parsing. There is surely more economic way to do it, but Attoparsec is relatively fast compared to its high abstraction level (the parser can stand in one line). The guideline is to avoid String and read. read is partial and slow, [Char] is slow and not memory efficient, unless properly fused.
As András Kovács noted, IntMap is better than Map for Int keys. My code provides another example of alter usage. If the node identifier mapping is dense, you may also want to use Vector and Array. They allow O(1) indexing by the identifier.
The mutable version handle on demand the exponential growth of the MVector. This avoid to precise an upper bound on node identifiers, but introduce more complexity (the reference on the vector may change).
I benchmarked with a file of 5M edges with identifiers in the range [0..2^16]. The MVector version is ~2x faster than the IntMap code (12s vs 25s on my computer).
The code is here [Gist].
I will edit when more profiling is done on my side.

Resources