Extracting Information from Haskell Object - haskell

I'm new to Haskell and I'm confused on how to get values out of function results. In my particular case, I am trying to parse Haskell files and see which AST nodes appear on which lines. This is the code I have so far:
import Language.Haskell.Parser
import Language.Haskell.Syntax
getTree :: String -> IO (ParseResult HsModule)
getTree path = do
file <- readFile path
let tree = parseModuleWithMode (ParseMode path) file
return tree
main :: IO ()
main = do
tree <- getTree "ex.hs"
-- <do something with the tree other than print it>
print tree
So on the line where I have the comment, I have a syntax tree as tree. It appears to have type ParseResult HsModule. What I want is just HsModule. I guess what I'm looking for is a function as follows:
extract :: ParseResult a -> a
Or better yet, a general Haskell function
extract :: AnyType a -> a
Maybe I'm missing a major concept about Haskell here?
p.s. I understand that thinking of these things as "Objects" and trying to access "Fields" from them is wrong, but I'd like an explanation of how to deal with this type of thing in general.

Looking for a general function of type
extract :: AnyType a -> a
does indeed show a big misunderstanding about Haskell. Consider the many things AnyType might be, and how you might extract exactly one object from it. What about Maybe Int? You can easily enough convert Just 5 to 5, but what number should you return for Nothing?
Or what if AnyType is [], so that you have [String]? What should be the result of
extract ["help", "i'm", "trapped"]
or of
extract []
?
ParseResult has a similar "problem", in that it uses ParseOk to contain results indicating that everything was fine, and ParseFailed to indicate an error. Your incomplete pattern match successfully gets the result if the parse succeeded, but will crash your program if in fact the parse failed. By using ParseResult, Haskell is encouraging you to consider what you should do if the code you are analyzing did not parse correctly, rather than to just blithely assume it will come out fine.

The definition of ParseResult is:
data ParseResult a = ParseOk a | ParseFailed SrcLoc String
(obtained from source code)
So there are two possibilities: either the parsing succeeded, and it will return a ParseOk instance, or something went wrong during the parsing in which case you get the location of the error, and an error message with a ParseFailed constructor.
So you can define a function:
getData :: ParseResult a -> a
getData (ParseOk x) = x
getData (ParseFailed _ s) = error s
It is better to then throw an error as well, since it is always possible that your compiler/interpreter/analyzer/... parses a Haskell program containing syntax errors.

I just figured out how to do this. It seems that when I was trying to define
extract :: ParseResult a -> a
extract (ParseResult a) = a
I actually needed to use
extract :: ParseResult a -> a
extract (ParseOk a) = a
instead. I'm not 100% sure why this is.

Related

Getting current context error formatting function

I am using Megaparsec to get tree representation of the code, which is later evaluated by separated functions. I would like to add to the nodes of the tree representation the parsec function with the current context to format the error.
Why? Eg. the syntax might be okey, but some variable from the code might not exist which will be found out only by later separeted functions processing the tree. The functions will have to throw error, that variable don't exist and I would be glad, if I could use Megaparsec nicely formatted errors for this (with line number, context,...).
Is there some way how to do this please?
Thanks.
I believe you can get the current position via getSourcePos. For example, in the open-recursion style of tree generation, you might write
data Annotated f = Annotated
{ start :: SourcePos
, term :: f (Annotated f)
, end :: SourcePos
}
annotated :: (MonadParser e s m, TraversableStream s) =>
m (f (Annotated f)) -> m (Annotated f)
annotated p = liftA3 Annotated getSourcePos p getSourcePos
(N.B. I haven't tried it or even type-checked it; only done my best to interpret megaparsec's documentation with an expert eye. Caveat lector.)

How to combine Megaparsec with Text.Read (derived Read instance)

I want to use the derived instances of Read in the megaparsec module.
How can I use 'Text.Read.read' or 'Text.Read.readEither' in a 'Parser a' ?
It needs not to be fast, but easy to maintain and to extend.
The megaparsec module is for testing my application via CLI, so many different datatypes must be parsed.
It shall work in the following way:
import Text.Megaparsec
readableDatatype :: Read a => Parser a
readableDatatype =
-- This is wrong, but describes how it shall work
-- liftA read chunkToTokens
expr' :: Parser UserControlExpr
expr' = timeExpr
<|> timeEventExpr
<|> digiInExpr
<|> quitExpr
digiInExpr :: Parser UserControlExpr
digiInExpr = do
cmdword "digiIn"
inElement <- (readableDatatype :: Parser TI_I)
return $ UserDigiIn inElement
What do I have to write, so that the three functions typecheck, especially readableDataype ?
You can use getInput :: MonadParsec e s m => m s and setInput :: MonadParsec e s m => s -> m () together with reads :: Read a => String -> [(a, String)] for that. getInput and setInput just get and set the input stream the parser is working on and reads takes a string and returns a list of possible parses together with the remaining unconsumed portions of the input. We also need to tell the parser the new offset in the input, otherwise error locations are wrong. We can do that using getOffset and setOffset.
-- For equality constraint (~)
{-# LANGUAGE TypeFamilies #-}
import Text.Megaparsec
import Text.Read (reads)
readableDatatype :: (Read a, MonadParsec e s m, s ~ String) => m a
readableDatatype = do
input <- getInput
offset <- getOffset
choice $
(\(a, input') -> a <$ setInput input'
<* setOffset (offset + length input - length input'))
<$> reads input
If your input is something other than String you will have to convert between that and String after getInput and before setInput.
The following is about performance concerns, so not really relevant to your problem, but maybe it is educational and it may be useful to others who may need a solution with good performance.
Converting the whole input between String and some other type all the time during parsing is a rather big performance bottleneck for larger input. Furthermore using length to calculate the new offset here is not very performant either.
To solve both of these problems need some way to be able to know how much of the input was actually consumed by the Read-parser, so that we can just drop that part from the original input instead of having to convert the whole unconsumed part back to the original input type. But the Read class does not have that. One could try to parse incrementally longer prefixes of the input, which may be faster in cases where the parses done using Read are short compared to the length of the entire input. You could also use unsafePerformIO to write to an IORef how much of the input was actually forced by the Read-parser which would be the fastest but not so pretty solution.
I implemented the latter here. Feel free to use it, but be aware that it is not very well tested. It does however solve all the problems with the above approach.
That did it. Thank you! In the meantime I made a "conservative" solution of the problem by defining the constructors as strings and parsing them, without using read. That has the advantage, that you got the impressive error message of megaparsec, that tell you what symbols are missing.
Example with read:
1:8:
|
1 | digiIn TI_I_Signal1 DirA Dectivated
| ^
unknown parse error
(only a 'a' was missing in "Deactivated")
example with an hand written parser for the datatype:
1:19:
|
1 | digiIn TI_I_Signal1 Dectivated
| ^^^^^^^^
unexpected "Dectivat"
expecting "active", "inactive", '0', or '1'
I think I will use your code block in future datatypes.
Thank you very much!

Haskell: Runtime Data Type Iteration?

A friend and I have been working on a system for automatically importing C functions into GNU Guile, but need to use Haskell's C parser because no other parser seems sufficient or as accessible (let me know if we're wrong about that).
The trouble is coming up when we try to produce Scheme data from the parsed AST. We need to produce text that can be directly imported by scheme (S-Expressions, not M-Expressions), so below is an example...
toscm $ Just (SomeType (AnotherType "test" 5) (YetAnother "hello!"))
=> "(Just (SomeType (AnotherType \"test\" 5) (YetAnother \"hello!\")))"
The output of Haskell's C parser has type (Either ParseError CTranslUnit). CTranslUnit is a specialization (CTranslationUnit NodeInfo). The CTranslationUnit's contents have ever more contents going deeper than is really any fun at all.
Before realizing that, I tried the following...
class Schemable a where
toscm :: a -> String
{- and then an (omitted) ridiculously (1000+ lines) long chain
of horrible instance declarations for every single type
that can be part of the C AST -}
I figured there really must be a better way to do this, but I haven't been able to find one I understand. GHC.Generics seems like it might have a solution, but the semantics of some of its internal types baffles me.
What should I do??
Update: I've been looking into Scrap Your Boilerplate, and while it definitely looks good for finding substructures, it doesn't provide a way for me to generically convert data to strings. The ideal would be if the data with the constructor Maybe (CoolType "awesome" (Something "funny")) could be processed with a function that would give me access to the names of the constructors, and allow me to recurse on the values of the arguments to that constructor. e.g...
data Constructorized = Constructed String [Constructorized]
| RawValue <anything>
constructorize :: <anything> -> Constructorized
constructorize a = <???>
toscm :: Constructorized -> String
toscm (Constructed c v) = "(" ++ c ++ " " ++ (intercalate " " (map toscm v)) ++ ")"
toscm (RawValue v) = show v
I guess the train of thought I'm on is: Show seems to be able to recurse into every single type that derives it, no problem. How are the functions for that generated? Shouldn't we be able to make our own Show that generates similar functions, with a slightly different output?

How to convert IO Int to String in Haskell?

I'm learning to use input and output in Haskell. I'm trying to generate a random number and output it to another file. The problem is that the random number seems to be returning an IO Int, something that I can't convert to a String using show.
Could someone give me a pointer here?
It's helpful if you show us the code you've written that isn't working.
Anyway, you are in a do block and have written something like this, yes?
main = do
...
writeFile "some-file.txt" (show generateRandomNumberSomehow)
...
You should instead do something like this:
main = do
...
randomNumber <- generateRandomNumberSomehow
writeFile "some-file.txt" (show randomNumber)
...
The <- operator binds the result of the IO Int value on the right to the Int-valued variable on the left. (Yes, you can also use this to bind the result of an IO String value to a String-valued variable, etc.)
This syntax is only valid inside a do block. It's important to note that the do block will itself result in an IO value --- you can't launder away the IO-ness.
dave4420's answer is what you want here. It uses the fact that IO is a Monad; that's why you can use the do notation.
However, I think it's worth mentioning that the concept of "applying a function to a value that's not 'open', but inside some wrapper" is actually more general than IO and more general than monads. It's what we have the Functor class for.
For any functor f (this could, for instance, be Maybe or [] or IO), when you have some value
wrapped :: f t (for instance wrapped :: Maybe Int), you can use fmap to apply a function
t -> t' to it (like show :: Int -> String) and get a
wrappedApplied :: f t' (like wrappedApplied :: Maybe String).
In your example, it would be
genRandomNumAsString :: IO String
genRandomNumAsString = fmap show genRandomNumPlain

haskell load module in list

Hey haskellers and haskellettes,
is it possible to load a module functions in a list.
in my concrete case i have a list of functions all checked with or
checkRules :: [Nucleotide] -> Bool
checkRules nucs = or $ map ($ nucs) [checkRule1, checkRule2]
i do import checkRule1 and checkRule2 from a seperate module - i don't know if i will need more of them in the future.
i'd like to have the same functionality look something like
-- import all functions from Rules as rules where
-- :t rules ~~> [([Nucleotide] -> Bool)]
checkRules :: [Nucleotide] -> Bool
checkRules nucs = or $ map ($ nucs) rules
the program sorts Pseudo Nucleotide Sequences in viable and nonviable squences according to given rules.
thanks in advance ε/2
Addendum:
So do i think right - i need:
genList :: File -> TypeSignature -> [TypeSignature]
chckfun :: (a->b) -> TypeSignature -> Bool
at compile time.
but i can't generate a list of all functions in the module - as they most probably will have not the same type signature and hence not all fit in one list. so i cannot filter given list with chckfun.
In order to do this i either want to check the written type signatures in the source file (?) or the inferenced types given by the compiler(?).
another problem that comes to my mind is: not every function written in the source file might get exported ?
Is this a problem a haskell beginner should try to solve after 5 months of learning - my brain is shaped like a klein's bottle after all this "compile time thinking".
There is a nice package on Hackage just for this: language-haskell-extract. In particular, the Template Haskell function functionExtractor takes a regular expression and returns a list of the matching top level bindings as (name, value) pairs. As long as they all have matching types, you're good to go.
{-# LANGUAGE TemplateHaskell #-}
import Language.Haskell.Extract
myFoo = "Hello"
myBar = "World"
allMyStuff = $(functionExtractor "^my")
main = print allMyStuff
Output:
[("myFoo", "Hello"), ("myBar", "World")]

Resources