I'm writing a pretty-printer for a simple white-space sensitive language.
I like the Leijen pretty-printer library more than I like the Wadler library, but the Leijen library has one problem in my domain: any line break I insert may be overridden by the group construct, which may compress any line, which might change the semantics of the output.
I don't think I can implement an ungroupable line in the wl-pprint (although I'd love to be wrong).
Looking a bit at the wl-pprint-extras package, I don't think that even the exposed internal interface allows me to create a line which will not be squashed by group.
Do I just have to rely on the fact that I never use group, or do I have some better option?
Given that you want to be able to group and you also need to be able to ensure some lines aren't uninserted,
why don't we use the fact that the library designers encoded the semantics in the data type,
instead of in code. This fabulous decision makes it eminently re-engineerable.
The Doc data type encodes a line break using the constructor Line :: Bool -> Doc.
The Bool represents whether to omit a space when removing a line. (Lines indent when they're there.)
Let's replace the Bool:
data LineBehaviour = OmitSpace | AddSpace | Keep
data Doc = ...
...
Line !LineBehaviour -- not Bool any more
The beautiful thing about the semantics-as-data design is that if we replace
this Bool data with LineBehaviour data, functions that didn't use it but
passed it on unchanged don't need editing. Functions that look inside at what
the Bool is break with the change - we'll rewrite exactly the parts of the code
that need changing to support the new semantics by changing the data type where
the old semantics resided. The program won't compile until we've made all the
changes we should, while we won't need to touch a line of code that doesn't
depend on line break semantics. Hooray!
For example, renderPretty uses the Line constructor, but in the pattern Line _,
so we can leave that alone.
First, we need to replace Line True with Line OmitSpace, and Line False with Line AddSpace,
line = Line AddSpace
linebreak = Line OmitSpace
but perhaps we should add our own
hardline :: Doc
hardline = Line Keep
and we could perhaps do with a binary operator that uses it
infixr 5 <->
(<->) :: Doc -> Doc -> Doc
x <-> y = x <> hardline <> y
and the equvalent of the vertical seperator, which I can't think of a better name than very vertical separator:
vvsep,vvcat :: [Doc] -> Doc
vvsep = fold (<->)
vvcat = fold (<->)
The actual removing of lines happens in the group function. Everything can stay the same except:
flatten (Line break) = if break then Empty else Text 1 " "
should be changed to
flatten (Line OmitSpace) = Empty
flatten (Line AddSpace) = Text 1 " "
flatten (Line Keep) = Line Keep
That's it: I can't find anything else to change!
You do need to avoid group, yes. The library's designed to facilitate wrapping or not wrapping based on the width of the output that you specify.
Dependent on the syntax of language you're implementing, you should also be cautious about softline and softbreak and the </> and <//> operators that use them. There's no reason I can see that you can't use <$> and <$$> instead.
sep, fillSep, cat and fillCat all use group directly or indirectly (and have the indeterminate semantics/width-dependent line breaks you want to avoid). However, given the your purpose, I don't think you need them:
Use vsep or hsep instead of sep or fillSep.
Use hcat or vcat instead of cat or fillCat.
You could use a line like
import Text.PrettyPrint.Leijen hiding (group,softline,softbreak,
(</>),(<//>),
sep,fillSep,cat,fillCat)
to make sure you don't call these functions.
I can't think of a way to ensure that functions you do use don't call group somewhere along the line, but I think those are the ones to avoid.
Related
I'm planning on writing a Parser for some language. I'm quite confident that I could cobble together a parser in Parsec without too much hassle, but I thought about including comments into the AST so that I could implement a code formatter in the end.
At first, adding an extra parameter to the AST types seemed like a suitable idea (this is basically what was suggested in this answer). For example, instead of having
data Expr = Add Expr Expr | ...
one would have
data Expr a = Add a Expr Expr
and use a for whatever annotation (e.g. for comments that come after the expression).
However, there are some not so exciting cases. The language features C-like comments (// ..., /* .. */) and a simple for loop like this:
for (i in 1:10)
{
... // list of statements
}
Now, excluding the body there are at least 10 places where one could put one (or more) comments:
/*A*/ for /*B*/ ( /*C*/ i /*E*/ in /*F*/ 1 /*G*/ : /*H*/ 10 /*I*/ ) /*J*/
{ /*K*/
...
In other words, while the for loop could previously be comfortably represented as an identifier (i), two expressions (1 & 10) and a list of statements (the body), we would now at least had to include 10 more parameters or records for annotations.
This get ugly and confusing quite quickly, so I wondered whether there is a clear better way to handle this. I'm certainly not the first person wanting to write a code formatter that preserves comments, so there must be a decent solution or is writing a formatter just that messy?
You can probably capture most of those positions with just two generic comment productions:
Expr -> Comment Expr
Stmt -> Comment Stmt
This seems like it ought to capture comments A, C, F, H, J, and K for sure; possibly also G depending on exactly what your grammar looks like. That only leaves three spots to handle in the for production (maybe four, with one hidden in Range here):
Stmt -> "for" Comment "(" Expr Comment "in" Range Comment ")" Stmt
In other words: one before each literal string but the first. Seems not too onerous, ultimately.
I'm trying to use the wavefront-obj package to read an OBJ file. Here is an example of OBJ file.
After downloading this file, I do
import Data.WaveFrontObj
x <- loadWavefrontObj "pinecone.obj"
Then:
> :t x
x :: Either String WavefrontModel
import Data.Either.Extra
y = fromRight' x
Then:
> :t y
y :: WavefrontModel
> y
WavefrontModel []
Looks like the result is empty. What am I doing bad ?
Looks like your OBJ file has some directives that wavefront-obj doesn't recognize. You can see in the source that wavefront-obj only understands the #, v, vt, vn, and f directives. Your file kicks off with mtllib and o directives, and appears to have several others not in the supported list.
A priori, I would therefore expect a Left result instead of a Right as you're getting. But the wavefront-obj author fell into a common parser-combinator pitfall: their top-level parser does not end with eof. So it sees the first two comment lines, then none of its parsers match the next line but it doesn't mind not being at the end of the file, so it reports successfully parsing an empty list of directives.
Between this and a few other things I noticed while sourcediving (comments are almost certainly not treated correctly, failure to exploit the predictable structure of directives and therefore code duplication), I expect you're going to have to do quite a bit of work if you want this package to work reliably and correctly.
I often read that I shouldn't mix tabs and spaces in Haskell, or that I shouldn't use tabs at all. Why?
The problem is twofold. First of all, Haskell is indentation sensitive, e.g. the following code isn't valid:
example = (a, b)
where
a = "Hello"
b = "World"
Both bindings need to be indented with the same number of spaces/tabs (see off-side rule). While it's obvious in this case, it's rather hidden in the following one, where I denote a space by · and a tab by »:
example = (a, b)
··where
····a = "Hello"
» b = "World"
This will look like valid Haskell code if the editor will show tabs aligned to multiples by four. But it isn't. Haskell tabs are aligned by multiples of eight, so the code will be interpreted like this:
example = (a, b)
··where
····a = "Hello"
» b = "World"
Second, if you use only tabs, you can end up with a layout that doesn't look right. For example, the following code looks correct if a tab gets displayed with six or more spaces (eight in this case):
example = (a, b)
» where» a = "Hello"
» » b = "World"
But in another editor that uses 4 spaces it won't look right anymore:
example = (a, b)
» where» a = "Hello"
» » b = "World"
It's still correct, though. However, someone who's used to spaces might reindent b' binding with spaces and end up with a parser error.
If you enforce a code convention throughout your code that makes sure that you only use tabs at the beginning of a line and use a newline after where, let or do you can avoid some of the problems (see 11). However, current releases of GHC warn about tabs by default, because they have been a source of many parser errors in the past, so you probably want to get rid of them too.
See also
A reddit thread on the topic (majority pro spaces, but some pro tabs)
Good Haskell Style (pro spaces)
Yet Another Tabs v Space debate (pro mixing)
In a small DSL, I'm parsing macro definitions, similarly to #define C pre-processor directives (here a simplistic example):
_def mymacro(a,b) = a + b / a
When the following call is encountered by the parser
c = mymacro(pow(10,2),3)
it is expanded to
c = pow(10,2) + 3 / pow(10,2)
My current approach is:
wrap the parser in a State monad
when parsing macro definitions, store them in the state, with their body unparsed (parse it as a string)
when parsing a macro call, find the definition in the state, replace the arguments in the body text, replace the call with this body and resume the parsing.
Some code from the last step:
macrocallStmt
= do -- capture starting position and content of old input before macro call
oldInput <- getInput
oldPos <- getPosition
-- parse the call
ret <- identifier
symbolCS "="
i <- identifier
args <- parens $ commaSep anyExprStr
-- expand the macro call
us <- get
let inlinedCall = replaceMacroArgs i args ret us
-- set up new input with macro call expanded
remainder <- getInput
let newInput = T.append inlinedCall (T.cons '\n' remainder)
setPosition oldPos
setInput newInput
-- update the expanded input script
modify (updateExpandedInput oldInput newInput)
anyExprStr = fmap praShow expression <|> fmap praShow algexpr
This approach does the job decently. However, it has a number of drawbacks.
Parsing multiple times
Any valid DSL expression can be an argument of the macro call. Therefore, even though I only need their textual representation (to be replaced in the macro body), I need to parse them and then convert them again to string - simply looking for the next comma wouldn't work. Then the complete and customised macro will be parsed. So in practice, macro arguments get parsed twice (and also show-ed, which has its cost). Moreover, each call requires a new parsing of the (almost same) body. The reason to keep the body unparsed in memory is to allow maximum flexibility: in the body, even DSL keywords could be constructed out of the macro arguments.
Error handling
Because the expanded body is inserted in front of the unconsumed input (replacing the call), the initial and final input can be quite different. In the event of a parse error, the position where the error occurred in the expanded input is available. However, when processing the error, I only have the original, not expanded, input. So the error position won't match.
That is why, in the code snippet above, I use the state to save the expanded input, so that it is available when the parser exits with an error.
This works well, but I noticed that it becomes quite costly, with new Text arrays (the input stream is Text) being allocated for the whole stream at every expansion. Perhaps keeping the expanded input in the state as String, rather than Text, would be cheaper in this case, i.e. when a middle part needs to be replaced?
The reasons for this question are:
I would appreciate suggestions / comments on the two issues described above
Can anyone suggest a better approach altogether?
I'm looking to call functions dynamically based on the contents found in an association list.
Here is an example in semi-pseudo-code. listOfFunctions would be passed to callFunctions.
listOfFunctions = [('function one', 'value one')
, ('function two', 'value two')
, ('function three', 'value three')]
callFunctions x = loop through functions
if entry found
then call function with value
else do nothing
The crux of the question is not looping through the list, rather, it's how to call a function once I have it's name?
Consider this use case for further clarification. You open the command prompt and are presented with the following menu.
1: Write new vHost file
2: Exit
You write the new vHost file and are not presented with a new menu
1: Enter new directive
2: Write file
3: Exit
You enter some new directives for the vHost and are now ready to write the file.
The program isn't going to blindly write each and every directive it can, rather, it will only write the ones that you supplied. This is where the association list comes in. Writing a giant if/then/else or case statement is madness. It would be much more elegant to loop through the list, look for which directives were added and call the functions to write them accordingly.
Hence, loop, find a function name, call said function with supplied value.
Thanks to anyone who can help out with this.
Edit:
Here is the solution that I've come up with (constructive critiques are always welcome).
I exported the functions which write the directives in an association list as every answer provided said that just including the function is the way to go.
funcMap = [("writeServerName", writeServerName)
,("writeServeralias", writeServerAlias)
,("writeDocRoot", writeDocRoot)
,("writeLogLevel", writeErrorLog)
,("writeErrorPipe", writeErrorPipe)
,("writeVhostOpen", writeVhostOpen)]
In the file which actually writes the hosts, that file is imported.
I have an association list called hostInfo to simulate some dummy value that would be gathered from an end-user and a function called runFunction which uses the technique supplied by edalorzo to filter through both the lists. By matching on the keys of both lists I ensure that the right function is called with the right value.
import Vhost.Directive
hostInfo = [("writeVhostOpen", "localhost:80")
,("writeServerName", "norics.com")]
runFunctions = [f val | (mapKey, f) <- funcMap, (key, val) <- hostInfo, mapKey == key]
You can simply include the function in the list directly; functions are values, so you can reference them by name in a list. Once you've got them out of the list, applying them is just as simple as func value. There's no need to involve their names at all.
Since I am farily new to Haskell I will risk that you consider my suggestion very naive, but anyways here it goes:
let funcs = [("sum", (+3),1),("product", (*3),2),("square", (^2),4)]
[f x | (name, f, x) <- funcs, name == "sum"]
I think it satisfies the requirements of the question, but perhaps what you intend is more sofisticated than what I can see with my yet limitted knowledge of Haskell.
It might be a bit of an overkill (I agree with ehird's reasoning) but you can evaluate a string with Haskell code by using the eval function in System.Eval.Haskell.
EDIT
As pointed out in the comments, hint is a better option for evaluating strings with Haskell expressions. Quoting the page:
This library defines an Interpreter monad. It allows to load Haskell modules, browse them, type-check and evaluate strings with Haskell expressions and even coerce them into values. The library is thread-safe and type-safe (even the coercion of expressions to values). It is, esentially, a huge subset of the GHC API wrapped in a simpler API. Works with GHC 6.10.x and 6.8.x
First we define our list of functions. This could be built using more machinery, but for the sake of example I just make one explicit list:
listOfFunctions :: [(Int, IO ())]
listOfFunctions = [(0, print "HI") -- notice the anonymous function
,(1, someNamedFunction) -- and something more traditional here
]
someNamedFunction = getChar >>= \x -> print x >> print x
Then we can select from this list however we want and execute the function:
executeFunctionWithVal :: Int -> IO ()
executeFunctionWithVal v = fromMaybe (return ()) (lookup v listOfFunctions)
and it works (if you import Data.Maybe):
Ok, modules loaded: Main.
> executeFunctionWithVal 0
"HI"
> executeFunctionWithVal 01
a'a'
'a'
Don't store the functions as strings, or rather, try storing the actual functions and then tagging them with a string. That way you can just call the function directly. Functions are first class values, so you can call the function using whatever name you assign it to.