I often read that I shouldn't mix tabs and spaces in Haskell, or that I shouldn't use tabs at all. Why?
The problem is twofold. First of all, Haskell is indentation sensitive, e.g. the following code isn't valid:
example = (a, b)
where
a = "Hello"
b = "World"
Both bindings need to be indented with the same number of spaces/tabs (see off-side rule). While it's obvious in this case, it's rather hidden in the following one, where I denote a space by · and a tab by »:
example = (a, b)
··where
····a = "Hello"
» b = "World"
This will look like valid Haskell code if the editor will show tabs aligned to multiples by four. But it isn't. Haskell tabs are aligned by multiples of eight, so the code will be interpreted like this:
example = (a, b)
··where
····a = "Hello"
» b = "World"
Second, if you use only tabs, you can end up with a layout that doesn't look right. For example, the following code looks correct if a tab gets displayed with six or more spaces (eight in this case):
example = (a, b)
» where» a = "Hello"
» » b = "World"
But in another editor that uses 4 spaces it won't look right anymore:
example = (a, b)
» where» a = "Hello"
» » b = "World"
It's still correct, though. However, someone who's used to spaces might reindent b' binding with spaces and end up with a parser error.
If you enforce a code convention throughout your code that makes sure that you only use tabs at the beginning of a line and use a newline after where, let or do you can avoid some of the problems (see 11). However, current releases of GHC warn about tabs by default, because they have been a source of many parser errors in the past, so you probably want to get rid of them too.
See also
A reddit thread on the topic (majority pro spaces, but some pro tabs)
Good Haskell Style (pro spaces)
Yet Another Tabs v Space debate (pro mixing)
Related
I'm planning on writing a Parser for some language. I'm quite confident that I could cobble together a parser in Parsec without too much hassle, but I thought about including comments into the AST so that I could implement a code formatter in the end.
At first, adding an extra parameter to the AST types seemed like a suitable idea (this is basically what was suggested in this answer). For example, instead of having
data Expr = Add Expr Expr | ...
one would have
data Expr a = Add a Expr Expr
and use a for whatever annotation (e.g. for comments that come after the expression).
However, there are some not so exciting cases. The language features C-like comments (// ..., /* .. */) and a simple for loop like this:
for (i in 1:10)
{
... // list of statements
}
Now, excluding the body there are at least 10 places where one could put one (or more) comments:
/*A*/ for /*B*/ ( /*C*/ i /*E*/ in /*F*/ 1 /*G*/ : /*H*/ 10 /*I*/ ) /*J*/
{ /*K*/
...
In other words, while the for loop could previously be comfortably represented as an identifier (i), two expressions (1 & 10) and a list of statements (the body), we would now at least had to include 10 more parameters or records for annotations.
This get ugly and confusing quite quickly, so I wondered whether there is a clear better way to handle this. I'm certainly not the first person wanting to write a code formatter that preserves comments, so there must be a decent solution or is writing a formatter just that messy?
You can probably capture most of those positions with just two generic comment productions:
Expr -> Comment Expr
Stmt -> Comment Stmt
This seems like it ought to capture comments A, C, F, H, J, and K for sure; possibly also G depending on exactly what your grammar looks like. That only leaves three spots to handle in the for production (maybe four, with one hidden in Range here):
Stmt -> "for" Comment "(" Expr Comment "in" Range Comment ")" Stmt
In other words: one before each literal string but the first. Seems not too onerous, ultimately.
For instance:
let x = 1 in putStrLn [dump|x, x+1|]
would print something like
x=1, (x+1)=2
And even if there isn't anything like this currently, would it be possible to write something similar?
TL;DR There is this package which contains a complete solution.
install it via cabal install dump
and/or
read the source code
Example usage:
{-# LANGUAGE QuasiQuotes #-}
import Debug.Dump
main = print [d|a, a+1, map (+a) [1..3]|]
where a = 2
which prints:
(a) = 2 (a+1) = 3 (map (+a) [1..3]) = [3,4,5]
by turnint this String
"a, a+1, map (+a) [1..3]"
into this expression
( "(a) = " ++ show (a) ++ "\t " ++
"(a+1) = " ++ show (a + 1) ++ "\t " ++
"(map (+a) [1..3]) = " ++ show (map (+ a) [1 .. 3])
)
Background
Basically, I found that there are two ways to solve this problem:
Exp -> String The bottleneck here is pretty-printing haskell source code from Exp and cumbersome syntax upon usage.
String -> Exp The bottleneck here is parsing haskell to Exp.
Exp -> String
I started out with what #kqr put together, and tried to write a parser to turn this
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
into this
["not x = False","x = True","x == True = True"]
But after trying for a day, my parsec-debugging-skills have proven insufficient to date, so instead I went with a simple regular expression:
simplify :: String -> String
simplify s = subRegex (mkRegex "_[0-9]+|([a-zA-Z]+\\.)+") s ""
For most cases, the output is greatly improved.
However, I suspect this to likely mistakenly remove things it shouldn't.
For example:
$(dump [|(elem 'a' "a.b.c", True)|])
Would likely return:
["elem 'a' \"c\" = True","True = True"]
But this could be solved with proper parsing.
Here is the version that works with the regex-aided simplification: https://github.com/Wizek/kqr-stackoverflow/blob/master/Th.hs
Here is a list of downsides / unresolved issues I've found with the Exp -> String solution:
As far as I know, not using Quasi Quotation requires cumbersome syntax upon usage, like: $(d [|(a, b)|]) -- as opposed to the more succinct [d|a, b|]. If you know a way to simplify this, please do tell!
As far as I know, [||] needs to contain fully valid Haskell, which pretty much necessitates the use of a tuple inside further exacerbating the syntactic situation. There is some upside to this too, however: at least we don't need to scratch our had where to split the expressions since GHC does that for us.
For some reason, the tuple only seemed to accept Booleans. Weird, I suspect this should be possible to fix somehow.
Pretty pretty-printing Exp is not very straight-forward. A more complete solution does require a parser after all.
Printing an AST scrubs the original formatting for a more uniform looks. I hoped to preserve the expressions letter-by-letter in the output.
The deal-breaker was the syntactic over-head. I knew I could get to a simpler solution like [d|a, a+1|] because I have seen that API provided in other packages. I was trying to remember where I saw that syntax. What is the name...?
String -> Exp
Quasi Quotation is the name, I remember!
I remembered seeing packages with heredocs and interpolated strings, like:
string = [qq|The quick {"brown"} $f {"jumps " ++ o} the $num ...|]
where f = "fox"; o = "over"; num = 3
Which, as far as I knew, during compile-time, turns into
string = "The quick " ++ "brown" ++ " " ++ $f ++ "jumps " ++ o ++ " the" ++ show num ++ " ..."
where f = "fox"; o = "over"; num = 3
And I thought to myself: if they can do it, I should be able to do it too!
A bit of digging in their source code revealed the QuasiQuoter type.
data QuasiQuoter = QuasiQuoter {quoteExp :: String -> Q Exp}
Bingo, this is what I want! Give me the source code as string! Ideally, I wouldn't mind returning string either, but maybe this will work. At this point I still know quite little about Q Exp.
After all, in theory, I would just need to split the string on commas, map over it, duplicate the elements so that first part stays string and the second part becomes Haskell source code, which is passed to show.
Turning this:
[d|a+1|]
into this:
"a+1" ++ " = " ++ show (a+1)
Sounds easy, right?
Well, it turns out that even though GHC most obviously is capable to parse haskell source code, it doesn't expose that function. Or not in any way we know of.
I find it strange that we need a third-party package (which thankfully there is at least one called haskell-src-meta) to parse haskell source code for meta programming. Looks to me such an obvious duplication of logic, and potential source of mismatch -- resulting in bugs.
Reluctantly, I started looking into it. After all, if it is good enough for the interpolated-string folks (those packaged did rely on haskell-src-meta) then maybe it will work okay for me too for the time being.
And alas, it does contain the desired function:
Language.Haskell.Meta.Parse.parseExp :: String -> Either String Exp
Language.Haskell.Meta.Parse
From this point it was rather straightforward, except for splitting on commas.
Right now, I do a very simple split on all commas, but that doesn't account for this case:
[d|(1, 2), 3|]
Which fails unfortunatelly. To handle this, I begun writing a parsec parser (again) which turned out to be more difficult than anticipated (again). At this point, I am open to suggestions. Maybe you know of a simple parser that handles the different edge-cases? If so, tell me in a comment, please! I plan on resolving this issue with or without parsec.
But for the most use-cases: it works.
Update at 2015-06-20
Version 0.2.1 and later correctly parses expressions even if they contain commas inside them. Meaning [d|(1, 2), 3|] and similar expressions are now supported.
You can
install it via cabal install dump
and/or
read the source code
Conclusion
During the last week I've learnt quite a bit of Template Haskell and QuasiQuotation, cabal sandboxes, publishing a package to hackage, building haddock docs and publishing them, and some things about Haskell too.
It's been fun.
And perhaps most importantly, I now am able to use this tool for debugging and development, the absence of which has been bugging me for some time. Peace at last.
Thank you #kqr, your engagement with my original question and attempt at solving it gave me enough spark and motivation to continue writing up a full solution.
I've actually almost solved the problem now. Not exactly what you imagined, but fairly close. Maybe someone else can use this as a basis for a better version. Either way, with
{-# LANGUAGE TemplateHaskell, LambdaCase #-}
import Language.Haskell.TH
dump :: ExpQ -> ExpQ
dump tuple =
listE . map dumpExpr . getElems =<< tuple
where
getElems = \case { TupE xs -> xs; _ -> error "not a tuple in splice!" }
dumpExpr exp = [| $(litE (stringL (pprint exp))) ++ " = " ++ show $(return exp)|]
you get the ability to do something like
λ> let x = True
λ> print $(dump [|(not x, x, x == True)|])
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
which is almost what you wanted. As you see, it's a problem that the pprint function includes module prefixes and such, which makes the result... less than ideally readable. I don't yet know of a fix for that, but other than that I think it is fairly usable.
It's a bit syntactically heavy, but that is because it's using the regular [| quote syntax in Haskell. If one wanted to write their own quasiquoter, as you suggest, I'm pretty sure one would also have to re-implement parsing Haskell, which would suck a bit.
I am doing some haskell exercises to learn the language and I have a syntax error I was hoping someone could help me with:
-- Split a list l at element k into a tuple: The first part up to and including k, the second part after k
-- For example "splitAtIndex 3 [1,1,1,2,2,2]" returns ([1,1,1],[2,2,2])
splitAtIndex k l = ([l !! x | x <- firstHalfIndexes], [l !! x | x <- firstHalfIndexes])
where firstHalfIndexes = [0..k-1]
secondHalfIndexes = [k..(length l-1)]
The syntax error is "parse error on input ‘=’" and seems to be coming from my second where clause, but I can't work out why the first where clause is ok but not the second?
The Haskell Report specifies that tab characters flesh out text to the next multiple of eight. Your code appears to assume that it gets fleshed out to the next multiple of four. (My best guess. Might also be configured to be five or six, but those settings seem less popular than four.)
See my page on tabs for ideas on how to safely use tabs in Haskell code; or else do what most other folks do and configure your editor to expand tabs to spaces.
For an example of the style I use, your current code looks like this to the compiler (using > to mark tabs and _ for spaces):
splitAtIndex_..._=_...
> where_> firstHalfIndexes_=_...
> > > secondHalfIndexes_=_...
I would write it to look like this to the compiler:
splitAtIndex_..._=_...
> where_> firstHalfIndexes_=_...
> ______> secondHalfIndexes_=_...
This also looks correct with four-space tabstops (and indeed any size tabstop):
splitAtIndex_..._=_...
> where_> firstHalfIndexes_=_...
> ______> secondHalfIndexes_=_...
(Actually, I would probably just use one space after where rather than a space and a tab, but that's an aesthetics thing, not really a technical one.)
I'm writing a pretty-printer for a simple white-space sensitive language.
I like the Leijen pretty-printer library more than I like the Wadler library, but the Leijen library has one problem in my domain: any line break I insert may be overridden by the group construct, which may compress any line, which might change the semantics of the output.
I don't think I can implement an ungroupable line in the wl-pprint (although I'd love to be wrong).
Looking a bit at the wl-pprint-extras package, I don't think that even the exposed internal interface allows me to create a line which will not be squashed by group.
Do I just have to rely on the fact that I never use group, or do I have some better option?
Given that you want to be able to group and you also need to be able to ensure some lines aren't uninserted,
why don't we use the fact that the library designers encoded the semantics in the data type,
instead of in code. This fabulous decision makes it eminently re-engineerable.
The Doc data type encodes a line break using the constructor Line :: Bool -> Doc.
The Bool represents whether to omit a space when removing a line. (Lines indent when they're there.)
Let's replace the Bool:
data LineBehaviour = OmitSpace | AddSpace | Keep
data Doc = ...
...
Line !LineBehaviour -- not Bool any more
The beautiful thing about the semantics-as-data design is that if we replace
this Bool data with LineBehaviour data, functions that didn't use it but
passed it on unchanged don't need editing. Functions that look inside at what
the Bool is break with the change - we'll rewrite exactly the parts of the code
that need changing to support the new semantics by changing the data type where
the old semantics resided. The program won't compile until we've made all the
changes we should, while we won't need to touch a line of code that doesn't
depend on line break semantics. Hooray!
For example, renderPretty uses the Line constructor, but in the pattern Line _,
so we can leave that alone.
First, we need to replace Line True with Line OmitSpace, and Line False with Line AddSpace,
line = Line AddSpace
linebreak = Line OmitSpace
but perhaps we should add our own
hardline :: Doc
hardline = Line Keep
and we could perhaps do with a binary operator that uses it
infixr 5 <->
(<->) :: Doc -> Doc -> Doc
x <-> y = x <> hardline <> y
and the equvalent of the vertical seperator, which I can't think of a better name than very vertical separator:
vvsep,vvcat :: [Doc] -> Doc
vvsep = fold (<->)
vvcat = fold (<->)
The actual removing of lines happens in the group function. Everything can stay the same except:
flatten (Line break) = if break then Empty else Text 1 " "
should be changed to
flatten (Line OmitSpace) = Empty
flatten (Line AddSpace) = Text 1 " "
flatten (Line Keep) = Line Keep
That's it: I can't find anything else to change!
You do need to avoid group, yes. The library's designed to facilitate wrapping or not wrapping based on the width of the output that you specify.
Dependent on the syntax of language you're implementing, you should also be cautious about softline and softbreak and the </> and <//> operators that use them. There's no reason I can see that you can't use <$> and <$$> instead.
sep, fillSep, cat and fillCat all use group directly or indirectly (and have the indeterminate semantics/width-dependent line breaks you want to avoid). However, given the your purpose, I don't think you need them:
Use vsep or hsep instead of sep or fillSep.
Use hcat or vcat instead of cat or fillCat.
You could use a line like
import Text.PrettyPrint.Leijen hiding (group,softline,softbreak,
(</>),(<//>),
sep,fillSep,cat,fillCat)
to make sure you don't call these functions.
I can't think of a way to ensure that functions you do use don't call group somewhere along the line, but I think those are the ones to avoid.
It's standard in most modern editors that you can highlight a piece of code and indent or unindent a tab or however many spaces you're using; how do you do this in emacs?
So, for example I just opened sublime text, highlighted the following piece of code:
variation1 person phoneMap carrierMap addressMap =
case M.lookup person phoneMap of
Nothing -> Nothing
Just number ->
case M.lookup number carrierMap of
Nothing -> Nothing
Just carrier -> M.lookup carrier addressMap
then pressed tab and got
variation1 person phoneMap carrierMap addressMap =
case M.lookup person phoneMap of
Nothing -> Nothing
Just number ->
case M.lookup number carrierMap of
Nothing -> Nothing
Just carrier -> M.lookup carrier addressMap
one shift-tab on that code returns it back to where it was, and if I continuing pressing shift-tab I eventually get the following:
variation1 person phoneMap carrierMap addressMap =
case M.lookup person phoneMap of
Nothing -> Nothing
Just number ->
case M.lookup number carrierMap of
Nothing -> Nothing
Just carrier -> M.lookup carrier addressMap
Quote from another response:
emacs language modes don't really have a notion of 'indent this block
1 tab further'. Instead they're very opinionated and have a notion of
'this is the correct indentation' and that's what you get when you hit
tab in a language mode.
Except when I do that with the following code (haskell mode and ghc mod enabled):
import Monad
import System
import IO
import Random
import Control.Monad.State
type RandomState a = State StdGen a
data CountedRandom = CountedRandom {
crGen :: StdGen
, crCount :: Int
}
type CRState = State CountedRandom
getRandom :: Random a => RandomState a
getRandom =
get >>= \gen ->
let (val, gen') = random gen in
put gen' >>
return val
I get the following:
import Monad
import System
import IO
import Random
import Control.Monad.State
type RandomState a = State StdGen a
data CountedRandom = CountedRandom {
crGen :: StdGen
, crCount :: Int
}
type CRState = State CountedRandom
getRandom :: Random a => RandomState a
getRandom =
get >>= \gen ->
let (val, gen') = random gen in
put gen' >>
return val
when I wanted
import Monad
import System
import IO
import Random
import Control.Monad.State
type RandomState a = State StdGen a
data CountedRandom = CountedRandom {
crGen :: StdGen
, crCount :: Int
}
type CRState = State CountedRandom
getRandom :: Random a => RandomState a
getRandom =
get >>= \gen ->
let (val, gen') = random gen in
put gen' >>
return val
Near enough to a solution from ataylor:
(defcustom tab-shift-width 4
"Sets selected text shift width on tab"
:type 'integer)
(make-variable-buffer-local 'tab-shift-width)
(global-set-key
(kbd "<tab>")
(lambda (start end)
(interactive "r")
(if (use-region-p)
(save-excursion
(let ((deactivate-mark nil))
(indent-rigidly start end tab-shift-width)))
(indent-for-tab-command))))
(global-set-key
(kbd "S-<tab>")
(lambda (start end)
(interactive "r")
(if (use-region-p)
(save-excursion
(let ((deactivate-mark nil))
(indent-rigidly start end (- tab-shift-width))))
(indent-for-tab-command))))
It'd be nice if emacs had support for indent detection (i.e., just grab the value of some variable); the closest thing I found to this was a plugin called dtrt indent but it doesn't work for Haskell.
indent-region will reindent a block of text according to the current mode.
To force an indentation level to be added, you can use string-rectangle, which will prompt you for a string. Here you can provide the string for an indentation level (e.g. a tab, 4 spaces, etc.). The string will be inserted on each line of the currently selected region, in the current column, effectively indenting it. Alternatively, you can get a similar effect from open-rectangle, which will insert whitespace into the rectangle with corners defined by the point and the mark.
Another way to force indentation is to call indent-rigidly (C-x TAB). This overrides the mode specific indentation rules and indents a fixed amount. The numeric argument specifies how much to indent, and a negative argument will unindent. If you want this to be the default behavior when a region is selected, you could do something like this:
(global-set-key
(kbd "<tab>")
(lambda (start end)
(interactive "r")
(if (use-region-p)
(save-excursion
(let ((deactivate-mark nil))
(indent-rigidly start end 4)))
(indent-for-tab-command))))
Haskell code is incredibly difficult to indent correctly, because there are multiple "correct" indentations for a piece of code.
haskell-mode has a very specific line format that it expects you to follow (As in, you have to make line breaks in the right places) and it has a few indentation rules for formatting code that matches that line format. These rules exist to make the automatic indentation results more consistent. The rules are roughly this:
After every keyword that introduces a block, you should make a line break or make sure that the entire block fits into the layout. Otherwise, you get a lot of hanging blocks like in your getRandom example
All blocks are indented exactly two spaces. This includes module blocks; if you do module Bla where, the whole part after that line will be indented. This means that you should keep the default Haskell module file format for the indentation order to work out.
The indentation of a line needs to be as unambiguous as possible; if a line could mean different things depending on its indentation, it will lead to it being indented to the position that haskell-mode thinks makes sense in the context. Fixing this can be impossible in some cases.
Because it is impossible to structure Haskell code so that it meets the requirements of haskell-mode, you cannot indent a while Haskell code file like this. You need to only use automatic indentation locally. This can be done in a number of ways:
When you are on a line, you can indent the current line to the most likely "correct" position with regards to the previous line by pressing TAB. By pressing TAB again, you will bring the line to the "next" indentation level, and continuously cycle through all possible logical indentation steps.
If you select a series of blocks that are found locally (The body of a function etc.) and use M-x indent-region, the result will most likely be correct.
What I usually do in a situation like this is to start on the line that has the "wrong" indentation, press TAB once, and go down line-by-line, pressing TAB one or multiple times on each line until the indentation of that line is correct. The current "logical indentation positions" for the current line are calculated from the preceding code context, so correcting the indentation from the top almost always yields the correct result.
I highlight the area and hit C-M-\. It is indent-region and more fun can be found at the multi-line indent page.
emacs language modes don't really have a notion of 'indent this block 1 tab further'. Instead they're very opinionated and have a notion of 'this is the correct indentation' and that's what you get when you hit tab in a language mode.
Once you get used to it anything else seems weird. In your Haskell code those case statements only really have one indentation that's valid, anything else is a syntax error unless you add braces and semicolons. If you really want to customize what emacs considers the 'right' indentation look at how to customize your mode. A lot of language modes reuse the c-mode variables so here is probably a good place to start (though I'm not sure what Haskell mode does, I've never found a need to customize it).
edit:
I see in your comment that your troubles are from not having Haskell mode installed, got to the github page to get it.