I have a pretty-printer like that:
somefun = text "woo" $+$ nest 4 (text "nested text") $+$ text "text without indent"
fun = text "------" $+$ somefun
What I want from it is to print this:
------ woo
nested text
text without indent
But it prints:
------
woo
nested text
text without indent
I can understand why it prints like this, but I'm having trouble to do what I want. One solution I find was this:
somefun p = p <+> text "woo" $+$ nest 4 (text "nested text") $+$ text "text without indent"
fun = somefun (text "------")
That is, I'm passing the Doc which I want my next Doc's indentation to be based on. This solves my problem but I'm looking for better ways to do this.
Your pass-the-Doc-as-an-argument solution is good. Once you've combined into a single Doc, you can't split it apart again, so here are two ways that use lists instead:
Alternative 1
Another way of doing this is to use [Doc] instead of Doc for your subsequent text, if you want to treat the lines differently, then recombine using something like
(<+$) :: Doc -> [Doc] -> Doc
doc <+$ [] = doc
doc <+$ (d:ds) = (doc <+> d) $+$ foldr ($+$) empty ds
somefun :: [Doc]
somefun = [text "woo",
nest 4 (text "nested text"),
text "text without indent"]
fun :: Doc
fun = text "------" <+$ somefun
This gives you
*Main> fun
------ woo
nested text
text without indent
Alternative 2
You could rewrite this solution another way keeping lists, if you like to keep indenting the top line:
(<+:) :: Doc -> [Doc] -> [Doc]
doc <+: [] = [doc]
doc <+: (d:ds) = (doc <+> d) : ds -- pop doc in front.
We'll need to put those together into a single Doc at some stage:
vsep = foldr ($+$) empty
Now you can use : to put a line above, and <+: to push a bit in front of the top line:
start = [text "text without indent"]
next = nest 4 (text "nested text") : start
more = text "woo" : next
fun = text "------" <+: more
extra = text "-- extra! --" <+: fun
Test this with
*Main> vsep fun
------ woo
nested text
text without indent
*Main> vsep extra
-- extra! -- ------ woo
nested text
text without indent
The main issue is that if you use [Doc] instead of Doc it's almost as if you're not using the pretty-print library! It doesn't matter, though, if it's what you need.
Related
I have a following programming language grammar:
data Expr = ...
data Stmt = SExpr Expr | SBlock Block | SLet Fundef | ...
data Block = Block [Stmt]
data Fundef = Fundef String [String] Block
data TopDef = TopFun Fundef
With following example syntax:
function long_function_name () = {
let g() = {
{
h()
};
3
}
}
I am trying to use HughesPJ pretty library to create a pretty printer for this language. My attempts so far look like:
instance Pretty Stmt where
pPrint = \case
SExpr e -> pPrint e
SBlock b -> pPrint b
SLet f -> text "let" <+> pPrint f
instance Pretty Block where
pPrint (Block stmts) = lbrace $+$
nest 2 (vcat (punctuate semi (map pPrint stmts))) $+$
rbrace
instance Pretty Fundef where
pPrint (Fundef name args body) = pPrint name <> parens (...) <+> text "=" <+> pPrint body
instance Prettty TopDef where
pPrint (TopFun f) = text "function" <+> pPrint f
The problem is, I want to have { in the same line as the function declaration, but it always makes the indentation of the following lines relative to the column of the bracket instead of being absolute. Should be visible in the pretty print of the example above;
function long_function_name () = {
let g() = {
{
h()
};
3
}
}
Why does it happen and how should I tackle this problem? I would like to avoid as much code duplication as possible.
You’re writing <+> before the body, so the $+$ vertical concatenation is entirely within that horizontal concatenation of the function line, hence it’s all indented. I believe the way to do what you want with pretty is to explicitly match on the block, since it’s part of the vertical layout, i.e.:
pPrint (Fundef name args (Block stmts)) = vcat
[ pPrint name <> parens (...) <+> text "=" <+> lbrace
, nest 2 (vcat (punctuate semi (map pPrint stmts)))
, rbrace
]
The more modern pretty-printing libraries like prettyprinter make this a little easier: nest (or indent, or hang) handles the indentation of lines following the first line in a vertical layout, so you can put the nest around the opening brace and body, and the closing brace outside the nesting, like so:
"prefix" <+> vcat
[ nest 4 $ vcat
[ "{"
, "body"
]
, "}"
]
⇓
prefix {
body
}
(NB. you can use OverloadedStrings like this instead of wrapping literals in text.)
But that won’t work with pretty, which seems to be designed to align the heck out of everything.
I also recommend prettyprinter for its other advantages, for example, a group function that allows you to express “put this on one line if it fits”, which is extremely helpful for making formatting robust & responsive to different rendering contexts.
using gogol package,
follow example got
> exampleGetValue
-- ValueRange' {_vrValues = Just [String "2018/1/1",String "2018/1/2"], _vrRange = Just "'\24037\20316\34920\&1'!A1:1", _vrMajorDimension = Just VRMDRows}
> exampleGetValue >>= return . view vrValues
-- [String "2018/1/1",String "2018/1/2"]
> mapM_ (print) (exampleGetValue >>= return . view vrValues)
String "2018/1/1"
String "2018/1/2"
Why there will be a string of words
How to do I can only show
2018/1/1
2018/1/2
Take a look at
[String "2018/1/1",String "2018/1/2"]
the result of
> exampleGetValue >>= return . view vrValues
Here the strings you are interested in, like "2018/1/1" are contained in another datatype String, which has, I assume, an automatically derived show instance, which will print the name of the Data constructor String.
You need to unpack the strings somehow to get rid of the printing of the word String.
As this is stackoverflow, and we are considered to provide answers, I will give you one possibility now, but before you read it, try to do it yourself:
unpackString (String w) = w
mapM_ (print . unpackString) (exampleGetValue >>= return . view vrValues)
You have to determine the type signature for unpackString yourself, as you didn't provided any types.
So I had a location class
data Location = Location {
title :: String
, description :: String
}
instance Show Location where
show l = title l ++ "\n"
++ replicate (length $ title l) '-' ++ "\n"
++ description l
Then I changed it to use Data.Text
data Location = Location {
title :: Text
, description :: Text
}
instance Show Location where
show l = T.unpack $
title l <> "\n"
<> T.replicate (T.length $ title l) "-" <> "\n"
<> description l
Using criterion, I benchmarked the time taken by show on both the String and Data.Text implementations:
benchmarks = [ bench "show" (whnf show l) ]
where l = Location {
title="My Title"
, description = "This is the description."
}
The String implementation took 34ns, the Data.Text implementation was almost six times slower, at 170ns
How do I get Data.Text working as fast as String?
Edit: Silly mistakes
I'm not sure how this happened, but I cannot replicate the original speed difference: now for String and Text I get 28ns and 24ns respectively
For the more aggressive bench "length.show" (whnf (length . show) l) benchmark, for String and Text, I get 467ns and 3954ns respectively.
If I use a very basic lazy builder, without the replicated dashes
import qualified Data.Text.Lazy.Builder as Bldr
instance Show Location where
show l = show $
Bldr.fromText (title l) <> Bldr.singleton '\n'
-- <> Bldr.fromText (T.replicate (T.length $ title l) "-") <> Bldr.singleton '\n'
<> Bldr.fromText (description l)
and try the original, ordinary show benchmark, I get 19ns. Now this is buggy, as using show to convert a builder to a String will escape newlines. If I replace it with LT.unpack $ Bldr.toLazyText, where LT is a qualified import of Data.Text.Lazy, then I get 192ns.
I'm testing this on a Mac laptop, and I suspect my timings are getting horribly corrupted by machine noise. Thanks for the guidance.
You can't make it as fast, but you can speed it up some.
Appending
Text is represented as an array. This makes <> rather slow, because a new array has to be allocated and each Text copied into it. You can fix this by converting each piece to a String first, and then concatenating them. I imagine Text probably also offers an efficient way to concatenate multiple texts at once (as a commenter mentions, you can use a lazy builder) but for this purpose that will be slower. Another good option might be the lazy version of Text, which probably supports efficient concatenation.
Sharing
In your String-based implementation, the description field doesn't have to be copied at all. It's just shared between the Location and the result of showing that Location. There's no way to accomplish this with the Text version.
In the String case you are not fully evaluating all of the string operations - (++) and replicate.
If you change your benchmark to:
benchmarks = [ bench "show" (whnf (length.show) l) ]
you'll see that the String case takes around 520 ns - approx 10 times longer.
I'm trying to hack together a new file format writer for pandoc using LaTeX.hs as a guide. Extensive use of a $$ operator is made, but I can't find this in the Haskell syntax documentation or even references to in in other projects. Here is an example:
let align dir txt = inCmd "begin" dir $$ txt $$ inCmd "end" dir
This almost looks like a concatenation operator of some kind, yet I can't make out how this is different from other concatenation operations. What is this operator, how does it work, and where is it documented?
This is a job for Hayoo or Hoogle. It's an operator defined in Text.Pandoc.Pretty.
($$) :: Doc -> Doc -> Doc infixr 5
a $$ b puts a above b.
Basically, it will make sure that a and b are on different lines, which leads to nicer LaTeX output:
\begin{dir}
txt
\end{dir}
Pandoc defines its own pretty-printing library internally but the operations (and the name of the type, Doc) are standard in Haskell pretty printing libraries. Pandoc also defines other familiars like vcat, hsep, <+> and so on; there are many pretty printing modules around, but they always support these operations.
> import Text.PrettyPrint
> text "hello" <> text "world"
helloworld
> text "hello" <+> text "world"
hello world
> text "hello" $$ text "world"
hello
world
> text "hello" <+> text "world" $$ text "goodbye" <+> text "world"
hello world
goodbye world
ghci here displays 'what the document will look like', crudely speaking.
I'm fairly new to Haskell and as input I want to take an array of string for example as
["HEY" "I'LL" "BE" "RIGHT" "BACK"] and look for lets say "BE" "RIGHT" "BACK" and replace it with a different word, lets say "CHEESE". I have a function made for single words but I want this to work if a string contains a certain phrase to replace it with a word. Oh and I don't want to use external libraries.
Code:
replace :: [String] -> [String]
replace [] = []
replace (h:t)
| h == "WORD" = "REPLACED" : replace t
| otherwise = h : replace t
What you have now could also be implemented as
replace ("WORD":rest) = "REPLACED" : replace rest
replace (x:rest) = x : replace rest
replace [] = []
And this could be extended to your example as
replace ("BE":"RIGHT":"BACK":rest) = "CHEESE" : replace rest
replace (x:rest) = x : replace rest
replace [] = []
But obviously this is not a great way to write it. We'd like a more general solution where we can pass in a phrase (or sub-list) to replace. To start with we know the following things:
Input is a list of n elements (decreases as we recurse)
Phrase is a list of m elements (stays constant as we recurse)
If m > n, we definitely don't have a match
If m <= n, we might have a match
If we don't have a match, keep the head and try with the tail
While there are more efficient algorithms out there, a simple one would be to check our lengths at each step along the list. This can be done pretty simply as
-- Phrase Replacement Sentence New sentence
replaceMany :: [String] -> String -> [String] -> [String]
replaceMany phrase new sentence = go sentence
where
phraseLen = length phrase
go [] = []
go sent#(x:xs)
| sentLen < phraseLen = sent
| first == phrase = new : go rest
| otherwise = x : go xs
where
sentLen = length sent
first = take phraseLen sent
rest = drop phraseLen sent
Here we can take advantage of Haskell's laziness and just go ahead and define first and rest without worrying if it's valid to do so. If they aren't used, they never get computed. I've opted to also use some more complex pattern matching in the form sent#(x:xs). This matches a list with at least one element, assigning the entire list to sent, the first element to x, and the tail of the list to xs. Next, we just check each condition. If sentLen < phraseLen, there's no possible chance that there's a match in the rest of the list so just return the whole thing. If the first m elements equals our phrase, then replace it and keep searching, and otherwise just put back the first element and keep searching.