How to properly indent blocks using HughesPJ's pretty printing library? - haskell

I have a following programming language grammar:
data Expr = ...
data Stmt = SExpr Expr | SBlock Block | SLet Fundef | ...
data Block = Block [Stmt]
data Fundef = Fundef String [String] Block
data TopDef = TopFun Fundef
With following example syntax:
function long_function_name () = {
let g() = {
{
h()
};
3
}
}
I am trying to use HughesPJ pretty library to create a pretty printer for this language. My attempts so far look like:
instance Pretty Stmt where
pPrint = \case
SExpr e -> pPrint e
SBlock b -> pPrint b
SLet f -> text "let" <+> pPrint f
instance Pretty Block where
pPrint (Block stmts) = lbrace $+$
nest 2 (vcat (punctuate semi (map pPrint stmts))) $+$
rbrace
instance Pretty Fundef where
pPrint (Fundef name args body) = pPrint name <> parens (...) <+> text "=" <+> pPrint body
instance Prettty TopDef where
pPrint (TopFun f) = text "function" <+> pPrint f
The problem is, I want to have { in the same line as the function declaration, but it always makes the indentation of the following lines relative to the column of the bracket instead of being absolute. Should be visible in the pretty print of the example above;
function long_function_name () = {
let g() = {
{
h()
};
3
}
}
Why does it happen and how should I tackle this problem? I would like to avoid as much code duplication as possible.

You’re writing <+> before the body, so the $+$ vertical concatenation is entirely within that horizontal concatenation of the function line, hence it’s all indented. I believe the way to do what you want with pretty is to explicitly match on the block, since it’s part of the vertical layout, i.e.:
pPrint (Fundef name args (Block stmts)) = vcat
[ pPrint name <> parens (...) <+> text "=" <+> lbrace
, nest 2 (vcat (punctuate semi (map pPrint stmts)))
, rbrace
]
The more modern pretty-printing libraries like prettyprinter make this a little easier: nest (or indent, or hang) handles the indentation of lines following the first line in a vertical layout, so you can put the nest around the opening brace and body, and the closing brace outside the nesting, like so:
"prefix" <+> vcat
[ nest 4 $ vcat
[ "{"
, "body"
]
, "}"
]
⇓
prefix {
body
}
(NB. you can use OverloadedStrings like this instead of wrapping literals in text.)
But that won’t work with pretty, which seems to be designed to align the heck out of everything.
I also recommend prettyprinter for its other advantages, for example, a group function that allows you to express “put this on one line if it fits”, which is extremely helpful for making formatting robust & responsive to different rendering contexts.

Related

How do I pass a rendered persistent/esqueleto query to another query?

I'd like to use Persistent/Esqueleto to implement count estimates.
One approach recommended in this article is to define a function like this
CREATE FUNCTION count_estimate(query text) RETURNS integer AS $$
DECLARE
rec record;
rows integer;
BEGIN
FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
rows := substring(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
EXIT WHEN rows IS NOT NULL;
END LOOP;
RETURN rows;
END;
$$ LANGUAGE plpgsql VOLATILE STRICT;
and then use it like this
SELECT count_estimate('SELECT * FROM companies WHERE status = ''Active''');
In order to use the count_estimate function, I'll need (I think?) to render the query that Peristent/Equeleto generates, however when I try rendering the query with renderQuerySelect, I get something like this
SELECT "companies"."id", "companies"."name", "companies"."status"
FROM "companies"
WHERE "companies"."status" IN (?)
; [PersistText "Active"]
This of course can't be stuffed into the count_estimate, because it will syntax error on the ? placeholder. I also can't naïvely replace the ? with "Active", because it will syntax error on that first double quote.
How do I render the query in a way that my count_estimate function will accept?
I tried something like this, but it fails at runtime
getEstimate :: (Text, [PersistValue]) -> DB [Single Int]
getEstimate (query, params) = rawSql [st|
SELECT count_estimate('#{query}');
|] params
I managed to figure it out (mostly).
It's a matter of escaping the single quotes in both the query and the PersistValue parameters. I'm doing it like this at the moment, but escaping will need to be added back in otherwise I think it creates a SQL injection vulnerability. I may also need to handle the other PersistValue constructors in some specific way, but I haven't run into problems there yet.
import qualified Data.Text as T
import qualified Database.Persist as P
getEstimate :: (Text, [PersistValue]) -> DB (Maybe Int)
getEstimate (query, params) = fmap unSingle . listToMaybe <$> rawSql [st|
SELECT count_estimate('#{T.replace "'" "''" query}');
|] (map replace' params)
where literal a = PersistLiteral_ P.Unescaped ("''" <> a <> "''")
replace' = \case
PersistText t -> literal $ encodeUtf8 t
PersistDay d -> literal $ encodeUtf8 $ pack $ showGregorian d
a -> a

Words from string to list

I'm trying to write a Haskell function which would read a string and return a list with the words from the string saved in it.
Here's how I did it:
toWordList :: String -> [String]
toWordList = do
[ toLower x | x <- str ]
let var = removePunctuation(x)
return (words var)
But I get this error:
Test1.hs:13:17: error: parse error on input 'let'
|
13 | let var = removePunctuation(x)
| ^^^
I'm new to Haskell so I don't have the grasp over its syntax so thanks in advance for the help.
There's quite a few mistakes here, you should spend more time reading over some tutorials (learn you a Haskell, Real World Haskell). You're pretty close though, so I'll try to do a break-down here.
do is special - it doesn't switch Haskell into "imperative mode", it lets you write clearer code when using Monads - if you don't yet know what Monads are, stay away from do! Keywords like return also don't behave the same as in imperative languages. Try to approach Haskell with a completely fresh mind.
Also in Haskell, indentation is important - see this link for a good explanation. Essentially, you want all the lines in the same "block" to have the same indentation.
Okay, let's strip out the do and return keywords, and align the indentation. We'll also name the parameter to the function str - in your original code, you missed this bit out.
toWordList :: String -> [String]
toWordList str =
[toLower x | x <- str]
let var = removePunctuation(x)
words var
The syntax for let is let __ = __ in __. There's different notation when using do, but forget about that for now. We also don't name the result of the list comprehension, so let's do that:
toWordList str =
let lowered = [toLower x | x <- str] in
let var = removePunctuation lowered in
words var
And this works! We just needed to get some syntax right and avoid the monadic syntactic sugar of do/return.
It's possible (and easy) to make it nicer though. Those let blocks are kinda ugly, we can strip those away. We can also replace the list comprehension with map toLower, which is a bit more elegant and is equivalent to your comprehension:
toWordList str = words (removePunctuation (map toLower str))
Nice, that's down to a single line now! But all those brackets are also a bit of an eyesore, how about we use the $ function?
toWordList str = words $ removePunctuation $ map toLower str
Looking good. There's another improvement we can make, which is to convert this into point-free style, where we don't explicitly name our parameter - instead we express this function as the composition of other functions.
toWordList = words . removePunctuation . (map toLower)
And we're done! Hopefully the first two code snippets make it clearer how the Haskell syntax works, and the last few might show you some nice examples of how you can make fairly verbose code much much cleaner.

Parsing a JSON document with a Monad to look for a specific value?

I am a beginner at Haskell and I am trying to use https://hackage.haskell.org/package/json-0.9.1/docs/Text-JSON.html to parse a JSON document.
In my task, i am given a JSON document, and I would like to return the value corresponding to "index" , for example in the following case:
{
"root": [
{
"root1": 157538
},
{
"root2": [
{
"leaf21": 3
}
]
},
{
"root3": [
{
"leaf31": "somestring"
},
{
"index": "foundit"
}
]
}
]
}
To be specific: if presented with a JSON document and a path like "root" -> "root3" -> "index" exists, I would like to return "foundit", else I would like to return Nothing. Everything else in the document is arbitrary: root3,root2,root1,root may or may not exist etc.
Now I can do this using lots of case statements and patterns matches, but having read https://wiki.haskell.org/All_About_Monads, I am wondering if there is a better way using something similar to the Maybe Monad and the sheep-cloning example, however i am not sure how to write the bind function ...
[in my real case the value I seek is actually 19-deep in the document so I have lots of case statements]
Please could you suggest how to use Monads to do this ?
Yes those case statements are not necessary
Your guesses are correct - but monads are not the correct answer in this case (no pun intended1).
This is a great job for Traversals, Prisms and Lenses - and of course the great aeson-library and lens-aeson.
{-# LANGUAGE OverloadedStrings #-}
module Test where
import Control.Lens
import Data.Aeson
import Data.Aeson.Lens
import Data.Monoid ((<>))
jsonString = "{\"root\":[{\"root1\":157538}"
<> ",{\"root2\":[{\"leaf21\":3}]}"
<> ",{\"root3\":[{\"leaf31\":\"somestring\"}"
<> ",{\"index\":\"foundit\"}]}]}"
val :: Maybe Value
val = decode jsonString
indexMaybe :: Maybe Value
indexMaybe = val ^? _Just . key "root" . values
. key "root3" . values
. key "index"
So what does this do?
decode transforms a ByteString into a Maybe Value - Maybe because parsing might fail!
then the (^?) operator previews a traversal - i.e. it goes through the JSON Object and follows the json path you give;
For this you have at least to know the path to "index", if this path is unknown, you'd have to invest a bit more research into lenses/prisms/traversals or do a simple tree search on the parsed object.
Here is a spoiler for those who lack time to implement a search for "index" in a json object:
search :: Text -> Value -> Maybe Value
search txt o#(Object o') = let f y#(Just x) _ = y
f _ v = search txt v
in case o ^? key txt of
Nothing -> foldl' f Nothing o'
x -> x
search txt a#(Array a') = let f y#(Just x) _ = y
f _ v = search txt v
in foldl' f Nothing a'
search _ _ = Nothing
Alternatives
As #MarkSeeman already mentioned - a simple text search, might be much more efficient.
1: okay maybe a little bit

Why is building a Haskell String from Data.Text so slow

So I had a location class
data Location = Location {
title :: String
, description :: String
}
instance Show Location where
show l = title l ++ "\n"
++ replicate (length $ title l) '-' ++ "\n"
++ description l
Then I changed it to use Data.Text
data Location = Location {
title :: Text
, description :: Text
}
instance Show Location where
show l = T.unpack $
title l <> "\n"
<> T.replicate (T.length $ title l) "-" <> "\n"
<> description l
Using criterion, I benchmarked the time taken by show on both the String and Data.Text implementations:
benchmarks = [ bench "show" (whnf show l) ]
where l = Location {
title="My Title"
, description = "This is the description."
}
The String implementation took 34ns, the Data.Text implementation was almost six times slower, at 170ns
How do I get Data.Text working as fast as String?
Edit: Silly mistakes
I'm not sure how this happened, but I cannot replicate the original speed difference: now for String and Text I get 28ns and 24ns respectively
For the more aggressive bench "length.show" (whnf (length . show) l) benchmark, for String and Text, I get 467ns and 3954ns respectively.
If I use a very basic lazy builder, without the replicated dashes
import qualified Data.Text.Lazy.Builder as Bldr
instance Show Location where
show l = show $
Bldr.fromText (title l) <> Bldr.singleton '\n'
-- <> Bldr.fromText (T.replicate (T.length $ title l) "-") <> Bldr.singleton '\n'
<> Bldr.fromText (description l)
and try the original, ordinary show benchmark, I get 19ns. Now this is buggy, as using show to convert a builder to a String will escape newlines. If I replace it with LT.unpack $ Bldr.toLazyText, where LT is a qualified import of Data.Text.Lazy, then I get 192ns.
I'm testing this on a Mac laptop, and I suspect my timings are getting horribly corrupted by machine noise. Thanks for the guidance.
You can't make it as fast, but you can speed it up some.
Appending
Text is represented as an array. This makes <> rather slow, because a new array has to be allocated and each Text copied into it. You can fix this by converting each piece to a String first, and then concatenating them. I imagine Text probably also offers an efficient way to concatenate multiple texts at once (as a commenter mentions, you can use a lazy builder) but for this purpose that will be slower. Another good option might be the lazy version of Text, which probably supports efficient concatenation.
Sharing
In your String-based implementation, the description field doesn't have to be copied at all. It's just shared between the Location and the result of showing that Location. There's no way to accomplish this with the Text version.
In the String case you are not fully evaluating all of the string operations - (++) and replicate.
If you change your benchmark to:
benchmarks = [ bench "show" (whnf (length.show) l) ]
you'll see that the String case takes around 520 ns - approx 10 times longer.

Text.PrettyPrint a better way to set indentation

I have a pretty-printer like that:
somefun = text "woo" $+$ nest 4 (text "nested text") $+$ text "text without indent"
fun = text "------" $+$ somefun
What I want from it is to print this:
------ woo
nested text
text without indent
But it prints:
------
woo
nested text
text without indent
I can understand why it prints like this, but I'm having trouble to do what I want. One solution I find was this:
somefun p = p <+> text "woo" $+$ nest 4 (text "nested text") $+$ text "text without indent"
fun = somefun (text "------")
That is, I'm passing the Doc which I want my next Doc's indentation to be based on. This solves my problem but I'm looking for better ways to do this.
Your pass-the-Doc-as-an-argument solution is good. Once you've combined into a single Doc, you can't split it apart again, so here are two ways that use lists instead:
Alternative 1
Another way of doing this is to use [Doc] instead of Doc for your subsequent text, if you want to treat the lines differently, then recombine using something like
(<+$) :: Doc -> [Doc] -> Doc
doc <+$ [] = doc
doc <+$ (d:ds) = (doc <+> d) $+$ foldr ($+$) empty ds
somefun :: [Doc]
somefun = [text "woo",
nest 4 (text "nested text"),
text "text without indent"]
fun :: Doc
fun = text "------" <+$ somefun
This gives you
*Main> fun
------ woo
nested text
text without indent
Alternative 2
You could rewrite this solution another way keeping lists, if you like to keep indenting the top line:
(<+:) :: Doc -> [Doc] -> [Doc]
doc <+: [] = [doc]
doc <+: (d:ds) = (doc <+> d) : ds -- pop doc in front.
We'll need to put those together into a single Doc at some stage:
vsep = foldr ($+$) empty
Now you can use : to put a line above, and <+: to push a bit in front of the top line:
start = [text "text without indent"]
next = nest 4 (text "nested text") : start
more = text "woo" : next
fun = text "------" <+: more
extra = text "-- extra! --" <+: fun
Test this with
*Main> vsep fun
------ woo
nested text
text without indent
*Main> vsep extra
-- extra! -- ------ woo
nested text
text without indent
The main issue is that if you use [Doc] instead of Doc it's almost as if you're not using the pretty-print library! It doesn't matter, though, if it's what you need.

Resources