Using pandoc as a library to make a PDF - haskell

Part of a project I am working on involves creating a PDF, using Pandoc. I have the part of the program which makes a PDF. To figure out how to do this, I am trying to modify fuel.hs from JGM BayHack 2014.
However, I am having difficulty. I have the following function:
export :: (MonadIO m) => Pandoc -> m (Either BL.ByteString BL.ByteString)
export = liftIO . makePDF "xelatex" writeLaTeX def { writerStandalone = True }
In the body of my modified fuel.hs,
pdfbytes <- export letter
print pdfbytes
I get the following output:
$ stack runghc fuel.hs
Run from outside a project, using implicit global project config
Using resolver: lts-3.7 from implicit global project's config file: /home/stevejb/.stack/global/stack.yaml
Left "! Emergency stop.\n<*> /tmp/tex2pdf.8283/input.tex\n \nNo pages of output.\nTranscript written on /tmp/tex2pdf.8283/input.log.\n"
"Fail"
However, the log file that is being referenced does not exist. I am not sure how to debug this. I have xelatex installed.

With great help from #haskell IRC, I was able to get it working. The key was to add my own LaTeX template. Thus, one can use the following:
export :: (MonadIO m) => String -> Pandoc -> m (Either BL.ByteString BL.ByteString)
export tmpl pdoc = liftIO $ makePDF "xelatex" writeLaTeX (def { writerStandalone = True, writerTemplate = tmpl}) pdoc
getLetter = do
json <- BL.readFile "cng_fuel_chicago.json"
let letter = case decode json of
Just stations -> createLetter [s | s <- stations,
"Voyager" `elem` cardsAccepted s]
Nothing -> error "Could not decode JSON"
return $ letter
main :: IO ()
main = do
letter <- getLetter
temp <- readFile "template.tex"
let str_should_have_something = writeLaTeX (def {writerStandalone = True, writerTemplate = temp}) letter
print str_should_have_something
mybytes <- export temp letter
case mybytes of Right b -> BL.writeFile "mypdf.pdf" b
Left _ -> putStrLn "Export error"
To get a template, you can use Pandoc in standalone mode from the shell:
pandoc -D latex > template.tex
Also, there may be an issue with Pandoc installed using stack, using cabal, and using the system package manger, in terms of finding the default templates. I am not sure exactly how all of this interacts.
Fully contained gist here.

Related

Errorbundles after parsing with megaparsec

I currently have a working parser in megaparsec, where I build an AST for my program. I now want to do some weeding operations on my AST, while being able to use the same kind of pretty errors as the parser. While this stage is after parsing, I'm wondering if there are general practices for megaparsec in doing so. Is there a way for me to extract every line and comment (used in the bundle) and add it to each item in my AST? Is there any other way that people tackle this problem?
Apologies in advance if this sounds open ended, but I'm mainly wondering is there are some better ideas than getting the line numbers and creating bundles myself. I'm still new to haskell so I haven't been able to navigate properly through all the source code.
This was answered by the megaparsec developer here.
To summarize, parsers have a getOffset function that returns the current char index. You can use that along with an initial PosState to create an error bundle which you can later pretty print.
I have a sample project within the github thread, and pasted again here:
module TestParser where
import Data.List.NonEmpty as NonEmpty
import qualified Data.Maybe as Maybe
import qualified Data.Set as Set
import Data.Void
import Parser
import Text.Megaparsec
data Sample
= Test Int
String
| TestBlock [Sample]
| TestBlank
deriving (Show, Eq)
sampleParser :: Parser Sample
sampleParser = do
l <- many testParser
return $ f l
where
f [] = TestBlank
f [s] = s
f p = TestBlock p
testParser :: Parser Sample
testParser = do
offset <- getOffset
test <- symbol "test"
return $ Test offset test
fullTestParser :: Parser Sample
fullTestParser = baseParser testParser
testParse :: String -> Maybe (ParseErrorBundle String Void)
testParse input =
case parse (baseParser sampleParser) "" input of
Left e -> Just e
Right x -> do
(offset, msg) <- testVerify x
let initialState =
PosState
{ pstateInput = input
, pstateOffset = 0
, pstateSourcePos = initialPos ""
, pstateTabWidth = defaultTabWidth
, pstateLinePrefix = ""
}
let errorBundle =
ParseErrorBundle
{ bundleErrors = NonEmpty.fromList [TrivialError offset Nothing Set.empty]
-- ^ A collection of 'ParseError's that is sorted by parse error offsets
, bundlePosState = initialState
-- ^ State that is used for line\/column calculation
}
return errorBundle
-- Sample verify; throw an error on the second test key
testVerify :: Sample -> Maybe (Int, String)
testVerify tree =
case tree of
TestBlock [_, Test a _, _] -> Just (a, "Bad")
_ -> Nothing
testMain :: IO ()
testMain = do
testExample "test test test"
putStrLn "Done"
testExample :: String -> IO ()
testExample input =
case testParse input of
Just error -> putStrLn (errorBundlePretty error)
Nothing -> putStrLn "pass"
Some parts are from other files, but the important parts are in the code.

How do I create, and distinguish, global options using 'optparse-applicative'?

In my Haskell executable, created using optparse-applicative, I would like to have a global option for --version alongside the global --help option that is available from all subcommands. However the example provided (see below) for adding a --version option to to a CLI with subcommands results in a --version option that is inconsistently available
$ cli create --version
Invalid option `--version'
Usage: cli create NAME
Create a thing
$ cli delete --version
0.0
and never shows up in help for subcommands
$ cli create -h
Usage: cli create NAME
Create a thing
Available options:
NAME Name of the thing to create
-h,--help Show this help text
$ cli delete -h
Usage: cli delete
Delete the thing
Available options:
-h,--help Show this help text
The behavior I would like is for --version to be available globally and to all subcommands:
$ cli create -h
Usage: cli create NAME
Create a thing
Available options:
NAME Name of the thing to create
--version Show version
-h,--help Show this help text
$ cli delete -h
Usage: cli delete
Delete the thing
Available options:
--version Show version
-h,--help Show this help text
$ cli create --version
0.0
$ cli delete --version
0.0
It's not clear from the documentation how to achieve this.
In fact, I'd ideally like to be able to clearly group options in the help output:
$ cli create -h
Usage: cli create NAME
Create a thing
Arguments:
NAME Name of the thing to create
Global options:
--version Show version
-h,--help Show this help text
$ cli delete -h
Usage: cli delete
Delete the thing
Global options:
--version Show version
-h,--help Show this help text
Is there a way to achieve this using optparse-applicative?
{-#LANGUAGE ScopedTypeVariables#-}
import Data.Semigroup ((<>))
import Options.Applicative
data Opts = Opts
{ optGlobalFlag :: !Bool
, optCommand :: !Command
}
data Command
= Create String
| Delete
main :: IO ()
main = do
(opts :: Opts) <- execParser optsParser
case optCommand opts of
Create name -> putStrLn ("Created the thing named " ++ name)
Delete -> putStrLn "Deleted the thing!"
putStrLn ("global flag: " ++ show (optGlobalFlag opts))
where
optsParser :: ParserInfo Opts
optsParser =
info
(helper <*> versionOption <*> programOptions)
(fullDesc <> progDesc "optparse subcommands example" <>
header
"optparse-sub-example - a small example program for optparse-applicative with subcommands")
versionOption :: Parser (a -> a)
versionOption = infoOption "0.0" (long "version" <> help "Show version")
programOptions :: Parser Opts
programOptions =
Opts <$> switch (long "global-flag" <> help "Set a global flag") <*>
hsubparser (createCommand <> deleteCommand)
createCommand :: Mod CommandFields Command
createCommand =
command
"create"
(info createOptions (progDesc "Create a thing"))
createOptions :: Parser Command
createOptions =
Create <$>
strArgument (metavar "NAME" <> help "Name of the thing to create")
deleteCommand :: Mod CommandFields Command
deleteCommand =
command
"delete"
(info (pure Delete) (progDesc "Delete the thing"))
As far as I know, this (in particular, the categorized help text) isn't really easy to do with optparse-applicative, since it isn't quite the pattern that they were planning for with global arguments. If you are okay with using program --global-options command --local-options (which is a fairly standard pattern) instead of program command --global-and-local-options, then you can use the approach shown in the linked example:
$ ./optparse-sub-example
optparse-sub-example - a small example program for optparse-applicative with
subcommands
Usage: optparse [--version] [--global-flag] COMMAND
optparse subcommands example
Available options:
-h,--help Show this help text
--version Show version
--global-flag Set a global flag
Available commands:
create Create a thing
delete Delete the thing
$ ./optparse-sub-example --version create
0.0
$ ./optparse-sub-example --version delete
0.0
$ ./optparse-sub-example --global-flag create HI
Created the thing named HI
global flag: True
$ ./optparse-sub-example --global-flag delete
Deleted the thing!
global flag: True
(Note: I would advise going with this approach, since "global options before the command" is fairly standard).
If you also want the global options to be available in every subcommand, you will have a few issues.
As far as I know, there is no way to affect the help text output in order to group them separately inside the individual command help texts.
You will need some custom subparser-like function that adds your global options & merges them with any global options before the command.
For #2, one way to restructure the example to support this might be something along these lines:
To start with, standard boilerplate and imports:
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TupleSections #-}
{-# LANGUAGE ApplicativeDo #-}
import Data.Monoid
import Data.Semigroup ((<>))
import Options.Applicative
import Options.Applicative.Types
Opts are explicitly split into optGlobals and optCommand, making it easy to deal with all of the global options at once if more are available:
data Opts = Opts
{ optGlobals :: !GlobalOpts
, optCommand :: !Command
}
data GlobalOpts = GlobalOpts { optGlobalFlag :: Bool }
GlobalOpts should be a Semigroup and a Monoid, since we need to merge options seen at various different points (before the command, after the command, etc.). It should also be possible, with suitable alterations to mysubparser below, to require global options to be given only after commands and omit this requirement.
instance Semigroup GlobalOpts where
-- Code for merging option parser results from the multiple parsers run
-- at various different places. Note that this may be run with the default
-- values returned by one parser (from a location with no options present)
-- and the true option values from another, so it may be important
-- to distinguish between "the default value" and "no option" (since "no
-- option" shouldn't override another value provided earlier, while
-- "user-supplied value that happens to match the default" probably should).
--
-- In this case this doesn't matter, since the flag being provided anywhere
-- should be enough for it to be considered true.
(GlobalOpts f1) <> (GlobalOpts f2) = GlobalOpts (f1 || f2)
instance Monoid GlobalOpts where
-- Default values for the various options. These should probably match the
-- defaults used in the option declarations.
mempty = GlobalOpts False
As before, a Command type to represent the different possible commands:
data Command
= Create String
| Delete
The real magic: mysubparser wraps hsubparser to add global options and deal with merging them. It takes the parser for global options as an argument:
mysubparser :: forall a b. Monoid a
=> Parser a
-> Mod CommandFields b
-> Parser (a, b)
mysubparser globals cmds = do
To start with, it runs the global parser (to catch any globals given before a command):
g1 <- globals
It then uses hsubparser to get a command parser, and modifies it to also parse global options:
(g2, r) <- addGlobals $ hsubparser cmds
Finally, it merges the two global option sets, and returns the parsed global options and the command parser result:
pure (g1 <> g2, r)
where
The addGlobals helper function:
addGlobals :: forall c. Parser c -> Parser (a, c)
If NilP was given, we just use mempty to get the default option set:
addGlobals (NilP x) = NilP $ (mempty,) <$> x
The important case: if we have an OptP around an Option that uses a CommandReader, the globals parser is added to every command parser:
addGlobals (OptP (Option (CmdReader n cs g) ps)) =
OptP (Option (CmdReader n cs $ fmap go . g) ps)
where go pi = pi { infoParser = (,) <$> globals <*> infoParser pi }
In all other cases, either just use the default option set, or merge option sets from recursive Parsers as appropriate:
addGlobals (OptP o) = OptP ((mempty,) <$> o)
addGlobals (AltP p1 p2) = AltP (addGlobals p1) (addGlobals p2)
addGlobals (MultP p1 p2) =
MultP ((\(g2, f) -> \(g1, x) -> (g1 <> g2, f x)) <$> addGlobals p1)
(addGlobals p2)
addGlobals (BindP p k) = BindP (addGlobals p) $ \(g1, x) ->
BindP (addGlobals $ k x) $ \(g2, x') ->
pure (g1 <> g2, x')
Modifications to the main function are fairly minimal, and mostly related to using the new GlobalOpts. Once a parser for GlobalOpts is available, passing it to mysubparser is quite easy:
main :: IO ()
main = do
(opts :: Opts) <- execParser optsParser
case optCommand opts of
Create name -> putStrLn ("Created the thing named " ++ name)
Delete -> putStrLn "Deleted the thing!"
putStrLn ("global flag: " ++ show (optGlobalFlag (optGlobals opts)))
where
optsParser :: ParserInfo Opts
optsParser =
info
(helper <*> programOptions)
(fullDesc <> progDesc "optparse subcommands example" <>
header
"optparse-sub-example - a small example program for optparse-applicative with subcommands")
versionOption :: Parser (a -> a)
versionOption = infoOption "0.0" (long "version" <> help "Show version")
globalOpts :: Parser GlobalOpts
globalOpts = versionOption <*>
(GlobalOpts <$> switch (long "global-flag" <> help "Set a global flag"))
programOptions :: Parser Opts
programOptions =
uncurry Opts <$> mysubparser globalOpts (createCommand <> deleteCommand)
createCommand :: Mod CommandFields Command
createCommand =
command
"create"
(info createOptions (progDesc "Create a thing"))
createOptions :: Parser Command
createOptions =
Create <$>
strArgument (metavar "NAME" <> help "Name of the thing to create")
deleteCommand :: Mod CommandFields Command
deleteCommand =
command
"delete"
(info (pure Delete) (progDesc "Delete the thing"))
Notice that mysubparser should be a quite generic/reusable component.
This exhibits behavior closer to what you wanted:
$ ./optparse-sub-example create --global-flag HI
Created the thing named HI
global flag: True
$ ./optparse-sub-example --global-flag create HI
Created the thing named HI
global flag: True
$ ./optparse-sub-example --global-flag delete
Deleted the thing!
global flag: True
$ ./optparse-sub-example delete --global-flag
Deleted the thing!
global flag: True
$ ./optparse-sub-example delete
Deleted the thing!
global flag: False
$ ./optparse-sub-example delete --version
0.0
$ ./optparse-sub-example create --version
0.0

Haskells stack or interact are adding characters

I'm doing my first steps using Haskell. I created a project using stack and changed the Main.hs into
module Main where
my_fkt :: String -> String
my_fkt input = show (length input)
main :: IO ()
main = interact my_fkt
I build the project via stack build, run it via stack exec firststeps-exe, enter "abcd" and finish input via <CTRL>-D. In the console I now see
abcd4%
The %is inverted. If I use a text file containing the "abcd" (without line break) and execute more sample.txt | stack exec firststeps-exe I see
abcd5%
Why do I get one additional character in the second case and what is the inverted percentage sign?
That is because the definition of interact uses putStr instead of putStrLn.
You can take a look at the source code here.
interact :: (String -> String) -> IO ()
interact f = do s <- getContents
putStr (f s)
To remedy your issue I would go on and create a similar function
interact' :: (String -> String) -> IO ()
interact' f = do s <- getContents
putStrLn (f s)
or if you like to mix it up and write a bit terser code
interact' f = putStrLn =<< (f <$> getContents)
I don't know what the % is or why it is showing up, my guess would be that it is the escaped CTRL-D.
With regards to your second question about the additional "non-existing" character, I am also not sure, but here my guess would be that this is the \EOF.
Btw. you can always check using more testinput | wc -c it should yield the same result as your haskell program.

Haskell map does not iterate over the whole list

I'm trying to learn the basics of Haskell while developing a filter for Pandoc to recursively include additional markdown files.
Based on the scripting guide I was able to create a somewhat working filter. This looks for CodeBlocks with the include class and tries to include the ASTs of the referenced files.
```include
section-1.md
section-2.md
#pleasedontincludeme.md
```
The whole filter and the input sources could be found in the following repository: steindani/pandoc-include (or see below)
One could run pandoc with the filter and see the output in markdown format using the following command: pandoc -t json input.md | runhaskell IncludeFilter.hs | pandoc --from json --to markdown
I've noticed that the map function (at line 38) — although gets the list of files to include — only calls the function for the first element. And this is not the only strange behavior. The included file could also have an include block that is processed and the referenced file is included; but it won't go deeper, the include blocks of the last file are ignored.
Why does not the map function iterate over the whole list? Why does it stop after 2 levels of hierarchy?
Please note that I'm just starting to learn Haskell, I'm sure I made mistakes, but I'm happy to learn.
Thank you
Full source code:
module Text.Pandoc.Include where
import Control.Monad
import Data.List.Split
import Text.Pandoc.JSON
import Text.Pandoc
import Text.Pandoc.Error
stripPandoc :: Either PandocError Pandoc -> [Block]
stripPandoc p =
case p of
Left _ -> [Null]
Right (Pandoc _ blocks) -> blocks
ioReadMarkdown :: String -> IO(Either PandocError Pandoc)
ioReadMarkdown content = return (readMarkdown def content)
getContent :: String -> IO [Block]
getContent file = do
c <- readFile file
p <- ioReadMarkdown c
return (stripPandoc p)
doInclude :: Block -> IO [Block]
doInclude cb#(CodeBlock (_, classes, _) list) =
if "include" `elem` classes
then do
files <- return $ wordsBy (=='\n') list
contents <- return $ map getContent files
result <- return $ msum contents
result
else
return [cb]
doInclude x = return [x]
main :: IO ()
main = toJSONFilter doInclude
I can spot the following error in your doInclude function:
doInclude :: Block -> IO [Block]
doInclude cb#(CodeBlock (_, classes, _) list) =
if "include" `elem` classes
then do
let files = wordsBy (=='\n') list
let contents = map getContent files
let result = msum contents -- HERE
result
else
return [cb]
doInclude x = return [x]
Since the type of the result of this whole function is IO [Block], we can work backward:
result has type IO [Block]
contents has type [IO [Block]]
msum is being used with type [IO [Block]] -> IO [Block]
And that third part is the problem—somehow in your program, there is a non-standard MonadPlus instance being loaded for IO, and I bet that what it does on msum contents is this:
Execute the first action
If that succeeds, produce the same result as that and discard the rest of the list. (This is the cause of the behavior you observe.)
If it fails with an exception, try the rest of the list.
This isn't a standard MonadPlus instance so it's coming from one of the libraries that you're importing. I don't know which.
A general recommendation here would be:
Split your program into smaller functions
Write type signatures for those functions
Because the problem here seems to be that msum is being used with a different type than the one you expect. Normally this would produce a type error, but here you got unlucky and it interacted with a strange type class instance in some library.
From the comments, your intent with msum contents was to create an IO action that executes all of the subactions in sequence, and collects their result as a list. Well, the MonadPlus class isn't normally defined for IO, and when it is it does something else. So the correct function to use here is sequence:
-- Simplified version, the real one is more general:
sequence :: Monad m => [m a] -> m [a]
sequence [] = return []
sequence (ma:mas) = do
a <- ma
as <- mas
return (a:as)
That gets you from [IO [Block]] to IO [[Block]]. To eliminate the double nested lists then you just use fmap to apply concat inside IO.

simple rss downloader in haskell

Yesterday i tried to write a simple rss downloader in Haskell wtih hte help of the Network.HTTP and Feed libraries. I want to download the link from the rss item and name the downloaded file after the title of the item.
Here is my short code:
import Control.Monad
import Control.Applicative
import Network.HTTP
import Text.Feed.Import
import Text.Feed.Query
import Text.Feed.Types
import Data.Maybe
import qualified Data.ByteString as B
import Network.URI (parseURI, uriToString)
getTitleAndUrl :: Item -> (Maybe String, Maybe String)
getTitleAndUrl item = (getItemTitle item, getItemLink item)
downloadUri :: (String,String) -> IO ()
downloadUri (title,link) = do
file <- get link
B.writeFile title file
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)
I reached a state where i got a list which contains tuples, which contains name and the corresponding link. And i have a downloadUri function which properly downloads the given link to a file which has the name of the rss item title.
I already tried to modify downloadUri to work on (Maybe String,Maybe String) with fmap- ing on get and writeFile but failed with it horribly.
How can i apply my downloadUri function to the result of the getTuples function. I want to implement the following main function
main :: IO ()
main = some magic incantation donwloadUri more incantation getTuples
The character encoding of the result of getItemTitle broken, it puts code points in the places of the accented characters. The feed is utf8 encoded, and i thought that all haskell string manipulation functions are defaulted to utf8. How can i fix this?
Edit:
Thanks for you help, i implemented successfully my main and helper functions. Here comes the code:
downloadUri :: (Maybe String,Maybe String) -> IO ()
downloadUri (Just title,Just link) = do
item <- get link
B.writeFile title item
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = print "Somewhere something went Nothing"
getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> decodeString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)
downloadAllItems :: Maybe [(Maybe String, Maybe String)] -> IO ()
downloadAllItems (Just feedlist) = mapM_ downloadUri $ feedlist
downloadAllItems _ = error "feed does not get parsed"
main = getTuples >>= downloadAllItems
The character encoding issue has been partially solved, i put decodeString before the feed parsing, so the files get named properly. But if i want to print it out, the issue still happens. Minimal working example:
main = getTuples
It sounds like it's the Maybes that are giving you trouble. There are many ways to deal with Maybe values, and some useful library functions like fromMaybe and fromJust. However, the simplest way is to do pattern matching on the Maybe value. We can tweak your downloadUri function to work with the Maybe values. Here's an example:
downloadUri :: (Maybe String, Maybe String) -> IO ()
downloadUri (Just title, Just link) = do
file <- get link
B.writeFile title file
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = error "One of my parameters was Nothing".
Or maybe you can let the title default to blank, in which case you could insert this just before the last line in the previous example:
downloadUri (Nothing, Just link) = downloadUri (Just "", Just link)
Now the only Maybe you need to work with is the outer one, applied to the array of tuples. Again, we can pattern match. It might be clearest to write a helper function like this:
downloadAllItems (Just ts) = ??? -- hint: try a `mapM`
downloadAllItems Nothing = ??? -- don't do anything, or report an error, or...
As for your encoding issue, my guesses are:
You're reading the information from a file that isn't UTF-8 encoded, or your system doesn't realise that it's UTF-8 encoded.
You are reading the information correctly, but it gets messed up when you output it.
In order to help you with this problem, I need to see a full code example, which shows how you're reading the information and how you output it.
Your main could be something like the shown below. There may be some more concise way to compose these two operations though:
main :: IO ()
main = getTuples >>= process
where
process (Just lst) = foldl (\s v -> do {t <- s; download v}) (return ()) lst
process Nothing = return ()
download (Just t, Just l) = downloadUri (t,l)
download _ = return ()

Resources