String Formatting columns in Haskell without Text.Printf - string

I am new to Haskell. I am at the last part of a school project. I have to take tuples and print them to an outfile and separate them by a tab column. So (709,4226408), (12965,4226412) and (5,4226016) should have and output of
709 4226408
12965 4226412
5 4226016
What I have been trying to do is this:
genOutput :: (Int, Int) -> String
genOutput (a,b) = (show a) ++ "\t" ++ (show b)
And this gives outputs like:
"709\t4226408"
"12965\t4226412"
"5\t4226016"
There are 3 things wrong with this. 1) Quotes still appear in the output. 2) The \t tab does not actually become a tab space. .Whenever I try to make an actual tab for the "" it just comes out as a " " space. 3) They are not aligned into columns like the above example. I know Text.Printf exists but we are not allowed to import anything other than:
import System.IO
import Data.List
import System.Environment

that's the output you get from GHCi I guess? Try to use putStrLn instead:
Prelude> genOutput (1,42)
"1\t42"
Prelude> putStrLn $ genOutput (1,42)
1 42
Why is that?
If you tell GHCi to evaluate an expression it will do so and (more or less) output it using show - show is designed to work with read and will usually output a value as if you would input it directly into Haskell. For a String that will include escape sequences and the "s
Now using putStrLn it will take the string and print it to stdout as you would expect.
Using print
Another reason could be that you use print to output your value - print is show + putStrLn so it'll show the values first re-introducing the escapes (as GHCi would) - so if you use print change it to putStrLn if you are using Strings

Related

Calculate simple math expression pass as a command line argument

I'm a newbie in Haskell and I'm lost. I was trying to parse a math expression, but really don't know how Haskell programming works well. So what I'm trying to write is a program to resolve a simple math expression. I'm looking for ideas on how I could resolve by giving arguments.
The command line could look like : ./math "3 + 2" or ./math "5 * 8"
My code looks like this:
import System.Environment (getArgs)
import Text.Printf
main :: IO ()
main = do
args <- getArgs
printf "%.2f" args[1] + args[2]
Haskell has no array[index] syntax. It does have list!!index syntax (which isn't really special syntax at all, !! is just an infix-function defined in the prelude). Note that Haskell indices are 0-based and unlike in Bash, the zeroth argument is not the command name itself, so you probably want indices 0 and 1.
Also, in Haskell function application binds more tightly than any operators. So, if you were to write
printf "%.2f" args!!0 + args!!1
it would parse as ((printf "%.2f" args)!!0) + (args!!1), which is obviously not right. You need to make explicit what precedence you want:
printf "%.2f" (args!!0 + args!!1)
or as we like to do it, with $ instead of parens:
printf "%.2f" $ args!!0 + args!!1
That's still not right, because the arguments come in as strings, but the addition should be performed on numbers. For this, you need to read the numbers; I'd suggest you do that separately:
import Text.Read (readMaybe)
main = do
args <- getArgs
let a, b :: Double
Just a = readMaybe $ args!!0
Just b = readMaybe $ args!!1
printf "%.2f" $ a + b
$ runhaskell Argsmath.hs 3 2
5.00
Of course this will not allow you to do stuff like ./math "5 * 8" because you have no means of parsing the *. For that, something read-based would be awkward; I suggest you check out parser combinator libraries, there are plenty of tutorials around; this one seems to be nice and simple.

Dealing with tabular data in Haskell

This is an excerpt of a file.csv file with some tabular data
John,23,Paris
Alban,28,London
Klaus,27,Berlin
Hans,29,Stockholm
Julian,25,Paris
Jonathan,26,Lyon
Albert,27,London
The column headers for this file would be
firstName, age, city
This file is loaded in ghci like this
𝛌> :m + Data.List Data.Function Data.List.Split
𝛌> contents <- readFile "file.csv"
𝛌> let t = map (splitOn ",") $ lines contents
𝛌> mapM print $ take 3 t
["John","23","Paris"]
["Alban","28","London"]
["Klaus","27","Berlin"]
[(),(),()]
Now, if I want to add a birthYear column to those 3 columns, I can do
𝛌> let getYear str = show $ 2016 - read str
𝛌> let withYear = map (\(x:xs) -> x : xs ++ [getYear (head xs)]) t
𝛌> mapM print $ take 3 withYear
["John","23","Paris","France","1993"]
["Alban","28","London","UK","1988"]
["Klaus","27","Berlin","Germany","1989"]
[(),(),()]
This works well but what bothers me is that the getYear function has type String -> String and as such, type checking is pretty much useless here.
I could easily convert t into a list of tuples like ("John", 23, "Paris") but what about if I have not 3, but 300 features (which is not that uncommon in machine learning problems)?
What would be the best way to deal with different column types? Using tuples? Using maps?
In case of a big number of columns, is there a way to make Haskell infer the column's types? For instance, it would detect that column 2 in the above example is Int, and the others are strings?
Concerning column headers, would there be a way that one could simply access the columns by label instead of by index, so that getYear could be something like 2016 - column['age'] (Python example)?
I'm used to Python's Pandas DataFrames which perform all this stuff automagically, but Haskell looks like it could perform a ton of it natively. Not sure how to do this however as of now.

Why is building a Haskell String from Data.Text so slow

So I had a location class
data Location = Location {
title :: String
, description :: String
}
instance Show Location where
show l = title l ++ "\n"
++ replicate (length $ title l) '-' ++ "\n"
++ description l
Then I changed it to use Data.Text
data Location = Location {
title :: Text
, description :: Text
}
instance Show Location where
show l = T.unpack $
title l <> "\n"
<> T.replicate (T.length $ title l) "-" <> "\n"
<> description l
Using criterion, I benchmarked the time taken by show on both the String and Data.Text implementations:
benchmarks = [ bench "show" (whnf show l) ]
where l = Location {
title="My Title"
, description = "This is the description."
}
The String implementation took 34ns, the Data.Text implementation was almost six times slower, at 170ns
How do I get Data.Text working as fast as String?
Edit: Silly mistakes
I'm not sure how this happened, but I cannot replicate the original speed difference: now for String and Text I get 28ns and 24ns respectively
For the more aggressive bench "length.show" (whnf (length . show) l) benchmark, for String and Text, I get 467ns and 3954ns respectively.
If I use a very basic lazy builder, without the replicated dashes
import qualified Data.Text.Lazy.Builder as Bldr
instance Show Location where
show l = show $
Bldr.fromText (title l) <> Bldr.singleton '\n'
-- <> Bldr.fromText (T.replicate (T.length $ title l) "-") <> Bldr.singleton '\n'
<> Bldr.fromText (description l)
and try the original, ordinary show benchmark, I get 19ns. Now this is buggy, as using show to convert a builder to a String will escape newlines. If I replace it with LT.unpack $ Bldr.toLazyText, where LT is a qualified import of Data.Text.Lazy, then I get 192ns.
I'm testing this on a Mac laptop, and I suspect my timings are getting horribly corrupted by machine noise. Thanks for the guidance.
You can't make it as fast, but you can speed it up some.
Appending
Text is represented as an array. This makes <> rather slow, because a new array has to be allocated and each Text copied into it. You can fix this by converting each piece to a String first, and then concatenating them. I imagine Text probably also offers an efficient way to concatenate multiple texts at once (as a commenter mentions, you can use a lazy builder) but for this purpose that will be slower. Another good option might be the lazy version of Text, which probably supports efficient concatenation.
Sharing
In your String-based implementation, the description field doesn't have to be copied at all. It's just shared between the Location and the result of showing that Location. There's no way to accomplish this with the Text version.
In the String case you are not fully evaluating all of the string operations - (++) and replicate.
If you change your benchmark to:
benchmarks = [ bench "show" (whnf (length.show) l) ]
you'll see that the String case takes around 520 ns - approx 10 times longer.

Is there a (Template) Haskell library that would allow me to print/dump a few local bindings with their respective names?

For instance:
let x = 1 in putStrLn [dump|x, x+1|]
would print something like
x=1, (x+1)=2
And even if there isn't anything like this currently, would it be possible to write something similar?
TL;DR There is this package which contains a complete solution.
install it via cabal install dump
and/or
read the source code
Example usage:
{-# LANGUAGE QuasiQuotes #-}
import Debug.Dump
main = print [d|a, a+1, map (+a) [1..3]|]
where a = 2
which prints:
(a) = 2 (a+1) = 3 (map (+a) [1..3]) = [3,4,5]
by turnint this String
"a, a+1, map (+a) [1..3]"
into this expression
( "(a) = " ++ show (a) ++ "\t " ++
"(a+1) = " ++ show (a + 1) ++ "\t " ++
"(map (+a) [1..3]) = " ++ show (map (+ a) [1 .. 3])
)
Background
Basically, I found that there are two ways to solve this problem:
Exp -> String The bottleneck here is pretty-printing haskell source code from Exp and cumbersome syntax upon usage.
String -> Exp The bottleneck here is parsing haskell to Exp.
Exp -> String
I started out with what #kqr put together, and tried to write a parser to turn this
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
into this
["not x = False","x = True","x == True = True"]
But after trying for a day, my parsec-debugging-skills have proven insufficient to date, so instead I went with a simple regular expression:
simplify :: String -> String
simplify s = subRegex (mkRegex "_[0-9]+|([a-zA-Z]+\\.)+") s ""
For most cases, the output is greatly improved.
However, I suspect this to likely mistakenly remove things it shouldn't.
For example:
$(dump [|(elem 'a' "a.b.c", True)|])
Would likely return:
["elem 'a' \"c\" = True","True = True"]
But this could be solved with proper parsing.
Here is the version that works with the regex-aided simplification: https://github.com/Wizek/kqr-stackoverflow/blob/master/Th.hs
Here is a list of downsides / unresolved issues I've found with the Exp -> String solution:
As far as I know, not using Quasi Quotation requires cumbersome syntax upon usage, like: $(d [|(a, b)|]) -- as opposed to the more succinct [d|a, b|]. If you know a way to simplify this, please do tell!
As far as I know, [||] needs to contain fully valid Haskell, which pretty much necessitates the use of a tuple inside further exacerbating the syntactic situation. There is some upside to this too, however: at least we don't need to scratch our had where to split the expressions since GHC does that for us.
For some reason, the tuple only seemed to accept Booleans. Weird, I suspect this should be possible to fix somehow.
Pretty pretty-printing Exp is not very straight-forward. A more complete solution does require a parser after all.
Printing an AST scrubs the original formatting for a more uniform looks. I hoped to preserve the expressions letter-by-letter in the output.
The deal-breaker was the syntactic over-head. I knew I could get to a simpler solution like [d|a, a+1|] because I have seen that API provided in other packages. I was trying to remember where I saw that syntax. What is the name...?
String -> Exp
Quasi Quotation is the name, I remember!
I remembered seeing packages with heredocs and interpolated strings, like:
string = [qq|The quick {"brown"} $f {"jumps " ++ o} the $num ...|]
where f = "fox"; o = "over"; num = 3
Which, as far as I knew, during compile-time, turns into
string = "The quick " ++ "brown" ++ " " ++ $f ++ "jumps " ++ o ++ " the" ++ show num ++ " ..."
where f = "fox"; o = "over"; num = 3
And I thought to myself: if they can do it, I should be able to do it too!
A bit of digging in their source code revealed the QuasiQuoter type.
data QuasiQuoter = QuasiQuoter {quoteExp :: String -> Q Exp}
Bingo, this is what I want! Give me the source code as string! Ideally, I wouldn't mind returning string either, but maybe this will work. At this point I still know quite little about Q Exp.
After all, in theory, I would just need to split the string on commas, map over it, duplicate the elements so that first part stays string and the second part becomes Haskell source code, which is passed to show.
Turning this:
[d|a+1|]
into this:
"a+1" ++ " = " ++ show (a+1)
Sounds easy, right?
Well, it turns out that even though GHC most obviously is capable to parse haskell source code, it doesn't expose that function. Or not in any way we know of.
I find it strange that we need a third-party package (which thankfully there is at least one called haskell-src-meta) to parse haskell source code for meta programming. Looks to me such an obvious duplication of logic, and potential source of mismatch -- resulting in bugs.
Reluctantly, I started looking into it. After all, if it is good enough for the interpolated-string folks (those packaged did rely on haskell-src-meta) then maybe it will work okay for me too for the time being.
And alas, it does contain the desired function:
Language.Haskell.Meta.Parse.parseExp :: String -> Either String Exp
Language.Haskell.Meta.Parse
From this point it was rather straightforward, except for splitting on commas.
Right now, I do a very simple split on all commas, but that doesn't account for this case:
[d|(1, 2), 3|]
Which fails unfortunatelly. To handle this, I begun writing a parsec parser (again) which turned out to be more difficult than anticipated (again). At this point, I am open to suggestions. Maybe you know of a simple parser that handles the different edge-cases? If so, tell me in a comment, please! I plan on resolving this issue with or without parsec.
But for the most use-cases: it works.
Update at 2015-06-20
Version 0.2.1 and later correctly parses expressions even if they contain commas inside them. Meaning [d|(1, 2), 3|] and similar expressions are now supported.
You can
install it via cabal install dump
and/or
read the source code
Conclusion
During the last week I've learnt quite a bit of Template Haskell and QuasiQuotation, cabal sandboxes, publishing a package to hackage, building haddock docs and publishing them, and some things about Haskell too.
It's been fun.
And perhaps most importantly, I now am able to use this tool for debugging and development, the absence of which has been bugging me for some time. Peace at last.
Thank you #kqr, your engagement with my original question and attempt at solving it gave me enough spark and motivation to continue writing up a full solution.
I've actually almost solved the problem now. Not exactly what you imagined, but fairly close. Maybe someone else can use this as a basis for a better version. Either way, with
{-# LANGUAGE TemplateHaskell, LambdaCase #-}
import Language.Haskell.TH
dump :: ExpQ -> ExpQ
dump tuple =
listE . map dumpExpr . getElems =<< tuple
where
getElems = \case { TupE xs -> xs; _ -> error "not a tuple in splice!" }
dumpExpr exp = [| $(litE (stringL (pprint exp))) ++ " = " ++ show $(return exp)|]
you get the ability to do something like
λ> let x = True
λ> print $(dump [|(not x, x, x == True)|])
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
which is almost what you wanted. As you see, it's a problem that the pprint function includes module prefixes and such, which makes the result... less than ideally readable. I don't yet know of a fix for that, but other than that I think it is fairly usable.
It's a bit syntactically heavy, but that is because it's using the regular [| quote syntax in Haskell. If one wanted to write their own quasiquoter, as you suggest, I'm pretty sure one would also have to re-implement parsing Haskell, which would suck a bit.

Adding the possibility to write a AST-file to my (rail-)compiler

I'm writing rail-compiler (rail is an esoteric language) in Haskell and I get some problems within the main-function of my mainmodule.
1) I want my program to ask wheter I want to run the compiling-pipeline or simply stop after the lexer and write the AST to a file so another compiler can deal with my AST (Abstract Synatx Tree). Here is my program:
module Main (
main -- main function to run the program
)
where
-- imports --
import InterfaceDT as IDT
import qualified Testing as Test
import qualified Preprocessor as PreProc
import qualified Lexer
import qualified SyntacticalAnalysis as SynAna
import qualified SemanticalAnalysis as SemAna
import qualified IntermediateCode as InterCode
import qualified CodeOptimization as CodeOpt
import qualified Backend
-- functions --
main :: IO()
main = do putStr "Enter inputfile (path): "
inputfile <- getLine
input <- readFile inputfile
putStr "Enter outputfile (path): "
outputfile <- getLine
input <- readFile inputfile
putStr "Only create AST (True/False): "
onlyAST <- getLine
when (onlyAST=="True") do putStrLn "Building AST..."
writeFile outputfile ((Lexer.process . PreProc.process) input)
when (onlyAST=="False") do putStrLn ("Compiling "++inputfile++" to "++outputfile)
writeFile outputfile ((Backend.process . CodeOpt.process . InterCode.process . SemAna.process . SynAna.process . Lexer.process . PreProc.process) input)
I get an error in Line 21 (input <- readFile inputfile) caused by the <-. Why?
How should I do it?
2) Next thing is that I want to refactor the program in that way, that I can call it from the terminal with parameters like runhaskell Main(AST) (in that way it should just create the AST) or like runhaskell Main.hs (in that way it should do the whole pipeline).
I hope for your help!
For your error in (1), your program doesn't look syntactically incorrect at line 21 to me. However an error at <- would happen if that line were indented differently from the previous one. I suspect that you are having an indentation error due to mixing tabs and spaces in a way that looks correct in your editor but disagrees with Haskell's interpretation of tabs. The simplest recommendation is to always use spaces and never tabs.
You also have an extra copy of that line later, which you might want to remove.
I also suspect you may need to use hFlush stdin after your putStr's, for them to work as prompts.
For (2), I'd suggest using a library for proper command line argument and option parsing, such as System.Console.GetOpt which is included with GHC, or one of the fancier ones which you can find on Hackage.

Resources