Adding the possibility to write a AST-file to my (rail-)compiler - haskell

I'm writing rail-compiler (rail is an esoteric language) in Haskell and I get some problems within the main-function of my mainmodule.
1) I want my program to ask wheter I want to run the compiling-pipeline or simply stop after the lexer and write the AST to a file so another compiler can deal with my AST (Abstract Synatx Tree). Here is my program:
module Main (
main -- main function to run the program
)
where
-- imports --
import InterfaceDT as IDT
import qualified Testing as Test
import qualified Preprocessor as PreProc
import qualified Lexer
import qualified SyntacticalAnalysis as SynAna
import qualified SemanticalAnalysis as SemAna
import qualified IntermediateCode as InterCode
import qualified CodeOptimization as CodeOpt
import qualified Backend
-- functions --
main :: IO()
main = do putStr "Enter inputfile (path): "
inputfile <- getLine
input <- readFile inputfile
putStr "Enter outputfile (path): "
outputfile <- getLine
input <- readFile inputfile
putStr "Only create AST (True/False): "
onlyAST <- getLine
when (onlyAST=="True") do putStrLn "Building AST..."
writeFile outputfile ((Lexer.process . PreProc.process) input)
when (onlyAST=="False") do putStrLn ("Compiling "++inputfile++" to "++outputfile)
writeFile outputfile ((Backend.process . CodeOpt.process . InterCode.process . SemAna.process . SynAna.process . Lexer.process . PreProc.process) input)
I get an error in Line 21 (input <- readFile inputfile) caused by the <-. Why?
How should I do it?
2) Next thing is that I want to refactor the program in that way, that I can call it from the terminal with parameters like runhaskell Main(AST) (in that way it should just create the AST) or like runhaskell Main.hs (in that way it should do the whole pipeline).
I hope for your help!

For your error in (1), your program doesn't look syntactically incorrect at line 21 to me. However an error at <- would happen if that line were indented differently from the previous one. I suspect that you are having an indentation error due to mixing tabs and spaces in a way that looks correct in your editor but disagrees with Haskell's interpretation of tabs. The simplest recommendation is to always use spaces and never tabs.
You also have an extra copy of that line later, which you might want to remove.
I also suspect you may need to use hFlush stdin after your putStr's, for them to work as prompts.
For (2), I'd suggest using a library for proper command line argument and option parsing, such as System.Console.GetOpt which is included with GHC, or one of the fancier ones which you can find on Hackage.

Related

Editable default string in Haskell's terminal input

I want to be able to prompt the user for input (let's say a FilePath), but also to offer a mutable/interactive string as a default, so instead of having the user type the full path, I can prompt with:
C:\Users\John\project\test
and have them be able to backspace 4 times and enter final to yield C:\Users\John\project\final, rather than type the entire path.
However printing a default string with putStr or System.IO.hPutStr stdout does print this default to the terminal, but does not allow me to alter any of it. E.g.
import System.IO
main = do
hSetBuffering stdout NoBuffering
putStr "C:\\Users\\John\\project\\test"
l <- getLine
doSomethingWith l
I suspect Data.Text.IO's interact may be able to do what I want but I could not get it to work.
Any suggestions would be greatly appreciated.
getLine doesn’t offer any facility for line editing. For this you can use a library like haskeline instead, for example:
import System.Console.Haskeline
main :: IO ()
main = do
runInputT defaultSettings $ do
mInput <- getInputLineWithInitial "Enter path: "
("C:\\Users\\John\\project\\test", "")
case mInput of
Nothing -> do
outputStrLn "No entry."
Just input -> do
outputStrLn $ "Entry: " ++ show input
An alternative is to invoke the program with a wrapper that provides line editing, such as rlwrap. For building a more complex fullscreen text UI, there is also brick, which provides a simple text editing component in Brick.Widgets.Edit.

Read large lines in huge file without buffering

I was wondering if there's an easy way to get lines one at a time out of a file without eventually loading the whole file in memory. I'd like to do a fold over the lines with an attoparsec parser. I tried using Data.Text.Lazy.IO with hGetLine and that blows through my memory. I read later that eventually loads the whole file.
I also tried using pipes-text with folds and view lines:
s <- Pipes.sum $
folds (\i _ -> (i+1)) 0 id (view Text.lines (Text.fromHandle handle))
print s
to just count the number of lines and it seems to be doing some wonky stuff "hGetChunk: invalid argument (invalid byte sequence)" and it takes 11 minutes where wc -l takes 1 minute. I heard that pipes-text might have some issues with gigantic lines? (Each line is about 1GB)
I'm really open to any suggestions, can't find much searching except for newbie readLine how-tos.
Thanks!
The following code uses Conduit, and will:
UTF8-decode standard input
Run the lineC combinator as long as there is more data available
For each line, simply yield the value 1 and discard the line content, without ever read the entire line into memory at once
Sum up the 1s yielded and print it
You can replace the yield 1 code with something which will do processing on the individual lines.
#!/usr/bin/env stack
-- stack --resolver lts-8.4 --install-ghc runghc --package conduit-combinators
import Conduit
main :: IO ()
main = (runConduit
$ stdinC
.| decodeUtf8C
.| peekForeverE (lineC (yield (1 :: Int)))
.| sumC) >>= print
This is probably easiest as a fold over the decoded text stream
{-#LANGUAGE BangPatterns #-}
import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.ByteString as PB
import qualified Pipes.Text.Encoding as PT
import qualified Control.Foldl as L
import qualified Control.Foldl.Text as LT
main = do
n <- L.purely P.fold (LT.count '\n') $ void $ PT.decodeUtf8 PB.stdin
print n
It takes about 14% longer than wc -l for the file I produced which was just long lines of commas and digits. IO should properly be done with Pipes.ByteString as the documentation says, the rest is conveniences of various sorts.
You can map an attoparsec parser over each line, distinguished by view lines, but keep in mind that an attoparsec parser can accumulate the whole text as it pleases and this might not be a great idea over a 1 gigabyte chunk of text. If there is a repeated figure on each line (e.g. word separated numbers) you can use Pipes.Attoparsec.parsed to stream them.

Get args Haskell

I'm having problems with an exercise, and can not understand the error.
It should be a simple exercise with args:
import System.IO
import System.Environment
main= do
args < - getArgs
nomeficheiro <- return( args !! 0)
putStrnLn ( "Name is" ++ nomeficheiro)
Then i should run it, with : $ ./comando James
The error:
<interactive>:51:1:
parse error on input ‘$’
Perhaps you intended to use TemplateHaskell
I've read other doubts about args at this fórum and I didn't find any answer that could help me
$ ./comando James isn't meant to be run on GHCi. Instead, $ at the start of the line indicates that the following line should be run in your bash/cmd/shell, not in GHCi:
# in your favourite shell, in the correct directory
./comando James
If you want to run main with arguments within GHCi, you can use :main args:
ghci> :main James
Further remarks
Your current code isn't indented correctly, so make sure that you fix this too. Also, you can use let nomeficheiro = head args instead of … <- return …. Keep in mind that this could lead to problems if one doesn't supply any argument to your program, since head [] calls error.

Flexible number of arguments to haskell program

I am using the System.FilePath.Find module of filemanip to recursively find all files I need to process (here I will be using just printing to console as the action to perform, in order not confuse things). Now, this code:
import System.Environment (getArgs)
import System.FilePath (FilePath)
import System.Directory (doesDirectoryExist, getDirectoryContents,doesFileExist)
import Control.Monad
import System.FilePath.Find (find,always,fileType,(==?),FileType(..),(&&?),extension)
main= do
[dbFile,input]<- getArgs
files <- findFiles input
mapM_ putStrLn files
return ()
searchExtension :: String
searchExtension = ".hs"
findFiles :: FilePath -> IO [String]
findFiles = find (always) ( fileType ==? RegularFile &&? extension ==? searchExtension)
works well with this call
./myprog tet .
In this case, the get argument is ignored (will be the output database file later) and the second argument is searched recursively for matching files. It also allows me to specify just a single file, which is just perfect!
BUT, I would like to be able to specify
./myprog tet path1 path2 path4 file1
but this of course fails in the pattern matching:
./myprog tet . .
myprogt: user error (Pattern match failure in do expression at myprog.hs:11:9-22)
Now, how do I make this program more flexible, so that I can take more than two arguments?
Sorry for asking this, actually, but my Haskell knowledge is limited but increasing for every new thing I have to do in my first project.
Well, you can use a different pattern like:
(dbFile:inputs) <- getArgs
where dbFile will match the first argument passed while inputs will match any number of file names (even 0. If you want at least one path name use inputs#(_:_) instead of the simple inputs).
Then you can use mapM to call findFiles for each path in inputs:
files <- mapM findFiles input
mapM_ putStrLn $ concat files
Instead of mapM you could modify findFiles to accept a [FilePath] argument instead of a simple FilePath.
Note that to parse command arguments you could consider using some module like getopt. You should also read this page about argument handling.

Haskell IO russian symbols

I an trying to process a file which contains russian symbols. When reading and after writing some text to the file I get something like:
\160\192\231\229\240\225\224\233\228\230\224\237
How can I get normal symbols?
If you are getting strings with backslashes and numbers in, then it sounds like you might be calling "print" when you want to call "putStr".
If you deal with Unicode, you might try utf8-string package
import System.IO hiding (hPutStr, hPutStrLn, hGetLine, hGetContents, putStrLn)
import System.IO.UTF8
import Codec.Binary.UTF8.String (utf8Encode)
main = System.IO.UTF8.putStrLn "Вася Пупкин"
However it didn't work well in my windows CLI garbling the output because of codepage. I expect it to work fine on other Unix-like systems if your locale is set correctly. However writing to file should be successfull on all systems.
UPDATE:
An example on encoding package usage.
I have got success.
{-# LANGUAGE ImplicitParams #-}
import Network.HTTP
import Text.HTML.TagSoup
import Data.Encoding
import Data.Encoding.CP1251
import Data.Encoding.UTF8
openURL x = do
x <- simpleHTTP (getRequest x)
fmap (decodeString CP1251) (getResponseBody x)
main :: IO ()
main = do
tags <- fmap parseTags $ openURL "http://www.trade.su/search?ext=1"
let TagText r = partitions (~== "<input type=checkbox>") tags !! 1 !! 4
appendFile "out" r

Resources