hGetContents hangs when getting contents utf-8 file - haskell

I'm parsing files from a git repository and, while planning to use the gitlib module for that, I'm getting the file contents using the git executable for now - until I find some tutorial or have time to dive into gitlib's code.
I have a function that essentially run a "git show" for a specific file on a specific commit, and return its contents. Here is a full working example.
import System.IO
import System.Process
import System.Exit
main = do
let commit = Commit { hash = "811e22679008298176d8be24eedc65f9e8c4900b", time = ""}
fileIO <- showFileIO "/path/to/the/repo" (commit, "/path/to/the/file")
putStr (show fileIO)
showFileIO :: String -> (Commit, String) -> IO (Commit, String, String)
showFileIO directory (commit, filepath) = do
(_, Just hout, Just herr, procHandle) <- createProcess $ createCommand command directory
hSetEncoding hout utf8
hSetEncoding herr utf8
exitCode <- waitForProcess procHandle
stdOut <- hGetContents hout
stdErr <- hGetContents herr
if exitCode == ExitSuccess
then return (commit, filepath, stdOut)
-- Continue in the case of an error.
else return (commit, filepath, "")
where command = "git show " ++ (hash commit) ++ ":" ++ filepath
createCommand :: String -> FilePath -> CreateProcess
createCommand command directory = (shell command){std_out = CreatePipe, std_err = CreatePipe, cwd = Just directory}
-- Where Commit is defined as:
data Commit = Commit { hash :: String
, time :: String
} deriving (Show)
I was initially getting some errors ("invalid byte sequence") when getting the contents of a php file with mime-type "text/x-php" and charset "utf-8", and that was resolved when I set the encoding of the Handles to utf8. There is another file with mime-type "text/html" that is actually a html.twig file (Twig templating engine) with charset "utf-8". Now the function hangs indefinitely when trying to get the contents of this file. It works fine for other files.
Any ideas what could be wrong? How do I even get to debug in Haskell something that does not give me an error or any info? Are there any debugging tools that could help with that?

I would try something like this: (untested)
showFileIO directory (commit, filepath) = do
(_, Just hout, Just herr, procHandle) <- createProcess $ createCommand command directory
hSetEncoding hout utf8
hSetEncoding herr utf8
stdOut <- hGetContents hout
evaluate (length stdOut) -- strictify the above lazy IO
stdErr <- hGetContents herr
evaluate (length stdErr)
exitCode <- waitForProcess procHandle
if exitCode == ExitSuccess
...
Alternatively, use some strict-IO variant of hGetContents.
Note that there still is, as far as I can see, some window for deadlock. If the command produces a vast amount of data on stderr, then the command & OS buffers will become full and writes to stderr will block. Since the Haskell consumer now first waits for stdout to be consumed completely, we have a deadlock. Note that this will not be an issue for "short" error messages.
If we want to make it more robust, we need to read from both stdout and stderr at the same time. E.g.
showFileIO directory (commit, filepath) = do
(_, Just hout, Just herr, procHandle) <- createProcess $ createCommand command directory
hSetEncoding hout utf8
hSetEncoding herr utf8
stdOutV <- newEmptyMVar
stdErrV <- newEmptyMVar
forkIO $ do
stdOut <- hGetContents hout
evaluate (length stdOut)
putMVar stdOutV stdOut
forkIO $ fo
stdErr <- hGetContents herr
evaluate (length stdErr)
putMVar stdErrV stdErr
stdOut <- takeMVar stdOutV
stdErr <- takeMVar stdErrV
exitCode <- waitForProcess procHandle
if exitCode == ExitSuccess
...
Update. This should also work, and is much simpler.
showFileIO directory (commit, filepath) = do
(_, Just hout, Just herr, procHandle) <- createProcess $ createCommand command directory
hSetEncoding hout utf8
hSetEncoding herr utf8
stdOut <- hGetContents hout
stdErr <- hGetContents herr
forkIO $ evaluate (length stdOut)
evaluate (length stdErr)
exitCode <- waitForProcess procHandle
if exitCode == ExitSuccess
...
I wouldn't be surprised if there were some library function doing all of this for you, but I can't remember anything at the moment.
Unrelated: I prefer proc to shell to construct the CreateProcess options. The latter requires careful escaping of filenames (spaces, special characters), while the former simply takes a list of strings parameters.

Related

Strange monadic behaviour

I tried to write a program which takes filepaths as (command line) arguments and returns the first line of each file
main = getArgs >>= ( mapM_ ( \file -> ( openFile file ReadMode >>= ( (\handle -> hGetLine handle >>= print) >> hClose ) ) ) )
I know that this doesn't look very beautiful but I am just a beginner in Haskell. I did also avoid the do notation on purpose because I just don't feel very comfortable with her (yet).
So the Code above compiles and returns an error for invalid file paths, and nothing (i.e. especially not the first line of a file) for valid paths.
I must confess that I have pretty much no idea what I did wrong, but I made the following observation:
If I add the following to check which parts still get executed
main = getArgs >>= ( mapM_ ( \file -> ( openFile file ReadMode >>= ( (\handle -> hGetLine handle >>= print) >> (const $ putStr "Hello1") >> hClose >> (const $ putStr "Hello2") ) ) ) )
the program prints only the second "Hello", this reminded me of the type signature of (>>):
(>>) :: Monad m => m a -> m b -> m b
so taking into perspective that only something of the type of the second argument gets returned, maybe the first argument is just ignored?
But the first argument against this theory is that such a function would not seem to be very useful (at least not in the context of the IO Monad), and the second is that the program
main = (putStr "Hello" >> putStr "World" >> putStr "!")
returns 'HelloWorld!' as expected. Hence I must be completely on the wrong track, which is why I came here.
Thanks for your help!
I think you main error is that you messed up with the handle:
main = getArgs >>= (mapM_ (\file -> (openFile file ReadMode >>= (\handle -> (hGetLine handle >>= print) >> hClose handle) ) ) )
this way you did it >> was for the (-> handle) Monad (it's a reader monad - see there is an Monad instance for (->) c for constant c) not the IO!
So it did indeed pass the handle to both hGetLine handle >>= print and hClose but >> ignored the first resulting IO action and returned the hClose one as the result to >>
Here the effect was passing the handle!
So yes in the end the only executed IO-effect was closing the file!
It's subtle and not obvious as you seldom see/think about the reader-monad instance like this.
here is this with do notation
main = do
args <- getArgs
mapM_ (\file -> do
handle <- openFile file ReadMode
line <- hGetLine handle
print line
hClose handle) args
and I'd suggest switching to forM_ (from Control.Monad) for the args parameter:
main = do
args <- getArgs
forM_ args (\file -> do
handle <- openFile file ReadMode
line <- hGetLine handle
print line
hClose handle)
now you should make sure you close the handle - you can use bracket from Control.Exception for this:
main = do
args <- getArgs
forM_ args (\file -> do
bracket
(openFile file ReadMode)
hClose
(\h -> do
line <- hGetLine h
print line
)
)
or (as this is very common) just withFile from System.IO which does the opening/closing for you:
main = do
args <- getArgs
forM_ args (\file -> do
withFile file ReadMode
(\h -> do
line <- hGetLine h
print line
)
)
finally you don't really have to use all the handle stuff you can use the (lazy) readFile instead and be a bit safer with empty files too:
main = do
args <- getArgs
forM_ args (\file -> do
content <- readFile file
let ls = lines content
case ls of
[] -> putStrLn "no line in file"
(firstLine:_) -> putStrLn firstLine
)

Binding in expression

I am trying to on wheatear argument with file name is passed to either parse the file or read from standard input:
let formulae = fmap parseInput $ if isInputFile args then (hGetContents $ openFile (last args) ReadMode) else hGetContents stdin
but of course this error occurs:
formula-2-bdd.hs:89:79:
Couldn't match expected type `Handle' with actual type `IO Handle'
In the second argument of `($)', namely
`openFile (last args) ReadMode'
In the expression: (hGetContents $ openFile (last args) ReadMode)
If I bind IO Handle first I can't decided according to `isInputFile args:
handle <- openFile (last args) ReadMode
formulae = fmap parseInput $ if isInputFile args then hGetContents handle else hGetContents stdin
Let's understand what the actual problem with the original let expression is.
let formulae = fmap parseInput $ if isInputFile args
then (hGetContents $ openFile (last args) ReadMode)
else hGetContents stdin
The problem is that stdin is a Handle, while the return value of openFile is an IO Handle. To handle (no pun intended) both cases with the same code, you need to promote stdin to an IO Handle.
let formulae = fmap parseInput $ if isInputFile args
then (hGetContents $ openFile (last args) ReadMode)
else (hGetContents (pure stdin))
This can be rewritten a little more simply calling hGetContents on the result of the if expression, rather than having the if expression return the result of hGetContents.
let formulae = do handle <- if isInputFile
then openFile (last args) ReadMode
else return stdin
data <- hGetContents handle
fmap parseInput data

How to retrieve output from a process without blocking the thread in Haskell

What is the best way to write to the stdin and read from the stdout of a subprocess without blocking?
The subprocess was created via System.IO.createProcess which returns handles for writing to and reading from the subprocess. The writing and reading is done in text format.
For example, my best attempt at doing non-blocking read is timeout 1 $ hGetLine out which returns a Just "some line" or Nothing if no line exists to be read. However, this seems an hack to me, so I am looking for a more "standard" way.
Thanks
Here are some examples of how to interact with a spawned process in a fashion mentioned by #jberryman.
The program interacts with a script ./compute which simply reads lines from stdin in the form <x> <y> and returns x+1 after a delay of y seconds. More details at this gist.
There are many caveats when interacting with spawned processes. In order to avoid "suffering from buffering" you need to flush the outgoing pipe whenever you send input and the spawned process needs to flush stdout every time it sends a response. Interacting with the process via a pseudo-tty is an alternative if you find that stdout is not flushed promptly enough.
Also, the examples assume that closing the input pipe will lead to termination of the spawn process. If this is not the case you will have to send it a signal to ensure termination.
Here is the example code - see the main routine at the end for sample invocations.
import System.Environment
import System.Timeout (timeout)
import Control.Concurrent
import Control.Concurrent (forkIO, threadDelay, killThread)
import Control.Concurrent.MVar (newEmptyMVar, putMVar, takeMVar)
import System.Process
import System.IO
-- blocking IO
main1 cmd tmicros = do
r <- createProcess (proc "./compute" []) { std_out = CreatePipe, std_in = CreatePipe }
let (Just inp, Just outp, _, phandle) = r
hSetBuffering inp NoBuffering
hPutStrLn inp cmd -- send a command
-- block until the response is received
contents <- hGetLine outp
putStrLn $ "got: " ++ contents
hClose inp -- and close the pipe
putStrLn "waiting for process to terminate"
waitForProcess phandle
-- non-blocking IO, send one line, wait the timeout period for a response
main2 cmd tmicros = do
r <- createProcess (proc "./compute" []) { std_out = CreatePipe, std_in = CreatePipe }
let (Just inp, Just outp, _, phandle) = r
hSetBuffering inp NoBuffering
hPutStrLn inp cmd -- send a command, will respond after 4 seconds
mvar <- newEmptyMVar
tid <- forkIO $ hGetLine outp >>= putMVar mvar
-- wait the timeout period for the response
result <- timeout tmicros (takeMVar mvar)
killThread tid
case result of
Nothing -> putStrLn "timed out"
Just x -> putStrLn $ "got: " ++ x
hClose inp -- and close the pipe
putStrLn "waiting for process to terminate"
waitForProcess phandle
-- non-block IO, send one line, report progress every timeout period
main3 cmd tmicros = do
r <- createProcess (proc "./compute" []) { std_out = CreatePipe, std_in = CreatePipe }
let (Just inp, Just outp, _, phandle) = r
hSetBuffering inp NoBuffering
hPutStrLn inp cmd -- send command
mvar <- newEmptyMVar
tid <- forkIO $ hGetLine outp >>= putMVar mvar
-- loop until response received; report progress every timeout period
let loop = do result <- timeout tmicros (takeMVar mvar)
case result of
Nothing -> putStrLn "still waiting..." >> loop
Just x -> return x
x <- loop
killThread tid
putStrLn $ "got: " ++ x
hClose inp -- and close the pipe
putStrLn "waiting for process to terminate"
waitForProcess phandle
{-
Usage: ./prog which delay timeout
where
which = main routine to run: 1, 2 or 3
delay = delay in seconds to send to compute script
timeout = timeout in seconds to wait for response
E.g.:
./prog 1 4 3 -- note: timeout is ignored for main1
./prog 2 2 3 -- should timeout
./prog 2 4 3 -- should get response
./prog 3 4 1 -- should see "still waiting..." a couple of times
-}
main = do
(which : vtime : tout : _) <- fmap (map read) getArgs
let cmd = "10 " ++ show vtime
tmicros = 1000000*tout :: Int
case which of
1 -> main1 cmd tmicros
2 -> main2 cmd tmicros
3 -> main3 cmd tmicros
_ -> error "huh?"

How can I get unicode output in Haskell

I have a main function that outputs unicode which looks like this:
main = do
hSetEncoding stdout utf8
input <- getContents
mapM_ putStr $ myfunc input
How can I write this function without do notation?
I get <stdout>: commitBuffer: invalid argument (invalid character) when I try to compile this main function:
main = getContents >>= mapM_ putStr . myfunc
Just use sequence (>>):
main = do
hSetEncoding stdout utf8
input <- getContents
mapM_ putStr $ myfunc input
~~>
main = hSetEncoding stdout utf8 >> getContents >>= \input -> mapM_ putStr $ lines input
~~>
main = hSetEncoding stdout utf8 >> getContents >>= mapM_ putStr . lines

Portably opening a handle to stdin many times in a single session

Code:
main = do
putStrLn "4917 Microprocessor\nEnter the Machine Code to be run: "
inp <- getContents
putStrLn "The output of the Program is:"
fState <- ((runStateT _4917) . construct . parse) inp
args <- getArgs
if elem "-v" args then putStrLn ("\nFinal state was: " ++ (show . snd) fState) else return ()
putStrLn "\n================================ RESTART ================================"
main
where parse xs = array (0,15) $
zip [0..15] $
take 16 $
map ((makeArbInt 4) . read) (words (filter ((/=)'.') xs)) ++ repeat (makeArbInt 4 0)
construct xs = Program xs z z z z 0 False
z = (makeArbInt 4 0)
There's more but this is the relevant part. Basically line 3 needs to be evaluated multiple times but getContents is closing the stdin handle:
4917: <stdin>: hGetContents: illegal operation (handle is closed)
Is there a way to reopen the handle? Or some way of preventing getContents from doing that? (Maybe I'm sending the wrong signal. I'm sending over a Ctrl-D EOF on Linux. Maybe I should use EOT or something instead?)
edit: I've managed to get the desired behaviour but it won't port to windows.
mystdin <- openFile "/dev/tty" ReadMode
inp <- hGetContents mystdin
New question: is there a generic way to portably open a handle to stdin?
You cannot prevent getContents from closing the file, and a closed file stays closed.
It seems to me that you need some other function. Usually, when you read parts of a file, you know when to stop (end of line, certain bytes on the stream). So yes: If you cannot parse the data that you are reading and detect when it is done, you should use some kind of delimiter (possibly EOT, or an empty line, or special text unlikely to occur in the data like __END__).
Have them enter a blank line to end input:
getContents2 = helper "" where
helper str = do
a <- getLine
if "" == a then return str
else helper (str++"\n"++a)
You might also haven them signal the end of the session by entering a single blank line.
To open a handle to stdin portably, use hDuplicate function on the existing stdio handle to get a new one:
mystdin <- hDuplicate stdin
inp <- hGetContents mystdin
Make sure never to close the original stdin, so that you can make more duplicates as appropriate. (I'm not sure if this is good Haskell style)

Resources