How could I watch several files/sockets from Haskell and wait for these to become readable/writable?
Is there anything like the select/epoll/... in Haskell? Or I am forced to spawn one thread per file/socket and always use the blocking resource from within that thread?
The question is wrong: you aren't forced to spawn one thread per file/socket and use blocking calls, you get to spawn one thread per file/socket and use blocking calls. This is the cleanest solution (in any language); the only reason to avoid it in other languages is that it's a bit inefficient there. GHC's threads are cheap enough, however, that it is not inefficient in Haskell. (Additionally, behind the scenes, GHC's IO manager uses an epoll-alike to wake up threads as appropriate.)
There's a wrapper for select(2): https://hackage.haskell.org/package/select
Example usage here: https://github.com/pxqr/udev/blob/master/examples/monitor.hs#L36
There's a wrapper for poll(2):
https://hackage.haskell.org/package/poll
GHC base comes with functionality that wraps epoll on Linux (and equivalent on other platforms) in the GHC.Event module.
Example usage:
import GHC.Event
import Data.Maybe (fromMaybe)
import Control.Concurrent (threadDelay)
main = do
fd <- getSomeFileDescriptorOfInterest
mgr <- fromMaybe (error "Must be compiled with -threaded") <$> getSystemEventManager
registerFd mgr (\fdkey event -> print event) fd evtRead OneShot
threadDelay 100000000
More documentation at http://hackage.haskell.org/package/base-4.11.1.0/docs/GHC-Event.html
Example use of an older version of the lib at https://wiki.haskell.org/Simple_Servers#Epoll-based_event_callbacks
Though, the loop in that example has since been moved to the hidden module GHC.Event.Manager, and is not exported publicly as far as I can tell. GHC.Event itself says "This module should be considered GHC internal."
In Control.Concurrent there's threadWaitRead and threadWaitWrite.
So, to translate the above epoll example:
import Control.Concurrent (threadWaitRead)
main = do
fd <- getSomeFileDescriptorOfInterest
threadWaitRead fd
putStrLn "Got a read ready event"
You can wrap the threadWaitRead and subsequent IO action in Control.Monad.forever to run them repeatedly. You can also wrap the thing in forkIO to run it in the background while your program does something else.
Related
I wrote a test program in Haskell on the Raspberry Pi that plays a delightful tune
on a buzzer connected to a GPIO pin.
Here are the imports I used:
import qualified Control.Concurrent as C
import qualified Control.Monad as M
import System.IO
import qualified System.Posix.Unistd as P
Here are the functions that toggle the pin by writing to
the /sys/class/gpio/gpio16/value file:
changePin2 :: Handle -> String -> Int -> IO ()
changePin2 handle onOff delay = do
pos <- hGetPosn handle
hPutStr handle (onOff ++ "\n")
hFlush handle
hSetPosn pos
P.usleep delay
--C.threadDelay delay
blinkOn2 :: Handle -> Int -> IO ()
blinkOn2 handle delay = do
changePin2 handle "1" delay
changePin2 handle "0" delay
finally, here is an example of playing one note with a pause before the next one:
mapM_ (blinkOn2 h) (replicate 26 1908)
P.usleep 50000
-- C.threadDelay 50000
When I first tried it, I used threadDelay and it sounded terrible. It was low pitched,
suggesting the delay was longer than expected and all notes sounded more or less the same.
Using the usleep function improved things considerably.
Finally, adding the -threaded option when compiling with ghc made the sound even cleaner.
ghc -threaded buzzer1t.hs
I do not understand why either of these improved it and if anyone knows it would help greatly.
googling seems to reveal that usleep and friends are delays at the OS level whereas threadDelay
only pertains to the thread in the Haskell program itself. threadDelay also seems like the
more recommended one and considered better practice even though in this case usleep is clearly
superior.
I think the documentation is a good start here:
GHC Note: threadDelay is a better choice. Without the -threaded option, usleep will block all other user threads. Even with the -threaded option, usleep requires a full OS thread to itself. threadDelay has neither of these shortcomings.
To expand a bit further: The GHC runtime multiplexes user threads over system threads. The default runtime uses only a single OS thread, regardless of how many user threads there are. Most blocking calls to external code are written such that they deschedule the current Haskell user thread while they're in external code, which is allowed to execute concurrently with Haskell code. This means that even the default runtime with a single OS thread can handle multiple user threads doing IO simultaneously, for instance.
In this world, actually blocking the OS thread is considered a somewhat hostile activity. threadDelay just marks the current thread as not runnable until the specified amount of time has expired. This is much friendlier with the runtime system, as it releases the underlying OS thread.
When you use the threaded runtime, you get multiple OS threads to execute user threads, but it's still somewhat hostile to grab one and not release it. Among other things, it prevents the garbage collector from running (it waits until it can pause all user threads at known safe points, so it doesn't corrupt memory in use concurrently), and OS threads are significantly more memory-heavy than user threads if you add extras to make up for lost concurrency.
So for most software, threadDelay is a much better citizen. But it has downsides. The thread doesn't necessarily resume immediately. It becomes available to be scheduled at the given time, but that doesn't mean it actually runs. That still depends on other threads yielding. That's almost certainly the cause of the trouble you were having - the additional delay waiting to go from runnable to actually running. usleep is around specifically for the cases when that gets in the way. Seems like a fine reason to use it when needed.
I'm using the race function from the async package, exported by Control.Concurrent.Async.
The subtasks I fire off using race themselves invoke runInteractiveProcess to run (non-Haskell) executables. The idea is to run different external programs and take the result of the first one to finish. In a sense, Haskell "orchestrates" a bunch of external programs.
What I'm observing is that while race works correctly by killing the Haskell level "slower" thread; the sub-processes spawned from the slow thread itself continue to run.
I suspect expecting race to kill processes spawned this way is a bit unrealistic, as they probably become zombies and get inherited by init. For my purposes however, keeping the external processes running defeats the whole purpose of using race in the first place.
Is there an alternative way of using race so the subprocesses created this way get killed as well? While I don't have a use case yet, it'd be best if the entire chain of processes created from the raced tasks get killed; as one can imagine those external programs themselves creating a bunch of processes as well.
As already mentioned in the comments, you could use a combination of onException and terminateProcess.
My process-streaming library (which contains helper functions built on top of process and pipes) already does this. Asynchronous exceptions trigger the termination of the external process.
For example, the following code does not create file toolate.txt.
import qualified Pipes.ByteString as B
import System.Process.Streaming
import Control.Concurrent.Async
main :: IO ()
main =
do
result <- race (runProgram prog1) (runProgram prog2)
putStrLn $ show $ result
where
-- collecting stdout and stderr as bytestrings
runProgram = simpleSafeExecute $ pipeoe $
separated (surely B.toLazyM) (surely B.toLazyM)
prog1 = shell "{ echo aaa ; sleep 2 ; }"
prog2 = shell "{ echo bbb ; sleep 7 ; touch toolate.txt ; }"
The result is:
Left (Right ("aaa\n",""))
I've got a program which uses several threads. As I understand it, when thread 0 exits, the entire program exits, regardless of any other threads which might still be running.
The thing is, these other threads may have files open. Naturally, this is wrapped in exception-handling code which cleanly closes the files in case of a problem. That also means that if I use killThread (which is implemented via throwTo), the file should also be closed before the thread exits.
My question is, if I just let thread 0 exit, without attempting to stop the other threads, will all the various file handles be closed nicely? Does any buffered output get flushed?
In short, can I just exit, or do I need to manually kill threads first?
You can use Control.Concurrent.MVar to achieve this. An MVar is essentially a flag which is either ''empty'' or "full". A thread can try to read an MVar and if it is empty it blocks the thread. Wherever you have a thread which performs file IO, create an MVar for it, and pass it that MVar as an argument. Put all the MVars you create into a list:
main = do
let mvars = sequence (replicate num_of_child_threads newEmptyMVar)
returnVals <- sequence (zipWith (\m f -> f m)
mvars
(list_of_child_threads :: [MVar -> IO a]))
Once a child thread has finished all file operations that you are worried about, write to the MVar. Instead of writing killThread you can do
mapM_ takeMVar mvars >> killThread
and where-ever your thread would exit otherwise, just take all the MVars.
See the documentation on GHC concurrency for more details.
From my testing, I have discovered a few things:
exitFailure and friends only work in thread 0. (The documentation actually says so, if you go to the trouble of reading it. These functions just throw exceptions, which are silently ignored in other threads.)
If an exception kills your thread, or your whole program, any open handles are not flushed. This is excruciatingly annoying when you're desperately trying to figure out exactly where your program crashed!
So it appears it if you want your stuff flushed before the program exits, then you have to implement this. Just letting thread 0 die doesn't flush stuff, doesn't throw any exception, just silently terminates all threads without running exception handlers.
hGetContents returns a lazy String object that can be used in purely functional code to read from a file handle. If an I/O exception occurs while reading this lazy string, the underlying file handle is closed silently and no additional characters are added to the lazy string.
How can this I/O exception be detected?
As a concrete example, consider the following program:
import System.IO -- for stdin
lengthOfFirstLine :: String -> Int
lengthOfFirstLine "" = 0
lengthOfFirstLine s = (length . head . lines) s
main :: IO ()
main = do
lazyStdin <- hGetContents stdin
print (lengthOfFirstLine lazyStdin)
If an exception occurs while reading the first line of the file, this program will print the number of characters until the I/O exception occurs. Instead I want the program to crash with the appropriate I/O exception. How could this program be modified to have that behavior?
Edit: Upon closer inspection of the hGetContents implementation, it appears that the I/O exception is not ignored but rather bubbles up through the calling pure functional code to whatever IO code happened to trigger evaluation, which has the opportunity to then handle it. (I was not previously aware that pure functional code could raise exceptions.) Thus this question is a misunderstanding.
Aside: It would be best if this exceptional behavior were verified empirically. Unfortunately it is difficult to simulate a low level I/O error.
Lazy IO is considered to be a pitfall by many haskellers and as such is advised to keep away from. Your case colorfully describes why.
There is a non-lazy alternative of hGetContents function. It works on Text, but Text is also a generally preferred alternative to String. For convenience, there are modern preludes, replacing the String with Text: basic-prelude and classy-prelude.
Aside: It would be best if this exceptional behavior were verified
empirically. Unfortunately it is difficult to simulate a low level I/O
error.
I was wondering about the same thing, found this old question, and decided to perform an experiment.
I ran this little program in Windows, that listens for a connection and reads from it lazily:
import System.IO
import Network
import Control.Concurrent
main :: IO ()
main = withSocketsDo (do
socket <- listenOn (PortNumber 19999)
print "created socket"
(h,_,_) <- accept socket
print "accepted connection"
contents <- hGetContents h
print contents)
From a Linux machine, I opened a connection using nc:
nc -v mymachine 19999
Connection to mymachine 19999 port [tcp/*] succeeded!
And then used Windows Sysinternal's TCPView utility to forcibly close the connection. The result was:
Main.exe: <socket: 348>: hGetContents: failed (Unknown error)
It appears that I/O exceptions do bubble up.
A further experiment: I added a delay just after the hGetContents call:
...
contents <- hGetContents h
threadDelay (60 * 1000^2)
print contents)
With this change, killing the connection doesn't immediately raise an exception because, thanks to lazy I/O, nothing is actually read until print executes.
Consider the following code snippet
import qualified Foreign.Concurrent
import Foreign.Ptr (nullPtr)
main :: IO ()
main = do
putStrLn "start"
a <- Foreign.Concurrent.newForeignPtr nullPtr $
putStrLn "a was deleted"
putStrLn "end"
It produces the following output:
start
end
I would had expected to see "a was deleted" somewhere after start..
I don't know what's going on. I have a few guesses:
The garbage collector doesn't collect remaining objects when the program finishes
putStrLn stops working after main finishes. (btw I tried same thing with foreignly imported puts and got the same results)
My understanding of ForeignPtr is lacking
GHC bug? (env: GHC 6.10.3, Intel Mac)
When using Foreign.ForeignPtr.newForeignPtr instead of Foreign.Concurrent.newForeignPtr it seems to work:
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.C.String (CString, newCString)
import Foreign.ForeignPtr (newForeignPtr)
import Foreign.Ptr (FunPtr)
foreign import ccall "&puts" puts :: FunPtr (CString -> IO ())
main :: IO ()
main = do
putStrLn "start"
message <- newCString "a was \"deleted\""
a <- newForeignPtr puts message
putStrLn "end"
outputs:
start
end
a was "deleted"
From the documentation of Foreign.Foreign.newForeignPtr:
Note that there is no guarantee on how soon the finaliser is executed after the last reference was dropped; this depends on the details of the Haskell storage manager. Indeed, there is no guarantee that the finalizer is executed at all; a program may exit with finalizers outstanding.
So you're running into undefined behaviour: i.e., anything can happen, and it may change from platform to platform (as we saw under Windows) or release to release.
The cause of the difference in behaviour you're seeing between the two functions may be hinted at by the documentation for Foreign.Concurrent.newForeignPtr:
These finalizers necessarily run in a separate thread...
If the finalizers for the Foreign.Foreign version of the function use the main thread, but the Foreign.Concurrent ones use a separate thread, it could well be that the main thread shuts down without waiting for other threads to complete their work, so the other threads never get to run the finalization.
Of course, the docs for the Foreign.Concurrent version do claim,
The only guarantee is that the finalizer runs before the program terminates.
I'm not sure that they actually ought to be claiming this, since if the finalizers are running in other threads, they can take an arbitrary amount of time to do their work (even block forever), and thus the main thread would never be able to force the program to exit. That would conflict with this from Control.Concurrent:
In a standalone GHC program, only the main thread is required to terminate in order for the process to terminate. Thus all other forked threads will simply terminate at the same time as the main thread (the terminology for this kind of behaviour is "daemonic threads").