Haskell: why is usleep + threaded compile option more precise than threadDelay?

Haskell: why is usleep + threaded compile option more precise than threadDelay? - haskell

I wrote a test program in Haskell on the Raspberry Pi that plays a delightful tune
on a buzzer connected to a GPIO pin.
Here are the imports I used:
import qualified Control.Concurrent as C
import qualified Control.Monad as M
import System.IO
import qualified System.Posix.Unistd as P
Here are the functions that toggle the pin by writing to
the /sys/class/gpio/gpio16/value file:
changePin2 :: Handle -> String -> Int -> IO ()
changePin2 handle onOff delay = do
pos <- hGetPosn handle
hPutStr handle (onOff ++ "\n")
hFlush handle
hSetPosn pos
P.usleep delay
--C.threadDelay delay
blinkOn2 :: Handle -> Int -> IO ()
blinkOn2 handle delay = do
changePin2 handle "1" delay
changePin2 handle "0" delay
finally, here is an example of playing one note with a pause before the next one:
mapM_ (blinkOn2 h) (replicate 26 1908)
P.usleep 50000
-- C.threadDelay 50000
When I first tried it, I used threadDelay and it sounded terrible. It was low pitched,
suggesting the delay was longer than expected and all notes sounded more or less the same.
Using the usleep function improved things considerably.
Finally, adding the -threaded option when compiling with ghc made the sound even cleaner.
ghc -threaded buzzer1t.hs
I do not understand why either of these improved it and if anyone knows it would help greatly.
googling seems to reveal that usleep and friends are delays at the OS level whereas threadDelay
only pertains to the thread in the Haskell program itself. threadDelay also seems like the
more recommended one and considered better practice even though in this case usleep is clearly
superior.

I think the documentation is a good start here:
GHC Note: threadDelay is a better choice. Without the -threaded option, usleep will block all other user threads. Even with the -threaded option, usleep requires a full OS thread to itself. threadDelay has neither of these shortcomings.
To expand a bit further: The GHC runtime multiplexes user threads over system threads. The default runtime uses only a single OS thread, regardless of how many user threads there are. Most blocking calls to external code are written such that they deschedule the current Haskell user thread while they're in external code, which is allowed to execute concurrently with Haskell code. This means that even the default runtime with a single OS thread can handle multiple user threads doing IO simultaneously, for instance.
In this world, actually blocking the OS thread is considered a somewhat hostile activity. threadDelay just marks the current thread as not runnable until the specified amount of time has expired. This is much friendlier with the runtime system, as it releases the underlying OS thread.
When you use the threaded runtime, you get multiple OS threads to execute user threads, but it's still somewhat hostile to grab one and not release it. Among other things, it prevents the garbage collector from running (it waits until it can pause all user threads at known safe points, so it doesn't corrupt memory in use concurrently), and OS threads are significantly more memory-heavy than user threads if you add extras to make up for lost concurrency.
So for most software, threadDelay is a much better citizen. But it has downsides. The thread doesn't necessarily resume immediately. It becomes available to be scheduled at the given time, but that doesn't mean it actually runs. That still depends on other threads yielding. That's almost certainly the cause of the trouble you were having - the additional delay waiting to go from runnable to actually running. usleep is around specifically for the cases when that gets in the way. Seems like a fine reason to use it when needed.

Related

Will Go's scheduler yield control from one goroutine to another for CPU-intensive work?

The accepted answer at golang methods that will yield goroutines explains that Go's scheduler will yield control from one goroutine to another when a syscall is encountered. I understand that this means if you have multiple goroutines running, and one begins to wait for something like an HTTP response, the scheduler can use this as a hint to yield control from that goroutine to another.
But what about situations where there are no syscalls involved? What if, for example, you had as many goroutines running as logical CPU cores/threads available, and each were in the middle of a CPU-intensive calculation that involved no syscalls. In theory, this would saturate the CPU's ability to do work. Would the Go scheduler still be able to detect an opportunity to yield control from one of these goroutines to another, that perhaps wouldn't take as long to run, and then return control back to one of these goroutines performing the long CPU-intensive calculation?

There are few if any promises here.
The Go 1.14 release notes says this in the Runtime section:
Goroutines are now asynchronously preemptible. As a result, loops without function calls no longer potentially deadlock the scheduler or significantly delay garbage collection. This is supported on all platforms except windows/arm, darwin/arm, js/wasm, and plan9/*.
A consequence of the implementation of preemption is that on Unix systems, including Linux and macOS systems, programs built with Go 1.14 will receive more signals than programs built with earlier releases. This means that programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors. ...
I quoted part of the third paragraph here because this gives us a big clue as to how this asynchronous preemption works: the runtime system has the OS deliver some OS signal (SIGALRM, SIGVTALRM, etc.) on some sort of schedule (real or virtual time). This allows the Go runtime to implement the same kind of schedulers that real OSes implement with real (hardware) or virtual (virtualized hardware) timers. As with OS schedulers, it's up to the runtime to decide what to do with the clock ticks: perhaps just run the GC code, for instance.
We also see a list of platforms that don't do it. So we probably should not assume it will happen at all.
Fortunately, the runtime source is actually available: we can go look to see what does happen, should any given platform implement it. This shows that in runtime/signal_unix.go:
// We use SIGURG because it meets all of these criteria, is extremely
// unlikely to be used by an application for its "real" meaning (both
// because out-of-band data is basically unused and because SIGURG
// doesn't report which socket has the condition, making it pretty
// useless), and even if it is, the application has to be ready for
// spurious SIGURG. SIGIO wouldn't be a bad choice either, but is more
// likely to be used for real.
const sigPreempt = _SIGURG
and:
// doSigPreempt handles a preemption signal on gp.
func doSigPreempt(gp *g, ctxt *sigctxt) {
// Check if this G wants to be preempted and is safe to
// preempt.
if wantAsyncPreempt(gp) && isAsyncSafePoint(gp, ctxt.sigpc(), ctxt.sigsp(), ctxt.siglr()) {
// Inject a call to asyncPreempt.
ctxt.pushCall(funcPC(asyncPreempt))
}
// Acknowledge the preemption.
atomic.Xadd(&gp.m.preemptGen, 1)
atomic.Store(&gp.m.signalPending, 0)
}
The actual asyncPreempt function is in assembly, but it just does some assembly-only trickery to save user registers, and then calls asyncPreempt2 which is in runtime/preempt.go:
//go:nosplit
func asyncPreempt2() {
gp := getg()
gp.asyncSafePoint = true
if gp.preemptStop {
mcall(preemptPark)
} else {
mcall(gopreempt_m)
}
gp.asyncSafePoint = false
}
Compare this to runtime/proc.go's Gosched function (documented as the way to voluntarily yield):
//go:nosplit
// Gosched yields the processor, allowing other goroutines to run. It does not
// suspend the current goroutine, so execution resumes automatically.
func Gosched() {
checkTimeouts()
mcall(gosched_m)
}
We see the main differences include some "async safe point" stuff and that we arrange for an M-stack-call to gopreempt_m instead of gosched_m. So, apart from the safety check stuff and a different trace call (not shown here) the involuntary preemption is almost exactly the same as voluntary preemption.
To find this, we had to dig rather deep into the (Go 1.14, in this case) implementation. One might not want to depend too much on this.

A little bit more on this to complete #torek's answer.
Goroutines are interruptible when there is a syscall, but also when a routine is waiting on a lock, a chan or sleeping.
As #torek's said, since 1.14 routines can also be preempted even when they do none of the above. The scheduler can mark any routine as preemptible after it ran for more than 10ms.
More reading there: https://medium.com/a-journey-with-go/go-goroutine-and-preemption-d6bc2aa2f4b7

Haskell - wait for the first event (no busy waiting) [duplicate]

How could I watch several files/sockets from Haskell and wait for these to become readable/writable?
Is there anything like the select/epoll/... in Haskell? Or I am forced to spawn one thread per file/socket and always use the blocking resource from within that thread?

The question is wrong: you aren't forced to spawn one thread per file/socket and use blocking calls, you get to spawn one thread per file/socket and use blocking calls. This is the cleanest solution (in any language); the only reason to avoid it in other languages is that it's a bit inefficient there. GHC's threads are cheap enough, however, that it is not inefficient in Haskell. (Additionally, behind the scenes, GHC's IO manager uses an epoll-alike to wake up threads as appropriate.)

There's a wrapper for select(2): https://hackage.haskell.org/package/select
Example usage here: https://github.com/pxqr/udev/blob/master/examples/monitor.hs#L36
There's a wrapper for poll(2):
https://hackage.haskell.org/package/poll
GHC base comes with functionality that wraps epoll on Linux (and equivalent on other platforms) in the GHC.Event module.
Example usage:
import GHC.Event
import Data.Maybe (fromMaybe)
import Control.Concurrent (threadDelay)
main = do
fd <- getSomeFileDescriptorOfInterest
mgr <- fromMaybe (error "Must be compiled with -threaded") <$> getSystemEventManager
registerFd mgr (\fdkey event -> print event) fd evtRead OneShot
threadDelay 100000000
More documentation at http://hackage.haskell.org/package/base-4.11.1.0/docs/GHC-Event.html
Example use of an older version of the lib at https://wiki.haskell.org/Simple_Servers#Epoll-based_event_callbacks
Though, the loop in that example has since been moved to the hidden module GHC.Event.Manager, and is not exported publicly as far as I can tell. GHC.Event itself says "This module should be considered GHC internal."
In Control.Concurrent there's threadWaitRead and threadWaitWrite.
So, to translate the above epoll example:
import Control.Concurrent (threadWaitRead)
main = do
fd <- getSomeFileDescriptorOfInterest
threadWaitRead fd
putStrLn "Got a read ready event"
You can wrap the threadWaitRead and subsequent IO action in Control.Monad.forever to run them repeatedly. You can also wrap the thing in forkIO to run it in the background while your program does something else.

What happens when an Async value is garbage-collected?

Well... – apparently, nothing! If I try
Prelude Control.Concurrent.Async Data.List> do {_ <- async $ return $! foldl'(+) 0 [0,0.1 .. 1e+8 :: Double]; print "Async is lost!"}
"Async is lost!"
one processor core starts going wild for a while, the interface stays as normal. Evidently the thread is started and simply runs as long as there is something to do.
But (efficiency aside), is that in principle ok, or must Asyncs always be either cancelled or waited for? Does something break because there just isn't a way to read the result anymore? And does the GC properly clean up everything? Will perhaps the thread in fact be stopped, and that just doesn't happen yet when I try it (for lack of memory pressure)? Does the thread even properly "end" at all, simply when the forkIOed action comes to an end?
I'm quite uncertain about this concurrency stuff. Perhaps I'm still thinking too much in a C++ way about this. RAII / deterministic garbage collection certainly make you feel a bit better cared for in such regards...

Internally, an Async is just a Haskell thread that writes to an STM TMVar when finished. A cancel is just sending the Haskell thread a kill signal. In Haskell, you don't need to explcititly kill threads. If the Async itself can be garbage collected, then the thread will still run to its end, and then everything will be properly cleaned up. However, if the Async ends in an exception, then wait will propagate the exception to the waiting thread. If you don't wait, you'll never know that the exception happened.

Is the random number generator in Haskell thread-safe?

Is the same "global random number generator" shared across all threads, or does each thread get its own?
If one is shared, how can I ensure thread-safety? The approach using getStdGen and setStdGen described in the "Monads" chapter of Real World Haskell doesn't look safe.
If each thread has an independent generator, will the generators for two threads started in rapid succession have different seeds? (They won't, for example, if the seed is a time in seconds, but milliseconds might be OK. I don't see how to get a time with millisecond resolution from Data.Time.)

There is a function named newStdGen, which gives one a new std. gen every time it's called. Its implementation uses atomicModifyIORef and thus is thread-safe.
newStdGen is better than get/setStdGen not only in terms of thread-safety, but it also guards you from potential single-threaded bugs like this: let rnd = (fst . randomR (1,5)) <$> getStdGen in (==) <$> rnd <*> rnd.
In addition, if you think about the semantics of newStdGen vs getStdGen/setStdGen, the first ones can be very simple: you just get a new std. gen in a random state, chosen non-deterministically. On the other hand, with the get/set pair you can't abstract away the global program state, which is bad for multiple reasons.

I would suggest you to use getStdGen only once (in the main thread) and then use the split function to generate new generators. I would do it like this:
Make an MVar that contains the generator. Whenever a thread needs a new generator, it takes the current value out of the MVar, calls split and puts the new generator back. Due to the functionality of an MVar, this should be threadsafe.

By itself, getStdGen and setStdGen are not thread safe in a certain sense. Suppose the two threads both perform this action:
do ...
g <- getStdGen
(v, g') <- someRandOperation g
setStdGen g'
It is possible for the threads to both run the g <- getStdGen line before the other thread reaches setStdGen, therefore they both could get the exact same generator. (Am I wrong?)
If they both grab the same version of the generator, and use it in the same function, they will get the same "random" result. So you do need to be a little more careful when dealing with random number generation and multithreading. There are many solutions; one that comes to mind is to have a single dedicated random number generator thread that produces a stream of random numbers which other threads could consume in a thread-safe way. Putting the generator in an MVar, as FUZxxl suggests, is probably the simplest and most straightforward solution.
Of course I would encourage you to inspect your code and make sure it is necessary to generate random numbers in more than one thread.

You can use split as in FUZxxl's answer. However, instead of using an MVar, whenever you call forkIO, just have your IO action for the forked thread close over one of the resulting generators, and leave the other one with the original thread. This way each thread has its own generator.
As Dan Burton said, do inspect your code and see if you really need RNG in multiple threads.

Haskell FFI: ForeignPtr seems not to get freed (maybe a GHC bug?)

Consider the following code snippet
import qualified Foreign.Concurrent
import Foreign.Ptr (nullPtr)
main :: IO ()
main = do
putStrLn "start"
a <- Foreign.Concurrent.newForeignPtr nullPtr $
putStrLn "a was deleted"
putStrLn "end"
It produces the following output:
start
end
I would had expected to see "a was deleted" somewhere after start..
I don't know what's going on. I have a few guesses:
The garbage collector doesn't collect remaining objects when the program finishes
putStrLn stops working after main finishes. (btw I tried same thing with foreignly imported puts and got the same results)
My understanding of ForeignPtr is lacking
GHC bug? (env: GHC 6.10.3, Intel Mac)
When using Foreign.ForeignPtr.newForeignPtr instead of Foreign.Concurrent.newForeignPtr it seems to work:
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.C.String (CString, newCString)
import Foreign.ForeignPtr (newForeignPtr)
import Foreign.Ptr (FunPtr)
foreign import ccall "&puts" puts :: FunPtr (CString -> IO ())
main :: IO ()
main = do
putStrLn "start"
message <- newCString "a was \"deleted\""
a <- newForeignPtr puts message
putStrLn "end"
outputs:
start
end
a was "deleted"

From the documentation of Foreign.Foreign.newForeignPtr:
Note that there is no guarantee on how soon the finaliser is executed after the last reference was dropped; this depends on the details of the Haskell storage manager. Indeed, there is no guarantee that the finalizer is executed at all; a program may exit with finalizers outstanding.
So you're running into undefined behaviour: i.e., anything can happen, and it may change from platform to platform (as we saw under Windows) or release to release.
The cause of the difference in behaviour you're seeing between the two functions may be hinted at by the documentation for Foreign.Concurrent.newForeignPtr:
These finalizers necessarily run in a separate thread...
If the finalizers for the Foreign.Foreign version of the function use the main thread, but the Foreign.Concurrent ones use a separate thread, it could well be that the main thread shuts down without waiting for other threads to complete their work, so the other threads never get to run the finalization.
Of course, the docs for the Foreign.Concurrent version do claim,
The only guarantee is that the finalizer runs before the program terminates.
I'm not sure that they actually ought to be claiming this, since if the finalizers are running in other threads, they can take an arbitrary amount of time to do their work (even block forever), and thus the main thread would never be able to force the program to exit. That would conflict with this from Control.Concurrent:
In a standalone GHC program, only the main thread is required to terminate in order for the process to terminate. Thus all other forked threads will simply terminate at the same time as the main thread (the terminology for this kind of behaviour is "daemonic threads").

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string