Can ghci reoder IO actions within unsafePerformIO IO blocks - haskell

Can IO actions in IO blocks call within unsafePerformIO be reordered?
I have effectively the IO function.
assembleInsts :: ... -> IO S.ByteString
assembleInsts ... = do
tmpInputFile <- generateUniqueTmpFile
writeFile tmpInputFile str
(ec,out,err) <- readProcessWithExitCode asm_exe [tmpInputFile] ""
-- asm generates binary output in tmpOutputFile
removeFile tmpInputFile
let tmpOutputFile = replaceExtension tmpIsaFile "bits" -- assembler creates this
bs <- S.readFile tmpOutputFile -- fails due to tmpOutputFile not existing
removeFile tmpOutputFile
return bs
where S.ByteString is a strict byte string.
Sadly, I need to call this in a tree of pure code far from the IO monad,
but since I the assembler behaves as a referentially transparent
(given unique files) tool, I figured for the time being I could make
an unsafe interface for the time being.
{-# NOINLINE assembleInstsUnsafe #-}
assembleInstsUnsafe :: ... -> S.ByteString
assembleInstsUnsafe args = unsafePerformIO (assembleInsts args)
In addition I added to the top of the module the following annotation
as per the documentation's (System.IO.Unsafe's) instructions.
{-# OPTIONS -fno-cse #-}
module Gen.IsaAsm where
(I tried to also add -fnofull-laziness as well, as per a reference that
I consulted, but this was rejected by the compiler. I don't think that
case applies here though.)
Running in ghci it reports the following error.
*** Exception: C:\Users\trbauer\AppData\Local\Temp\tempfile_13516_0.dat: openBinaryFile: does not exist (No such file or directory)
But if I remove removeFile tmpOutputFile, then it magically works.
Hence, it seems like the removeFile is executing ahead of the process termination.
Is this possible? The bytestring is strict, and I even tried to force the output at one point with a:
S.length bs `seq` return ()
before the removeFile.
Is there a way to dump intermediate code to find out what's going on?
(Maybe I can trace this with Process Monitor or something to find out.)
Unfortunately, I'd like to clean up within this operation (remove the file).
I think the exe version might work, but under ghci it fails (interpreted).
I am using GHC 7.6.3 from the last Haskell Platform.
I know unsafePerformIO is a really big hammer and has other risks associated with it, but it would really limit the complexity of my software change.

This may not be applicable, since it is based on assumptions unspecified in your question. In particular, this answer is based on the following two assumptions. S, which is unspecified, is Data.ByteString.Lazy and tmpDatFile, which is undefined, is tmpOutputFile.
import qualified Data.ByteString.Lazy as S
...
let tmpDatFile = tmpOutputFile
Possible Cause
If these assumptions are true, removeFile will run too early, even without the use of unsafePerformIO. The following code
import System.Directory
import qualified Data.ByteString.Lazy as S
assembleInsts = do
-- prepare a file, like asm might have generated
let tmpOutputFile = "dataFile.txt"
writeFile tmpOutputFile "a bit of text"
-- read the prepared file
let tmpDatFile = tmpOutputFile
bs <- S.readFile tmpOutputFile
removeFile tmpDatFile
return bs
main = do
bs <- assembleInsts
print bs
Results in the error
lazyIOfail.hs: DeleteFile "dataFile.txt": permission denied (The process cannot access the file because it is being used by another process.)
Removing the line removeFile tmpDatFile will make this code execute correctly, just like you describe, but leaving behind the temporary file isn't what is desired.
Possible Solution
Changing the import S to
import qualified Data.ByteString as S
instead results in the correct output,
"a bit of text".
Explanation
The documentation for Data.ByteSting.Lazy's readFile states that it will
Read an entire file lazily into a ByteString. The Handle will be held open until EOF is encountered.
Internally, readfile accomplishes this by calling unsafeInterleaveIO. unsafeInterleaveIO defers execution of the IO code until the term it returns is evaluated.
hGetContentsN :: Int -> Handle -> IO ByteString
hGetContentsN k h = lazyRead -- TODO close on exceptions
where
lazyRead = unsafeInterleaveIO loop
loop = do
c <- S.hGetSome h k -- only blocks if there is no data available
if S.null c
then do hClose h >> return Empty
else do cs <- lazyRead
return (Chunk c cs)
Because nothing tries to look at the constructor of the bs defined in the example above until it is printed, which doesn't happen until after removeFile has been executed, no chunks are read from the file (and the file is not closed) before removeFile is executed. Therefore, when removeFile is executed, the Handle opened by readFile is still open, and the file can't be removed.

Even if you are using unsafePerformIO, IO actions should not be reordered. If you want to be sure of that, you can use the -ddump-simpl flag to see the intermediate Core language which GHC produces, or even one of the other -dump-* flags showing all the compilation intermediate steps up to assembly.
I am aware that this answers what you asked, and not what you actually need, but you can rule out GHC bugs at least. It seems unlikely there's a bug affecting this in GHC, though.

Totally my fault.... sorry everyone. GHC does not reorder IO actions in an IO block under the above stated conditions as mentioned by those above. The assembler was just failing to assemble the output and create the assumed file. I simply forgot to check the exit code or the output stream of the assembler. I assumed the input to be syntactically correct since it is generated, the assembler rejected it and simply failed to create the file. It gave a valid error code and error diagnostic too, so that was really bad on my part. I may have been using readProcess the first time around, which raises an exception on a non-zero exit, but must have eventually changed this. I think the assembler had a bug where it didn't correctly indicate a failing exit code for some cases, and I had to change from readProcessWithExitCode.
I am still not sure why the error went away when I elided the removeFile.
I thought about deleting the question, but I a hoping the suggestions above help others debug similar (more valid) problems as well. I've been burned by the lazy IO thing Cirdec mentioned, and the -ddump-simpl flag mentioned by chi is good to know as well.

Related

Why doesn't hSetBuffering return a new handle instead of changing the given handle?

In Haskell, as we try to write most of our code in immutable way by not changing variables or passed parameters and instead we create a new value from the old one with required changes.
main = do
withFile "something.txt" ReadMode (\handle -> do
hSetBuffering handle $ BlockBuffering (Just 2048)
contents <- hGetContents handle
putStr contents)
Then what is the reason than hSetBuffering, a function that takes a handle and sets its buffering mode, changes the passed handle itself instead of returning a new handle with required buffering mode?
With regular Haskell values, there is no problem keeping older versions of a value around. However, Handles are references to mutable resources allocated with the operating system, and carry state. After calling a version of hSetBufferingthat returned a new Handle, what should happen to earlier versions of the Handle that are still kept around? Should they reflect the change? If the answer is yes, then the new-handle-returning version of hSetBuffering is a bit of a lie.
This new-handle-returning version of hSetBuffering could work if the type system somehow disallowed keeping old versions of the Handle after calling the function. It could do that by enforcing a constraint: functions that receive a Handle as parameter can only use that parameter one single time, and functions that "duplicate" handles like dup :: Handle -> (Handle,Handle) are disallowed.
There is a (not yet accepted) proposal to extend Haskell with the ability to enforce such restrictions. In fact, file operations are one of the motivating examples. From section 2.3 of the paper:
type File
openFile :: FilePath → IOL 1 File
readLine :: File ⊸ IOL 1 (File,Unrestricted ByteString)
closeFile :: File ⊸ IOL ω ()
Under this proposal, we can only have a single version of a File around at any given time. closeFile makes the reference to File unavailable so that we can't close an already closed file. Every read operation takes the previous version of the File and returns a new one along with the read data. And hSetBuffering would have a type like:
hSetBuffering :: BufferingMode -> File ⊸ IOL 1 File

How can I make GHCI release memory

The introduction
The following code shows that when using runhaskell Haskell Garbage Collector releases the memory, when a is no longer used. It results in core dump while releasing variable a - for a purpose, to inspect the behaviour - a has got nullFunPtr as a finalizer.
module Main where
import Foreign.Ptr
import Foreign.ForeignPtr
main :: IO ()
main = do
a <- newForeignPtr nullFunPtr nullPtr
putStrLn "Hello World"
The problem
When running the same in ghci it does not release memory. How can I force ghci to release no longer used variables?
$ ghci
> import Foreign.Ptr
> import Foreign.ForeignPtr
> import System.Mem
> a <- newForeignPtr nullFunPtr nullPtr
> a <- return () -- rebinding variable a to show gc that I'm no longer using it
> performGC
> -- did not crash - GC didn't release memory
> ^D
Leaving GHCi.
[1] 4396 segmentation fault (core dumped) ghci
Memory was released on exit, but this is too late for me. I'm extending GHCi and using it for other purpose and I need to release the memory earlier - on demand or as fast as possible would be really great.
I know that I can call finalizeForeignPtr, but I'm using foreignPtr just for debug purposes. How can I release a in general in last example?
If there is no possibility to do it with ghci prompt, I can also modify ghci code. Maybe I can release this a by modyfing ghci Interactive Context or DynFlags? So far I've got no luck with my reaserch.
Tracing through the code we find that the value is stored in the field closure_env of the data type PersistentLinkerState, which is a ClosureEnv, i.e. a mapping from name to HValues. The relevant function in Linker.hs is
extendLinkEnv :: [(Name,HValue)] -> IO ()
-- Automatically discards shadowed bindings
extendLinkEnv new_bindings =
modifyPLS_ $ \pls ->
let new_closure_env = extendClosureEnv (closure_env pls) new_bindings
in return pls{ closure_env = new_closure_env }
and although the comment indicates that it should remove the shadowed binding, it does not, at least not the way you want it to.
The reason is, as AndrewC writes correctly: Although both variables have the same source code name, they are different to the compiler (they have a different Unique attached). We can observe this after adding some tracing to the function above:
*GHCiGC> a <- newForeignPtr nullFunPtr nullPtr
extendLinkEnv [a_azp]
*GHCiGC> a <- return ()
extendLinkEnv [a_aF0]
*GHCiGC> performGC
extendLinkEnv [it_aFL]
Removing bindings with the same source-name at this point should solve your GC problem, but I don’t know the compiler well enough to tell what else would break. I suggest you open a ticket, hopefully someone will know.
Confusion on binding vs. value
In the comments there seems to be some confusion about bindings and values. Consider this code:
> a <- return something
> b <- return somethingelse
> a <- return (b+b)
> b <- return anewthing
With the current implementation, the heap will consist of `
something
somethingelse
a thunk referencing the (+) operator and somethingelse
anewthing.
Furthermore the environment of the interpreter has references to all four heap values, so nothing can be GC’ed.
What remdezx rightly expected is that GHCi would drop the reference to something and somethingelse. This, in turn, would allow the run time system to garbage collect something (we assume no further references). GHCi still references the thunk, which in turn references somethingelse, so this would not be garbage collected.
Clearly the question was very implementation specific, and so is this answer :-)

Is it safe to reuse a conduit?

Is it safe to perform multiple actions using the same conduit value? Something like
do
let sink = sinkSocket sock
something $$ sink
somethingElse $$ sink
I recall that in the early versions of conduit there were some dirty hacks that made this unsafe. What's the current status?
(Note that sinkSocket doesn't close the socket.)
That usage is completely safe. The issue in older versions had to do with blurring the line between resumable and non-resumable components. With modern versions (I think since 0.4), the line is very clear between the two.
It might be safe to reuse sinks in the sense that the semantics for the "used" sink doesn't change. But you should be aware of another threat: space leaks.
The situation is analogous to lazy lists: you can consume a huge list lazily in a constant space, but if you process the list twice it will be kept in memory. The same thing might happen with a recursive monadic expression: if you use it once it's constant size, but if you reuse it the structure of the computation is kept in memory, resulting in space leak.
Here's an example:
import Data.Conduit
import Data.Conduit.List
import Control.Monad.Trans.Class (lift)
consumeN 0 _ = return ()
consumeN n m = do
await >>= (lift . m)
consumeN (n-1) m
main = do
let sink = consumeN 1000000 (\i -> putStrLn ("Got one: " ++ show i))
sourceList [1..9000000::Int] $$ sink
sourceList [1..22000000::Int] $$ sink
This program uses about 150M of ram on my machine, but if you remove the last line or repeat the definition of sink in both places, you get a nice constant space usage.
I agree that this is a contrived example (this was the first that came to my mind), and this is not very likely to happen with most Sinks. For example this will not happen with your sinkSocket. (Why is this contrived: because the control structure of the sink doesn't depend on the values it gets. And that is also why it can leak.) But, for example, for sources this would be much more common. (Many of the common Sources exhibit this behavior. The sourceList would be an obvious example, because it would actually keep the source list in memory. But, enumFromTo is no different, although there is no data to keep in memory, just the structure of the monadic computation.)
So, all in all, I think it's important to be aware of this.

Haskell: need to Timeout when running eval from the Hint package

I'm creating a small program to use with an irc bot that should take a string and then evaluate the string. For this I'm using the hint package, which work very well for my needs. The problem that I now have is that I want to be able to prevent evaluation of expressions that take a vary long to calculate e.g. 2^1000000000.
I tried using the System.Timeout package like this:
import Data.Maybe
import Language.Haskell.Interpreter
import System.Timeout
import System.Environment (getArgs)
main :: IO()
main = do
r <- timeout 500000 $ runInterpreter $ hEval arg
case r of
Nothing -> putStrLn "Timed out!"
Just x ->
case x of
Left err -> putStrLn (show err)
Right a -> putStrLn a
hEval e = do
setImportsQ [("Prelude", Nothing),("Data.List",Nothing)]
a <- eval e
return $ take 200 a
But it's not working, the timeout does not fire unless I put in such a short time that nothing can be evaluated. I read on the page for the Timeout package that it could have problems with some modules and have to let theme finish but my understanding is not good enough to know if Hint is such a module.
So any help on this would be appreciated, even if it's just to tell me that this isn't going to work.
GHC threads are cooperative. They can only yield or be terminated by asynchronous exceptions when they perform a memory allocation. This normally works fine, but someone malicious can write a tight loop that runs for a significant time without allocating.
The mueval package was created to deal with things like this. It's implemented in terms of hint, but with a lot of extra safety added in various ways.

Haskell FFI: ForeignPtr seems not to get freed (maybe a GHC bug?)

Consider the following code snippet
import qualified Foreign.Concurrent
import Foreign.Ptr (nullPtr)
main :: IO ()
main = do
putStrLn "start"
a <- Foreign.Concurrent.newForeignPtr nullPtr $
putStrLn "a was deleted"
putStrLn "end"
It produces the following output:
start
end
I would had expected to see "a was deleted" somewhere after start..
I don't know what's going on. I have a few guesses:
The garbage collector doesn't collect remaining objects when the program finishes
putStrLn stops working after main finishes. (btw I tried same thing with foreignly imported puts and got the same results)
My understanding of ForeignPtr is lacking
GHC bug? (env: GHC 6.10.3, Intel Mac)
When using Foreign.ForeignPtr.newForeignPtr instead of Foreign.Concurrent.newForeignPtr it seems to work:
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.C.String (CString, newCString)
import Foreign.ForeignPtr (newForeignPtr)
import Foreign.Ptr (FunPtr)
foreign import ccall "&puts" puts :: FunPtr (CString -> IO ())
main :: IO ()
main = do
putStrLn "start"
message <- newCString "a was \"deleted\""
a <- newForeignPtr puts message
putStrLn "end"
outputs:
start
end
a was "deleted"
From the documentation of Foreign.Foreign.newForeignPtr:
Note that there is no guarantee on how soon the finaliser is executed after the last reference was dropped; this depends on the details of the Haskell storage manager. Indeed, there is no guarantee that the finalizer is executed at all; a program may exit with finalizers outstanding.
So you're running into undefined behaviour: i.e., anything can happen, and it may change from platform to platform (as we saw under Windows) or release to release.
The cause of the difference in behaviour you're seeing between the two functions may be hinted at by the documentation for Foreign.Concurrent.newForeignPtr:
These finalizers necessarily run in a separate thread...
If the finalizers for the Foreign.Foreign version of the function use the main thread, but the Foreign.Concurrent ones use a separate thread, it could well be that the main thread shuts down without waiting for other threads to complete their work, so the other threads never get to run the finalization.
Of course, the docs for the Foreign.Concurrent version do claim,
The only guarantee is that the finalizer runs before the program terminates.
I'm not sure that they actually ought to be claiming this, since if the finalizers are running in other threads, they can take an arbitrary amount of time to do their work (even block forever), and thus the main thread would never be able to force the program to exit. That would conflict with this from Control.Concurrent:
In a standalone GHC program, only the main thread is required to terminate in order for the process to terminate. Thus all other forked threads will simply terminate at the same time as the main thread (the terminology for this kind of behaviour is "daemonic threads").

Resources