I'm writing bindings (for the first time). On C level there are function to allocate some sort of resource, let's call it ParentRes. It returns IO (Ptr ParentRes). Everytime ParentRes is created, a child resource allocated, let's call it ChildRes. This stuff is all static, child pointer cannot change and it's freed when parent resource is freed.
Now the catch: there is a function that takes pointer to parent and returns pointer to child:
foreign import ccall unsafe "…"
c_get_child_res :: Ptr ParentRes -> IO (Ptr ChildRes)
I want to write wrapper of type Ptr ParentRes -> Ptr ChildRes using unsafePerformIO. Is there a reason I should not do it?
Here I'm answering from my real experience that I have had just now because
of “pure” functions that are not quite pure.
I can formulate it in these words: everything that can be affected by order
of execution should always stay in IO. IO monad is Haskell is standard
and reliable way to ensure order of execution. If order matters, your
functions should live in IO monad.
Now, if it's not obvious why order matters in this particular case, remember
than parent resource as well as child resource must be allocated and then
they are freed. When they are freed you won't get the same results (but
rather segmentation fault), so referential transparency is broken. So these
resource-dependent functions should stay in IO.
Also, I don't think it's impossible to allocate different objects at the same
address during single program execution. This again would break referential
transparency.
Related
My code uses a resource that can be described as a pointer; I'll use a void pointer here for simplicity. The resource must be closed after the computation with it finishes, so the Control.Exception.bracket function is a natural choice to make sure the code won't leak if an error occurs:
run :: (Ptr () -> IO a) -> IO a
run action = bracket acquireResource closeResource action
-- no eta reduction for clarity
The downside of this pattern is that the resource will always be closed after action completes. AFAIU this means that it isn't possible to do something like
cont <- run $ \ptr -> do
a <- someAction ptr
return (\x -> otherActionUsingResource ptr a x)
cont ()
The resource will already be close by the time cont is executed. Now my approach is to use a ForeignPtr instead:
run' :: (ForeignPtr () -> IO a) -> IO a
run' action = do
ptr <- acquireResource
foreignPtr <- newForeignPtr closeResourceFunPtr ptr
action foreignPtr
Now it seems that this is roughly equivalent to the first version, minor typing differences and resource closing latency aside. However, I do wonder whether this is true, or if I miss something. Can some error conditions can lead to different outcomes with those two versions? Are ForeignPtr safe to use in this way?
If you want to do this, I'd recommend avoiding that run', which makes it look like you're going to close the resource. Do something like this instead.
acquire :: IO (ForeignPtr ())
acquire action = mask $ \unmask -> do
ptr <- unmask acquireResource
newForeignPtr closeResourceFunPtr ptr
As Carl pointed out in a comment, it's important that exceptions be masked between acquiring the resource and installing the finalizer to close it; otherwise an asynchronous exception could be delivered in between and cause a resource leak.
The challenge with anything of this sort is that you're leaving it up to the user and/or garbage collector to make sure the resource gets freed. The original Ptr-based code made the lifespan explicit. Now it's not. Many people believe that explicit lifespans are better for critical resources. What ForeignPtr gives you, automatic finalization by the GC, these people consider poor design. So think carefully! Is this a cheap resource (like a little bit of malloced memory) that you just want to free eventually? Or is it something expensive (like a file descriptor) that you really want to be sure about?
Side note: Ptr () and ForeignPtr () aren't very idiomatic. Usually the type argument should be a Haskell type representing whatever is being pointed to.
I'm trying to access a state of a State Monad inside an IO action.
More specifically: I'm trying to write a state-dependent signal handler using installHandlerfrom System.Posix.Signals which requires IO, however, I'd like to do different actions and change the state from inside the handler. I took a look at unliftio, but I read that State-Monads shouldn't be unlifted.
Is this possible? I'm more looking for an explanation than for a "copy-paste" solution. If unlifting State inside IO doesn't work, what would a solution look like, for when one wants to do some state-aware processing inside IO?
A value of type State a b does not contain a state. It is just an encapsulated function that can provide a resulting state and a result if you pass it a starting state (using the runState function. There is no way to access an intermediate (or "current") state from the outside of this function. This is what makes the State Monad "pure".
You seem to intend to have a handler, that does not always behave the same (even if invoked with the same parameters), but depends on an outside state. This kind of behaviour is "impure", and cannot be achieved by using only pure means. So what you need in this case is something that encapsulates the impureness in a way that you can access a "current value" of some state from the handler, without the current value itself getting passed into the handler.
As you already know from the comments, the go-to tool to provide access to mutable state to an IO action is using an IORef. IORefs work, because the Haskell runtime (traditionally, before multithreading at least) serializes IO actions. So the concept of the "current value" always makes sense. After every IO action, the value pointed to by every IORef is fixed. The order IO actions happen is also fixed, as it needs to be the order you chain them inside do blocks or using the >>= operators. Handling of Signals is performed by the Haskell runtime in a deterministic way, kind of like everytime you chain two IO actions, the runtime checks for pending signals, and invokes the corresponding handler.
In case you want to write code that manipulates data in an imperative way (where you can have a lot of variables, and even arrays where you update sinlge elements), you could write your code as I/O action and use IORef and IOArray for it, but there is a special "lite" version of IO that just supports mutable state in the same way as I/O without being able to interact with the environment. The shared state needs to be created, read and written from inside the same "capsule" of this special IO lite, so that running the whole action does not interact with outside state, but just with its internal state - the capsule as a whole is thus pure, even if single statements inside the capsule can be considered impure. The name of this lite version of IO is called ST, which is short for "state thread".
In The cost of weak pointers and finalizers in GHC, Edward Yang writes (emphasis added):
A weak pointer can also optionally be associated with a finalizer, which is run when the object is garbage collected. Haskell finalizers are not guaranteed to run.
I cannot find any documentation that corroborates this claim. The docs in System.Mem.Weak are not explicit about this. What I need to know is, given some primitive that has identity (MutVar#, MutableArray#, Array#, etc.), if I attach a finalizer to it, will it reliably be called when the value gets GCed?
The reason is that I'm considering doing something like this:
data OffHeapTree = OffHeapTree
{ ref :: IORef ()
, nodeCount :: Int
, nodeArray :: Ptr Node
}
data Node = Node
{ childrenArray :: Ptr Node
, childrenCount :: Int
, value :: Int
}
I want to make sure that I free the array (and everything the array points to) when an OffHeapTree goes out of scope. Otherwise, it would leak memory. So, can this be reliably accomplished with mkWeakIORef or not?
"Haskell finalizers are not guaranteed to run" means that GC may not be performed (e.g. on program exit). But if GC is performed, then finalizers are executed.
Edit: For future readers: the statement above is not exactly correct. RTS spawns a separate thread to execute finalizers after GC. So the program may exit after GC is performed, but finalizers are not yet executed, see this comment.
That is true in theory anyway. In practice finalizer may not be executed, e.g. when RTS tries to execute a number of finalizers in a row, and one of then throws an exception. So I'd not use finalizers unless it is unavoidable.
In Haskell, I have a container like:
data Container a = Container { length :: Int, buffer :: Unboxed.Vector (Int,a) }
This container is a flattened tree. Its accessor (!) performs a binary (log(N)) search through the vector in order to find the right bucket where index is stored.
(!) :: Container a -> Int -> a
container ! index = ... binary search ...
Since consecutive accesses are likely to be in the same bucket, this could be optimized in the following way:
if `index` is on the the last accessed bucket, skip the search
The tricky point is the last accessed bucket part. In JavaScript, I'd just impurely modify a hidden variable on the container object.
function read(index,object){
var lastBucket = object.__lastBucket;
// if the last bucket contains index, no need to search
if (contains(object, lastBucket, index))
var bucket = lastBucket;
// if it doesn't
else {
// then we search the bucket
var bucket = searchBucket(index,object);
// And impurely annotate it on the container, so the
// next time we access it we could skip the search.
container.__lastBucket = bucket;
}
return object.buffer[bucket].value;
}
Since this is just an optimization and the result is the same independent of the branch taken, I believe it doesn't break referential transparency. How is it possible, in Haskell, to impurely modify an state associated with a runtime value?
~
I have thought in 2 possible solutions.
A global, mutable hashmap linking pointers to the lastBucket value, and use unsafePerformIO to write on it. But I'd need a way to get the runtime pointer of an object, or at least an unique id of some sort (how?).
Add an extra field to Container, lastBucket :: Int, and somehow impurely modify it within (!), and consider that field internal (because it obviously break referential transparency).
Using solution (1), I managed to get the following design. First, I added a __lastAccessedBucket :: IORef Int field to my datatype, as suggested by #Xicò:
data Container a = Container {
length :: Int,
buffer :: V.Vector (Int,a),
__lastAccessedBucket :: IORef Int }
Then, I had to update the functions that create a new Container in order to create a new IORef using unsafePerformIO:
fromList :: [a] -> Container a
fromList list = unsafePerformIO $ do
ref <- newIORef 0
return $ Container (L.length list) buffer ref
where buffer = V.fromList (prepare list)
Finally, I created two new functions, findBucketWithHint, a pure function which searches the bucket of an index with guess (i.e., the bucket where you think it might be), and the unsafeFindBucket function, which replaces the pure findBucket when performance is needed, by always using the last accessed bucket as the hint:
unsafeFindBucket :: Int -> Container a -> Int
unsafeFindBucket findIdx container = unsafePerformIO $ do
let lastBucketRef = __lastAccessedBucket contianer
lastBucket <- readIORef lastBucketRef
let newBucket = findBucketWithHint lastBucket findIdx container
writeIORef lastBucketRef newBucket
return $ newBucket
With this, unsafeFindBucket is technically a pure function with the same API of the original findBucket function, but is an order of magnitude faster in some benchmarks. I have no idea how safe this is and where it could cause bugs. Threads are certainly a concern.
(This is more an extended comment than an answer.)
First I'd suggest to check if this isn't a case of premature optimization. After all, O(log n) ins't that bad.
If this part is indeed performance-critical, your intention is definitely valid. The usual warning for unsafePerformIO is "use it only if you know what you're doing", which you obviously do, and it can help to make things pure and fast at the same time.
Be sure that you follow all the precautions in the docs, in particular setting the proper compiler flags (you might want to use the OPTIONS_GHC pragma).
Also make sure that the IO operation is thread safe. The easiest way to ensure that is to use IORef together with atomicModifyIORef.
The disadvantage of an internal mutable state is that the performance of the cache will deteriorate if it's accessed from multiple threads, if they lookup different elements.
One remedy would be to explicitly thread the updated state instead of using the internal mutable state. This is obviously what you want to avoid, but if your program is using monads, you could just add another monadic layer that'd internally keep the state for you and expose the lookup operation as a monadic action.
Finally, you could consider using splay trees instead of the array. You'd still have (amortized) O(log n) complexity, but their big advantage is that by design they move frequently accessed elements near the top. So if you'll be accessing a subset of elements of size k, they'll be soon moved to the top, so the lookup operations will be just O(log k) (constant for a single, repeatedly accessed element). Again, they update the structure on lookups, but you could use the same approach with unsafePerformIO and atomic updates of IORef to keep the outer interface pure.
I want to write a function
forkos_try :: IO (Maybe α) -> IO (Maybe α)
which Takes a command x. x is an imperative operation which first mutates state, and then checks whether that state is messed up or not. (It does not do anything external, which would require some kind of OS-level sandboxing to revert the state.)
if x evaluates to Just y, forkos_try returns Just y.
otherwise, forkos_try rolls back state, and returns Nothing.
Internally, it should fork() into threads parent and child, with x running on child.
if x succeeds, child should keep running (returning x's result) and parent should die
otherwise, parent should keep running (returning Nothing) and child should die
Question: What's the way to write something with equivalent, or more powerful semantics than forkos_try? N.B. -- the state mutated (by x) is in an external library, and cannot be passed between threads. Hence, the semantic of which thread to keep alive is important.
Formally, "keep running" means "execute some continuation rest :: Maybe α -> IO () ". But, that continuation isn't kept anywhere explicit in code.
For my case, I think it will (for the time) work to write it in different style, using forkOS (which takes the entire computation child will run), since I can write an explicit expression for rest. But, it troubles me that I can't figure out how do this with the primitive function forkOS -- one would think it would be general enough to support any specific case (which could appear as a high-level API, like forkos_try).
EDIT -- please see the example code with explicit rest if the problem's still not clear [ http://pastebin.com/nJ1NNdda ].
p.s. I haven't written concurrency code in a while; hopefully my knowledge of POSIX fork() is correct! Thanks in advance.
Things are a lot simpler to reason about if you model state explicitly.
someStateFunc :: (s -> Maybe (a, s))
-- inside some other function
case someStateFunc initialState of
Nothing -> ... -- it failed. stick with initial state
Just (a, newState) -> ... -- it suceeded. do something with
-- the result and new state
With immutable state, "rolling back" is simple: just keep using initialState. And "not rolling back" is also simple: just use newState.
So...I'm assuming from your explanation that this "external library" performs some nontrivial IO effects that are nevertheless restricted to a few knowable and reversible operations (modify a file, an IORef, etc). There is no way to reverse some things (launch the missiles, write to stdout, etc), so I see one of two choices for you here:
clone the world, and run the action in a sandbox. If it succeeds, then go ahead and run the action in the Real World.
clone the world, and run the action in the real world. If it fails, then replace the Real World with the snapshot you took earlier.
Of course, both of these are actually the same approach: fork the world. One world runs the action, one world doesn't. If the action succeeds, then that world continues; otherwise, the other world continues. You are proposing to accomplish this by building upon forkOS, which would clone the entire state of the program, but this would not be sufficient to deal with, for example, file modifications. Allow me to suggest instead an approach that is nearer to the simplicity of immutable state:
tryIO :: IO s -> (s -> IO ()) -> IO (Maybe a) -> IO (Maybe a)
tryIO save restore action = do
initialState <- save
result <- action
case result of
Nothing -> restore initialState >> return Nothing
Just x -> return (Just x)
Here you must provide some data structure s, and a way to save to and restore from said data structure. This allows you the flexibility to perform any cloning you know to be necessary. (e.g. save could copy a certain file to a temporary location, and then restore could copy it back and delete the temporary file. Or save could copy the value of certain IORefs, and then restore could put the value back.) This approach may not be the most efficient, but it's very straightforward.