Should check for change before writing IORef? - haskell

There's code that reads an IORef and based on some conditions and calculations, creates a new value. Now it writes the new value to that IORef. But there is a chance that it didn't get changed at all. The new value may be identical to the old.
What are the considerations regarding whether to check to see if the value is different before writing the IORef, or just write the IORef regardless?
Does writeIORef check to see if the value is changed before setting?
By checking first, could you avoid the write and save a little on performance possibly?

Does writeIORef check to see if the value is changed before setting?
No. writeIORef wraps writeSTRef, which is defined as
-- |Write a new value into an 'STRef'
writeSTRef :: STRef s a -> a -> ST s ()
writeSTRef (STRef var#) val = ST $ \s1# ->
case writeMutVar# var# val s1# of { s2# ->
(# s2#, () #) }
By checking first, could you avoid the write and save a little on performance possibly?
What are the considerations regarding whether to check to see if the value is different before writing the IORef, or just write the IORef regardless?
This is really contingent on the algorithm in question. What are you trying to optimize for? What is the frequency/ratio of reads to writes? what kind of data are you storing? how is it packed? what is the cost of an equality comparison for the data in question?
There exists a whole host of factors to be taken into account when determining whether or not you want to destructively update cells in-place: some algorithm specific, some depending on cache locality, others depending on the structure and form of the code GHC generates. As such, it's exceedingly difficult to answer your question.
A quote from Donald Knuth:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil
Unless you're at the stage where you're trying to eek out every modicum of performance from some well-understood implementation, you're probably better off picking the path that is
simplest to implement
easiest to reason about
and getting on with it. If you are at the stage where you'd like to tweak your program, I'd suggest learning to read GHC's human-readable generated output (Core), as you'd then be in the position to make these sorts of decisions (on a very granular level) on a per-program basis.

Related

Using ST monad for efficiency purposes

I have some traditional state-passing code which I need to optimize. I've heard that if you're maintaining and updating state a lot that using the ST monad can help improve efficiency. However, after looking into the ST stuff a bit I'm a bit unclear as to how/where the ST monad should be used.
A couple of approaches that come to mind:
Instead of passing state everywhere, pass STRef's instead. So for instance, foo :: State -> a -> b -> (State,c) becomes foo :: STRef s State -> a -> b -> ST s c and so on.
Keep my function signatures the same but use ST under the hood using runST.
Only use ST when updating the state in my main execution loop and escape ST using either runST or stToIO.
Obviously, these questions will ultimately depend on the specifics of my project but I'm wondering if there are any rule-of-thumb type guidelines that might be helpful before more detail is required.
Only approach 1 seems to be viable, in the general case.
When runST action returns and delivers its result, it will deallocate all the memory which has been allocated running the action using newSTRef. If you use one runST per function, you can't have any reference persist across function calls. Even if you don't need that, making each function call allocate its mutable state on entry only to deallocate it on exit (as in your option 2) looks pointless. I would just pass a STRef s State around, and mutate the state through that.
There might be other aspects to consider, though. If a function performs many mutations inside, option 2 might still be OK, since the extra cost of entering/leaving would be negligible.
(I probably don't completely understand your option 3. Isn't it option 1, essentially?)

Getting value out of context or leaving them in?

Assume code below. Is there a quicker way to get the contextual values out of findSerial rather than writing a function like outOfContext?
The underlying question is: does one usually stick within context and use Functors, Applicatives, Monoids and Monads to get the job done, or is it better to take it out of context and apply the usual non-contextual computation methods. In brief: don't want to learn Haskell all wrong, since it takes time enough as it does.
import qualified Data.Map as Map
type SerialNumber = (String, Int)
serialList :: Map.Map String SerialNumber
serialList = Map.fromList [("belt drive",("BD",0001))
,("chain drive",("CD",0002))
,("drive pulley",("DP",0003))
,("drive sprocket",("DS",0004))
]
findSerial :: Ord k => k -> Map.Map k a -> Maybe a
findSerial input = Map.lookup input
outOfContext (Just (a, b)) = (a, b)
Assuming I understand it correctly, I think your question essentially boils down to “Is it idiomatic in Haskell to write and use partial functions?” (which your outOfContext function is, since it’s just a specialized form of the built-in partial function fromJust). The answer to that question is a resounding no. Partial functions are avoided whenever possible, and code that uses them can usually be refactored into code that doesn’t.
The reason partial functions are avoided is that they voluntarily compromise the effectiveness of the type system. In Haskell, when a function has type X -> Y, it is generally assumed that providing it an X will actually produce a Y, and that it will not do something else entirely (i.e. crash). If you have a function that doesn’t always succeed, reflecting that information in the type by writing X -> Maybe Y forces the caller to somehow handle the Nothing case, and it can either handle it directly or defer the failure further to its caller (by also producing a Maybe). This is great, since it means that programs that typecheck won’t crash at runtime. The program might still have logical errors, but knowing before even running the program that it won’t blow up is still pretty nice.
Partial functions throw this guarantee out the window. Any program that uses a partial function will crash at runtime if the function’s preconditions are accidentally violated, and since those preconditions are not reflected in the type system, the compiler cannot statically enforce them. A program might be logically correct at the time of its writing, but without enforcing that correctness with the type system, further modification, extension, or refactoring could easily introduce a bug by mistake.
For example, a programmer might write the expression
if isJust n then fromJust n else 0
which will certainly never crash at runtime, since fromJust’s precondition is always checked before it is called. However, the type system cannot enforce this, and a further refactoring might swap the branches of the if, or it might move the fromJust n to a different part of the program entirely and accidentally omit the isJust check. The program will still compile, but it may fail at runtime.
In contrast, if the programmer avoids partial functions, using explicit pattern-matching with case or total functions like maybe and fromMaybe, they can replace the tricky conditional above with something like
fromMaybe 0 n
which is not only clearer, but ensures any accidental misuse will simply fail to typecheck, and the potential bug will be detected much earlier.
For some concrete examples of how the type system can be a powerful ally if you stick exclusively to total functions, as well as some good food for thought about different ways to encode type safety for your domain into Haskell’s type system, I highly recommend reading Matt Parsons’s wonderful blog post, Type Safety Back and Forth, which explores these ideas in more depth. It additionally highlights how using Maybe as a catch-all representation of failure can be awkward, and it shows how the type system can be used to enforce preconditions to avoid needing to propagate Maybe throughout an entire system.

What is the analogue of ConcurrentHashMap in Haskell?

update: please, bear in mind, I'm just started learning Haskell
Let's say we're writing an application with the following general functionality:
when starting, it gathers some data from an external source;
this data are a set of complex structures which contain lists,
arrays, ints, string, etc.;
when running, the application serves web API (servlets) that provides
access to the data.
Now, if the application would be written in Java, we could use static ConcurrentHashMap object where the data could be stored (representing Java classes). So that, during start, the app could fill the map with data, and then servlets could access it providing some API to the clients.
If the application would be written in Erlang, we could use ETS/DETS for storing the data (as native Erlang structures).
Now the question: what is the proper Haskell way for implementing such design?
It shouldn't be DB, it should be some sort of a lightweight in-memory something, that could store complex structures (Haskell native structures), and that could be accessible from different threads (servlets, talking by Java-world entities). In Haskell: no static global vars as in Java, no ETS and OTP as in Erlang, - so how to do it the right way (with no using external solutions like Redis)?
Thanks
update: another important part of the question - since Haskell doesn't (?) have 'global static' variables, then what would be the right way for implementing this 'global accessible' data keeping object (say, it is "stm-containers")? Should I initialize it somewhere in the 'main' function and then just pass it to every REST API handler? Or is there any other more correct way?
It's not clear from your question whether the client API will provide ways of mutating the data.
If not (i.e., the API will only be about querying), then any immutable data-structure will suffice, since one beauty of immutable data is that it can be accessed from multiple threads safely with you being sure that it can't change. No need for the overhead of locks or other strategies for working with concurrency. You'll simply construct the immutable data during the initialisation and then just query it. For this consider a package like "unordered-containers".
If your API will also be mutating the data, then you will need mutable data-structures, which are optimised for concurrency. "stm-containers" is one package, which provides those.
First off, I'm going to assume you mean it needs to be available to multiple threads, not multiple processes. (The difference being that threads share memory, processes do not.) If that assumption is wrong, much of your question doesn't make sense.
So, the first important point: Haskell has mutable data structures. They can easily be shared between threads. Here's a small example:
import Control.Concurrent
import Control.Monad
main :: IO ()
main = do
v <- newMVar 0 :: IO (MVar Int)
forkIO . forever $ do
x <- takeMVar v
putMVar v $! x + 1
forM_ [1..10] $ \_ -> do
x <- readMVar v
threadDelay 100
print x
Note the use of ($!) when putting the value in the MVar. MVars don't enforce that their contents are evaluated. There's some subtlety in making sure everything works properly. You will get lots of space leaks until you understand Haskell's evaluation model. That's part of why this sort of thing is usually done in a library that handles all those details.
Given this, the first pass approach is to just store a map of some sort in an MVar. Unless it's under a lot of contention, that actually has pretty good performance properties.
When it is under contention, you have a good fallback secondary approach, especially when using a hash map. That's striping. Instead of storing one map in one MVar, use N maps in N MVars. The first step in a lookup is using the hash to determine which of the N MVars to look in.
There are fancy lock-free algorithms, which could be implemented using finer-grained mutable values. But in general, they are a lot of engineering effort for a few percent improvement in performance that doesn't really matter in most use cases.

Referential transparency and mmap in Haskell

I was hoping to use System.INotify and System.IO.MMap together in order to watch for file modifications and then quickly perform diffs for sending patches over a network. However, in the documentation for System.IO.MMap there's a couple of warnings about referential transparency:
The documentation states
It is only safe to mmap a file if you know you are the sole user. Otherwise referential transparency may be or may be not compromised. Sadly semantics differ much between operating systems.
The values that MMap returns are IO ByteString, surely when I use this value with putStr I'm expecting a different result each time? I assume that the author means that the value could change during an IO operation such as putStr and crash?
START-OF-EDIT: Come to think of it, I guess answer to this part of the question is somewhat obvious...
If the value changes any time after it is unboxed it would be problematic.
do
v <- mappedValue :: IO ByteString
putStr v
putStr v -- Expects the same value of v everywhere
END-OF-EDIT
Shouldn't it be possible to acquire some kind of lock on the mapped region or on the file?
Alternatively, would it be possible to write a function copy :: IO ByteString -> IO ByteString that takes a snapshot of the file in its current state in a safe way?
I think the author means that the value can change even inside a lifted function that can view it as a plain ByteString (no IO).
The meory mapped file is a region of memory. It doesn't make much sense to copy its content back and forth, for performance reasons (otherwise one could just do plain old stream-based I/O). So the ByteString you are getting is live.
If you want to have a snapshot, just use a stream-based I/O. That's what reading a file does: creates a file snapshot in the memory! I guess an alternative would be using the ForeignPtr interface which does not carry the referential transparency warning. I'm not familiar with ForeignPtrs so I cannot guarantee it will work, but it looks promising and I would investigate it.
You can also try calling map id on your ByteString but it is not guaranteed you will get a copy distinct from the original.
Mandatory file locking, especially on Linux, is a mess that is better avoided. Advisory file locking is OK, except nobody is using it, so it effectively does not exist.

ST Monad == code smell?

I'm working on implementing the UCT algorithm in Haskell, which requires a fair amount of data juggling. Without getting into too much detail, it's a simulation algorithm where, at each "step," a leaf node in the search tree is selected based on some statistical properties, a new child node is constructed at that leaf, and the stats corresponding to the new leaf and all of its ancestors are updated.
Given all that juggling, I'm not really sharp enough to figure out how to make the whole search tree a nice immutable data structure à la Okasaki. Instead, I've been playing around with the ST monad a bit, creating structures composed of mutable STRefs. A contrived example (unrelated to UCT):
import Control.Monad
import Control.Monad.ST
import Data.STRef
data STRefPair s a b = STRefPair { left :: STRef s a, right :: STRef s b }
mkStRefPair :: a -> b -> ST s (STRefPair s a b)
mkStRefPair a b = do
a' <- newSTRef a
b' <- newSTRef b
return $ STRefPair a' b'
derp :: (Num a, Num b) => STRefPair s a b -> ST s ()
derp p = do
modifySTRef (left p) (\x -> x + 1)
modifySTRef (right p) (\x -> x - 1)
herp :: (Num a, Num b) => (a, b)
herp = runST $ do
p <- mkStRefPair 0 0
replicateM_ 10 $ derp p
a <- readSTRef $ left p
b <- readSTRef $ right p
return (a, b)
main = print herp -- should print (10, -10)
Obviously this particular example would be much easier to write without using ST, but hopefully it's clear where I'm going with this... if I were to apply this sort of style to my UCT use case, is that wrong-headed?
Somebody asked a similar question here a couple years back, but I think my question is a bit different... I have no problem using monads to encapsulate mutable state when appropriate, but it's that "when appropriate" clause that gets me. I'm worried that I'm reverting to an object-oriented mindset prematurely, where I have a bunch of objects with getters and setters. Not exactly idiomatic Haskell...
On the other hand, if it is a reasonable coding style for some set of problems, I guess my question becomes: are there any well-known ways to keep this kind of code readable and maintainable? I'm sort of grossed out by all the explicit reads and writes, and especially grossed out by having to translate from my STRef-based structures inside the ST monad to isomorphic but immutable structures outside.
I don't use ST much, but sometimes it is just the best solution. This can be in many scenarios:
There are already well-known, efficient ways to solve a problem. Quicksort is a perfect example of this. It is known for its speed and in-place behavior, which cannot be imitated by pure code very well.
You need rigid time and space bounds. Especially with lazy evaluation (and Haskell doesn't even specify whether there is lazy evaluation, just that it is non-strict), the behavior of your programs can be very unpredictable. Whether there is a memory leak could depend on whether a certain optimization is enabled. This is very different from imperative code, which has a fixed set of variables (usually) and defined evaluation order.
You've got a deadline. Although the pure style is almost always better practice and cleaner code, if you are used to writing imperatively and need the code soon, starting imperative and moving to functional later is a perfectly reasonable choice.
When I do use ST (and other monads), I try to follow these general guidelines:
Use Applicative style often. This makes the code easier to read and, if you do switch to an immutable version, much easier to convert. Not only that, but Applicative style is much more compact.
Don't just use ST. If you program only in ST, the result will be no better than a huge C program, possibly worse because of the explicit reads and writes. Instead, intersperse pure Haskell code where it applies. I often find myself using things like STRef s (Map k [v]). The map itself is being mutated, but much of the heavy lifting is done purely.
Don't remake libraries if you don't have to. A lot of code written for IO can be cleanly, and fairly mechanically, converted to ST. Replacing all the IORefs with STRefs and IOs with STs in Data.HashTable was much easier than writing a hand-coded hash table implementation would have been, and probably faster too.
One last note - if you are having trouble with the explicit reads and writes, there are ways around it.
Algorithms which make use of mutation and algorithms which do not are different algorithms. Sometimes there is a strightforward bounds-preserving translation from the former to the latter, sometimes a difficult one, and sometimes only one which does not preserve complexity bounds.
A skim of the paper reveals to me that I don't think it makes essential use of mutation -- and so I think a potentially really nifty lazy functional algorithm could be developed. But it would be a different but related algorithm to that described.
Below, I describe one such approach -- not necessarily the best or most clever, but pretty straightforward:
Here's the setup a I understand it -- A) a branching tree is constructed B) payoffs are then pushed back from the leafs to the root which then indicates the best choice at any given step. But this is expensive, so instead, only portions of the tree are explored to the leafs in a nondeterministic manner. Furthermore, each further exploration of the tree is determined by what's been learned in previous explorations.
So we build code to describe the "stage-wise" tree. Then, we have another data structure to define a partially explored tree along with partial reward estimates. We then have a function of randseed -> ptree -> ptree that given a random seed and a partially explored tree, embarks on one further exploration of the tree, updating the ptree structure as we go. Then, we can just iterate this function over an empty seed ptree to get a list of increasingly more sampled spaces in the ptree. We then can walk this list until some specified cutoff condition is met.
So now we've gone from one algorithm where everything is blended together to three distinct steps -- 1) building the whole state tree, lazily, 2) updating some partial exploration with some sampling of a structure and 3) deciding when we've gathered enough samples.
It's can be really difficult to tell when using ST is appropriate. I would suggest you do it with ST and without ST (not necessarily in that order). Keep the non-ST version simple; using ST should be seen as an optimization, and you don't want to do that until you know you need it.
I have to admit that I cannot read the Haskell code. But if you use ST for mutating the tree, then you can probably replace this with an immutable tree without losing much because:
Same complexity for mutable and immutable tree
You have to mutate every node above the new leaf. An immutable tree has to replace all nodes above the modified node. So in both cases the touched nodes are the same, thus you don't gain anything in complexity.
For e.g. Java object creation is more expensive than mutation, so maybe you can gain a bit here in Haskell by using mutation. But this I don't know for sure. But a small gain does not buy you much because of the next point.
Updating the tree is presumably not the bottleneck
The evaluation of the new leaf will probably be much more expensive than updating the tree. At least this is the case for UCT in computer Go.
Use of the ST monad is usually (but not always) as an optimization. For any optimization, I apply the same procedure:
Write the code without it,
Profile and identify bottlenecks,
Incrementally rewrite the bottlenecks and test for improvements/regressions,
The other use case I know of is as an alternative to the state monad. The key difference being that with the state monad the type of all of the data stored is specified in a top-down way, whereas with the ST monad it is specified bottom-up. There are cases where this is useful.

Resources