I wonder what is the memory footprint of a variable from type IORef a if I know that the size of a is x.
Also what is the expected performance of the function writeIORef applied to integer compare to say a regular variable assignment (like x = 3) in, say, Java ?
In Haskell, an IORef a behaves like a single-element mutable array. The definition of IORef is the following, disregarding newtype wrapping:
data IORef a = IORef (MutVar# RealWorld a)
Here, MutVar# RealWorld a is a primitive mutable reference type. It is a pointer which points to two words, a header, and a payload which is itself a pointer to a normal lifted Haskell object. Hence the overhead of MutVar is two words (16 byte on 64 bit systems) and one indirection.
The overhead of MutVar# is thus one extra indirection and one extra header word. This is unavoidable. In contrast, the overhead of the IORef constructor is also one header word and one indirection, but it can be eliminated by unpacking IORef:
data Foo a = Foo !(IORef a) a a
Here, the bang on IORef causes the underlying MutVar to be unpacked into Foo. But while this unpacking works whenever we define new data types, it does not work if we use any existing parameterized type, like lists. In [IORef a], we pay the full cost with two extra indirections.
IORef will be also generally unpacked by GHC optimization if it is used as an argument to a function: IORef a -> b will be generally unboxed to MutVar# RealWorld a -> b, if you compile with optimization.
However, all of the above overheads are less important than the overhead in garbage collection when you use a large number of IORef-s. To avoid that, it is advisable to use a single mutable array instead of many IORef-s.
Related
For example, function length abstracts concrete sequence (Foldable), but do not abstract concrete integral type Int:
length :: Foldable t => t a -> Int
Would it be more useable or more convenient to have following type signature?
length' :: Foldable t, Integral i => t a -> i
Yes, perhaps, however first take note that more polymorphism is not always a good thing. If all functions have highly polymorphic arguments and results, then the compiler has little information to start type inference, so you end up having to type more awkward local signatures.
Now as for length, there is very little reason why it you'd want a result of any other Integral type but Int, at least not on a 64-bit machine:
Smaller types like Word16 don't usually give much performance- or memory advantage in Haskell, because there'll be some boxing somwhere, and then you have a 64-bit pointer to only 16 bits of information... bit silly.
It's basically impossible to have a list (let alone array or map) so large that its length can't be measured with a 63 bit word. Even for a crazy lazy list that never completely exists in memory at any time.Now, strictly speaking, Int only guarantees 29 which can in some extreme cases be exhausted, but practically this is only relevant on 32-bit platforms which are anyways more limited in memory and performance, so you wouldn't want to juggle such huge data.
For that matter, any application where performance or exhaustion could possibly an issue should probably be optimised to something more efficient than lists or other Foldable (which are always boxed due to parametricity); unboxed vectors or ByteStrings perform much better.
If you need the result to be Integer or something else more expensive, not for length but simply context reasons, it is probably a better idea to just calculate the length in Int and convert once at the end, rather than dragging around a slow addition through the entire list.
That said, I do in fact sometimes wish the signature was in fact that of the actually existing
genericlength :: (Foldable t, Num i) => t a -> i
...note that the function doesn't need to know its result will be integral. And in fact it would often be pretty useful to make the result rational, so we could then just write
average :: Fractional n => [n] -> n
average l = sum l / length l
instead of needing an extra fromIntegral.
The reason this isn't the case in Prelude.length? I don't know, seems rather historical. The rationale is probably as I said in the beginning that you don't want too much polymorphism. Then again, IMO it's usually a good idea to make function results as polymorphic as possible, and rather constrain the arguments a bit more tightly, because then the result will always be bound to something that can be used for type inference.
I'll motivate the general question with a more specific one:
In GHC Haskell, should a Cofree [] a have the same performance as a containers-style Data.Tree a? Or does the additional polymorphism result in some kind of runtime cost?
Generally speaking, is there additional runtime cost associated with increasing "arity" of a type's kind?
I think a more classic concrete example would be something like vectors or arrays. The vector package exports both "boxed" and "unboxed" vectors. While boxed vectors can contain any Haskell type (including functions), unboxed vectors requires its elements to be an instance of the Unbox type class. Although this implies a more efficient packed memory representation without pointer indirections, you can't define a Functor instance for unboxed vectors anymore, so it comes with a loss of generality.
If you made use of
fmap :: (a -> b) -> Vector a -> Vector b
in a function of type
f :: Functor f => f SomeType -> f SomeOtherType
a "Dictionary", that is, a record with the corresponding fmap implementation will be passed at run-time as an additional implicit argument. You can actually see this by looking at the "Core" output produced by GHC, using the -ddump-simpl flag. Specifically, the arity of f above would be two instead of one.
In some cases, GHC can optimize this overhead away by creating specialized versions of your functions. You can help out via using SPECIALIZE/INLINABLE/... pragmas, using explicit export lists, maybe adding some strictness, and a few other tweaks which are also described in the documentation.
Regarding the overhead of using parametrically polymorphic types, it of course depends. My personal worst case was a factor 100 in an inner numeric loop (which was resolved by adding one SPECIALIZE pragma), so it can indeed bite you. Luckily, using the profiling tools and remembering that dictionaries influence a function's arity, tracking these issues down becomes very systematic.
GHC Haskell exposes the prim package, which contains definitions of unboxed values, such as Int#, Char#, etc.
How do they differ from the default Int, Char, etc., types in regular Haskell? An assumption would be that they're faster, but why?
When should one reach down to use these instead of the boxed regular alternatives?
How does using boxed vs unboxed value affect the program?
In simple terms, a value of type Int may be an unevaluated expression. The actual value isn't calculated until you "look at" the value.
A value of type Int# is an evaluated result. Always.
As a result of this, a Int a data structure that lives on the heap. An Int# is... just a 32-bit integer. It can live in a CPU register. You can operate on it with a single machine instruction. It has almost no overhead.
By contrast, when you write, say, x + 1, you're not actually computing x + 1, you're creating a data structure on the heap that says "when you want to compute this, do x + 1".
Put simply, Int# is faster, because it can't be lazy.
When should you use it? Almost never. That's the compiler's job. The idea being that you write nice high-level Haskell code involving Int, and the compiler figures out where it can replace Int with Int#. (We hope!) If it doesn't, it's almost always easier to throw in a few strictness annotations rather than play with Int# directly. (It's also non-portable; only GHC uses Int# - although currently there aren't really any other widely used Haskell compilers.)
https://stackoverflow.com/a/15243682/944430
And then there is coding: use unboxed types (no GC), minimize lazy structure allocation. Keep long lived data around in packed form. Test and benchmark.
1.) What are unboxed types? I am pretty sure he is speaking about data types, something like Just x or IO y (boxed). But what about newtypes? If I understood it correctly, newtype has no overhead at all and therefore shouldn't count as a boxed type?
2.) What does he mean by Keep long lived data around in packed form.?
3.) What else can I do to prevent GC pauses?
1 .
Unboxed types are the primitives in Haskell. For example, Int is defined as: data Int = GHC.Types.I# GHC.Prim.Int# (for the GHC compiler). The trailing # symbol is used to indicate primitives (this is only convention). Primitives don't really exist in Haskell. You can't define additional primitives. When they appear in code, the compiler is responsible for translating them to 'real' function calls (functions can be primitives too) and datatypes.
Yes, newtype does not 'box' a type additionally. But you can't have a newtype containing a primitive - newtype Int2 = Int2 Int# is invalid while data Int2 = Int2 Int# is fine.
The main difference between primitive and boxed types in the context of the question you linked is how they are represented in memory. A primitive type means there are no pointers to follow. A pointer to an Int# must point to the value of the number, whereas a pointer to an Int may point to a thunk which points to a thunk ... etc. Note that this means that primitives are always strict. If you believe this will be an issue, use the UNPACK pragma, which removes any 'intermediate' boxing. That is,
data D = D (Int, Int)
is stored as a pointer (D) to a pointer (the tuple) to a block of memory containing two pointers (Ints) which each point to an actual Int#. However,
data D = D {-# UNPACK #-} !(Int, Int)
is stored as a pointer (D) to two Ints, thereby removing one level of boxing. Note the !. This indicates that that field is strict and is required for UNPACK.
2 . Any data which is going to be called with polymorphic functions should be kept packed, as unpacked data passed to polymorphic functions will be repacked anyways (introducing unnecessary overhead). The reasoning behind keeping long-lived data packed is that it is more likely to be used in an intermediate datatype or function which will require repacking, while this is easier to control with short-lived data which is only passed to a few functions before being garbage collected.
3 . In 99% of cases, you won't have issues with garbage collector pauses. In general, there aren't things you can do to guarantee the GC will not pause. The only suggestion I have is, don't try to reinvent the wheel. There are libraries designed for high-performance computations with large amounts of data (repa, vector, etc). If you try to implement it yourself, chances are, they did it better!
If you define data Int2 = Int, You could think of Int# being unboxed, plain Int being boxed and Int2 as "double boxed". Is you used newtype instead of data, it would have avoided one indirection. But Int itself is still boxed. Therefore Int2 is boxed too.
As for packed form, without going into to much details, it intuitively is similar to this kind of C-code.
struct PackedCoordinate {
int x;
int y;
}
struct UnpackedCoordinate {
int *x;
int *y;
}
I'm not sure why he suggested long lived data to be in packed form. Anyhow, it seems from the documentation I linked to that one should be careful using the {-# UNPACK #-} pragma, because if you're unlucky, GHC might need to repack it's values before function calls, making it allocating more memory than it would if it wasn't unpacked to begin with.
In order to avoid garbage collections. I think you should approach this as anything else related to profiling: Find the bottle-neck in your program and then work from there.
PS. Please comment on anything I happen to be incorrect about. :)
I have a function
slow :: Double -> Double
which gets called very often (hundreds of millions of times), but only gets called on about a thousand discrete values. This seems like an excellent candidate for memoization, but I can't figure out how to memoize a function of a Double.
The standard technique of making a list doesn't work, since it's not an integral type. I looked at Data.MemoCombinators, but it doesn't natively support Doubles. There was a bits function for handling more data types, but Double isn't an instance of Data.Bits.
Is there an elegant way to memoize `slow?
You could always use ugly-memo. The internals are impure, but it's fast and does what you need (except if the argument is NaN).
I think StableMemo should do exactly what you want, but I don't have any experience with that.
There are two main approaches: use the Ord property to store the keys in a tree structure, like Map. That doesn't require the integral property you'd need for e.g. a MemoTrie approach; it is thus slower but very simple.
The alternative, that works with yet much more general types, is to map unorderedly onto a large integral domain with a Hash function, in order to, well, store the keys in a hash map. This is going to be substantially faster but pretty much just as simple since the interface of HashMap largely matches that of Map, so you probably want to go that way.
Now, sadly neither is quite as simple to use as MemoCombinators. That builds directly on IntTrie, which is specialised for offering a lazy / infinite / pure interface. Both Map and particularly HashMap, in contrast, can be used very well for impure memoisation, but are not inherently able to do it purely. You may throw in some UnsafePerformIO (uh oh), or just do it openly in the IO monad (yuk!). Or use StableMemo.
But it's actually easy and safe if you already know which values it's going to be, at least most of the calls, at compile-time. Then you can just fill a local hash map with those values in the beginning, and at each call look up if it's there and otherwise simply call the expensive function directly:
import qualified Data.HashMap.Lazy as HM
type X = Double -- could really be anything else
notThatSlow :: Double -> X
notThatSlow = \v -> case HM.lookup v memo of
Just x -> x
Nothing -> slow v
where memo = HM.fromList [ (v, x) | v<-expectedValues, let x = slow v ]