In the Data.ByteString.Internal, the ByteString has constructor
PS !!(ForeignPtr Word8) !!Int !!Int
What does these double exclamations mean here? I searched and just got that (!!) can be used to index a list (!!) :: [a] -> Int -> a.
This is not part of the actual Haskell source but an (undocumented) feature of how Haddock renders unboxed data types. See https://mail.haskell.org/pipermail/haskell-cafe/2009-January/054135.html:
2009/1/21 Stephan Friedrichs <...>:
Hi,
using haddock-2.4.1 and this file:
module Test where
data Test
= NonStrict Int
| Strict !Int
| UnpackedStrict {-# UNPACK #-} !Int
The generated documentation looks like this:
data Test
Constructors
NonStrict Int
Strict !Int
UnpackedStrict !!Int
Note the double '!' in the last constructor. This is not intended
behaviour, is it?
This is the way GHC pretty prints unboxed types, so I thought Haddock
should follow the same convention. Hmm, perhaps Haddock should have a
chapter about language extensions in its documentation, with a
reference to the GHC documentation. That way the language used is at
least documented. Not sure if it helps in this case though, since "!!"
is probably not documented there.
Perhaps we should not display unbox annotations at all since they are
an implementation detail, right? We could display one "!" instead,
indicating that the argument is strict.
David
Related
I'm trying to use Data.Word but I don't even understand its source code. Here are some precise questions I have, but if you have a better resource for using Word or similar libraries, that might be helpful too.
For reference, let's look at the implementation of Word8 (source)
data {-# CTYPE "HsWord8" #-} Word8 = W8# Word#
What are the #s? As far as I can tell it's not part of the name or a regular function.
What is the declaration before Word8 ({-# CTYPE "HsWord8" #-})? I have seen those as language declarations at the begining of files but never in a definition.
As far as I can tell W8 or W8# (I don't even know how to parse it) is not defined anywhere else on the file or imported. Is it being implicitly defined here or em I missing something?
Similarly Word# is used in all definitions of Word, but I don't see it defined anywhere... where is it coming from and how can I see its definition?
What are the #s?
They are only marginally more special than the Ws, os, rs, and ds -- just part of the name. Standard Haskell doesn't allow this in a name, but it's just a syntactic extension (named MagicHash) -- nothing deep happening here. As a convention, GHC internals use # suffixes on types to indicate they are unboxed, and # suffixes on constructors to indicate they box up unboxed types, but these are just conventions, and not enforced by the compiler or anything like that.
What is the declaration before Word8 ({-# CTYPE "HsWord8" #-})?
CTYPE declares that, when using the foreign function interface to marshall this type to C, the appropriate C type to marshall it to is HsWord8 -- a type defined in the GHC runtime's C headers.
As far as I can tell W8 or W8# (I don't even know how to parse it) is not defined anywhere else on the file or imported. Is it being implicitly defined here?
Well, it is being defined there, but I wouldn't call it implicit; it's quite explicit! Consider this typical Haskell data declaration:
data Foo = Bar Field1 Field2
It defines two new names: Foo, a new type at the type level, and Bar, a new function at the computation level which takes values of type Field1 and Field2 and constructs a value of type Foo. Similarly,
data Word8 = W8# Word#
defines a new type Word8 and a new constructor function W8# :: Word# -> Word8.
Similarly Word# is used in all definitions of Word, but I don't see it defined anywhere... where is it coming from and how can I see it's definition.
Word# may be imported from GHC.Exts. You can discover this yourself via Hoogle. It is a compiler primitive, so while it is possible to look at its source, the thing you would be looking at would be metacode, not code: it would not be valid Haskell code declaring the type with a standard data declaration and listing constructors, but rather some combination of C code and Haskell code describing how to lay out bits in memory, emit assembly instructions for modifying it, and interact with the garbage collector.
Well, #DanielWagner covered most of this, but I was just about done writing this up, so maybe it'll provide some additional detail... I was originally confused about the nature of the "definitions" in GHC.Prim, so I've updated my answer with a correction.
You should be able to effectively use the Data.Word types without understanding the source code in GHC.Word.
Data.Word just provides a family of unsigned integral types of fixed bitsize (Word8, Word16, Word32, and Word64) plus a Word type of "default size" (same size as Int, so 64 bits on 64-bit architectures). Because these types all have Num and Integral instances, the usual operations on integers are available, and overflow is handled the usual way. If you want to use them as bit fields, then the facilities in Data.Bits will be helpful.
In particular, I don't see anything in the GHC.Word source that could possibly help you write "normal" code using these types.
That being said, the # character is not normally allowed in identifiers, but it can be permitted (only as a final character, so W# is okay but not bad#Identifier) by enabling the MagicHash extension. There is nothing special about such identifiers EXCEPT that specific identifiers are treated "magically" by the GHC compiler, and by convention these magic identifiers, plus some other identifiers that aren't actually "magic" but are intended for internal use only, use a final # character to mark them as special so they don't accidentally get used by someone who is trying to write "normal" Haskell code.
To illustrate, in the definition:
data {-# CTYPE "HsWord8" #-} Word8 = W8# Word#
the identifier W8# is not magic. It's just a regular constructor that's intended only for internal, or at least advanced, use. On the other hand, Word# is magic. It's internally defined by GHC as an "unboxed" unsigned integer (64 bits on 64-bit architectures) where "unboxed" here means that it's stored directly in memory in an 8-byte field without an extra field for its constructor.
You can find a nonsensical "definition", of sorts, in the source code for GHC.Prim:
data Word#
In normal Haskell code, this would define a data type Word# with no constructor. Such a data type would be "uninhabited", meaning it has no possible values. However, this definition isn't actually used. This GHC.Prim source code is automatically generated for the benefit of the Haddock documentation utility. Instead, GHC.Prim is a sort of virtual module, and its "real" implementation is build into the GHC compiler.
How do you know which identifiers ending in # are magic and which aren't? Well, you don't know just by looking at the names. I believe you can reliably tell by checking in GHCi if they are defined in the virtual GHC.Prim module:
> :set -XMagicHash
> import GHC.Prim
> :i Word#
data Word# :: TYPE 'GHC.Types.WordRep -- Defined in ‘GHC.Prim’
Anything defined in GHC.Prim is magic, and anything else isn't. In the generated GHC.Prim source, these magic identifiers will show up with nonsense definitions like:
data Foo#
or:
bar# = bar#
Constructs of the form {-# WHATEVER #-} are compiler pragmas. They provide special instructions to the compiler that relate to the source file as a whole or, usually, to "nearby" Haskell code. Some pragmas are placed at the top of the source file to enable language extensions or set compiler flags:
{-# LANGUAGE FlexibleInstances #-}
{-# OPTIONS_GHC -Wall #-}
Others are interleaved with Haskell code to influence the compiler's optimizations:
double :: Int -> Int
{-# NOINLINE double #-}
double x = x + x
or to specify special memory layout or handling of data structures:
data MyStructure = MyS {-# UNPACK #-} !Bool {-# UNPACK #-} !Int
These pragmas are documented in the GHC manual. The CTYPE pragma is a rather obscure pragma that relates to how the Word type will be interpreted when used with the foreign function interface and the capi calling convention. If you aren't planning to call C functions from a Haskell program using the capi calling convention, you can ignore it.
There is unboxed types GHC for Int, Float, etc.
I know about code built on them is running with less overhead,
but I don't see a way how to input and output data to/from a function based on unboxed Int i.e.
GHC.Exts defines functions (+#) and (*#), but I cannot find function boxing/unboxy
readInt:: String -> Int#
showInt:: Int# -> String
boxInt :: Int# -> Int
unboxInt :: Int -> Int#
instance Show Int# and instance Read Int# cannot exist because show and read polymorphic.
Without these function how could I integrated optimized code block on unboxed types with the rest of application?
Int, Float, etc. are just data types in GHC:
data Int = I# Int#
data Float = F# Float#
-- etc.
The constructors are only exported by GHC.Exts. Import it and use the constructors to convert:
{-# LANGUAGE MagicHash #-}
import GHC.Exts
main = do I# x <- readLn
I# y <- readLn
print (I# (x +# y))
I know about code built on them is running with less overhead
Though this is true in a sense, it's nothing you should normally worry about. GHC tries very hard to optimise away the boxes of built-in types, and I'd expect that it manages to do it well in most cases where you could do that manually as well.
In practice, what you should be more careful about is to ensure
that it actually sees the concrete Int or Float type for which it knows the unboxed form. In particular, this does not work for polymorphic functions (polymorphism generally relies on the boxes, like it does in OO languages).If you want a function to be polymorphic and still run fast with the primitive types, make sure you add a SPECIALIZE annotation and/or rewrite rules.
that laziness doesn't get in the way. Unboxed types are always strict, so strictness annotations can make it a lot easier for GHC to remove the boxes.
And of course profile your code.
Only if you're really sure you want it (e.g. to ensure that the boxes won't re-appear when a new GHC that optimises differently), or maybe if you'd like to get SIMD instructions in, should you actually do manual accesses to unboxed primitive types.
I have the following problem: I defined a type class and want to declare tuples of types of this class to be instances as well. But I don't know how to get GHC to accept this declaration. Here a very simple example:
class Test a where
elm :: a
And know for tuples I want to do something like
instance (Test a, Test b) => Test (a,b) where
elm = (elm, elm) :: (a,b)
(Actually, I want to do something similar for more fancy type classes corresponding to vector spaces.)
How can this been done? Thanks in advance for any suggestions!
Try this instead:
instance (Test a, Test b) => Test (a,b) where
elm = (elm, elm)
This should work. The issue with you code, is that the :: (a,b) type annotation that you added is actually confusing GHC instead of helping it. The problem is that when GHC sees a and b it thinks that they represent some arbitrary types. But you don't want them to be arbitrary, you want them to be exact same types referenced in the above line. But GHC doesn't know that. If you leave the type annotation out, GHC will figure the correct types itself. Alternatively, you can change GHC's behavior by enabling the ScopedTypeVariables language extension, by adding the following at the top of your file:
{-# LANGUAGE ScopedTypeVariables #-}
This will tell GHC that whenever there is a class definition, the type variables referenced at the top line, will be in scope for the rest of the definition. I am one of those people who think that ScopedTypeVariables should have been on by default but unfortunately this is not the case, mostly for historical & compatibility reasons. In fact, this question provides a good argument why ScopedTypeVariables being off by default is counterintuitive.
Learn You a Haskell has a code example like this:
ghci> B.pack [99,97,110]
Chunk "can" Empty
(B stands for Data.ByteString.Lazy)
But my ghci does not show Chunk and Empty data constructors.
> B.pack [99,97,110]
"can"
Did Haskell developers change the way the values of ByteString are printed?
Looks like Duncan added hand-written Show instance for lazy ByteString somewhere between 0.9.2.1 and 0.10.0.1. See http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/src/Data-ByteString-Lazy-Internal.html#ByteString
Add: Here is the relevant patch
Old versions of BL.ByteString simple have a deriving Show in their data declaration. This results in the GHCi output as shown in LYAH, and ensures the output is valid Haskell code. The nice plain string "can" isn't really a valid Haskell representation of that bytestring – that is, not a valid Haskell 98 representation. However, it is common to use {-# LANGUAGE OverloadedStrings #-} in modules that use bytestrings, which makes it valid. Which is probably the reason that there is now (since 0.10) this nicer-to-read manual instance.
I am starting Haskell and was looking at some libraries where data types are defined with "!". Example from the bytestring library:
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
Now I saw this question as an explanation of what this means and I guess it is fairly easy to understand. But my question is now: what is the point of using this? Since the expression will be evaluated whenever it is need, why would you force the early evaluation?
In the second answer to this question C.V. Hansen says: "[...] sometimes the overhead of lazyness can be too much or wasteful". Is that supposed to mean that it is used to save memory (saving the value is cheaper than saving the expression)?
An explanation and an example would be great!
Thanks!
[EDIT] I think I should have chosen an example without {-# UNPACK #-}. So let me make one myself. Would this ever make sense? Is yes, why and in what situation?
data MyType = Const1 !Int
| Const2 !Double
| Const3 !SomeOtherDataTypeMaybeMoreComplex
The goal here is not strictness so much as packing these elements into the data structure. Without strictness, any of those three constructor arguments could point either to a heap-allocated value structure or a heap-allocated delayed evaluation thunk. With strictness, it could only point to a heap-allocated value structure. With strictness and packed structures, it's possible to make those values inline.
Since each of those three values is a pointer-sized entity and is accessed strictly anyway, forcing a strict and packed structure saves pointer indirections when using this structure.
In the more general case, a strictness annotation can help reduce space leaks. Consider a case like this:
data Foo = Foo Int
makeFoo :: ReallyBigDataStructure -> Foo
makeFoo x = Foo (computeSomething x)
Without the strictness annotation, if you just call makeFoo, it will build a Foo pointing to a thunk pointing to the ReallyBigDataStructure, keeping it around in memory until something forces the thunk to evaluate. If we instead have
data Foo = Foo !Int
This forces the computeSomething evaluation to proceed immediately (well, as soon as something forces makeFoo itself), which avoids leaving a reference to the ReallyBigDataStructure.
Note that this is a different use case than the bytestring code; the bytestring code forces its parameters quite frequently so it's unlikely to lead to a space leak. It's probably best to interpret the bytestring code as a pure optimization to avoid pointer dereferences.