Understanding the implementation of Data.Word

Understanding the implementation of Data.Word - haskell

I'm trying to use Data.Word but I don't even understand its source code. Here are some precise questions I have, but if you have a better resource for using Word or similar libraries, that might be helpful too.
For reference, let's look at the implementation of Word8 (source)
data {-# CTYPE "HsWord8" #-} Word8 = W8# Word#
What are the #s? As far as I can tell it's not part of the name or a regular function.
What is the declaration before Word8 ({-# CTYPE "HsWord8" #-})? I have seen those as language declarations at the begining of files but never in a definition.
As far as I can tell W8 or W8# (I don't even know how to parse it) is not defined anywhere else on the file or imported. Is it being implicitly defined here or em I missing something?
Similarly Word# is used in all definitions of Word, but I don't see it defined anywhere... where is it coming from and how can I see its definition?

What are the #s?
They are only marginally more special than the Ws, os, rs, and ds -- just part of the name. Standard Haskell doesn't allow this in a name, but it's just a syntactic extension (named MagicHash) -- nothing deep happening here. As a convention, GHC internals use # suffixes on types to indicate they are unboxed, and # suffixes on constructors to indicate they box up unboxed types, but these are just conventions, and not enforced by the compiler or anything like that.
What is the declaration before Word8 ({-# CTYPE "HsWord8" #-})?
CTYPE declares that, when using the foreign function interface to marshall this type to C, the appropriate C type to marshall it to is HsWord8 -- a type defined in the GHC runtime's C headers.
As far as I can tell W8 or W8# (I don't even know how to parse it) is not defined anywhere else on the file or imported. Is it being implicitly defined here?
Well, it is being defined there, but I wouldn't call it implicit; it's quite explicit! Consider this typical Haskell data declaration:
data Foo = Bar Field1 Field2
It defines two new names: Foo, a new type at the type level, and Bar, a new function at the computation level which takes values of type Field1 and Field2 and constructs a value of type Foo. Similarly,
data Word8 = W8# Word#
defines a new type Word8 and a new constructor function W8# :: Word# -> Word8.
Similarly Word# is used in all definitions of Word, but I don't see it defined anywhere... where is it coming from and how can I see it's definition.
Word# may be imported from GHC.Exts. You can discover this yourself via Hoogle. It is a compiler primitive, so while it is possible to look at its source, the thing you would be looking at would be metacode, not code: it would not be valid Haskell code declaring the type with a standard data declaration and listing constructors, but rather some combination of C code and Haskell code describing how to lay out bits in memory, emit assembly instructions for modifying it, and interact with the garbage collector.

Well, #DanielWagner covered most of this, but I was just about done writing this up, so maybe it'll provide some additional detail... I was originally confused about the nature of the "definitions" in GHC.Prim, so I've updated my answer with a correction.
You should be able to effectively use the Data.Word types without understanding the source code in GHC.Word.
Data.Word just provides a family of unsigned integral types of fixed bitsize (Word8, Word16, Word32, and Word64) plus a Word type of "default size" (same size as Int, so 64 bits on 64-bit architectures). Because these types all have Num and Integral instances, the usual operations on integers are available, and overflow is handled the usual way. If you want to use them as bit fields, then the facilities in Data.Bits will be helpful.
In particular, I don't see anything in the GHC.Word source that could possibly help you write "normal" code using these types.
That being said, the # character is not normally allowed in identifiers, but it can be permitted (only as a final character, so W# is okay but not bad#Identifier) by enabling the MagicHash extension. There is nothing special about such identifiers EXCEPT that specific identifiers are treated "magically" by the GHC compiler, and by convention these magic identifiers, plus some other identifiers that aren't actually "magic" but are intended for internal use only, use a final # character to mark them as special so they don't accidentally get used by someone who is trying to write "normal" Haskell code.
To illustrate, in the definition:
data {-# CTYPE "HsWord8" #-} Word8 = W8# Word#
the identifier W8# is not magic. It's just a regular constructor that's intended only for internal, or at least advanced, use. On the other hand, Word# is magic. It's internally defined by GHC as an "unboxed" unsigned integer (64 bits on 64-bit architectures) where "unboxed" here means that it's stored directly in memory in an 8-byte field without an extra field for its constructor.
You can find a nonsensical "definition", of sorts, in the source code for GHC.Prim:
data Word#
In normal Haskell code, this would define a data type Word# with no constructor. Such a data type would be "uninhabited", meaning it has no possible values. However, this definition isn't actually used. This GHC.Prim source code is automatically generated for the benefit of the Haddock documentation utility. Instead, GHC.Prim is a sort of virtual module, and its "real" implementation is build into the GHC compiler.
How do you know which identifiers ending in # are magic and which aren't? Well, you don't know just by looking at the names. I believe you can reliably tell by checking in GHCi if they are defined in the virtual GHC.Prim module:
> :set -XMagicHash
> import GHC.Prim
> :i Word#
data Word# :: TYPE 'GHC.Types.WordRep -- Defined in ‘GHC.Prim’
Anything defined in GHC.Prim is magic, and anything else isn't. In the generated GHC.Prim source, these magic identifiers will show up with nonsense definitions like:
data Foo#
or:
bar# = bar#
Constructs of the form {-# WHATEVER #-} are compiler pragmas. They provide special instructions to the compiler that relate to the source file as a whole or, usually, to "nearby" Haskell code. Some pragmas are placed at the top of the source file to enable language extensions or set compiler flags:
{-# LANGUAGE FlexibleInstances #-}
{-# OPTIONS_GHC -Wall #-}
Others are interleaved with Haskell code to influence the compiler's optimizations:
double :: Int -> Int
{-# NOINLINE double #-}
double x = x + x
or to specify special memory layout or handling of data structures:
data MyStructure = MyS {-# UNPACK #-} !Bool {-# UNPACK #-} !Int
These pragmas are documented in the GHC manual. The CTYPE pragma is a rather obscure pragma that relates to how the Word type will be interpreted when used with the foreign function interface and the capi calling convention. If you aren't planning to call C functions from a Haskell program using the capi calling convention, you can ignore it.

Related

Can I avoid explicitly deriving built-in Haskell typeclasses over and over again?

I have a large type hierarchy in Haskell.
Counting family instances, which (can) have separate class membership after all, there are hundreds of data types.
Since the top-most type needs to implement built-in classes like Generic,Eq,Ord,Show, every single type in the hierarchy has to as well for a meaningful implementation overall. So my specification contains hundreds of times deriving (Generic,Eq,Ord,Show), which I would like to avoid cluttering the files.
A solution involving a single typeclass to attach everywhere like deriving GEOS with a single automatic derivation from that to the usual set in a centralized place would already help a lot with readability.
Another question asking for similar conciseness in constraints is solved by using constraint synonyms (so my GEOS would be not just linked to but explicitly made up of exactly the classes I want), which however apparently are currently prevented from being instantiated.
(A side question of mine would be why that is so. It seems to me like the reason #simonpj gives about the renamer not knowing what the type checker knows the synonym really to be would only apply to explicitly written out instance implementations.)
Maybe GHC.Generic itself (and something like generic-deriving) could help here?

You could of course use Template Haskell, to generate the deriving-clauses as -XStandaloneDeriving.
{-# LANGUAGE QuasiQuotes #-}
module GEOSDerive where
import Language.Haskell.TH
import Control.Monad
import GHC.Generics
deriveGEOS :: Q Type -> DecsQ
deriveGEOS t = do
t' <- t
forM [ [t|Generic|], [t|Eq|], [t|Ord|], [t|Show|] ] $ \c -> do
c' <- c
return $ StandaloneDerivD Nothing [] (AppT c' t')
Then,
{-# LANGUAGE TemplateHaskell, StandaloneDeriving, QuasiQuotes, DeriveGeneric #-}
import GEOSDerive
data Foo = Foo
deriveGEOS [t|Foo|]
But, I find it somewhat dubious that you need so many types in the first place, or rather that you have so many types but each of them has so little code associated with it that you're bothered about mentioning those four classes for each of them. It's not like there's anything to be concerned about regarding refactoring or so with those, so I'd rather recommend simply keeping with deriving (Generic, Eq, Ord, Show) for each of them.

Apply Haskell Language Pragma to Block of Code (rather than entire module)

Is it possible in Haskell to apply a language pragma to a block of code, rather than the entire file itself?
For example, I enable the -fwarn-monomorphism-restriction flag, but I have a couple of files where I'd really like to disable this flag, so I use {-# LANGUAGE NoMonomorphismRestriction #-} at the top of the file.
However, instead of applying this pragma to the entire module, I'd like to apply it only to the block of code where I don't think this warning is helpful. Only solution I can think of right now is move this block of code to its own file and then import it

In general there is no way to do this, no.
For this particular pragma, you can disable the monomorphism restriction for a single declaration by giving it a type signature. Although I strongly recommend giving a full signature, there may be some situation where that is undesirable for some reason; in such a case even a signature full of holes is sufficient, e.g.
{-# LANGUAGE PartialTypeSignatures #-}
x :: _ => _
x = (+)
will be inferred to have type Num a => a -> a -> a instead of Integer -> Integer -> Integer.

What's the usage of the double exclamations?

In the Data.ByteString.Internal, the ByteString has constructor
PS !!(ForeignPtr Word8) !!Int !!Int
What does these double exclamations mean here? I searched and just got that (!!) can be used to index a list (!!) :: [a] -> Int -> a.

This is not part of the actual Haskell source but an (undocumented) feature of how Haddock renders unboxed data types. See https://mail.haskell.org/pipermail/haskell-cafe/2009-January/054135.html:
2009/1/21 Stephan Friedrichs <...>:
Hi,
using haddock-2.4.1 and this file:
module Test where
data Test
= NonStrict Int
| Strict !Int
| UnpackedStrict {-# UNPACK #-} !Int
The generated documentation looks like this:
data Test
Constructors
NonStrict Int
Strict !Int
UnpackedStrict !!Int
Note the double '!' in the last constructor. This is not intended
behaviour, is it?
This is the way GHC pretty prints unboxed types, so I thought Haddock
should follow the same convention. Hmm, perhaps Haddock should have a
chapter about language extensions in its documentation, with a
reference to the GHC documentation. That way the language used is at
least documented. Not sure if it helps in this case though, since "!!"
is probably not documented there.
Perhaps we should not display unbox annotations at all since they are
an implementation detail, right? We could display one "!" instead,
indicating that the argument is strict.
David

How to make a tuple with components belonging to a type class an instance

I have the following problem: I defined a type class and want to declare tuples of types of this class to be instances as well. But I don't know how to get GHC to accept this declaration. Here a very simple example:
class Test a where
elm :: a
And know for tuples I want to do something like
instance (Test a, Test b) => Test (a,b) where
elm = (elm, elm) :: (a,b)
(Actually, I want to do something similar for more fancy type classes corresponding to vector spaces.)
How can this been done? Thanks in advance for any suggestions!

Try this instead:
instance (Test a, Test b) => Test (a,b) where
elm = (elm, elm)
This should work. The issue with you code, is that the :: (a,b) type annotation that you added is actually confusing GHC instead of helping it. The problem is that when GHC sees a and b it thinks that they represent some arbitrary types. But you don't want them to be arbitrary, you want them to be exact same types referenced in the above line. But GHC doesn't know that. If you leave the type annotation out, GHC will figure the correct types itself. Alternatively, you can change GHC's behavior by enabling the ScopedTypeVariables language extension, by adding the following at the top of your file:
{-# LANGUAGE ScopedTypeVariables #-}
This will tell GHC that whenever there is a class definition, the type variables referenced at the top line, will be in scope for the rest of the definition. I am one of those people who think that ScopedTypeVariables should have been on by default but unfortunately this is not the case, mostly for historical & compatibility reasons. In fact, this question provides a good argument why ScopedTypeVariables being off by default is counterintuitive.

Escaping monad IO

One of the things I like best about Haskell is how the compiler locates the side effects via the IO monad in function signatures. However, it seems easy to bypass this type check by importing 2 GHC primitives :
{-# LANGUAGE MagicHash #-}
import GHC.Magic(runRW#)
import GHC.Types(IO(..))
hiddenPrint :: ()
hiddenPrint = case putStrLn "Hello !" of
IO sideEffect -> case runRW# sideEffect of
_ -> ()
hiddenPrint is of type unit, but it does trigger a side effect when called (it prints Hello). Is there a way to forbid those hidden IOs (other than trusting no one imports GHC's primitives) ?

This is the purpose of Safe Haskell. If you add {-# language Safe #-} to the top of your source file, you will only be allowed to import modules that are either inferred safe or labeled {-# language Trustworthy #-}. This also imposes some mild restrictions on overlapping instances.

There are many ways in which you can break purity in Haskell. However, you have to go out of your way to find them. Here are a few:
Importing GHC internal modules to access low-level primitives
Using the FFI to import a C function as a pure one, when it is not pure
Using unsafePerformIO
Using unsafeCoerce
Using Ptr a and dereference pointers not pointing to valid data
Declaring your own Typeable instances, and lying so that you can abuse cast. (No longer possible in recent GHC)
Using the unsafe array operations
All these, and others, are not meant to be used regularly. They are there so that, if one is really, really sure, can tell the compiler "I know what I am doing, don't get in my way". Doing so, you take the burden of the proof -- proving that what you are doing is safe.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Understanding the implementation of Data.Word - haskell

Related

Can I avoid explicitly deriving built-in Haskell typeclasses over and over again?

Apply Haskell Language Pragma to Block of Code (rather than entire module)

What's the usage of the double exclamations?

How to make a tuple with components belonging to a type class an instance

Escaping monad IO

Categories

Resources