Learn You a Haskell has a code example like this:
ghci> B.pack [99,97,110]
Chunk "can" Empty
(B stands for Data.ByteString.Lazy)
But my ghci does not show Chunk and Empty data constructors.
> B.pack [99,97,110]
"can"
Did Haskell developers change the way the values of ByteString are printed?
Looks like Duncan added hand-written Show instance for lazy ByteString somewhere between 0.9.2.1 and 0.10.0.1. See http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/src/Data-ByteString-Lazy-Internal.html#ByteString
Add: Here is the relevant patch
Old versions of BL.ByteString simple have a deriving Show in their data declaration. This results in the GHCi output as shown in LYAH, and ensures the output is valid Haskell code. The nice plain string "can" isn't really a valid Haskell representation of that bytestring – that is, not a valid Haskell 98 representation. However, it is common to use {-# LANGUAGE OverloadedStrings #-} in modules that use bytestrings, which makes it valid. Which is probably the reason that there is now (since 0.10) this nicer-to-read manual instance.
Related
I'm trying to write a small GHC extension to get my hands on GHC hacking. As suggested by the GHC gitlab wiki, I started off by having a look at this article by SPJ and SM to get a feel for GHC's architecture.
I now have an idea for what I think would be a relatively small GHC extension that I would like to have a crack at as a first attempt at actually modifying GHC's source code: CoercibleStrings. The outcome would be similar in spirit to that of the existing OverloadedStrings GHC extension, however, the latter I believe "only" infers the type (Data.String.IsString a) => a for a string literal l during type checking and then desugars them to fromString l, where fromString :: (Data.String.IsString a) => String -> a is from the typeclass Data.String.IsString.
This comes in handy when using libraries that make frequent use of alternatives to Haskell's standard String type, particularly if we frequently wish to pass in arguments represented as string literals, e.g. libraries used for terminal IO. For instance, consider the logging library simple-logger, which makes frequent use of Data.Text.Text, and contains the function logError :: (?callStack :: CallStack) => MonadIO m => Text -> m ().
Rather than writing something like:
import qualified Data.Text as T
import Control.Logger.Simple
...
logError . T.pack $ "Some error occured."
Using the OverloadedStrings GHC extension, the last line can be replaced with:
logError $ "Some error occured."
As Data.Text.Text is an instance of the typeclass IsString, this type checks and desugars to something like:
($) logError (fromString "Some error occured.")
This can result in fairly significantly less cluttered code due to these conversions being elided, however, in a fairly similar and common case, where operations applied to the string literal force it to be typed as a String, this benefit is lost. For example, consider:
import qualified Data.Text as T
import Control.Logger.Simple
...
logError . T.pack $ "Some error occured # " ++ (show locus)
Where locus is bound in the enclosing context and has a type that is an instance of the typeclass Show.
Now, due to the type of the concatenation operator, (++) :: [a] -> [a] -> [a], the result of evaluating the expression "Some error occured # " ++ (show locus) must be of type [a], where a is such that [a] is an instance of the typeclass IsString, i.e. String.
Therefore, in this case, the OverloadedStrings extensions does not help us. To remedy this, I propose the CoercibleStrings GHC extension, which would:
Allow the type checker to unify the type String with any instance of the typeclass IsString.
During desugaring, insert an application of the function fromString where these coercions are necessary.
Although the architecture of such an extension would be quite different from that of the OverloadedStrings extension, the idea seems quite straightforward, while providing, I would argue, a fairly large advantage over it. This brings me to the question of why it has yet to be implemented.
There seem to me to be a few potential problems with such an extension that I can think of. The first is that, given that String is an instance of the typeclass IsString, a type inference rule of the form
∀a ∈ IsString. Γ ⊢ e :: String => Γ ⊢ e :: a
can be applied to an expression of type String an unbounded number of times. So it may not be possible to modify GHC's type inference algorithm in such a way as to guarantee termination of type checking given the addition of such a rule. I think however that this issue can be prevented by requiring that a is not String and only allowing GHC to apply this rule if no other rules apply.
The second is from point 2) above, which is where to insert the applications of fromString. The simplest answer I guess would be to apply it to every expression of type String and given that the definition of fromString for String is simply id, any unnecessary applications would likely be optimised away. This may of course lengthen compilation times somewhat. I believe however that this could be targeted better given information from the type checker.
The third is that this type of implicit coercion goes against the philosophy of strongly typed languages such as Haskell, or indeed that these implicit coercions may introduce extra run-time costs that will be invisible to the programmer.
The second and third points are not existential issues that would prevent the extension from being written, however, I am unsure if the solution that I am suggesting to the first point actually works/is reasonably implementable within the existing GHC infrastructure/does not interact badly with other existing extensions.
I'd welcome any comment on these points or any pointers to up-to-date information on GHC's type checking algorithm. Thank you!
I'm trying to use Data.Word but I don't even understand its source code. Here are some precise questions I have, but if you have a better resource for using Word or similar libraries, that might be helpful too.
For reference, let's look at the implementation of Word8 (source)
data {-# CTYPE "HsWord8" #-} Word8 = W8# Word#
What are the #s? As far as I can tell it's not part of the name or a regular function.
What is the declaration before Word8 ({-# CTYPE "HsWord8" #-})? I have seen those as language declarations at the begining of files but never in a definition.
As far as I can tell W8 or W8# (I don't even know how to parse it) is not defined anywhere else on the file or imported. Is it being implicitly defined here or em I missing something?
Similarly Word# is used in all definitions of Word, but I don't see it defined anywhere... where is it coming from and how can I see its definition?
What are the #s?
They are only marginally more special than the Ws, os, rs, and ds -- just part of the name. Standard Haskell doesn't allow this in a name, but it's just a syntactic extension (named MagicHash) -- nothing deep happening here. As a convention, GHC internals use # suffixes on types to indicate they are unboxed, and # suffixes on constructors to indicate they box up unboxed types, but these are just conventions, and not enforced by the compiler or anything like that.
What is the declaration before Word8 ({-# CTYPE "HsWord8" #-})?
CTYPE declares that, when using the foreign function interface to marshall this type to C, the appropriate C type to marshall it to is HsWord8 -- a type defined in the GHC runtime's C headers.
As far as I can tell W8 or W8# (I don't even know how to parse it) is not defined anywhere else on the file or imported. Is it being implicitly defined here?
Well, it is being defined there, but I wouldn't call it implicit; it's quite explicit! Consider this typical Haskell data declaration:
data Foo = Bar Field1 Field2
It defines two new names: Foo, a new type at the type level, and Bar, a new function at the computation level which takes values of type Field1 and Field2 and constructs a value of type Foo. Similarly,
data Word8 = W8# Word#
defines a new type Word8 and a new constructor function W8# :: Word# -> Word8.
Similarly Word# is used in all definitions of Word, but I don't see it defined anywhere... where is it coming from and how can I see it's definition.
Word# may be imported from GHC.Exts. You can discover this yourself via Hoogle. It is a compiler primitive, so while it is possible to look at its source, the thing you would be looking at would be metacode, not code: it would not be valid Haskell code declaring the type with a standard data declaration and listing constructors, but rather some combination of C code and Haskell code describing how to lay out bits in memory, emit assembly instructions for modifying it, and interact with the garbage collector.
Well, #DanielWagner covered most of this, but I was just about done writing this up, so maybe it'll provide some additional detail... I was originally confused about the nature of the "definitions" in GHC.Prim, so I've updated my answer with a correction.
You should be able to effectively use the Data.Word types without understanding the source code in GHC.Word.
Data.Word just provides a family of unsigned integral types of fixed bitsize (Word8, Word16, Word32, and Word64) plus a Word type of "default size" (same size as Int, so 64 bits on 64-bit architectures). Because these types all have Num and Integral instances, the usual operations on integers are available, and overflow is handled the usual way. If you want to use them as bit fields, then the facilities in Data.Bits will be helpful.
In particular, I don't see anything in the GHC.Word source that could possibly help you write "normal" code using these types.
That being said, the # character is not normally allowed in identifiers, but it can be permitted (only as a final character, so W# is okay but not bad#Identifier) by enabling the MagicHash extension. There is nothing special about such identifiers EXCEPT that specific identifiers are treated "magically" by the GHC compiler, and by convention these magic identifiers, plus some other identifiers that aren't actually "magic" but are intended for internal use only, use a final # character to mark them as special so they don't accidentally get used by someone who is trying to write "normal" Haskell code.
To illustrate, in the definition:
data {-# CTYPE "HsWord8" #-} Word8 = W8# Word#
the identifier W8# is not magic. It's just a regular constructor that's intended only for internal, or at least advanced, use. On the other hand, Word# is magic. It's internally defined by GHC as an "unboxed" unsigned integer (64 bits on 64-bit architectures) where "unboxed" here means that it's stored directly in memory in an 8-byte field without an extra field for its constructor.
You can find a nonsensical "definition", of sorts, in the source code for GHC.Prim:
data Word#
In normal Haskell code, this would define a data type Word# with no constructor. Such a data type would be "uninhabited", meaning it has no possible values. However, this definition isn't actually used. This GHC.Prim source code is automatically generated for the benefit of the Haddock documentation utility. Instead, GHC.Prim is a sort of virtual module, and its "real" implementation is build into the GHC compiler.
How do you know which identifiers ending in # are magic and which aren't? Well, you don't know just by looking at the names. I believe you can reliably tell by checking in GHCi if they are defined in the virtual GHC.Prim module:
> :set -XMagicHash
> import GHC.Prim
> :i Word#
data Word# :: TYPE 'GHC.Types.WordRep -- Defined in ‘GHC.Prim’
Anything defined in GHC.Prim is magic, and anything else isn't. In the generated GHC.Prim source, these magic identifiers will show up with nonsense definitions like:
data Foo#
or:
bar# = bar#
Constructs of the form {-# WHATEVER #-} are compiler pragmas. They provide special instructions to the compiler that relate to the source file as a whole or, usually, to "nearby" Haskell code. Some pragmas are placed at the top of the source file to enable language extensions or set compiler flags:
{-# LANGUAGE FlexibleInstances #-}
{-# OPTIONS_GHC -Wall #-}
Others are interleaved with Haskell code to influence the compiler's optimizations:
double :: Int -> Int
{-# NOINLINE double #-}
double x = x + x
or to specify special memory layout or handling of data structures:
data MyStructure = MyS {-# UNPACK #-} !Bool {-# UNPACK #-} !Int
These pragmas are documented in the GHC manual. The CTYPE pragma is a rather obscure pragma that relates to how the Word type will be interpreted when used with the foreign function interface and the capi calling convention. If you aren't planning to call C functions from a Haskell program using the capi calling convention, you can ignore it.
In the Data.ByteString.Internal, the ByteString has constructor
PS !!(ForeignPtr Word8) !!Int !!Int
What does these double exclamations mean here? I searched and just got that (!!) can be used to index a list (!!) :: [a] -> Int -> a.
This is not part of the actual Haskell source but an (undocumented) feature of how Haddock renders unboxed data types. See https://mail.haskell.org/pipermail/haskell-cafe/2009-January/054135.html:
2009/1/21 Stephan Friedrichs <...>:
Hi,
using haddock-2.4.1 and this file:
module Test where
data Test
= NonStrict Int
| Strict !Int
| UnpackedStrict {-# UNPACK #-} !Int
The generated documentation looks like this:
data Test
Constructors
NonStrict Int
Strict !Int
UnpackedStrict !!Int
Note the double '!' in the last constructor. This is not intended
behaviour, is it?
This is the way GHC pretty prints unboxed types, so I thought Haddock
should follow the same convention. Hmm, perhaps Haddock should have a
chapter about language extensions in its documentation, with a
reference to the GHC documentation. That way the language used is at
least documented. Not sure if it helps in this case though, since "!!"
is probably not documented there.
Perhaps we should not display unbox annotations at all since they are
an implementation detail, right? We could display one "!" instead,
indicating that the argument is strict.
David
What exactly is going on with the following?
> let test = map show
> :t test
test :: [()] -> [String]
> :t (map show)
(map show) :: Show a => [a] -> [String]
I am wondering how I failed to notice this before? I actually encountered the problem with "map fromIntegral" rather than show - my code doesn't compile with the pointfree form, but works fine without eta reduction.
Is there a simple explanation of when eta reduction can change the meaning of Haskell code?
This is the monomorphism restriction, which applies when a binding doesn't take parameters and allows the binding to be shareable when it otherwise wouldn't be due to polymorphism, on the theory that if you don't give it a parameter you want to treat it as something "constant"-ish (hence shared). You can disable it in ghci with :set -XNoMonomorphismRestriction; this is often useful in ghci, where you often intend such expressions to be fully polymorphic. (In a Haskell source file, make the first line
{-# LANGUAGE NoMonomorphismRestriction #-}
instead.)
What exactly is going on with the following?
> let test = map show
> :t test
test :: [()] -> [String]
> :t (map show)
(map show) :: Show a => [a] -> [String]
I am wondering how I failed to notice this before? I actually encountered the problem with "map fromIntegral" rather than show - my code doesn't compile with the pointfree form, but works fine without eta reduction.
Is there a simple explanation of when eta reduction can change the meaning of Haskell code?
This is the monomorphism restriction, which applies when a binding doesn't take parameters and allows the binding to be shareable when it otherwise wouldn't be due to polymorphism, on the theory that if you don't give it a parameter you want to treat it as something "constant"-ish (hence shared). You can disable it in ghci with :set -XNoMonomorphismRestriction; this is often useful in ghci, where you often intend such expressions to be fully polymorphic. (In a Haskell source file, make the first line
{-# LANGUAGE NoMonomorphismRestriction #-}
instead.)