Would it be possible to derive Data.Vector.Unbox via GHC's generic deriving?

Would it be possible to derive Data.Vector.Unbox via GHC's generic deriving? - haskell

It's possible to derive Storable via GHC's generic deriving mechanism: http://hackage.haskell.org/package/derive-storable (and https://hackage.haskell.org/package/derive-storable-plugin for performance). The only library I can find for deriving Data.Vector.Unbox, however, uses template Haskell: http://hackage.haskell.org/package/vector-th-unbox. It also requires the user to write a little code; it's not entirely automatic.
My question is, could a library like deriving-storable also exist for Unbox, or is this not possible due to some fundamental way in which Unbox differs from Storable? If the latter, does that mean it's also not possible to create a library that allows automatically deriving Unbox for any Storable type, as I could not find such a library.
I ask because ideally I'd like to avoid template Haskell and the manual annotations necessary for using vector-th-unbox.

Say we had some Generic_ class to convert between our own types and some uniform representation which happens to have an Unbox instance (which amounts to both MVector and Vector instances for the Unboxed variants):
class Generic_ a where
type Rep_ (a :: Type) :: Type
to_ :: a -> Rep_ a
from_ :: Rep_ a -> a
Then we can use that to obtain generic implementations of the methods of MVector/Vector:
-- (auxiliary definitions of CMV and uncoercemv at the end of this block)
-- vector imports (see gist at the end for a compilable sample)
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Unboxed.Mutable as UM
import Data.Vector.Generic.Mutable.Base (MVector(..))
-- MVector
gbasicLength :: forall a s. CMV s a => UM.MVector s a -> Int
gbasicLength = basicLength #UM.MVector #(Rep_ a) #s . coerce
gbasicUnsafeSlice :: forall a s. CMV s a => Int -> Int -> UM.MVector s a -> UM.MVector s a
gbasicUnsafeSlice i j = uncoercemv . basicUnsafeSlice #UM.MVector #(Rep_ a) #s i j . coerce
-- etc.
-- idem Vector
-- This constraints holds when the UM.MVector data instance of a is
-- representationally equivalent to the data instance of its generic
-- representation (Rep_ a).
type CMV s a = (Coercible (UM.MVector s a) (UM.MVector s (Rep_ a)), MVector UM.MVector (Rep_ a))
-- Sadly coerce doesn't seem to want to solve this correctly so we use
-- unsafeCoerce as a workaround.
uncoercemv :: CMV s a => UM.MVector s (Rep_ a) -> UM.MVector s a
uncoercemv = unsafeCoerce
Now if we have some generic type
data MyType = MyCons Int Bool ()
We can define a generic instance with its isomorphism to a tuple
instance Generic_ MyType where
type Rep_ MyType = (Int, Bool, ())
to_ (MyCons a b c) = (a, b, c)
from_ (a, b, c) = MyCons a b c
And from there, there is a totally generic recipe to get its Unbox instance, if you have YourType instead with its own Generic_ instance, you can take this and literally replace MyType with YourType.
newtype instance UM.MVector s MyType
= MVMyType { unMVMyType :: UM.MVector s (Rep_ MyType) }
instance MVector UM.MVector MyType where
basicLength = gbasicLength
basicUnsafeSlice = gbasicUnsafeSlice
-- etc.
-- idem (Vector U.Vector MyType)
-- MVector U.Vector & Vector UM.MVector = Unbox
instance Unbox MyType
In theory all this boilerplate could be automated with internal language features (as opposed to TemplateHaskell or CPP). But there are various issues that get in the way in the current state of things.
First, Generic_ is essentially Generic from GHC.Generics. However, the uniform representation that gets derived by GHC is not in terms of tuples (,) but in terms of somewhat ad-hoc type constructors (:+:, :*:, M1, etc.), which lack Unbox instances.
Such Unbox instances could be added to use Generic directly
the generics-eot has a variant of Generic relying on tuples that could be a direct replacement to Generic_ here.
And second, MVector and Vector have quite a few methods. To avoid having to list them all, one might expect to leverage DerivingVia (or GeneralizedNewtypeDeriving), however they are not applicable because there are a couple of polymorphic monadic methods that prevent coercions (e.g., basicUnsafeNew). For now, the easiest way I can think of to abstract this is a CPP macro. In fact the vector package uses that technique internally, and it might be reusable somehow. I believe properly addressing those issues requires a deep redesign of the Vector/MVector architecture.
Gist (not complete, but compilable): https://gist.github.com/Lysxia/c7bdcbba548ee019bf6b3f1e388bd660

Related

differences: GADT, data family, data family that is a GADT

What/why are the differences between those three? Is a GADT (and regular data types) just a shorthand for a data family? Specifically what's the difference between:
data GADT a where
MkGADT :: Int -> GADT Int
data family FGADT a
data instance FGADT a where -- note not FGADT Int
MkFGADT :: Int -> FGADT Int
data family DF a
data instance DF Int where -- using GADT syntax, but not a GADT
MkDF :: Int -> DF Int
(Are those examples over-simplified, so I'm not seeing the subtleties of the differences?)
Data families are extensible, but GADTs are not. OTOH data family instances must not overlap. So I couldn't declare another instance/any other constructors for FGADT; just like I can't declare any other constructors for GADT. I can declare other instances for DF.
With pattern matching on those constructors, the rhs of the equation does 'know' that the payload is Int.
For class instances (I was surprised to find) I can write overlapping instances to consume GADTs:
instance C (GADT a) ...
instance {-# OVERLAPPING #-} C (GADT Int) ...
and similarly for (FGADT a), (FGADT Int). But not for (DF a): it must be for (DF Int) -- that makes sense; there's no data instance DF a, and if there were it would overlap.
ADDIT: to clarify #kabuhr's answer (thank you)
contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference
These types are tricky, so I expect I'd need explicit signatures to work with them. In that case the plain data family is easiest
inferDF (MkDF x) = x -- works without a signature
The inferred type inferDF :: DF Int -> Int makes sense. Giving it a signature inferDF :: DF a -> a doesn't make sense: there is no declaration for a data instance DF a .... Similarly with foodouble :: Foo Char a -> a there is no data instance Foo Char a ....
GADTs are awkward, I already know. So neither of these work without an explicit signature
inferGADT (MkGADT x) = x
inferFGADT (MkFGADT x) = x
Mysterious "untouchable" message, as you say. What I meant in my "matching on those constructors" comment was: the compiler 'knows' on rhs of an equation that the payload is Int (for all three constructors), so you'd better get any signatures consistent with that.
Then I'm thinking data GADT a where ... is as if data instance GADT a where .... I can give a signature inferGADT :: GADT a -> a or inferGADT :: GADT Int -> Int (likewise for inferFGADT). That makes sense: there is a data instance GADT a ... or I can give a signature at a more specific type.
So in some ways data families are generalisations of GADTs. I also see as you say
So, in some ways, GADTs are generalizations of data families.
Hmm. (The reason behind the question is that GHC Haskell has got to the stage of feature bloat: there's too many similar-but-different extensions. I was trying to prune it down to a smaller number of underlying abstractions. Then #HTNW's approach of explaining in terms of yet further extensions is opposite to what would help a learner. IMO existentials in data types should be chucked out: use GADTs instead. PatternSynonyms should be explained in terms of data types and mapping functions between them, not the other way round. Oh, and there's some DataKinds stuff, which I skipped over on first reading.)

As a start, you should think of a data family as a collection of independent ADTs that happen to be indexed by a type, while a GADT is a single data type with an inferrable type parameter where constraints on that parameter (typically, equality constraints like a ~ Int) can be brought into scope by pattern matching.
This means that the biggest difference is that, contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference on the type parameter. In particular, this typechecks:
inferGADT :: GADT a -> a
inferGADT (MkGADT n) = n
but this does not:
inferDF :: DF a -> a
inferDF (MkDF n) = n
and without type signatures, the first would fail to type check (with a mysterious "untouchable" message) while the second would be inferred as DF Int -> Int.
The situation becomes quite a bit more confusing for something like your FGADT type that combines data families with GADTs, and I confess I haven't really thought about how this works in detail. But, as an interesting example, consider:
data family Foo a b
data instance Foo Int a where
Bar :: Double -> Foo Int Double
Baz :: String -> Foo Int String
data instance Foo Char Double where
Quux :: Double -> Foo Char Double
data instance Foo Char String where
Zlorf :: String -> Foo Char String
In this case, Foo Int a is a GADT with an inferrable a parameter:
fooint :: Foo Int a -> a
fooint (Bar x) = x + 1.0
fooint (Baz x) = x ++ "ish"
but Foo Char a is just a collection of separate ADTs, so this won't typecheck:
foodouble :: Foo Char a -> a
foodouble (Quux x) = x
for the same reason inferDF won't typecheck above.
Now, getting back to your plain DF and GADT types, you can largely emulate DFs just using GADTs. For example, if you have a DF:
data family MyDF a
data instance MyDF Int where
IntLit :: Int -> MyDF Int
IntAdd :: MyDF Int -> MyDF Int -> MyDF Int
data instance MyDF Bool where
Positive :: MyDF Int -> MyDF Bool
you can write it as a GADT just by writing separate blocks of constructors:
data MyGADT a where
-- MyGADT Int
IntLit' :: Int -> MyGADT Int
IntAdd' :: MyGADT Int -> MyGADT Int -> MyGADT Int
-- MyGADT Bool
Positive' :: MyGADT Int -> MyGADT Bool
So, in some ways, GADTs are generalizations of data families. However, a major use case for data families is defining associated data types for classes:
class MyClass a where
data family MyRep a
instance MyClass Int where
data instance MyRep Int = ...
instance MyClass String where
data instance MyRep String = ...
where the "open" nature of data families is needed (and where the pattern-based inference methods of GADTs aren't helpful).

I think the difference becomes clear if we use PatternSynonyms-style type signatures for data constructors. Lets start with Haskell 98
data D a = D a a
You get a pattern type:
pattern D :: forall a. a -> a -> D a
it can be read in two directions. D, in "forward" or expression contexts, says, "forall a, you can give me 2 as and I'll give you a D a". "Backwards", as a pattern, it says, "forall a, you can give me a D a and I'll give you 2 as".
Now, the things you write in a GADT definition are not pattern types. What are they? Lies. Lies lies lies. Give them attention only insofar as the alternative is writing them out manually with ExistentialQuantification. Let's use this one
data GD a where
GD :: Int -> GD Int
You get
-- vv ignore
pattern GD :: forall a. () => (a ~ Int) => Int -> GD a
This says: forall a, you can give me a GD a, and I can give you a proof that a ~ Int, plus an Int.
Important observation: The return/match type of a GADT constructor is always the "data type head". I defined data GD a where ...; I got GD :: forall a. ... GD a. This is also true for Haskell 98 constructors, and also data family constructors, though it's a bit more subtle.
If I have a GD a, and I don't know what a is, I can pass into GD anyway, even though I wrote GD :: Int -> GD Int, which seems to say I can only match it with GD Ints. This is why I say GADT constructors lie. The pattern type never lies. It clearly states that, forall a, I can match a GD a with the GD constructor and get evidence for a ~ Int and a value of Int.
Ok, data familys. Lets not mix them with GADTs yet.
data Nat = Z | S Nat
data Vect (n :: Nat) (a :: Type) :: Type where
VNil :: Vect Z a
VCons :: a -> Vect n a -> Vect (S n) a -- try finding the pattern types for these btw
data family Rect (ns :: [Nat]) (a :: Type) :: Type
newtype instance Rect '[] a = RectNil a
newtype instance Rect (n : ns) a = RectCons (Vect n (Rect ns a))
There are actually two data type heads now. As #K.A.Buhr says, the different data instances act like different data types that just happen to share a name. The pattern types are
pattern RectNil :: forall a. a -> Rect '[] a
pattern RectCons :: forall n ns a. Vect n (Rect ns a) -> Rect (n : ns) a
If I have a Rect ns a, and I don't know what ns is, I cannot match on it. RectNil only takes Rect '[] as, RectCons only takes Rect (n : ns) as. You might ask: "why would I want a reduction in power?" #KABuhr has given one: GADTs are closed (and for good reason; stay tuned), families are open. This doesn't hold in Rect's case, as these instances already fill up the entire [Nat] * Type space. The reason is actually newtype.
Here's a GADT RectG:
data RectG :: [Nat] -> Type -> Type where
RectGN :: a -> RectG '[] a
RectGC :: Vect n (RectG ns a) -> RectG (n : ns) a
I get
-- it's fine if you don't get these
pattern RectGN :: forall ns a. () => (ns ~ '[]) => a -> RectG ns a
pattern RectGC :: forall ns' a. forall n ns. (ns' ~ (n : ns)) =>
Vect n (RectG ns a) -> RectG ns' a
-- just note that they both have the same matched type
-- which means there needs to be a runtime distinguishment
If I have a RectG ns a and don't know what ns is, I can still match on it just fine. The compiler has to preserve this information with a data constructor. So, if I had a RectG [1000, 1000] Int, I would incur an overhead of one million RectGN constructors that all "preserve" the same "information". Rect [1000, 1000] Int is fine, though, as I do not have the ability to match and tell whether a Rect is RectNil or RectCons. This allows the constructor to be newtype, as it holds no information. I would instead use a different GADT, somewhat like
data SingListNat :: [Nat] -> Type where
SLNN :: SingListNat '[]
SLNCZ :: SingListNat ns -> SingListNat (Z : ns)
SLNCS :: SingListNat (n : ns) -> SingListNat (S n : ns)
that stores the dimensions of a Rect in O(sum ns) space instead of O(product ns) space (I think those are right). This is also why GADTs are closed and families are open. A GADT is just like a normal data type except it has equality evidence and existentials. It doesn't make sense to add constructors to a GADT any more than it makes sense to add constructors to a Haskell 98 type, because any code that doesn't know about one of the constructors is in for a very bad time. It's fine for families though, because, as you noticed, once you define a branch of a family, you cannot add more constructors in that branch. Once you know what branch you're in, you know the constructors, and no one can break that. You're not allowed to use any constructors if you don't know which branch to use.
Your examples don't really mix GADTs and data families. Pattern types are nifty in that they normalize away superficial differences in data definitions, so let's take a look.
data family FGADT a
data instance FGADT a where
MkFGADT :: Int -> FGADT Int
Gives you
pattern MkFGADT :: forall a. () => (a ~ Int) => Int -> FGADT a
-- no different from a GADT; data family does nothing
But
data family DF a
data instance DF Int where
MkDF :: Int -> DF Int
gives
pattern MkDF :: Int -> DF Int
-- GADT syntax did nothing
Here's a proper mixing
data family Map k :: Type -> Type
data instance Map Word8 :: Type -> Type where
MW8BitSet :: BitSet8 -> Map Word8 Bool
MW8General :: GeneralMap Word8 a -> Map Word8 a
Which gives patterns
pattern MW8BitSet :: forall a. () => (a ~ Bool) => BitSet8 -> Map Word8 a
pattern MW8General :: forall a. GeneralMap Word8 a -> Map Word8 a
If I have a Map k v and I don't know what k is, I can't match it against MW8General or MW8BitSet, because those only want Map Word8s. This is the data family's influence. If I have a Map Word8 v and I don't know what v is, matching on the constructors can reveal to me whether it's known to be Bool or is something else.

Can a Haskell type constructor have non-type parameters?

A type constructor produces a type given a type. For example, the Maybe constructor
data Maybe a = Nothing | Just a
could be a given a concrete type, like Char, and give a concrete type, like Maybe Char. In terms of kinds, one has
GHCI> :k Maybe
Maybe :: * -> *
My question: Is it possible to define a type constructor that yields a concrete type given a Char, say? Put another way, is it possible to mix kinds and types in the type signature of a type constructor? Something like
GHCI> :k my_type
my_type :: Char -> * -> *

Can a Haskell type constructor have non-type parameters?
Let's unpack what you mean by type parameter. The word type has (at least) two potential meanings: do you mean type in the narrow sense of things of kind *, or in the broader sense of things at the type level? We can't (yet) use values in types, but modern GHC features a very rich kind language, allowing us to use a wide range of things other than concrete types as type parameters.
Higher-Kinded Types
Type constructors in Haskell have always admitted non-* parameters. For example, the encoding of the fixed point of a functor works in plain old Haskell 98:
newtype Fix f = Fix { unFix :: f (Fix f) }
ghci> :k Fix
Fix :: (* -> *) -> *
Fix is parameterised by a functor of kind * -> *, not a type of kind *.
Beyond * and ->
The DataKinds extension enriches GHC's kind system with user-declared kinds, so kinds may be built of pieces other than * and ->. It works by promoting all data declarations to the kind level. That is to say, a data declaration like
data Nat = Z | S Nat -- natural numbers
introduces a kind Nat and type constructors Z :: Nat and S :: Nat -> Nat, as well as the usual type and value constructors. This allows you to write datatypes parameterised by type-level data, such as the customary vector type, which is a linked list indexed by its length.
data Vec n a where
Nil :: Vec Z a
(:>) :: a -> Vec n a -> Vec (S n) a
ghci> :k Vec
Vec :: Nat -> * -> *
There's a related extension called ConstraintKinds, which frees constraints like Ord a from the yoke of the "fat arrow" =>, allowing them to roam across the landscape of the type system as nature intended. Kmett has used this power to build a category of constraints, with the newtype (:-) :: Constraint -> Constraint -> * denoting "entailment": a value of type c :- d is a proof that if c holds then d also holds. For example, we can prove that Ord a implies Eq [a] for all a:
ordToEqList :: Ord a :- Eq [a]
ordToEqList = Sub Dict
Life after forall
However, Haskell currently maintains a strict separation between the type level and the value level. Things at the type level are always erased before the program runs, (almost) always inferrable, invisible in expressions, and (dependently) quantified by forall. If your application requires something more flexible, such as dependent quantification over runtime data, then you have to manually simulate it using a singleton encoding.
For example, the specification of split says it chops a vector at a certain length according to its (runtime!) argument. The type of the output vector depends on the value of split's argument. We'd like to write this...
split :: (n :: Nat) -> Vec (n :+: m) a -> (Vec n a, Vec m a)
... where I'm using the type function (:+:) :: Nat -> Nat -> Nat, which stands for addition of type-level naturals, to ensure that the input vector is at least as long as n...
type family n :+: m where
Z :+: m = m
S n :+: m = S (n :+: m)
... but Haskell won't allow that declaration of split! There aren't any values of type Z or S n; only types of kind * contain values. We can't access n at runtime directly, but we can use a GADT which we can pattern-match on to learn what the type-level n is:
data Natty n where
Zy :: Natty Z
Sy :: Natty n -> Natty (S n)
ghci> :k Natty
Natty :: Nat -> *
Natty is called a singleton, because for a given (well-defined) n there is only one (well-defined) value of type Natty n. We can use Natty n as a run-time stand-in for n.
split :: Natty n -> Vec (n :+: m) a -> (Vec n a, Vec m a)
split Zy xs = (Nil, xs)
split (Sy n) (x :> xs) =
let (ys, zs) = split n xs
in (x :> ys, zs)
Anyway, the point is that values - runtime data - can't appear in types. It's pretty tedious to duplicate the definition of Nat in singleton form (and things get worse if you want the compiler to infer such values); dependently-typed languages like Agda, Idris, or a future Haskell escape the tyranny of strictly separating types from values and give us a range of expressive quantifiers. You're able to use an honest-to-goodness Nat as split's runtime argument and mention its value dependently in the return type.
#pigworker has written extensively about the unsuitability of Haskell's strict separation between types and values for modern dependently-typed programming. See, for example, the Hasochism paper, or his talk on the unexamined assumptions that have been drummed into us by four decades of Hindley-Milner-style programming.
Dependent Kinds
Finally, for what it's worth, with TypeInType modern GHC unifies types and kinds, allowing us to talk about kind variables using the same tools that we use to talk about type variables. In a previous post about session types I made use of TypeInType to define a kind for tagged type-level sequences of types:
infixr 5 :!, :?
data Session = Type :! Session -- Type is a synonym for *
| Type :? Session
| E

I'd recommend #Benjamin Hodgson's answer and the references he gives to see how to make this sort of thing useful. But, to answer your question more directly, using several extensions (DataKinds, KindSignatures, and GADTs), you can define types that are parameterized on (certain) concrete types.
For example, here's one parameterized on the concrete Bool datatype:
{-# LANGUAGE DataKinds, KindSignatures, GADTs #-}
{-# LANGUAGE FlexibleInstances #-}
module FlaggedType where
-- The single quotes below are optional. They serve to notify
-- GHC that we are using the type-level constructors lifted from
-- data constructors rather than types of the same name (and are
-- only necessary where there's some kind of ambiguity otherwise).
data Flagged :: Bool -> * -> * where
Truish :: a -> Flagged 'True a
Falsish :: a -> Flagged 'False a
-- separate instances, just as if they were different types
-- (which they are)
instance (Show a) => Show (Flagged 'False a) where
show (Falsish x) = show x
instance (Show a) => Show (Flagged 'True a) where
show (Truish x) = show x ++ "*"
-- these lists have types as indicated
x = [Truish 1, Truish 2, Truish 3] -- :: Flagged 'True Integer
y = [Falsish "a", Falsish "b", Falsish "c"] -- :: Flagged 'False String
-- this won't typecheck: it's just like [1,2,"abc"]
z = [Truish 1, Truish 2, Falsish 3] -- won't typecheck
Note that this isn't much different from defining two completely separate types:
data FlaggedTrue a = Truish a
data FlaggedFalse a = Falsish a
In fact, I'm hard pressed to think of any advantage Flagged has over defining two separate types, except if you have a bar bet with someone that you can write useful Haskell code without type classes. For example, you can write:
getInt :: Flagged a Int -> Int
getInt (Truish z) = z -- same polymorphic function...
getInt (Falsish z) = z -- ...defined on two separate types
Maybe someone else can think of some other advantages.
Anyway, I believe that parameterizing types with concrete values really only becomes useful when the concrete type is sufficient "rich" that you can use it to leverage the type checker, as in Benjamin's examples.
As #user2407038 noted, most interesting primitive types, like Ints, Chars, Strings and so on can't be used this way. Interestingly enough, though, you can use literal positive integers and strings as type parameters, but they are treated as Nats and Symbols (as defined in GHC.TypeLits) respectively.
So something like this is possible:
import GHC.TypeLits
data Tagged :: Symbol -> Nat -> * -> * where
One :: a -> Tagged "one" 1 a
Two :: a -> Tagged "two" 2 a
Three :: a -> Tagged "three" 3 a

Look at using Generalized Algebraic Data Types (GADTS), which enable you to define concrete outputs based on input type, e.g.
data CustomMaybe a where
MaybeChar :: Maybe a -> CustomMaybe Char
MaybeString :: Maybe a > CustomMaybe String
MaybeBool :: Maybe a -> CustomMaybe Bool
exampleFunction :: CustomMaybe a -> a
exampleFunction (MaybeChar maybe) = 'e'
exampleFunction (MaybeString maybe) = True //Compile error
main = do
print $ exampleFunction (MaybeChar $ Just 10)
To a similar effect, RankNTypes can allow the implementation of similar behaviour:
exampleFunctionOne :: a -> a
exampleFunctionOne el = el
type PolyType = forall a. a -> a
exampleFuntionTwo :: PolyType -> Int
exampleFunctionTwo func = func 20
exampleFunctionTwo func = func "Hello" --Compiler error, PolyType being forced to return 'Int'
main = do
print $ exampleFunctionTwo exampleFunctionOne
The PolyType definition allows you to insert the polymorphic function within exampleFunctionTwo and force its output to be 'Int'.

No. Haskell doesn't have dependent types (yet). See https://typesandkinds.wordpress.com/2016/07/24/dependent-types-in-haskell-progress-report/ for some discussion of when it may.
In the meantime, you can get behavior like this in Agda, Idris, and Cayenne.

Why is Identity monad useful?

I often read that
It seem that identity monad is useless. It's not... but that's another
topic.
So can anyone tell my how is it useful?

Identity is to monads, functors and applicative functors as 0 is to numbers. On its own it seems useless, but it's often needed in places where one expects a monad or an (applicative) functor that actually doesn't do anything.
As already mentioned, Identity allows us to define just monad transformers and then define their corresponding monads just as SomeT Identity.
But that's not all. It's often convenient to also define other concepts in terms of monads, which usually adds a lot of flexibility. For example Conduit i m o (also see this tutorial) defines an element in a pipeline that can request data of type i, can produce data of type o, and uses monad m for internal processing. Then such a pipeline can be run in the given monad using
($$) :: Monad m => Source m a -> Sink a m b -> m b
(where Source is an alias for Conduit with no input and Sink for Conduit with no output). And when no effectful computations are needed in the pipeline, just pure code, we just specialize m to Identity and run such a pipeline as
runIdentity (source $$ sink)
Identity is also the "empty" functor and applicative functor: Identity composed with another functor or applicative functor is isomorphic to the original. For example, Lens' is defined as a function polymorphic in a Functor:
Functor f => (a -> f a) -> s -> f s
roughly speaking, such a lens allows to read or manipulate something of type a inside s, for example a field inside a record (for an introduction to lenses see this post). If we specialize f to Identity, we get
(a -> Identity a) -> s -> Identity s
which is isomorphic to
(a -> a) -> s -> s
so given an updating function on a, return an updating function on s. (For completeness: If we specialize f to Const a, we get (a -> Const b a) -> s -> Const b s, which is isomorphic to (a -> b) -> (s -> b), that is, given a reader on a, return a reader on s.)

Sometimes I work with records whose fields are optional in some contexts (like when parsing the record from JSON) but mandatory in others.
I solve that by parametrizing the record with a functor, and using Maybe or Identity in each case.
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE StandaloneDeriving #-}
data Query f = Query
{
_viewName :: String
, _target :: f Server -- Server is some type, it doesn't matter which
}
deriving (Generic)
The server field is optional when parsing JSON:
instance FromJSON (Query Maybe)
But then I have a function like
withDefaultServer :: Server -> Query Maybe -> Query Identity
withDefaultServer = undefined
that returns a record in which the _target field is mandatory.
(This answer doesn't use anything monadic about Identity, though.)

One use of it is as a base monad for monad transformer stacks: instead of having to provide two types Some :: * ->* and SomeT :: (* -> *) -> * -> *, it is enough to provide just a latter by setting type Some = SomeT Identity.
Another, somewhat similar use case (but completely detached from the whole monad business) is when you need to refer to tuples: we can say () is a nullary tuple, (a, b) is a binary tuple, (a, b, c) is a ternary tuple, and so on, but what does that leave for the unary case? Saying a is a unary tuple for any choice of a is often not satisfactory, for example when we are building some typeclass instances like Data.Tuple.Select, some type constructor is needed to act as the unambiguous key. So by adding e.g. Sel1 instances to Identity a, it forces us to distinguish between (a, b) (a two-tuple containing an a and a b), and Identity (a, b) (a one-tuple containing a single (a, b) value).
(Note that Data.Tuple.Select defines its own type called OneTuple instead of reusing Identity, but it is isomorphic to Identity—in fact, it's just a rename away—and I think it only exists to avoid a non-base dependency.)

One real use-case is to be a (pure) base of monad transformers stack, e.g.
type Reader r = ReaderT r Identity

Convert from type `T a` to `T b` without boilerplate

So, I have an AST data type with a large number of cases, which is parameterized by an "annotation" type
data Expr a = Plus a Int Int
| ...
| Times a Int Int
I have annotation types S and T, and some function f :: S -> T. I want to take an Expr S and convert it to an Expr T using my conversion f on each S which occurs within an Expr value.
Is there a way to do this using SYB or generics and avoid having to pattern match on every case? It seems like the type of thing that this is suited for. I just am not familiar enough with SYB to know the specific way to do it.

It sounds like you want a Functor instance. This can be automatically derived by GHC using the DeriveFunctor extension.

Based on your follow-up question, it seems that a generics library is more appropriate to your situation than Functor. I'd recommend just using the function given on SYB's wiki page:
{-# LANGUAGE DeriveDataTypeable, ScopedTypeVariables, FlexibleContexts #-}
import Data.Generics
import Unsafe.Coerce
newtype C a = C a deriving (Data,Typeable)
fmapData :: forall t a b. (Typeable a, Data (t (C a)), Data (t a)) =>
(a -> b) -> t a -> t b
fmapData f input = uc . everywhere (mkT $ \(x::C a) -> uc (f (uc x)))
$ (uc input :: t (C a))
where uc = unsafeCoerce
The reason for the extra C type is to avoid a problematic corner case where there are occurrences of fields at the same type as a (more details on the wiki). The caller of fmapData doesn't need to ever see it.
This function does have a few extra requirements compared to the real fmap: there must be instances of Typeable for a, and Data for t a. In your case t a is Expr a, which means that you'll need to add a deriving Data to the definition of Expr, as well as have a Data instance in scope for whatever a you're using.

Class contraints for monads and monad functions

I am trying to write a new monad that only can contain a Num. When it fails, it returns 0 much like the Maybe monad returns Nothing when it fails.
Here is what I have so far:
data (Num a) => IDnum a = IDnum a
instance Monad IDnum where
return x = IDnum x
IDnum x >>= f = f x
fail :: (Num a) => String -> IDnum a
fail _ = return 0
Haskell is complaining that there is
No instance for (Num a) arising from a use of `IDnum'
It suggests that I add a add (Num a) to the context of the type signature for each of my monad functions, but I tried that it and then it complains that they need to work "forall" a.
Ex:
Method signature does not match class; it should be
return :: forall a. a -> IDnum a
In the instance declaration for `Monad IDnum'
Does anyone know how to fix this?

The existing Monad typeclass expects your type to work for every possible type argument. Consider Maybe: in Maybe a, a is not constrained at all. Basically you can't have a Monad with constraints.
This is a fundamental limitation of how the Monad class is defined—I don't know of any way to get around it without modifying that.
This is also a problem for defining Monad instances for other common types, like Set.
In practice, this restriction is actually pretty important. Consider that (normally) functions are not instances of Num. This means that we could not use your monad to contain a function! This really limits important operations like ap (<*> from Applicative), since that depends on a monad containing a function:
ap :: Monad m => m (a -> b) -> m a -> m b
Your monad would not support many common uses and idioms we've grown to expect from normal monads! This would rather limit its utility.
Also, as a side-note, you should generally avoid using fail. It doesn't really fit in with the Monad typeclass: it's more of a historic accident. Most people agree that you should avoid it in general: it was just a hack to deal with failed pattern matches in do-notation.
That said, looking at how to define a restricted monad class is a great exercise for understanding a few Haskell extensions and learning some intermediate/advanced Haskell.
Alternatives
With the downsides in mind, here are a couple of alternatives—replacements for the standard Monad class that do support restricted monads.
Constraint Kinds
I can think of a couple of possible alternatives. The most modern one would be taking advantage of the ConstraintKind extension in GHC, which lets you reify typeclass constraints as kinds. This blog post details how to implement a restricted monad using constraint kinds; once I've read it, I'll summarize it here.
The basic idea is simple: with ConstraintKind, we can turn our constrain (Num a) into a type. We can then have a new Monad class which contains this type as a member (just like return and fail are members) and allows use to overload the constraint with Num a. This is what the code looks like:
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE TypeFamilies #-}
module Main where
import Prelude hiding (Monad (..))
import GHC.Exts
class Monad m where
type Restriction m a :: Constraint
type Restriction m a = ()
return :: Restriction m a => a -> m a
(>>=) :: Restriction m a => m a -> (a -> m b) -> m b
fail :: Restriction m a => String -> m a
data IDnum a = IDnum a
instance Monad IDnum where
type Restriction IDnum a = Num a
return = IDnum
IDnum x >>= f = f x
fail _ = return 0
RMonad
There is an existing library on hackage called rmonad (for "restricted monad") which provides a more general typeclass. You could probably use this to write your desired monad instance. (I haven't used it myself, so it's a bit hard to say.)
It doesn't use the ConstraintKinds extension and (I believe) supports older versions of GHC. However, I think it's a bit ugly; I'm not sure that it's the best option any more.
Here's the code I came up with:
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeFamilies #-}
import Prelude hiding (Monad (..))
import Control.RMonad
import Data.Suitable
data IDnum a = IDnum a
data instance Constraints IDnum a = Num a => IDnumConstraints
instance Num a => Suitable IDnum a where
constraints = IDnumConstraints
instance RMonad IDnum where
return = IDnum
IDnum x >>= f = f x
fail _ = withResConstraints $ \ IDnumConstraints -> return 0
Further Reading
For more details, take a look at this SO question.
Oleg has an article about this pertaining specifically to the Set monad, which might be interesting: "How to restrict a monad without breaking it".
Finally, there are a couple of papers you could also read:
The Constrained-Monad Problem
Generic Monadic Constructs for Embedded Languages

This answer will be brief, but here's another alternative to go along with Tikhon's. You can apply a codensity transformation to your type to basically get a free monad for it. Just use it (in the below code it's IDnumM) instead of your base type, then convert the final value to your base type at the end (in the below code, you would use runIDnumM). You can also inject your base type into the transformed type (in the below code, that would be toIDnumM).
A benefit of this approach is that it works with the standard Monad class.
data Num a => IDnum a = IDnum a
newtype IDnumM a = IDnumM { unIDnumM :: forall r. (a -> IDnum r) -> IDnum r }
runIDnumM :: Num a => IDnumM a -> IDnum a
runIDnumM (IDnumM n) = n IDnum
toIDnumM :: Num a => IDnum a -> IDnumM a
toIDnumM (IDnum x) = IDnumM $ \k -> k x
instance Monad IDnumM where
return x = IDnumM $ \k -> k x
IDnumM m >>= f = IDnumM $ \k -> m $ \x -> f x `unIDnumM` k

There is an easier way to do this. One can use multiple functions. First, write one in the Maybe monad. The Maybe monad returns Nothing upon failure. Second, write a function that returns the Just value if not Nothing or some safe value if Nothing. Third, write a function that composes those two functions.
This produces the desired result while being much easier to write and understand.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string