Getting safety and extensibility

Getting safety and extensibility - haskell

I have a library that includes a type data Zq q = Zq Int representing the integers mod q. For safety, I'd like to expose some operations on this type ((+), (*), etc), but not export the constructor to avoid people circumventing the safety gotten by declaring such a type in the first place.
However, users of the library may reasonably need to declare instances for this type that I as the library author can't predict. To name just a few possible instances: DeepSeq, Storable, Unbox, ...
The only way I know of that allows third parties to make such instances is to export the constructor. (Alternatively, I could define and export a smart constructor and destructor, but this seems to be no better than just exporting the data constructor.)
Is there a way to ensure safety while also allowing third parties to extend the type?

Most well-formed instances shouldn't require the unsafe raw constructors. Unbox etc. are a bit unusually low-level, but other instances should generally be definable in terms of much the same high-level API you'd also use for end applications.
So, I don't really see your concern of don't know instances ⇒ can't hide constructors. If you just define the critical close-to-the-metal instances yourself you should be fine.
That said, I often find it rather annoying if a library doesn't export the constructors at all. Even if every instance and everything else can be defined only using the high-level API, it can make sense to grant unsafe low-level access for a lot of reasons that can't really be forseen. Debugging, special optimisations, simply seeing what's going on...Hence, in a similar vein to Python's “we're all consenting adults here” philosophy, I'd support kosmikus' suggestion: export the constructors of all important types, but do it in a way that makes it clear that using these directly is unsafe. An extra Unsafe module is a good way to achieve this. Simply giving the constructor a technical-sounding name may also be sufficient. And of course document what precisely is unsafe about these exports.

Related

Are there benefits of strong typing besides safety?

In the Haskell community, we are slowly adding features of dependent types. Dependent types is an advanced typing feature by which types can depend on values. Some languages like Agda and Idris already have them. It appears to be a very advanced feature requiring an advanced type system, until you realize that python has had dependent types has had the dynamic typing version of dependent types, which may or may not be actual dependent types, from the beginning.
For most any program in a functional programming language, there is a way to reperesent it as an untyped lambda calculus term, no matter how advanced the typing. That's because typing only eliminates programs, not enable new ones.
Strong Typing wins us safety. How classes of errors that happened at run time can no longer happen at run time. This safety is rather nice. Besides this safety though, what does strong typing give you?
Are there an additional benefits of a strong type system besides safety?
(Note that I'm not saying that strong typing is worthless. Safety is a huge benefit in and of itself. I'm just wondering if there are additional benefits.)

First, we need to talk a bit about the history of the simply typed lambda calculus.
There are two historical developments of the simply typed lambda calculus.
When Alonzo Church described the lambda calculus the types were baked in as part of the meaning / operational behavior of the terms.
When Haskell Curry described the lambda calculus the types were annotations put on the terms.
So we have the lambda calculus a la Church and the lambda calculus a la Curry. See https://en.wikipedia.org/wiki/Simply_typed_lambda_calculus#Intrinsic_vs._extrinsic_interpretations for more.
Ironically, the language Haskell, which is named after Curry is based on a lambda calculus a la Church!
What this means is the types aren't simply annotations that rule out bad programs for you. They can "do stuff" too. Such types don't erase without leaving residue.
This shows up in Haskell's notion of type classes, which are really why Haskell is a language a la Church.
In Haskell, when I make a function
sort :: Ord a => [a] -> [a]
We're passing an object or dictionary for Ord a as the first argument.
But you aren't forced to plumb that argument around yourself in the code, it is the job of the compiler to build that up and use it.
instance Ord Char
instance Ord Int
instance Ord a => Ord [a]
So if you go and use sort on a list of strings, which are themselves lists of chars, then this will build up the dictionary by passing the Ord Char instance through the instance for Ord a => Ord [a] to get Ord [Char], which is the same as Ord String, then you can sort a list of strings.
Calling sort above, is a lot less verbose than manually building a LexicographicComparator<List<Char>> by passing it an IComparator<Char> to its constructor and calling the function with an extra second argument, if I were to compare the complexity of calling such a sort function in Haskell to calling it in C# or Java.
This shows us that programming with types can be significantly less verbose, because mechanisms like implicits and typeclasses can infer a large part of the code for your program during type checking.
On a simpler basis, even the sizes of arguments can depend on types, unless you want to pay fairly massive costs for boxing everything in your language up so that it has a homogeneous representation.
This shows us that programming with types can be significantly more efficient, because it can use dedicated representations, rather than paying for boxed structures everywhere in your code. An int can't just be a machine integer, because it has to somehow look like everything else in the system. If you're willing to give up an order of magnitude or more worth of performance at runtime, then this may not matter to you.
Finally, once we have types "doing stuff" for us, it is often beneficial to consider the refactoring benefits that mere safety provides.
If I refactor the smaller set of code that remains, it'll rewrite all that type-class plumbing for me. It'll figure out the new ways it can rewrite the code to unbox more arguments. I'm not stuck elaborating all of this stuff by hand, I can leave these mundane tasks to the type-checker.
But even when I do change the types, I can move arguments around fairly willy-nilly, comfortable that the compiler will very likely catch my errors. Types give you "free theorems" which are like unit tests for whole classes of such errors.
On the other hand, once I lock down an API in a language like Python I'm deathly afraid of changing it, because it'll silently break at runtime for all my downstream dependencies! This leads to baroque APIs that lean heavily on easily bit-rotted keyword-arguments, and the API of something that evolves over time rarely resembles what you'd build out of the box if you had it to do over again. Consequently, even the mere safety concern has long-term impact in API design once you ever want people to build on top of your work, rather than simply replace it when it gets too unwieldy.

That's because typing only eliminates programs, not enable new ones.
This is not a correct statement. Type-classes make it possible to generate parts of your program from type-level information.
Consider two expressions:
readMaybe "15" :: Maybe Integer
readMaybe "15" :: Maybe Bool
Here I'm using the readMaybe function from the Text.Read module. At term level those expressions are identical, only their type annotations are different. However, the results they produce at runtime differ (Just 15 in the first case, Nothing in the second case).
This is because the compiler generates code for you from the static type information you have. To be more precise, it selects a suitable type class instance and passes its dictionary to the polymorphic function (readMaybe in our case).
This example is simple, but there are way more complex use cases. Using the mtl library you can write computations that run in different computational contexts (aka Monads). The compiler will automatically insert a lot of code that manages the computational contexts. In a dynamically typed language, you would have no static information to make this possible.
As you can see, static typing not only cuts off incorrect programs but also writes correct ones for you.

You need "safety" when you already know what and how you want to write. It's a very small part of what types are useful for. The most important thing about types is that they make your reasoning structured. When someone writes in Python a + b he doesn't see a and b as some abstract variables — he sees them as some numbers. Types are already there in the internal language of humans, Python just doesn't have a type system to talk about them. The actual question in the "typed vs untyped (unityped) programming" dispute is "do we want to reflect our internal structured concepts in a safe and explicit or unsafe and implicit way?". Types don't introduce new concepts — it's untyped reasoning forgets the existing ones.
When someone looks at a tree (I mean a real green one) he doesn't see every single leaf on it, but he doesn't treat it as an abstract nameless object as well. "A tree" — is an approximation that is good enough for most cases and that's why we have Hindley-Milner type systems, but sometimes you want to talk about a specific tree and you do want to look at leaves. And that's what dependent types give you: the ability to zoom. "A tree without leaves", "a tree in the forest", "a tree of a particular form"... Dependently typed programming is just another step towards how humans think.
On a less abstract note, I have a type checker for a toy dependently typed language, where all typing rules are expressed as constructors of a data type. You don't need to dive into the type checking procedure to understand the rules of the system. That's the power of "zooming": you can introduce as complex invariants as you want, thus distinguishing essential parts from not important ones.
Another example of the power dependent types give you is various forms of reflection. Look e.g. at the Pierre-Évariste Dagand thesis, which proves that
generic programming is just programming
And of course types are hints, many functions and abstractions I defined I would define in a far more clumsy way in a weakly typed language, but types suggested better alternatives.
There is just no question "What to choose: simple types or dependent types?". Dependent types are always better and they of course subsume simple types. The question is "What to choose: no types or dependent types?", but that question doesn't stand for me.

Refactoring. By having a strong type system you can safely refactor code and have the compiler tell you whether what you are doing now even makes sense. The stronger the typing system, the more refactor errors are avoided. This of course means your code is a lot more maintainable.

Haskell compiler magic: what requires a special treatment from the compiler?

When trying to learn Haskell, one of the difficulties that arise is the ability when something requires special magic from the compiler. One exemple that comes in mind is the seq function which can't be defined i.e. you can't make a seq2 function behaving exactly as the built-in seq. Consequently, when teaching someone about seq, you need to mention that seq is special because it's a special symbol for the compiler.
Another example would be the do-notation which only works with instances of the Monad class.
Sometimes, it's not always obvious. For instance, continuations. Does the compiler knows about Control.Monad.Cont or is it plain old Haskell that you could have invented yourself? In this case, I think nothing special is required from the compiler even if continuations are a very strange kind of beast.
Language extensions set aside, what other compiler magic Haskell learners should be aware of?

Nearly all the ghc primitives that cannot be implemented in userland are in the ghc-prim package. (it even has a module called GHC.Magic there!)
So browsing it will give a good sense.
Note that you should not use this module in userland code unless you know exactly what you are doing. Most of the usable stuff from it is exported in downstream modules in base, sometimes in modified form. Those downstream locations and APIs are considered more stable, while ghc-prim makes no guarantees as to how it will act from version to version.
The GHC-specific stuff is reexported in GHC.Exts, but plenty of other things go into the Prelude (such as basic data types, as well as seq) or the concurrency libraries, etc.

Polymorphic seq is definitely magic. You can implement seq for any specific type, but only the compiler can implement one function for all possible types [and avoid optimising it away even though it looks no-op].
Obviously the entire IO monad is deeply magic, as is everything to with concurrency and parallelism (par, forkIO, MVar), mutable storage, exception throwing and catching, querying the garbage collector and run-time stats, etc.
The IO monad can be considered a special case of the ST monad, which is also magic. (It allows truly mutable storage, which requires low-level stuff.)
The State monad, on the other hand, is completely ordinary user-level code that anybody can write. So is the Cont monad. So are the various exception / error monads.
Anything to do with syntax (do-blocks, list comprehensions) is hard-wired into the language definition. (Note, though, that some of these respond to LANGUAGE RebindableSyntax, which lets you change what functions it binds to.) Also the deriving stuff; the compiler "knows about" a handful of special classes and how to auto-generate instances for them. Deriving for newtype works for any class though. (It's just copying an instance from one type to another identical copy of that type.)
Arrays are hard-wired. Much like every other programming language.
All of the foreign function interface is clearly hard-wired.
STM can be implemented in user code (I've done it), but it's currently hard-wired. (I imagine this gives a significant performance benefit. I haven't tried actually measuring it.) But, conceptually, that's just an optimisation; you can implement it using the existing lower-level concurrency primitives.

Are implicit parameters a difficulty for inlining in GHC?

I'm curious about the objections to implicit parameters discussed in the Functional Pearl: Implicit Configurations article by Kiselyov and Shan.
It is not sound to inline code (β-reduce) in the presence of implicit parameters.
Really? I'd expect that GHC should inline into the same scope as the passed implicit parameter, no?
I believe I understand their objection that :
A term’s behavior can change if its signature is added, removed, or changed.
GHC's user documentation explains that programers must take care around polymorphic recursion and monomorphism restriction. Is this somehow what they mean by a problem for inlining?
I presume this polymorphic recursion example covers what they mean by "generalizing over implicit parameters" as well? Anything else?
Is the ReifiesStorable type class from Data.Reflection really a sensible solution to these difficulties? It seemingly deserializes the entire implicit data structure every time it's accessed, which sounds be disastrous for performance. We might for-example want our implicit information to be Cayley table or character table that occupies a gig of ram and must be access during millions of algebraic operations.
Is there perhaps some better solution that employs implicit parameters, or another technique the compiler can easily optimize, behind the scenes, while still guaranteeing more via the type system using state threads or whatever?

Yes, the example from the GHC manual shows how adding a type signature can change the semantics of code with implicit parameters, and I believe this is what they mean by breaking inlining; inlining an application of len_acc1 vs. an application of len_acc2 produces the same code, despite the two having different semantics.
As far as generalising over implicit parameters, it means that you can't write a function that can operate on multiple implicit parameters; there is no mechanism to abstract over them, as the implicit parameter a function uses is fixed by its type. With reflection, you can easily write a function like doSomethingWith :: (Reifies s a, Num a) => Proxy s -> a, which can operate over any type that reifies a numeric value.
As for ReifiesStorable, you're looking at an old version of the reflection package; the latest version has a very efficient implementation in which reify only costs as much as a function call.1 Note that, even with the old implementation, you would generally not use the ReifiesStorable class directly, but instead Reifies, which uses ReifiesStorable to reify a StablePtr, so only a few bytes end up being copied, not the entire object. (This is what the original implementation in the paper does, too.) Both implementations are definitely fast enough for practical use, with the old, "slow" implementation taking about 100 ms to reify and reflect 100000 values, and the new implementation under 10 ms.
(Full disclosure: I worked on the new implementation.)
1 The fast implementation depends on Haskell implementation details. The older, slower implementation is automatically used for Haskell implementations that the fast implementation has not yet been tested with; so far, GHC and Hugs have been shown to work with the fast implementation. You can request the slow implementation with -fslow, but it's unlikely to stop working unless GHC significantly overhauls its implementation of typeclasses. (Even if it does, you'd only have to recompile the packages that use reflection to get it working again.)

Haskell data types usage good practicies

Reading "Real world Haskell" i found some intresting question about data types:
This pattern matching and positional
data access make it look like you have
very tight coupling between data and
code that operates on it (try adding
something to Book, or worse change the
type of an existing part).
This is usually a very bad thing in
imperative (particularly OO)
languages... is it not seen as a
problem in Haskell?
source at RWH comments
And really, writing some Haskell programs I found that when I make small change to data type structure it affects almost all functions that use that data type. Maybe there are some good practices for data type usage. How can i minimize code coupling?

What you are describing is commonly known as the expression problem -- http://en.wikipedia.org/wiki/Expression_Problem.
There is a definite trade-off to be made, haskell code in general, and algebraic data types in particular, tends to fall into the hard to change the type but easy to add functions over the type. This optimizes for (up front) well-designed, complete, data types.
All that said, there are a number of things that you can do to reduce the coupling.
Define good library functions, by defining a complete set of combinators and higher order functions that are useful for interacting with your data type you will reduce coupling. It is often said that when ever you think of pattern matching there is a better solution using higher-order functions. If you look for these situations you will be in a better spot.
Expose your data structures as more abstract types. This means implementing all appropriate type classes. This will assist with defining a library functions as you will get a bunch for free with any of the type classes you implement, for examples look at operations over Functor or Monad.
Hide (as much as possible) any type constructors. Constructors expose implementation detail and will encourage coupling. Hint: this links in with defining a good api for interacting with your type, consumers of your type should rarely, if ever, have to use the type constructors.
The haskell community seems particularly good at this, if you look at many of the libraries on hackage you will find really good examples of implementing type classes and exposing good library functions.

In addition to what's been said:
One interesting approach is the "scrap your boilerplate" style of defining functions over data types, which makes use of generic functions (as opposed to explicit pattern matching) to define functions over the constructors of a data type. Looking at the "scrap your boilerplate" papers, you will see examples of functions which can cope with changes to the structure of a data type.
A second method, as Hibberd pointed out, is to use folds, maps, unfolds, and other recursion combinators, to define your functions. When you write functions using higher order functions, oftentimes small changes to the underlying data type can be dealt with in the instance declarations for Functor, Foldable, and so on.

First, I'd like to mention that in my view, there are two kinds of couplings:
One that makes your code cease to compile when you change one and forget to change the other
One that makes your code buggy when you change one and forget to change the other
While both are problematic, the former is significantly less of a headache, and that seems to be the one you're talking about.
I think the main problem you're mentioning is due to over-using positional arguments. Haskell almost forces you to have positional arguments in your ordinary functions, but you can avoid them in your type products (records).
Just use records instead of multiple anonymous fields inside data constructors, and then you can pattern-match any field you want out of it, by name.
bad (Blah _ y) = ...
good (Blah{y = y}) = ...
Avoid over-using tuples, especially those beyond 2-tuples, and liberally create records/newtypes around things to avoid positional meaning.

Haskell's TypeClasses and Go's Interfaces

What are the similarities and the differences between Haskell's TypeClasses and Go's Interfaces? What are the relative merits / demerits of the two approaches?

Looks like only in superficial ways are Go interfaces like single parameter type classes (constructor classes) in Haskell.
Methods are associated with an interface type
Objects (particular types) may have implementations of that interface
It is unclear to me whether Go in any way supports bounded polymorphism via interfaces, which is the primary purpose of type classes. That is, in Haskell, the interface methods may be used at different types,
class I a where
put :: a -> IO ()
get :: IO a
instance I Int where
...
instance I Double where
....
So my question is whether Go supports type polymorphism. If not, they're not really like type classes at all. And they're not really comparable.
Haskell's type classes allow powerful reuse of code via "generics" -- higher kinded polymorphism -- a good reference for cross-language support for such forms of generic program is this paper.
Ad hoc, or bounded polymorphism, via type classes, is well described here. This is the primary purpose of type classes in Haskell, and one not addressed via Go interfaces, meaning they're not really very similar at all. Interfaces are strictly less powerful - a kind of zeroth-order type class.

I will add to Don Stewart's excellent answer that one of the surprising consquences of Haskell's type classes is that you can use logic programming at compile time to generate arbitrarily many instances of a class. (Haskell's type-class system includes what is effectively a cut-free subset of Prolog, very similar to Datalog.) This system is exploited to great effect in the QuickCheck library. Or for a very simple example, you can see how to define a version of Boolean complement (not) that works on predicates of arbitrary arity. I suspect this ability was an unintended consequence of the type-class system, but it has proven incredibly powerful.
Go has nothing like it.

In haskell typeclass instantiation is explicit (i.e. you have to say instance Foo Bar for Bar to be an instance of Foo), while in go implementing an interface is implicit (i.e. when you define a class that defines the right methods, it automatically implements the according interface without having to say something like implement InterfaceName).
An interface can only describe methods where the instance of the interface is the receiver. In a typeclass the instantiating type can appear at any argument position or the return type of a function (i.e. you can say, if Foo is an instance of type Bar there must be a function named baz, which takes an Int and returns a Foo - you can't say that with interfaces).

Very superficial similarities, Go's interfaces are more like structural sub-typing in OCaml.

C++ Concepts (that didn't make it into C++0x) are like Haskell type classes. There were also "axioms" which aren't present in Haskell at all. They let you formalize things like the monad laws.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string