Data constructors without breaking the open/closed principle - haskell

I have a data constructor like this
class FooClass a where
foo :: a -> b
class BarClass a where
bar :: a -> b
data FooBar = Foo :: FooClass a => a -> IO ()
| Bar :: BarClass a => a -> IO ()
So that I can use pattern matching:
foobar :: FooBar -> a -> IO ()
foobar (Foo f) x = f (foo x)
foobar (Bar f) x = f (bar x)
However, this breaks the open/closed principle.
I'd like to be able to extend FooBar with additional methods based on other classes.
How would I implement this in Haskell?

As others have pointed out, this code is flawed in ways that obscure your question. It's also probably dangerous to try to think too hard about how OO principles translate to FP. They have a place, because much of OO is embedded in FP naturally, but it's much better to learn FP directly first and then observe the laws later as certain special cases.
In particular, we can talk about how greater refinement of types is a form of extension. For instance, comparing the types like
(Num a) => a -> IO ()
(Num a, Show a) => a -> IO ()
we can talk about how the second function takes in a set of types which is a natural subtype of the inputs to the first function. In particular, the set of possible types that can be input to the second function is a refinement of the inputs to the first. As users of these functions, there are fewer valid ways to use the second function. As implementers of these functions, there are more valid ways to implement the second function. In fact, we know the following
All values which are valid inputs to the second function are also valid inputs to the first
All functions which are correctly typed by the first signature are also correctly typed by the second.
This duality between giving and taking is explored in the study of Game semantics. The idea of "open for extension" plays out trivially in that we can always decide to ask for a more refined type, but it's almost completely uninteresting since that's just obvious in how refined types are used.
So what about ADTs (data declarations) directly? Are then Open/Closed? Mu—ADTs aren't objects, so the rule does not apply directly.

The trick to doing your example in Haskell is to use functions instead of classes:
-- FooBar is like a base class
-- with methods foo and bar.
-- I've interpreted your example liberally
-- for purposes of illustration.
-- In particular, FooBar has two methods -
-- foo and bar - with different signatures.
data FooBar = FooBar {
foo :: IO (),
bar :: Int -> Int
}
-- Use functions for classes, like in Javascript.
-- This doesn't mean Haskell is untyped, it just means classes are not types.
-- Classes are really functions that make objects.
fooClass :: Int -> FooBar
fooClass n = FooBar {
foo = putStrLn ("Foo " ++ show n)
bar = \n -> n+1
}
barClass :: FooBar
barClass = FooBar {
foo = putStrLn "Bar ",
bar = \n -> n * 2
}
-- Now we can define a function that uses FooBar and it doesn't matter
-- if the FooBar we pass in came from fooClass, barClass or something else,
-- bazClass, say.
foobar (FooBar foo bar) = do
-- invoke foo
foo
-- use bar
print (bar 7)
Here FooBar is 'open for extension' because we can create as many FooBar values as we like with different behaviours.
To 'extend' FooBar with another field, baz, without changing FooBar, fooClass or barClass, we need to declare a FooBarBaz type that includes a FooBar. We can still use our foobar function, we just have to first extract the FooBar from the FooBarBaz first.
So far, I've been keeping close to OOP. This is because Bertrand Meyer worded the open closed principle to require OOP or something very like it:
software entities (classes, modules, functions, etc.) should be open
for extension, but closed for modification
In particular, the word "extension" is traditionally interpreted as meaning "subclassing". If you're prepared to interpret the principle as merely "having extension points", then any function that takes another function as parameter is "open for extension". This is so common in functional programming that it's not considered a principle. The "parameterisation principle" just doesn't sound the same.

Related

How to read the syntax `Typ{..}` in haskell? [duplicate]

This question already has an answer here:
pattern matching of the form: Option{..} <-
(1 answer)
Closed 4 years ago.
While reading library code here I have noticed a really weird looking syntax that I can't make sense of:
momenta
:: (KnownNat m, KnownNat n)
=> System m n
-> Config n
-> R n
momenta Sys{..} Cfg{..} = tr j #> diag _sysInertia #> j #> cfgVelocities
-- ^^^^^^^^^^^^^^^ the syntax in question
where
j = _sysJacobian cfgPositions
The relevant definitions of System includes a record { _sysJacobian :: R n -> L m n }, and { cfgVelocities :: R n } is part of the record declaration of Config so I believe I know what the code does, I think the code is quite readable, props to the author.
The question is: what is this syntax called and how exactly can I use it?
In short: it is an extension of GHC called RecordWildCards.
In Haskell you can use record syntax to define data types. For example:
data Foo = Bar { foo :: Int, bar :: String } | Qux { foo :: Int, qux :: Int }
We can then pattern match on the data constructor, and match zero or more parameters, for example:
someFunction :: Int -> Foo -> Foo
someFunction dd (Bar {foo=x}) = dd + x
someFunction dd (Qux {foo=x, qux=y}) = dd + x + y
But it can happen that we need access to a large amount (or even all) parameters. Like for example:
someOtherFunction :: Foo -> Int
someOtherFunction (Bar {foo=foo, bar=bar}) = foo
someOtherFunction (Qux {foo=foo, qux=qux}) = foo + qux
In case the number of parameters is rather large, then this becomes cumbersome. There is an extension RecordWildCards:
{-# LANGUAGE RecordWildCards #-}
this will implicitly write for every parameter foo, foo=foo if you write {..} when we do record pattern matching.
So we can then write:
someOtherFunction :: Foo -> Int
someOtherFunction (Bar {..}) = foo
someOtherFunction (Qux {..}) = foo + qux
So here the compiler implicitly pattern matched all parameters with a variable with the same name, such that we can access those parameters without explicit pattern matching, nor by using getters.
The advantage is thus that we save a lot on large code chunks that have to be written manually. A downside is however the fact that the parameters are no longer explicitly and hence the code is harder to understand. We see the use of parameters for which there exist actually getter counterparts, and thus it can introduce some confusion.
Like #leftroundabout says, probably lenses can do the trick as well, and it will prevent introducing variables that basically shadow getters, etc.
You can also merge the RecordWildCards with pattern matching on parameters, for example:
someOtherFunction :: Foo -> Int
someOtherFunction (Bar {bar=[], ..}) = foo
someOtherFunction (Bar {..}) = foo + 42
someOtherFunction (Qux {..}) = foo + qux
So here in case the bar parameter of a Foo instance with a Bar data constructor is the empty string, we return the foo value, otherwise we add 42 to it.
It's the RecordWildCards syntax extension. From the docs:
For records with many fields, it can be tiresome to write out each field individually in a record pattern ... Record wildcard syntax permits a ".." in a record pattern, where each elided field f is replaced by the pattern f = f ... The expansion is purely syntactic, so the record wildcard expression refers to the nearest enclosing variables that are spelled the same as the omitted field names.
Basically it brings the fields of a record into scope.
It is particularly useful when writing encoders/decoders (e.g. Aeson), but should be used sparingly in the interest of code clarity.

non-lawful Monoid instances for building up AST not considered harmful?

I've seen a data type defined like the following with a corresponding Monoid instance:
data Foo where
FooEmpty :: String -> Foo
FooAppend :: Foo -> Foo -> Foo
-- | Create a 'Foo' with a specific 'String'.
foo :: String -> Foo
foo = FooEmpty
instance Monoid Foo where
mempty :: Foo
mempty = FooEmpty ""
mappend :: Foo -> Foo -> Foo
mappend = FooAppend
You can find the full code in a gist on Github.
This is how Foo can be used:
exampleFoo :: Foo
exampleFoo =
(foo "hello" <> foo " reallylongstringthatislong") <>
(foo " world" <> mempty)
exampleFoo ends up as a tree that looks like this:
FooAppend
(FooAppend
(FooEmpty "hello")
(FooEmpty " reallylongstringthatislong"))
(FooAppend
(FooEmpty " world")
(FooEmpty ""))
Foo can be used to turn sequences of Monoid operations (mempty and mappend) into an AST. This AST can then be interpreted into some other Monoid.
For instance, here is a translation of Foo into a String that makes sure the string appends will happen optimally:
fooInterp :: Foo -> String
fooInterp = go ""
where
go :: String -> Foo -> String
go accum (FooEmpty str) = str ++ accum
go accum (FooAppend foo1 foo2) = go (go accum foo2) foo1
This is really nice. It is convenient that we can be sure String appends will happen in the right order. We don't have to worry about left-associated mappends.
However, the one thing that worries me is that the Monoid instance for Foo is not a legal Monoid instance.
For instance, take the first Monoid law:
mappend mempty x = x
If we let x be FooEmpty "hello", we get the following:
mappend mempty (FooEmpty "hello") = FooEmpty "hello"
mappend (FooEmpty "") (FooEmpty "hello") = FooEmpty "hello" -- replace mempty with its def
FooAppend (FooEmpty "") (FooEmpty "hello") = FooEmpty "hello" -- replace mappend with its def
You can see that FooAppend (FooEmpty "") (FooEmpty "hello") does not equal FooEmpty "hello". The other Monoid laws also don't hold for similar reasons.
Haskellers are usually against non-lawful instances. But I feel like this is a special case. We are just trying to build up a structure that can be interpreted into another Monoid. In the case of Foo, we can make sure that the Monoid laws hold for String in the fooInterp function.
Is it ever okay to use these types of non-lawful instances to build up an AST?
Are there any specific problems that need to be watched for when using these types of non-lawful instances?
Is there an alternative way to write code that uses something like Foo? Some way to enable interpretation of a monoidal structure instead of using mappend on a type directly?
Quoting this answer on a similar question:
You can think of it from this alternative point of view: the law (a <> b) <> c = a <> (b <> c) doesn't specify which equality should be used, i.e. what specific relation the = denotes. It is natural to think of it in terms of structural equality, but note that very few typeclass laws actually hold up to structural equality (e.g. try proving fmap id = id for [] as opposed to forall x . fmap id x = id x).
For example, it's mostly fine if you do not export the constructors of Foo, and only export functions that, from the point of view of users, behave as if Foo were a monoid. But most of the time it is possible to come up with a representation that's structurally a monoid, good enough in practice, though maybe not as general (below, you cannot reassociate arbitrarily after the fact, because interpretation is mixed with construction).
type Foo = Endo String
foo :: String -> Foo
foo s = Endo (s <>)
unFoo :: Foo -> String
unFoo (Endo f) = f ""
(Data.Monoid.Endo)
Here is another SO question where a non-structural structure (Alternative) is considered at first.
This will come up for most non-trivial data structures. The only exceptions I can think of off the top of my head are (some) trie-like structures.
Balanced tree data structures allow multiple balancings of most values. This is true of AVL trees, red-black trees, B-trees, 2-3 finger trees, etc.
Data structures designed around "rebuilding", such as Hood-Melville queues, allow variable amounts of duplication within structures representing most values.
Data structures implementing efficient priority queues allow multiple arrangements of elements.
Hash tables will arrange elements differently depending on when collisions occur.
None of these structures can be asymptotically as efficient without this flexibility. The flexibility, however, always breaks laws under the strictest interpretation. In Haskell, the only good way to deal with this is by using the module system to make sure no one can detect the problem. In experimental dependently typed languages, researchers have been working on things like observational type theory and homotopy type theory to find better ways to talk about "equality", but that research is pretty far from becoming practical.
Is it ever okay to use these types of non-lawful instances to build up an AST?
This is a matter of opinion. (I'm firmly in the 'never ok' camp.)
Are there any specific problems that need to be watched for when using these types of non-lawful instances?
cognitive burden placed on potential users and future maintainers
potential bugs because we use the type in a place that makes assumptions based on the broken law(s)
edit to answer questions in comments:
Would you be able to come up with specific examples of how it raises the cognitive burden on users?
Imagine how annoyed you would be if someone did this in C:
// limit all while loops to 10 iterations
#define while(exp) for(int i = 0; (exp) && i < 10; ++i)
Now we have to keep track of the scope of this pseudo-while definition and its implications. It's a non-Haskell example, but I think the principle is the same. We shouldn't expect the semantics of while to be different in a particular source file just like we shouldn't expect the semantics of Monoid to be different for a particular data type.
When we say something is an X, then it should be a X because people understand the semantics of X. The principle here is don't create exceptions to well understood concepts.
I think the point of using lawful abstractions (like monoid) in the first place is to alleviate the need for programmers to learn and remember a myriad of different semantics. Thus, every exception we create undermines this goal. In fact, it makes it worse; we have to remember the abstraction and on top of that remember all the exceptions. (As an aside, I admire but pity those who learned English as a second language.)
Or how it can lead to potential bugs?
some library:
-- instances of this class must have property P
class AbidesByP where
...
-- foo relies on the property P
foo :: AbidesByP a => a -> Result
foo a = ...
my code:
data MyData = ...
-- note: AbidesByP's are suppose to have property P, but this one doesn't
instance AbidesByP MyData where
...
some other programmer (or me in a few months):
doSomethingWithMyData :: MyData -> SomeResult
doSomethingWithMyData x = let ...
...
...
r = foo x -- potential bug
...
...
in ...
Is there an alternative way to write code that uses something like Foo?
I'd probably just use the constructor to contruct:
(foo "hello" `FooAppend` foo " reallylongstringthatislong") `FooAppend` (foo " world" `FooAppend` foo "")
or make an operator:
(<++>) = FooAppend
(foo "hello" <++> foo " reallylongstringthatislong") <++> (foo " world" <++> foo "")

How to handle functions of a multi-parameter typeclass, who not need every type of the typeclass?

I've defined a typeclass similar to an interface with a bunch of functions required for my program. Sadly, it needs multiple polymorphic types, but not every function of this multi-parameter typeclass needs every type. GHC haunts me with undeduceable types and i can't get the code running.
A reduced example:
{-# LANGUAGE MultiParamTypeClasses #-}
class Foo a b where
-- ...
bar :: a -> ()
baz :: Foo a b => a -> ()
baz = bar
GHC says
Possible fix: add a type signature that fixes these type variable(s)
How can I do this for b? Especially when I want to keep b polymorphic. Only an instance of Foo should define what this type is.
This is impossible.
The underlying problem is that a multiparameter type class depends on every type parameter. If a particular definition in the class doesn't use every type parameter, the compiler will never be able to know what instance you mean, and you'll never even be able to specify it. Consider the following example:
class Foo a b where
bar :: String -> IO a
instance Foo Int Char where
bar x = return $ read x
instance Foo Int () where
bar x = read <$> readFile x
Those two instances do entirely different things with their parameter. The only way the compiler has to select one of those instances is matching both type parameters. But there's no way to specify what the type parameter is. The class is just plain broken. There's no way to ever call the bar function, because you can never provide enough information for the compiler to resolve the class instance to use.
So why is the class definition not rejected by the compiler? Because you can sometimes make it work, with the FunctionalDependencies extension.
If a class has multiple parameters, but they're related, that information can sometimes be added to the definition of the class in a way that allows a class member to not use every type variable in the class's definition.
class Foo a b | a -> b where
bar :: String -> IO a
With that definition (which requires the FunctionalDependencies extension), you are telling the compiler that for any particular choice of a, there is only one valid choice of b. Attempting to even define both of the above instances would be a compile error.
Given that, the compiler knows that it can select the instance of Foo to use based only on the type a. In that case, bar can be called.
Splitting it in smaller typeclasses might be sufficient.
{-# LANGUAGE MultiParamTypeClasses #-}
class Fo a => Foo a b where
-- ...
foo :: a -> b -> ()
class Fo a where
bar :: a -> ()
baz :: Foo a b => a -> ()
baz = bar
Assuming you really want to use more than one instance for a given a (and so cannot use functional dependencies as others mentioned), one possibility which may or may not be right for you is to use a newtype tagged with a "phantom" type used only to guide type selection. This compiles:
{-# LANGUAGE MultiParamTypeClasses #-}
newtype Tagged t a = Tagged { unTagged :: a } -- Also defined in the tagged package
-- on Hackage
class Foo a b where
bar :: Tagged b a -> ()
baz :: Foo a b => Tagged b a -> ()
baz = bar
Then you will be able to wrap your values in such a way that you can give an explicit type annotation to select the right instance.
Another way of refactoring multi-parameter type classes when they get awkward is to use the TypeFamilies extension. Like FunctionalDependencies, this works well when you can reframe your class as having only a single parameter (or at least, fewer parameter), with the other types that are different from instance to instance being computed from the actual class parameters.
Generally I've found whenever I thought I needed a multi-parameter type class, the parameters almost always varied together rather than varying independently. In this situation it's much easier to pick one as "primary" and use some system for determining the others from it. Functional dependencies can do this as well as type families, but many find type families a lot easier to understand.
Here's an example:
{-# LANGUAGE TypeFamilies, FlexibleInstances #-}
class Glue a where
type Glued a
glue :: a -> a -> Glued a
instance Glue Char where
type Glued Char = String
glue x y = [x, y]
instance Glue String where
type Glued String = String
glue x y = x ++ y
glueBothWays :: Glue a => a -> a -> (Glued a, Glued a)
glueBothWays x y = (glue x y, glue y x)
The above declares a class Glue of types that can be glued together with the glue operation, and that have a corresponding type which is the result of the "gluing".
I then declared a couple of instances; Glued Char is String, Glued String is also just String.
Finally I wrote a function to show how you use Glued when you're being polymorphic over the instance of Glue you're using; basically you "call" Glued as a function in your type signatures; this means glueBothWays doesn't "know" what type Glued a is, but it knows how it corresponds to a. You can even use Glued Char as a type, if you know you're gluing Chars but don't want to hard-code the assumption that Glued Char = String.

Am I thinking about and using singleton types in Haskell correctly?

I want to create several incompatible, but otherwise equal, datatypes. That is, I'd like to have a parameterized type Foo a, and functions such as
bar :: (Foo a) -> (Foo a) -> (Foo a)
without actually caring about what a is. To clarify further, I'd like the type system to stop me from doing
x :: Foo Int
y :: Foo Char
bar x y
while I at the same time don't really care about Int and Char (I only care that they're not the same).
In my actual code I have a type for polynomials over a given ring. I don't actually care what the indeterminates are, as long as the type system stops me from adding a polynomial in t with a polynomial in s. So far I've solved this by creating a typeclass Indeterminate, and parameterizing my polynomial type as
data (Ring a, Indeterminate b) => Polynomial a b
This approach feels perfectly natural for the Ring part because I do care about which particular ring a given polynomial is over. It feels very contrived for the Indeterminate part, as detailed below.
The above approach works fine, but feels contrived. Especially so this part:
class Indeterminate a where
indeterminate :: a
data T = T
instance Indeterminate T where
indeterminate = T
data S = S
instance Indeterminate S where
indeterminate = S
(and so on for perhaps a few more indeterminates). It feels weird and wrong. Essentially I'm trying to demand that instances of Indeterminate be singletons (in this sense). The feeling of weirdness is one indicator that I might be attacking this wrongly. Another is the fact that I end up having to annotate a lot of my Polynomial a bs since the actual type b often cannot be inferred (that's not strange, but is annoying nevertheless).
Any suggestions? Should I just keep on doing it like this, or am I missing something?
PS: Don't feel offended if I don't upvote or accept answers immediately. I'll be unable to check back in for a few days.
First of all, I'm not sure this:
data (Ring a, Indeterminate b) => Polynomial a b
...is doing what you expect it to. Contexts on data definitions are not terribly useful--see the discussion here for some reasons why, most of which amount to them forcing you to add extra annotations without actually providing many additional type guarantees.
Second, do you actually care about the "indeterminate" parameter other than to ensure that the types are kept distinct? A pretty standard way of doing that sort of thing is what's called phantom types--essentially, parameters in the type constructor that aren't used in the data constructor. You'll never use or need a value of the phantom type, so functions can be as polymorphic as you want, e.g.:
data Foo a b = Foo b
foo :: Foo a b -> Foo a b
foo (Foo x) = Foo x
bar :: Foo a c -> Foo b c
bar (Foo x) = Foo x
baz :: Foo Int Int -> Foo Char Int -> Foo () Int
baz (Foo x) (Foo y) = Foo $ x + y
Obviously this does require annotations, but only in places where you're deliberately adding restrictions. Otherwise, inference will work normally for the phantom type parameter.
It seems to me that the above approach should be sufficient for what you're doing here--the business with singleton types is mostly about bridging the gap between more complicated type-level stuff and regular value-level computations by creating type proxies for values. This could be useful for, say, marking vectors with types that indicate their basis, or marking numeric values with physical units--both cases where the annotation has more meaning than just "an indeterminate called X".

Operate on values within structurally similar types in Haskell

Excuse me for my extremely limited Haskell-fu.
I have a series of data types, defined in different modules, that are structured the same way:
-- in module Foo
data Foo = Foo [Param]
-- in module Bar
data Bar = Bar [Param]
-- * many more elsewhere
I'd like to have a set of functions that operate on the list of params, eg to add and remove elements from the list (returning a new Foo or Bar with a different list of params, as appropriate).
As far as I can tell, even if I create a typeclass and create instances for each type, I'd need to define all of these functions each time, ie:
-- in some imported module
class Parameterized a where
addParam :: a -> Param -> a
-- ... other functions
-- in module Foo
instance Parameterization Foo where
addParam (Foo params) param = Foo (param:params)
-- ... other functions
-- in module Bar
instance Parameterization Bar where
-- this looks familiar...
addParam (Bar params) param = Bar (param:params)
-- ... other functions
This feels tedious -- far past the degree where I start thinking I'm doing something wrong. If you can't pattern match regardless of constructor (?) to extract a value, how can boilerplate like this be reduced?
To rebut a possible line of argument: yes, I know I could simply have one set of functions (addParam, etc), that would explicitly list each constructor and pattern match put the params -- but as I'm building this fairly modularly (the Foo and Bar modules are pretty self-contained), and I'm prototyping a system where there will be dozens of these types, a verbose centralized listing of type constructors seems... wrong.
It's quite possible (probable?) that my approach is simply flawed and that this isn't anywhere near the right way to structure the type hierarchy anyhow -- but as I can't have a single data type somewhere and add a new constructor for the type in each of these modules (?) I'm stumped at how to get a nice "plugin" feel without having to redefine simple utility functions each time. Any and all kind suggestions gladly accepted.
You can have default implementations of functions in a typeclass, e.g.
class Parameterized a where
params :: a -> [Param]
fromParams :: [Param] -> a
addParam :: a -> Param -> a
addParam x par = fromParams $ par : params x
-- ... other functions
instance Parameterized Foo where
params (Foo pars) = pars
fromParams = Foo
However, your design does look suspicious.
You could use a newtype for Foo and Bar, and then deriving the implementation based on the underlying structure (in this case, lists) automatically with -XGenerializedNewtypeDeriving.
If they're really supposed to be structurally similar, yet you've coded it in a way that no code can be shared, then that is a suspicious pattern.
It's hard to tell from your example what would be the right way.
Why do you need both Foo and Bar if they are structurally similar? Can't you just go with Foo?
If, for some reason I can't see, you need both Foo and Bar you'll have to use a type class but you can clean up the code using Template Haskell.
Something like
$(superDuper "Foo")
could generate the code
data Foo = Foo [Param]
instance Parameterization Foo where
addParam (Foo params) param = Foo (param:params)
where
superDuper :: String -> Q [Dec]
superDuper n =
let name = mkName n
dataD = DataD [] name [] [NormalC name [] ] -- correct constructor here
instD = InstanceD [] ... -- add code here
return [dataD, instD]
That would at least get rid of the boiler plate coding.

Resources