Haskell: Prefer pattern-matching or member access? - haskell

Suppose I have a Vector datatype defined as follows:
data Vector = Vector { x :: Double
, y :: Double
, z :: Double
}
Would it be more usual to define functions against it using member access:
vecAddA v w
= Vector (x v + x w)
(y v + y w)
(z v + z w)
Or using pattern-matching:
vecAddB (Vector vx vy vz) (Vector wx wy wz)
= Vector (vx + wx)
(vy + wy)
(vz + wz)
(Apologies if I've got any of the terminology incorrect).

I would normally use pattern matching, especially since you're using all of the constructor's arguments and there aren't a lot of them. Also, In this example it's not an issue, but consider the following:
data Foo = A {a :: Int} | B {b :: String}
fun x = a x + 1
If you use pattern matching to do work on the Foo type, you're safe; it's not possible to access a member that doesn't exist. If you use accessor functions on the other hand, some operations such as calling fun (B "hi!") here will result in a runtime error.
EDIT: while it's of course quite possible to forget to match on some constructor, pattern matching makes it pretty explicit that what happens depends on what constructor is used (you can also tell the compiler to detect and warn you about incomplete patterns) whereas the use of a function hints more that any constructor goes, IMO.
Accessors are best saved for cases when you want to get at just one or a few of the constructor's (potentially many) arguments and you know that it's safe to use them (no risk of using an accessor on the wrong constructor, as in the example.)

Another minor "real world" argument: In general, it isn't a good idea to have such short record entry names, as short names like x and y often end up being used for local variables.
So the "fair" comparison here would be:
vecAddA v w
= Vector (vecX v + vecX w) (vecY v + vecY w) (vecZ v + vecZ w)
vecAddB (Vector vx vy vz) (Vector wx wy wz)
= Vector (vx + wx) (vy + wy) (vz + wz)
I think pattern matching wins out in most cases of this type. Some notable exceptions:
You only need to access (or change!) one or two fields in a larger record
You want to remain flexible to change the record later, such as add more fields.

This is an aesthetic preference since the two are semantically equivalent. Well, I suppose a in a naive compiler the first one would be slower because of the function calls, but I have a hard time believing that would not be optimized away in real life.
Still, with only three elements in the record, since you're using all three anyway and there is presumably some significance to their order, I would use the second one. A second (albeit weaker) argument is that this way you're using the order for both composition and decomposition, rather than a mixture of order and field access.

(Alert, may be wrong. I am still a Haskell newbie, but here's my understanding)
One thing that other people have not mentioned is that pattern matching will make the function "strict" in its argument. (http://www.haskell.org/haskellwiki/Lazy_vs._non-strict)
To choose which pattern to use, the program must reduce the argument to WHNF before calling the function, whereas using the record-syntax accessor function would evaluate the argument inside the function.
I can't really give any concrete examples (still being a newbie) but this can have performance implications where huge piles of "thunks" can build up in recursive, non-strict functions. (That is to mean, for simple functions like extracting values, there should be no performance difference).
(Concrete examples very much welcome)
In short
f (Just x) = x
is actually (using BangPatterns)
f !jx = fromJust jx
Edit: The above is not a good example of strictness, because both are actually strict from definition (f bottom = bottom), just to illustrate what I meant from the performance side.

As kizzx2 pointed out, there is a subtle difference in strictness between vecAddA and vecAddB
vecAddA ⊥ ⊥ = Vector ⊥ ⊥ ⊥
vecAddB ⊥ ⊥ = ⊥
To get the same semantics when using pattern matching, one would have to use irrefutable patterns.
vecAddB' ~(Vector vx vy vz) ~(Vector wx wy wz)
= Vector (vx + wx)
(vy + wy)
(vz + wz)
However, in this case, the fields of Vector should probably be strict to begin with for efficiency:
data Vector = Vector { x :: !Double
, y :: !Double
, z :: !Double
}
With strict fields, vecAddA and vecAddB are semantically equivalent.

Hackage package vect solves both these problems by allowing matching like f (Vec3 x y z) and indexing like:
get1 :: Vec3 -> Float
get1 v = _1 v
Look up HasCoordinates class.
http://hackage.haskell.org/packages/archive/vect/0.4.7/doc/html/Data-Vect-Float-Base.html

Related

How to use a refutation to direct the type checker in Haskell?

Does filling the hole in the following program necessarily require non-constructive means? If yes, is it still the case if x :~: y decidable?
More generally, how do I use a refutation to guide the type checker?
(I am aware that I can work around the problem by defining Choose as a GADT, I'm asking specifically for type families)
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
module PropositionalDisequality where
import Data.Type.Equality
import Data.Void
type family Choose x y where
Choose x x = 1
Choose x _ = 2
lem :: (x :~: y -> Void) -> Choose x y :~: 2
lem refutation = _
If you try hard enough to implement a function, you can convince yourself
that it is not possible. If you're not convinced, the argument can be made
more formal: we enumerate programs exhaustively to find that none is possible. It turns out there's only half a dozen of meaningful cases to consider.
I wonder why this argument is not made more often.
Totally not accurate summary:
Act I: proof search is easy.
Act II: dependent types too.
Act III: Haskell is still fine for writing dependently typed programs.
I. The proof search game
First we define the search space.
We can reduce any Haskell definition to one of the form
lem = (exp)
for some expression (exp). Now we only need to find a single expression.
Look at all possible ways of making an expression in Haskell:
https://www.haskell.org/onlinereport/haskell2010/haskellch3.html#x8-220003
(this doesn't account for extensions, exercise for the reader).
It fits a page in a single column, so it's not that big to start with.
Moreover, most of them are sugar for some form of function application or
pattern-matching; we can also desugar away type classes with dictionary
passing, so we're left with a ridiculously small lambda calculus:
lambdas, \x -> ...
pattern-matching, case ... of ...
function application, f x
constructors, C (including integer literals)
constants, c (for primitives that cannot be written in terms of the constructs above, so various built-ins (seq) and maybe FFI if that counts)
variables (bound by lambdas and cases)
We can exclude every constant on the grounds that I think the question is
really about pure lambda calculus (or the reader can enumerate the constants,
to exclude black-magic constants like undefined, unsafeCoerce,
unsafePerformIO that make everything collapse (any type is inhabited and, for
some of those, the type system is unsound), and to be left with white-magic
constants to which the present theoretical argument can be generalized via a
well-funded thesis).
We can also reasonably assume that we want a solution with no recursion involved
(to get rid of noise like lem = lem, and fix if you felt like you couldn't
part with it before), and which actually has a normal form, or preferably, a
canonical form with respect to βη-equivalence. In other words, we refine and
examine the set of possible solutions as follows.
lem :: _ -> _ has a function type, so we can assume WLOG that its definition starts with a lambda:
-- Any solution
lem = (exp)
-- is η-equivalent to a lambda
lem = \refutation -> (exp) refutation
-- so let's assume we do have a lambda
lem = \refutation -> _hole
Now enumerate what could be under the lambda.
It could be a constructor,
which then has to be Refl, but there is no proof that Choose x y ~ 2 in
the context (here we could formalize and enumerate the type equalities the
typechecker knows about and can derive, or make the syntax of coercions
(proofs of equalities) explicit and keep playing this proof search game
with them), so this doesn't type check:
lem = \refutation -> Refl
Maybe there is some way of constructing that equality proof, but then the
expression would start with something else, which is going to be another
case of the proof.
It could be some application of a constructor C x1 x2 ..., or the
variable refutation (applied or not); but there's no possible way that's
well-typed, it has to somehow produce a (:~:), and Refl is really the
only way.
Or it could be a case. WLOG, there is no nested case on the left, nor any
constructor, because the expression could be simplified in both cases:
-- Any left-nested case expression
case (case (e) of { C x1 x2 -> (f) }) { D y1 y2 -> (g) }
-- is equivalent to a right-nested case
case (e) of { C x1 x2 -> case (f) of { D y1 y2 -> (g) } }
-- Any case expression with a nested constructor
case (C u v) of { C x1 x2 -> f x1 x2 }
-- reduces to
f u v
So the last subcase is the variable case:
lem = \refutation -> case refutation (_hole :: x :~: y) of {}
and we have to construct a x :~: y. We enumerate ways of filling the
_hole again. It's either Refl, but no proof is available, or
(skipping some steps) case refutation (_anotherHole :: x :~: y) of {},
and we have an infinite descent on our hands, which is also absurd.
A different possible argument here is that we can pull out the case
from the application, to remove this case from consideration WLOG.
-- Any application to a case
f (case e of C x1 x2 -> g x1 x2)
-- is equivalent to a case with the application inside
case e of C x1 x2 -> f (g x1 x2)
There are no more cases. The search is complete, and we didn't find an
implementation of (x :~: y -> Void) -> Choose x y :~: 2. QED.
To read more on this topic, I guess a course/book about lambda calculus up
until the normalization proof of the simply-typed lambda calculus should give
you the basic tools to start with. The following thesis contains an
introduction on the subject in its first part, but admittedly I'm a poor judge
of the difficulty of such material: Which types have a unique inhabitant?
Focusing on pure program equivalence,
by Gabriel Scherer.
Feel free to suggest more adequate resources and literature.
II. Fixing the proposition and proving it with dependent types
Your initial intuition that this should encode a valid proposition
is definitely valid. How might we fix it to make it provable?
Technically, the type we are looking at is quantified with forall:
forall x y. (x :~: y -> Void) -> Choose x y :~: 2
An important feature of forall is that it is an irrelevant quantifier.
The variables it introduces cannot be used "directly" in a term of this type. Although that aspect becomes more prominent in the presence of dependent types,
it still pervades Haskell today, providing another intuition for why this (and many other examples) is not "provable" in Haskell: if you think about why you
think that proposition is valid, you will naturally start with a case split about whether x is equal to y, but to even do such a case split you need a way to
decide which side you're on, which will of course have to look at x and y,
so they cannot be irrelevant. forall in Haskell is not at all like what most people mean with "for all".
Some discussion on the matter of relevance can be found in the thesis Dependent Types in Haskell, by Richard Eisenberg (in particular, Section 3.1.1.5 for an initial example, Section 4.3 for relevance in Dependent Haskell and Section 8.7 for comparison with other languages with dependent types).
Dependent Haskell will need a relevant quantifier to complement forall, and
which would get us closer to proving this:
foreach x y. (x :~: y -> Void) -> Choose x y :~: 2
Then we could probably write this:
lem :: foreach x y. (x :~: y -> Void) -> Choose x y :~: 2
lem x y p = case x ==? u of
Left r -> absurd (p r) -- x ~ y, that's absurd!
Right Irrefl -> Refl -- x /~ y, so Choose x y = 2
That also assumes a first-class notion of disequality /~, complementing ~,
to help Choose reduce when it is in the context and a decision function
(==?) :: foreach x y. Either (x :~: y) (x :/~: y).
Actually, that machinery isn't necessary, that just makes for a shorter
answer.
At this point I'm making stuff up because Dependent Haskell does not exist yet,
but that is easily doable in related dependently typed languages (Coq, Agda,
Idris, Lean), modulo an adequate replacement of the type family Choose
(type families are in some sense too powerful to be translated as mere
functions, so may be cheating, but I digress).
Here is a comparable program in Coq, showing also that lem applied to 1 and 2
and a suitable proof does reduce to a proof by reflexivity of choose 1 2 = 2.
https://gist.github.com/Lysxia/5a9b6996a3aae741688e7bf83903887b
III. Without dependent types
A critical source of difficulty here is that Choose is a closed type
family with overlapping instances. It is problematic because there is
no proper way to express the fact that x and y are not equal in Haskell,
to know that the first clause Choose x x does not apply.
A more fruitful avenue if you're into Pseudo-Dependent Haskell is to use a
boolean type equality:
-- In the base library
import Data.Type.Bool (If)
import Data.Type.Equality (type (==))
type Choose x y = If (x == y) 1 2
An alternative encoding of equality constraints becomes useful for this style:
type x ~~ y = ((x == y) ~ 'True)
type x /~ y = ((x == y) ~ 'False)
with that, we can get another version of the type-proposition above,
expressible in current Haskell (where SBool is the singleton type of Bool),
which essentially can be read as adding the assumption that the equality of x
and y is decidable. This does not contradict the earlier claim about "irrelevance" of forall, the function is inspecting a boolean (or rather an SBool), which postpones the inspection of x and y to whoever calls lem.
lem :: forall x y. SBool (x == y) -> ((x ~~ y) => Void) -> Choose x y :~: 2
lem decideEq p = case decideEq of
STrue -> absurd p
SFalse -> Refl

Partial Derivatives in Haskell

A while back a friend wanted help with a program that could solve for the roots of functions using Newton's method, and naturally for that I needed some way to calculate the derivative of a function numerically, and this is what I came up with:
deriv f x = (f (x+h) - f x) / h where h = 0.00001
Newton's method was a fairly easy thing to implement, and it works rather well. But now I've started to wonder - Is there some way I could use this function to solve partial derivatives in a numerical manner, or is that something that would require a full-on CAS? I would post my attempts but I have absolutely no clue what to do yet.
Please keep in mind that I am new to Haskell. Thank you!
You can certainly do much the same thing as you already implemented, only with multivariate perturbation instead. But first, as you should always do with top-level functions, add a type signature:
deriv :: (Double -> Double) -> Double -> Double
That's not the most general signature possible, but probably sufficiently general for everything you'll need. I'll call
type ℝ = Double
in the following for brevity, i.e.
deriv :: (ℝ -> ℝ) -> ℝ -> ℝ
Now what you want is, for example in ℝ²
grad :: ((ℝ,ℝ) -> ℝ) -> (ℝ,ℝ) -> (ℝ,ℝ)
grad f (x,y) = ((f (x+h,y) - f (x,y)) / h, (f (x,y+h) - f (x,y)) / h)
where h = 0.00001
It's awkward to have to write out the components individually and make the definition specific to a particular-dimensional vector space. A generic way of doing it:
import Data.VectorSpace
import Data.Basis
grad :: (HasBasis v, Scalar v ~ ℝ) => (v -> ℝ) -> v -> v
grad f x = recompose [ (e, (f (x ^+^ h*^basisValue b) - f x) ^/ h)
| (e,_) <- decompose x ]
where h = 0.00001
Note that this pre-chosen-step–finite-differentiation is always a tradeoff between inaccuracy from higher-order terms and from floating-point errors, so definitely check out automatic differentiation.
This is called automatic differentiation and there is a lot of really neat work in this area in Haskell, though I don't know how accessible it is.
From the wiki page:
A paper Beautiful Differentiation and the corresponding talk.
Forward mode libraries: ad, fad, vector-space, Data.Ring.Module.AutomaticDifferentiation
Reverse mode libraries: also ad, rad

Apply function to all pairs efficiently

I need a second order function pairApply that applies a binary function f to all unique pairs of a list-like structure and then combines them somehow. An example / sketch:
pairApply (+) f [a, b, c] = f a b + f a c + f b c
Some research leads me to believe that Data.Vector.Unboxed probably will have good performance (I will also need fast access to specific elements); also it necessary for Statistics.Sample, which would come in handy further down the line.
With this in mind I have the following, which almost compiles:
import qualified Data.Vector.Unboxed as U      
pairElement :: (U.Unbox a, U.Unbox b)    
=> (U.Vector a)                    
  -> (a -> a -> b)                   
  -> Int                             
-> a                               
 -> (U.Vector b)                    
pairElement v f idx el =
U.map (f el) $ U.drop (idx + 1) v            
pairUp :: (U.Unbox a, U.Unbox b)   
=> (a -> a -> b)                        
 -> (U.Vector a)                         
-> (U.Vector (U.Vector b))
pairUp f v = U.imap (pairElement v f) v 
pairApply :: (U.Unbox a, U.Unbox b)
=> (b -> b -> b)                     
-> b                                 
 -> (a -> a -> b)                     
-> (U.Vector a)                      
 -> b
pairApply combine neutral f v =
folder $ U.map folder (pairUp f v) where
folder = U.foldl combine neutral
The reason this doesn't compile is that there is no Unboxed instance of a U.Vector (U.Vector a)). I have been able to create new unboxed instances in other cases using Data.Vector.Unboxed.Deriving, but I'm not sure it would be so easy in this case (transform it to a tuple pair where the first element is all the inner vectors concatenated and the second is the length of the vectors, to know how to unpack?)
My question can be stated in two parts:
Does the above implementation make sense at all or is there some quick library function magic etc that could do it much easier?
If so, is there a better way to make an unboxed vector of vectors than the one sketched above?
Note that I'm aware that foldl is probably not the best choice; once I've got the implementation sorted I plan to benchmark with a few different folds.
There is no way to define a classical instance for Unbox (U.Vector b), because that would require preallocating a memory area in which each element (i.e. each subvector!) has the same fixed amount of space. But in general, each of them may be arbitrarily big, so that's not feasible at all.
It might in principle be possible to define that instance by storing only a flattened form of the nested vector plus an extra array of indices (where each subvector starts). I once briefly gave this a try; it actually seems somewhat promising as far as immutable vectors are concerned, but a G.Vector instance also requires a mutable implementation, and that's hopeless for such an approach (because any mutation that changes the number of elements in one subvector would require shifting everything behind it).
Usually, it's just not worth it, because if the individual element vectors aren't very small the overhead of boxing them won't matter, i.e. often it makes sense to use B.Vector (U.Vector b).
For your application however, I would not do that at all – there's no need to ever wrap the upper element-choices in a single triangular array. (And it would be really bad for performance to do that, because it make the algorithm take O (n²) memory rather than O (n) which is all that's needed.)
I would just do the following:
pairApply combine neutral f v
= U.ifoldl' (\acc i p -> U.foldl' (\acc' q -> combine acc' $ f p q)
acc
(U.drop (i+1) v) )
neutral v
This corresponds pretty much to the obvious nested-loops imperative implementation
pairApply(combine, b, f, v):
for(i in 0..length(v)-1):
for(j in i+1..length(v)-1):
b = combine(b, f(v[i], v[j]);
return b;
My answer is basically the same as leftaroundabout's nested-loops imperative implementation:
pairApply :: (Int -> Int -> Int) -> Vector Int -> Int
pairApply f v = foldl' (+) 0 [f (v ! i) (v ! j) | i <- [0..(n-1)], j <- [(i+1)..(n-1)]]
where n = length v
As far as I know, I do not see any performance issue with this implementation.
Non-polymorphic for simplicity.

Symbolic theory proving using SBV and Haskell

I'm using SBV (with Z3 backend) in Haskell to create some theory provers. I want to check if forall x and y with given constrains (like x + y = y + x, where + is a "plus operator", not addition) some other terms are valid. I want to define axioms about the + expression (like associativity, identity etc.) and then check for further equalities, like check if a + (b + c) == (a + c) + b is valid formal a, b and c.
I was trying to accomplish it using something like:
main = do
let x = forall "x"
let y = forall "y"
out <- prove $ (x .== x)
print "end"
But it seems we cannot use the .== operator on symbolic values. Is this a missing feature or wrong usage? Are we able to do it somehow using SBV?
That sort of reasoning is indeed possible, through the use of uninterpreted sorts and functions. Be warned, however, that reasoning about such structures typically requires quantified axioms, and SMT-solvers are usually not terribly good at reasoning with quantifiers.
Having said that, here's how I would go about it, using SBV.
First, some boiler-plate code to get an uninterpreted type T:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Generics
import Data.SBV
-- Uninterpreted type T
data T = TBase () deriving (Eq, Ord, Data, Typeable, Read, Show)
instance SymWord T
instance HasKind T
type ST = SBV T
Once you do this, you'll have access to an uninterpreted type T and its symbolic counterpart ST. Let's declare plus and zero, again just uninterpreted constants with the right types:
-- Uninterpreted addition
plus :: ST -> ST -> ST
plus = uninterpret "plus"
-- Uninterpreted zero
zero :: ST
zero = uninterpret "zero"
So far, all we told SBV is that there exists a type T, and a function plus, and a constant zero; expressly being uninterpreted. That is, the SMT solver makes no assumptions other than the fact that they have the given types.
Let's first try to prove that 0+x = x:
bad = prove $ \x -> zero `plus` x .== x
If you try this, you'll get the following response:
*Main> bad
Falsifiable. Counter-example:
s0 = T!val!0 :: T
What the SMT solver is telling you is that the property does not hold, and here's a value where it doesn't hold. The value T!val!0 is a Z3 specific response; other solvers can return other things. It's essentially an internal identifier for a habitant of the type T; and other than that we know nothing about it. This isn't terribly useful of course, as you don't really know what associations it made for plus and zero, but it is to be expected.
To prove the property, let's tell the SMT solver two more things. First, that plus is commutative. And second, that zero added on the right doesn't do anything. These are done via addAxiom calls. Unfortunately, you have to write your axioms in the SMTLib syntax, as SBV doesn't (at least yet) support axioms written using Haskell. Note also we switch to using the Symbolic monad here:
good = prove $ do
addAxiom "plus-zero-axioms"
[ "(assert (forall ((x T) (y T)) (= (plus x y) (plus y x))))"
, "(assert (forall ((x T)) (= (plus x zero) x)))"
]
x <- free "x"
return $ zero `plus` x .== x
Note how we told the solver x+y = y+x and x+0 = x, and asked it to prove 0+x = x. Writing axioms this way looks really ugly since you have to use the SMTLib syntax, but that's the current state of affairs. Now we have:
*Main> good
Q.E.D.
Quantified axioms and uninterpreted-types/functions are not the easiest things to use via the SBV interface, but you can get some mileage out of it this way. If you have heavy use of quantifiers in your axioms, it's unlikely that the solver will be able to answer your queries; and will likely respond unknown. It all depends on the solver you use, and how hard the properties to prove are.
Your use of the API isn't quite right. The simplest way to prove mathematical equalities would be to use simple functions. For instance, associativity over unbounded Integers can be expressed this way:
prove $ \x y z -> x + (y + z) .== (x + y) + (z :: SInteger)
If you need a more programmatic interface (and sometimes you will), then you can use the Symbolic monad, thusly:
plusAssoc = prove $ do x <- sInteger "x"
y <- sInteger "y"
z <- sInteger "z"
return $ x + (y + z) .== (x + y) + z
I'd recommend browsing through many of the examples provided in the hackage site to get familiar with the API: https://hackage.haskell.org/package/sbv

Haskell - add typeclass?

Consider the following example:
data Dot = Dot Double Double
data Vector = Vector Double Double
First, i would like to overload + operator for Vector addition. If i wanted to overload equality(==) operator, i would write it like:
instance Eq Vector where ...blahblahblah
But I can't find if there is Add typeclass to make Vector behave like a type with addition operation. I can't even find a complete list of Haskell typeclasses, i know only few from different tutorials. Does such a list exist?
Also, can I overload + operator for adding Vector to Dot(it seems rather logical, doesn't it?).
An easy way to discover information about which typeclass (if any) a function belongs to is to use GHCi:
Prelude> :i (+)
class (Eq a, Show a) => Num a where
(+) :: a -> a -> a
...
-- Defined in GHC.Num
infixl 6 +
The operator + in Prelude is defined by the typeclass Num. However as the name suggests, this not only defines addition, but also a lots of other numeric operations (in particular the other arithmetic operators as well as the ability to use numeric literals), so this doesn't fit your use case.
There is no way to overload just + for your type, unless you want to hide Prelude's + operator (which would mean you have to create your own Addable instance for Integer, Double etc. if you still want to be able to use + on numbers).
You can write an instance Num Vector to overload + for vector addition (and the other operators that make sense).
instance Num Vector where
(Vector x1 y1) + (Vector x2 y2) = Vector (x1 + x2) (y1 + y2)
-- and so on
However, note that + has the type Num a => a -> a -> a, i.e. both operands and the result all have to be the same type. This means that you cannot have a Dot plus a Vector be a Dot.
While you can hide Num from the Prelude and specify your own +, this is likely to cause confusion and make it harder to use your code together with regular arithmetic.
I suggest you define your own operator for vector-point addition, for example
(Dot x y) `offsetBy` (Vector dx dy) = Dot (x + dx) (y + dy)
or some variant using symbols if you prefer something shorter.
I sometimes see people defining their own operators that kind of look like ones from the Prelude. Even ++ probably uses that symbol because they wanted something that conveyed the idea of "adding" two lists together, but it didn't make sense for lists to be an instance of Num. So you could use <+> or |+| or something.

Resources