Haskell data type for hmatrix Vector and Matrix - haskell

I am just starting out with Haskell, I have read up to the defining data types section of LYAH and am attempting to implement the Sum-Product algorithm for Belief Propagation. One of the rudimentary tasks is to define the Probabilistic Graphical Model.
As shown below, I have attempted to create a graph by tying the knot to represent the graph where each node represents a Gaussian distribution and has constant weight links(for now) to it's neighbours. However, when trying to define the Mean and Covariance types I am having some difficulty in specifying the types of the Matrix and Vector types, i.e. Float or Double.
module Graph(Graph) where
import Numeric.LinearAlgebra
data Mean = Mean Vector
data Covariance = Covariance Matrix
data Gaussian = Gaussian Mean Covariance
data Node = Node [Node] Gaussian
data Graph = Graph [Node]
In this simple example, what is the syntax to define Mean as a Vector of type Double and Covariance as a Matrix of type Double. Additionally, how would one generalise so that Mean and Covariance can be of type Float or Double?
I currently get the following from GHCi
Graph.hs:5:18: error:
• Expecting one more argument to ‘Vector’
Expected a type, but ‘Vector’ has kind ‘* -> *’
• In the type ‘Vector’
In the definition of data constructor ‘Mean’
In the data declaration for ‘Mean’
Failed, modules loaded: none.
I am using the hmatrix package as described here

Vector and Matrix are parameterised on the scalar type (so you can not only have matrices of floating-point “real numbers”, but also matrices of integers, complex numbers etc.). This is what GHC tells you by ‘Vector’ has kind ‘* -> *’: by itself, Vector is not a type (types have kind *, aka Type). Rather it is a type function mapping types of kind * to types of kind *. Scalars like Double are already plain types, so you can just apply Vector to them.
GHCi> :kind Vector
Vector :: * -> *
GHCi> :k Double
Double :: *
GHCi> :k Vector Double
Vector Double :: *
Thus you need
newtype Mean = Mean (Vector Double)
newtype Covariance = Covariance (Matrix Double)
(newtype does the same thing as data here, but it's a bit more efficient because no extra box/pointer is needed).
Alternatively, you may use more meaningfully-typed vector spaces, e.g.
import Math.LinearMap.Category
newtype Mean v = Mean v
newtype Covariance v = Covariance (v +> DualVector v)
The advantage of this is that dimensions are checked at compile time, which prevents nasty runtime errors (and can in principle also improve performance, though frankly the linearmap-category library is not optimised at all yet).
You'd then also parameterise the other types over the vector space:
data Gaußian v = Gaußian (Mean v) (Covariance v)
data Node v = Node [Node v] (Gaussian v)
data Graph v = Graph [Node v]
Somewhat unrelated to your question: this knot-tying sure feels elegant, but it's not really a suitable way to represent a graph, because nodes can't be identity-checked. Any cycles in the graph lead to, for all distinguishable means, an infinite structure. In practice, you won't get around giving your nodes e.g. Int labels and keeping a separate structure for the edges.

Related

Prevent user to use binary operators defined in a new type

I'm currently trying to define multivariate polynomials over a field in Haskell (work in progress). I have as a starting point:
data Polynomial a = Zero
| M (Monomial a)
| Polynomial a :+: Polynomial a
| Polynomial a :*: Polynomial a
deriving (Show)
Is it possible to prevent the user to use the binary operators :+: and :*:? I'd like, because I define the addition and the multiplication later, which not only perform the operation but also put the result in canonical form (sum of monomials with distinct powers), and I would like that the user can only use these operations.
I would bet that's not possible if one exports the Polynomial type, but maybe the brilliant minds here have a trick?
You can export the Polynomial type without exporting its constructors.
module Foo(Polynomial()) where
would do this.

Haskell: how to write code that interacts with the internals of two wrapped types?

I'm wondering how to create two encapsulated types that interact with each other, without exposing the internal implementation to other modules.
As an example, consider my two modules, Vector.hs and Matrix.hs, that wrap Linear.V4 (a 4-element vector) and Linear.M44 (a 4x4 element matrix). I wish to write a function that multiplies a Matrix by a Vector, returning a Vector, and using the wrapped data types to perform the operation.
In this example, in Vector.hs I have:
-- Vector.hs
module Vector (Vector, vector) where
import Linear (V4 (V4))
newtype Vector = Vector (V4 Double) deriving (Eq, Show, Read)
vector :: (Double, Double, Double, Double) -> Vector
vector (x, y, z, w) = Vector (V4 x y z w)
Note that I'm only exporting the new type and the factory function - the data constructor Vector is not exported. As far as I understand, this hides the internal implementation of Vector (i.e. that it's really a V4).
And in Matrix.hs I have something similar:
-- Matrix.hs
module Matrix (Matrix, vector)
import Linear (V4 (V4), M44)
type Row = (Double, Double, Double, Double)
newtype Matrix = Matrix (M44 Double) deriving (Eq, Show, Read)
matrix :: (Row, Row, Row, Row) -> Matrix
--matrix ((a00, a01, a02, a03), (a10, ... ) = Matrix (V4 (V4 a00 a01 a02 a03) (V4 a10 ...))
These two modules can be used by client code pretty effectively - the client code is not aware that they are implemented with Linear data structures, and there's apparently no way for client code to exploit that (is that true?).
The problem arises when those two types need to interact with each other at the level of their wrapped types - in this particular case, multiplying a Matrix by a Vector yields a new Vector. However to implement this operation, the code performing it needs access to the internal implementation of both Vector and Matrix, in order to do something like this:
-- Matrix.hs
-- ... include earlier definitions
{-# LANGUAGE FlexibleInstances #-}
import qualified Linear.Matrix ((!*))
class MatrixMultiplication a b c | a b -> c where
infixl 7 |*| -- set same precedence and associativity as *
(|*|) :: a -> b -> c
instance MatrixMultiplication Matrix Vector Vector where
(|*|) (Matrix a) q0 =
let (V4 x y z w) = a Linear.Matrix.!* _impl q0
in vector (x, y, z, w)
To do this I need a function, _impl, that allows the Matrix module to get at the internals of Vector:
-- Vector.hs
-- ... include earlier definitions
_impl :: Vector -> V4 Double
_impl (Vector q) = q
-- also added to export list
So after carefully hiding the internals of Vector (by not exporting the data constructor function of Vector), it seems I must expose them with a different exported function, and the only defence I have is some documentation telling client code they shouldn't be using it. This seems unfortunate - I've effectively just renamed the data constructor.
In a language like C++, the friend keyword could be used to provide a function access to the private data member of one of the types. Is there a similar concept in Haskell?
I suppose I could implement both Vector and Matrix in the same module. They'd then have access to each others' data constructors. However this isn't really a satisfying solution as it just works around the issue.
Is there a better approach?

What is this type?

Haskell novice here. I know from type classes that =>means "in the context of". Yet, I can't read the following type, found in module Statistics.Sample
(Vector v (Double, Double), Vector v Double) => v (Double, Double) -> Double
What constraints are being applied on v left of => ?
The Data.Vector.Generic.Vector typeclass takes two type arguments, v and a where v :: * -> * is the type of the container and a :: * is the type of the elements in the container. This is simply a generic interface for the vector types defined in the vector package, notably Data.Vector.Unboxed.Vector.
This is essentially saying that the type v must be able to hold (Double, Double) and Double, although not simultaneously. If you were to use v ~ Data.Vector.Unboxed.Vector then this works just fine. The reason is due to the implementation of correlation, which uses unzip. This function splits a v (a, b) into (v a, v b). Since correlation is working on v (Double, Double), it needs the additional constraint that v can hold Doubles.
This generic type is meant to make the correlation function work with more types than Data.Vector.Vector, including any vector style types that might be implemented in other libraries.
I want to stress that these constraints
Data.Vector.Generic.Vector v (Double, Double)
Data.Vector.Generic.Vector v Double
State that whatever type you choose for v is capable of holding (Double, Double) and is also capable of holding Double. This is specifying certain prerequisites for your vector type, not the actual contents of the vector. The actual contents of the vector is specified in the first argument to the correlation function.

Why are unconditionally ambiguous methods involving type families not rejected?

I have the following declarations:
class NN a where
type Vector a :: *
vectorize :: Vector a -> WordVector
compute :: a -> SomeResult
In an instance of NN, I have this:
instance NN Model where
type Vector Model = Vec
compute m = .... vectorize v ...
compute uses vectorize but this fails to typecheck: GHC says it cannot unifies Vector a0 with Vec and variable a0 is ambiguous.
I somehow understand why typechecking would fail in the general case of calling vectorize with only typeclass constraint: As type family Vector is open, there is no way, given a particular Vector a image to infer which a it is. Thanks to this answer I also can understand why it fail to typechecks in the particular case of a call from within a method defined in the same instance: At call site of vectorize there is no relationship between the a of Vector a -> WordVector and the a of compute.
What I don't understand is why the declaration of vectorize is not rejected by the compiler. Following this page and the referenced papers, I found the following statement in this paper:
Each method must mention the class variable somewhere that is not
under an associated synonym. For example, this declaration defines an
unconditionally-ambiguous method op, and is rejected:
class C a where
type S a
op :: S a → Int
Edit: I am using GHC 7.8.3

Restrictions of unboxed types

I wonder why unboxed types in Haskell have these restrictions:
You cannot define a newtype for unboxed type:
newtype Vec = Vec (# Float#, Float# #)
but you can define type synonim:
type Vec = (# Float#, Float# #)
Type families can't return unboxed type:
type family Unbox (a :: *) :: # where
Unbox Int = Int#
Unbox Word = Word#
Unbox Float = Float#
Unbox Double = Double#
Unbox Char = Char#
Are there some fundamental reasons behind this, or it's just because no one asked for this features?
Parametric polymorphism in Haskell relies on the fact that all values of t :: * types are uniformly represented as a pointer to a runtime object. Thus, the same machine code works for all instantiations of polymorphic values.
Contrast polymorphic functions in Rust or C++. For example, the identity function there still has type analoguous to forall a. a -> a, but since values of different a types may have different sizes, the compilers have to generate different code for each instatiation. This also means that we can't pass polymorphic functions around in runtime boxes:
data Id = Id (forall a. a -> a)
since such a function would have to work correctly for arbitrary-sized objects. It requires some additional infrastructure to allow this feature, for example we could require that a runtime forall a. a -> a function takes extra implicit arguments that carry information about the size and constructors/destructors of a values.
Now, the problem with newtype Vec = Vec (# Float#, Float# #) is that even though Vec has kind *, runtime code that expects values of some t :: * can't handle it. It's a stack-allocated pair of floats, not a pointer to a Haskell object, and passing it to code expecting Haskell objects would result in segfaults or errors.
In general (# a, b #) isn't necessarily pointer-sized, so we can't copy it into pointer-sized data fields.
Type families returning # types are disallowed for related reasons. Consider the following:
type family Foo (a :: *) :: # where
Foo Int = Int#
Foo a = (# Int#, Int# #)
data Box = forall (a :: *). Box (Foo a)
Our Box is not representable runtime, since Foo a has different sizes for different a-s. Generally, polymorphism over # would require generating different code for different instantiations, like in Rust, but this interacts badly with regular parametric polymorphism and makes runtime representation of polymorphic values difficult, so GHC doesn't bother with any of this.
(Not saying though that a usable implementation couldn't possibly be devised)
A newtype would allow one to define class instances
instance C Vec where ...
which can not be defined for unboxed tuples. Type synonyms instead do not offer such functionality.
Also, Vec would not be a boxed type. This means that you can no longer instantiate type variables with Vec in general, unless their kind allows it. For instance [Vec] should be disallowed. The compiler should keep track of "regular" newtypes and "unboxed" newtypes in some way. This will have, I think, the only benefit of allowing the data constructor Vec to wrap unboxed values at compile time (since it is removed at runtime). This would probably be not enough useful to justify making the necessary changes to the type inference engine, I guess.

Resources