How to create a lattice-type data structure in Haskell? - haskell

I am trying to build a lattice-type of FCA-type of data structure in Haskell where I could check if two entities have a join or not. Actually, I'm not even sure that the lattice is the right structure as it might be a bit "too much".
Here is the context. In a COBOL program analyzer, I have an ontology of data set names, files, records and fields. A data set name can have multiple file names depending on the program, a file can have multiple records and a record can have multiple fields. I'd like this hierarchy to be reflected in the Haskell data structure. But I'd like also to be able to have a relation inherited for file1 and file2 such that I can check if file1 and file2 belong to the same data set name. Actually, this relation could almost be that of "==". But it could simply be that they do have a join in dsn0, for instance.
I have other ontologies in that context that would benefit from a lattice or FCA data structure. For example, I have programs that belongs to job steps and job steps that belong to jobs. If I could easily figure out if two programs belong to the same job, that would be great. Here also, it seems like a "join" operator. Getting the extension (code) of a certain entity would be useful too.
I'm still a bit new to Haskell. I tried to look at the Lattice library but I'm not sure where to go from there, concretely. Any idea of how to get started? A small example of a lattice in Haskell would be very helpful. Thank you very much for your help (and patience).
UPDATE:
The Lattice might not be the best formalism for this as mentioned in the comments. I realize I might just have to use a regular class type of data structure along those lines:
data DSN = DSN {
programFiles :: [ProgramFile]
name :: String
ddn :: DDN
}
data ProgramFile = ProgramFile {
records :: [Record]
name :: String
}
data Record = Record {
fields :: [Field]
name :: String
}
data Field = Field {
name :: String
order :: Int
}
I guess my initial intention behind using a tree/lattice/FCA type of structure is to take full advantage of the functor potential in Haskell which should lead to interesting lattice operations including geting all the extension of a concept, checking that two concepts belong to the same higher-level concept, checking equality '==' of two files through their DSN,...
Maybe a non-binary tree structure would be better? Is it easy to do that in Haskell?

I recommend making an abstract data type to represent one-to-many relationships. It might look something like this:
module OneToMany (OMRel, empty, insert, delete, source, targets) where
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map.Strict as M
import qualified Data.Set as S
data OMRel a b = OMRel
{ oneToMany :: Map a (Set b)
, manyToOne :: Map b a
} deriving (Eq, Ord, Read, Show)
empty :: OMRel a b
empty = OMRel M.empty M.empty
insert :: (Ord a, Ord b) => a -> b -> OMRel a b -> OMRel a b
insert a b (OMRel otm mto) = OMRel
{ oneToMany = M.insertWith S.union a (S.singleton b) $
case M.lookup b mto of
Just oldA -> M.adjust (S.delete b) oldA otm
Nothing -> otm
, manyToOne = M.insert b a mto
}
delete :: (Ord a, Ord b) => a -> b -> OMRel a b -> OMRel a b
delete a b (OMRel otm mto) = OMRel (M.adjust (S.delete b) a otm) (M.delete b mto)
source :: Ord b => b -> OMRel a b -> Maybe a
source b = M.lookup b . manyToOne
targets :: Ord a => a -> OMRel a b -> Set b
targets a = M.findWithDefault S.empty a . oneToMany
(Of course you can flesh out the API with more efficient bulk operations like merging, bulk insert, sequential composition, etc. But this is sort of the minimal construct/consume API that gets you where you need to go.)
Then you need a couple data types to represent your various ontological entries:
newtype Dataset = Dataset { dataset :: String }
newtype Record = Record { record :: String }
newtype Field = Field { order :: Int }
From there you can use values with types like OMRel Dataset FilePath to represent the fact that datasets "contain" files. For querying containment equality, you can write this once and for all via the OMRel API above:
sameSource :: (Eq a, Ord b) => OMRel a b -> b -> b -> Bool
sameSource rel b b' = source b rel == source b' rel
(You may need an extra clause if two missing targets should be considered inequal.) Then, e.g., this can be specialized to
sameSource :: OMRel Dataset FilePath -> FilePath -> FilePath -> Bool
and friends.
You will not be able to make a Functor (or Bifunctor) instance for OMRel because of the Ord constraints. It's not clear to me that fmap/bimap make a TON of sense for this particular data structure, though. For example, if we have associations x :: a <-> y :: b and x' :: a <-> y' :: b, and f b = f b', then should fmap f associate f b with a or a'? If they do turn out to have a sensible interpretation and be useful, you could either pursue the constrained-monads approach or simply offer a function named bimap from the OneToMany module with the appropriate type, but without making it an instance method.

Related

Is there an appropriate optic for set membership?

I’m using Data.Sets in deeply-nested heterogeneous data structures, and thought it would be helpful to create a Prism for set membership. Hence:
membership :: (Ord a) => a -> Prism' (Set a) (Set a)
membership a = prism (Set.insert a) g
where g as = if Set.member a as
then Right $ Set.delete a as
else Left as
However, this fails the first prism law, preview l (review l b) ≡ Just b, in the case where review l inserts a member that is already present in b, viz., if l is the membership lens for a, and b is {a}, then review l b is also {a}, and preview l (review l b) is just the null set, rather than just {a} as the first prism law requires.
Is there a better optic for capturing set membership? I like being able to check membership and conditionally decompose the set into the matching and non-matching parts simultaneously. Additionally, having an optic to do this is appealing because, since it captures all the use-cases I have for working with Sets in other parts of my code, it enables me to remove my import Data.Set statements from the rest of my package, which often indicates successful abstraction to me.
There is Contains type class with a member:
contains :: Contains m => Index m -> Lens' m Bool
which when specialised to Set is
contains :: Ord a => a -> Lens' (Set a) Bool
It is probably a good exercise to think why it is a lens (and not a prism, as in your attempt).

How to change the behavior of the function based on class constraints in Haskell?

I have a data type that represents a collection of values paired with a probability. At first, the implementation was just to use good old lists, but as you can imagine, this can be inefficient (for example, I use a Tree instead of a list to store ordered values)
After some research, I thought about using GADTs
data Tree a b = Leaf | Node {left::Tree a b, val :: (a, b), right :: Tree a b}
data Prob a where
POrd ::Ord a => Tree a Rational -> Prob a
PEq ::Eq a => [(a, Rational)] -> Prob a
PPlain ::[(a, Rational)] -> Prob a
So far, so good. I'm now stuck at trying to create a smart constructor for my new data type,
that takes [(a,Rational)] and depending on the constraints of a, chooses the correct constructor for Prob. Basically:
prob :: [(a, Rational)] -> Prob a
-- chooses the "best" constructor based on the constraints of a
Is this at all possible? If not, how should I go about designing something better? Am I missing something?
Thanks!
There is no way to perform a check of the form "is type T in class C?" in Haskell. The issue here is that it is hard to answer negatively to such question and allow separate compilation: T could be in C in the scope of one module but not in the scope of another one, causing a rather fragile semantics.
To ensure consistency, Haskell only allows to require a constraint, and raise an compile time error otherwise.
As far as I can see, the best you can do is to use another custom type class, which tells you which case is the best one. E.g.
{-# LANGUAGE AllowAmbiguousTypes, TypeApplications, ScopedTypeVariables #-}
data BestConstraint a where
BCOrd :: Ord a => BestConstraint a
BCEq :: Eq a => BestConstraint a
BCNone :: BestConstraint a
class BC a where
bestC :: BestConstraint a
instance BC Int where bestC = BCOrd
-- ... etc.
instance BC a => BC [a] where
bestC = case bestC #a of
BCOrd -> BCOrd
BCEq -> BCEq
BCNone -> BCNone
prob :: forall a . BestConstraint a => [(a, Rational)] -> Prob a
prob xs = case bestC #a of
BCOrd -> POrd .... -- build the tree
BCEq -> PEq xs
BCNone -> PPlain xs
You will have to provide an instance for any type you want to use, though.

What's a better way of managing large Haskell records?

Replacing fields names with letters, I have cases like this:
data Foo = Foo { a :: Maybe ...
, b :: [...]
, c :: Maybe ...
, ... for a lot more fields ...
} deriving (Show, Eq, Ord)
instance Writer Foo where
write x = maybeWrite a ++
listWrite b ++
maybeWrite c ++
... for a lot more fields ...
parser = permute (Foo
<$?> (Nothing, Just `liftM` aParser)
<|?> ([], bParser)
<|?> (Nothing, Just `liftM` cParser)
... for a lot more fields ...
-- this is particularly hideous
foldl1 merge [foo1, foo2, ...]
merge (Foo a b c ...seriously a lot more...)
(Foo a' b' c' ...) =
Foo (max a a') (b ++ b') (max c c') ...
What techniques would allow me to better manage this growth?
In a perfect world a, b, and c would all be the same type so I could keep them in a list, but they can be many different types. I'm particularly interested in any way to fold the records without needing the massive patterns.
I'm using this large record to hold the different types resulting from permutation parsing the vCard format.
Update
I've implemented both the generics and the foldl approaches suggested below. They both work, and they both reduce three large field lists to one.
Datatype-generic programming techniques can be used to transform all the fields of a record in some "uniform" sort of way.
Perhaps all the fields in the record implement some typeclass that we want to use (the typical example is Show). Or perhaps we have another record of "similar" shape that contains functions, and we want to apply each function to the corresponding field of the original record.
For these kinds of uses, the generics-sop library is a good option. It expands the default Generics functionality of GHC with extra type-level machinery that provides analogues of functions like sequence or ap, but which work over all the fields of a record.
Using generics-sop, I tried to create a slightly less verbose version of your merge funtion. Some preliminary imports:
{-# language TypeOperators #-}
{-# language DeriveGeneric #-}
{-# language TypeFamilies #-}
{-# language DataKinds #-}
import Control.Applicative (liftA2)
import qualified GHC.Generics as GHC
import Generics.SOP
A helper function that lifts a binary operation to a form useable by the functions of generics-sop:
fn_2' :: (a -> a -> a) -> (I -.-> (I -.-> I)) a -- I is simply an Identity functor
fn_2' = fn_2 . liftA2
A general merge function that takes a vector of operators and works on any single-constructor record that derives Generic:
merge :: (Generic a, Code a ~ '[ xs ]) => NP (I -.-> (I -.-> I)) xs -> a -> a -> a
merge funcs reg1 reg2 =
case (from reg1, from reg2) of
(SOP (Z np1), SOP (Z np2)) ->
let npResult = funcs `hap` np1 `hap` np2
in to (SOP (Z npResult))
Code is a type family that returns a type-level list of lists describing the structure of a datatype. The outer list is for constructors, the inner lists contain the types of the fields for each constructor.
The Code a ~ '[ xs ] part of the constraint says "the datatype can only have one constructor" by requiring the outer list to have exactly one element.
The (SOP (Z _) pattern matches extract the (heterogeneus) vector of field values from the record's generic representation. SOP stands for "sum-of-products".
A concrete example:
data Person = Person
{
name :: String
, age :: Int
} deriving (Show,GHC.Generic)
instance Generic Person -- this Generic is from generics-sop
mergePerson :: Person -> Person -> Person
mergePerson = merge (fn_2' (++) :* fn_2' (+) :* Nil)
The Nil and :* constructors are used to build the vector of operators (the type is called NP, from n-ary product). If the vector doesn't match the number of fields in the record, the program won't compile.
Update. Given that the types in your record are highly uniform, an alternative way of creating the vector of operations is to define instances of an auxiliary typeclass for each field type, and then use the hcpure function:
class Mergeable a where
mergeFunc :: a -> a -> a
instance Mergeable String where
mergeFunc = (++)
instance Mergeable Int where
mergeFunc = (+)
mergePerson :: Person -> Person -> Person
mergePerson = merge (hcpure (Proxy :: Proxy Mergeable) (fn_2' mergeFunc))
The hcliftA2 function (that combines hcpure, fn_2 and hap) could be used to simplify things further.
Some suggestions:
(1) You can use the RecordWildCards extension to automatically
unpack a record into variables. Doesn't help if you need to unpack
two records of the same type, but it's a useful to keep in mind.
Oliver Charles has a nice blog post on it: (link)
(2) It appears your example application is performing a fold over the records.
Have a look at Gabriel Gonzalez's foldl package. There is also a blog post: (link)
Here is a example of how you might use it with a record like:
data Foo = Foo { _a :: Int, _b :: String }
The following code computes the maximum of the _a fields and the
concatenation of the _b_ fields.
import qualified Control.Foldl as L
import Data.Profunctor
data Foo = Foo { _a :: Int, _b :: String }
deriving (Show)
fold_a :: L.Fold Foo Int
fold_a = lmap _a (L.Fold max 0 id)
fold_b :: L.Fold Foo String
fold_b = lmap _b (L.Fold (++) "" id)
fold_foos :: L.Fold Foo Foo
fold_foos = Foo <$> fold_a <*> fold_b
theFoos = [ Foo 1 "a", Foo 3 "b", Foo 2 "c" ]
test = L.fold fold_foos theFoos
Note the use of the Profunctor function lmap to extract out
the fields we want to fold over. The expression:
L.Fold max 0 id
is a fold over a list of Ints (or any Num instance), and therefore:
lmap _a (L.Fold max 0 id)
is the same fold but over a list of Foo records where we use _a
to produce the Ints.

Structurally Enforcing No Red Children Of Red Node

While studying Learn You A Haskell For Great Good and Purely Functional Data Structures, I thought to try to reimplement a Red Black tree while trying to structurally enforce another tree invariant.
Paraphrasing Okasaki's code, his node looks something like this:
import Data.Maybe
data Color = Red | Black
data Node a = Node {
value :: a,
color :: Color,
leftChild :: Maybe (Node a),
rightChild :: Maybe (Node a)}
One of the properties of a red black tree is that a red node cannot have a direct-child red node, so I tried to encode this as the following:
import Data.Either
data BlackNode a = BlackNode {
value :: a,
leftChild :: Maybe (Either (BlackNode a) (RedNode a)),
rightChild :: Maybe (Either (BlackNode a) (RedNode a))}
data RedNode a = RedNode {
value :: a,
leftChild :: Maybe (BlackNode a),
rightChild :: Maybe (BlackNode a)}
This outputs the errors:
Multiple declarations of `rightChild'
Declared at: :4:5
:8:5
Multiple declarations of `leftChild'
Declared at: :3:5
:7:5
Multiple declarations of `value'
Declared at: :2:5
:6:5
I've tried several modifications of the previous code, but they all fail compilation. What is the correct way of doing this?
Different record types must have distinct field names. E.g., this is not allowed:
data A = A { field :: Int }
data B = B { field :: Char }
while this is OK:
data A = A { aField :: Int }
data B = B { bField :: Char }
The former would attempt to define two projections
field :: A -> Int
field :: B -> Char
but, alas, we can't have a name with two types. (At least, not so easily...)
This issue is not present in OOP languages, where field names can never be used on their own, but they must be immediately applied to some object, as in object.field -- which is unambiguous, provided we already know the type of object. Haskell allows standalone projections, making things more complicated here.
The latter approach instead defines
aField :: A -> Int
bField :: B -> Char
and avoids the issue.
As #dfeuer comments above, GHC 8.0 will likely relax this constraint.

Defining a suite of tests for a class

This question picks up where Haskell QuickCheck best practices (especially when testing type classes) left off.
I have a class and a bunch of implementations of that class. Something like this:
import Test.QuickCheck
import Control.Applicative
import Test.Framework
import Test.Framework.Providers.QuickCheck2
class C c where
f :: c -> Int
data A = A Int deriving Show
instance C A where
f (A a) = 2*a
data B = B Int deriving Show
instance C B where
f (B b) = 2*b
All of my implementations should satisfy a certain property. For example:
prop_f_is_even :: C c => c -> Property
prop_f_is_even x = property $ even (f x)
I want to test that property for each of the implementations. I can do something like this. (I'm using Test.Framework.)
instance Arbitrary A where
arbitrary = A <$> arbitrary
instance Arbitrary B where
arbitrary = B <$> arbitrary
test :: Test
test = testGroup "Whole buncha tests"
[
testProperty "prop_f_is_even - A" (prop_f_is_even :: A -> Property),
testProperty "prop_f_is_even - B" (prop_f_is_even :: B -> Property)
-- continue on for all combinations of properties and implementations
]
But in my case, I have dozens of properties to test, and a dozen or so
classes, so that approach is error-prone, and a hassle.
(A common mistake I make is to cut-and paste tests, but forget to change
the type name, so I end up testing A twice for that property, without testing B.)
I have a solution, which I'll post below in case anyone else finds it helpful.
This is my solution.
cProperties :: C t => String -> [(String, t -> Property)]
cProperties s =
[
("prop_f_is_even: " ++ s, prop_f_is_even)
-- plus any other tests that instances of C should satisfy
]
makeTests :: (Arbitrary t, Show t) => [(String, t -> Property)] -> [Test]
makeTests ts = map (\(s,t) -> testProperty s t) ts
aProperties :: [(String, A -> Property)]
aProperties = cProperties "A"
bProperties :: [(String, B -> Property)]
bProperties = cProperties "B"
easierTest :: Test
easierTest =
testGroup "tests" (makeTests aProperties ++ makeTests bProperties)
With this approach, if I want to add another property that all instances of C should satisfy, I just add it to cProperties. And if I create another instance of C, call it D, then I define dProperties similarly to aProperties and bProperties, and then update easierTest.
EDIT:
One disadvantage of this approach is that all tests in cProperties have to have the type signature t -> Property. I myself have not found this to be a hindrance because in cases where I apply this technique, I have already -- for unrelated reasons -- defined a type that encompasses all of the data for a test.
Another disadvantage is that, in ghci, I can no longer type, for example:
quickCheck prop_f_is_even
Now I have to type something like this:
quickCheck (prop_f_is_even :: A -> Property)

Resources