"Strategy Pattern" in Haskell - haskell

In the OO world, I have a class (let's call it "Suggestor") that implement something approaching a "Strategy Pattern" to provide differing implementations of an algorithm at runtime. As an exercise in learning Haskell, I want to rewrite this.
The actual use-case is quite complex, so I'll boil down a simpler example.
Let's say I have a class Suggester that's takes a list of rules, and applies each rule as a filter to a list of database results.
Each rule has three phases "Build Query", "Post Query Filter", and "Scorer". We essentially end up with an interface meeting the following
buildQuery :: Query -> Query
postQueryFilter :: [Record] -> [Record]
scorer :: [Record] -> [(Record, Int)]
Suggestor needs to take a list of rules that match this interface - dynamically at run time - and then execute them in sequence. buildQuery() must be run across all rules first, followed by postQueryFilter, then scorer. (i.e. I can't just compose the functions for one rule into a single function).
in the scala I simply do
// No state, so a singleton `object` instead of a class is ok
object Rule1 extends Rule {
def buildQuery ...
def postQueryFilter ...
def scorer ...
}
object Rule2 extends Rule { .... }
And can then initialise the service by passing the relevant rules through (Defined at runtime based on user input).
val suggester = new Suggester( List(Rule1, Rule2, Rule3) );
If the rules were a single function, this would be simple - just pass a list of functions. However since each rule is actually three functions, I need to group them together somehow, so I have multiple implementations meeting an interface.
My first thought was type classes, however these don't quite seem to meet my needs - they expect a type variable, and enforce that each of my methods must use it - which they don't.
No parameters for class `Rule`
My second thought was just to place each one in a haskell module, but as modules aren't "First Class" I can't pass them around directly (And they of course don't enforce an interface).
Thirdly I tried creating a record type to encapsulate the functions
data Rule = Rule { buildQuery :: Query -> Query, .... etc }
And then defined an instance of "Rule" for each. When this is done in each module it encapsulates nicely and works fine, but felt like a hack and I'm not sure if this is an appropriate use of records in haskell?
tl;dr - How do I encapsulate a group of functions together such that I can pass them around as an instance of something matching an interface, but don't actually use a type variable.
Or am I completely coming at this from the wrong mindset?

In my opinion your solution isn't the "hack", but the "strategy pattern" in OO languages: It is only needed to work around the limitations of a language, especially in case of missing, unsafe or inconvenient Lambdas/Closures/Function Pointers etc, so you need a kind of "wrapper" for it to make it "digestible" for that language.
A "strategy" is basically a function (may be with some additional data attached). But if a function is truly a first class member of the language - as in Haskell, there is no need to hide it in the object closet.

Just generate a single Rule type as you did
data Rule = Rule
{ buildQuery :: Query -> Query
, postQueryFilter :: [Record] -> [Record]
, scorer :: [Record] -> [(Record, Int)]
}
And build a general application method—I'm assuming such a generic thing exists given that these Rules are designed to operate independently over SQL results
applyRule :: Rule -> Results -> Results
Finally, you can implement as many rules as you like wherever you want: just import the Rule type and create an appropriate value. There's no a priori reason to give each different rule its own type as you might in an OO setting.
easyRule :: Rule
easyRule = Rule id id (\recs -> zip recs [1..])
upsideDownRule :: Rule
upsideDownRule = Rule reverse reverse (\recs -> zip recs [-1, -2..])
Then if you have a list of Rules you can apply them all in order
applyRules :: [Rule] -> Results -> Results
applyRules [] res = res
applyRules (r:rs) res = applyRules rs (applyRule r res)
which is actually just a foldr in disguise
applyRules rs res = foldr applyRule res rs
foo :: Results -> Results
foo = applyRules [Some.Module.easyRule, Some.Other.Module.upsideDownRule]

Related

Existential type classes vs. Data Constructors vs. Coproducts

I find myself running up against the same pattern in my designs where I start with a type with a few data constructors, eventually want to be able to type against those data constructors and thus split them into their own types, just to then have to increase the verbosity of other parts of the program by needing to use Either or another tagged-union for situations where I still need to represent multiple of these types (namely collections).
I am hoping someone can point me to a better way of accomplishing what I'm trying to do. Let me start with a simple example. I am modeling a testing system, where you can have nested test suites which eventually end in tests. So, something like this:
data Node =
Test { source::string }
Suite { title::string, children::[Node] }
So, pretty simple so far, essentially a fancy Tree/Leaf declaration. However, I quickly realize that I want to be able to make functions that take Tests specifically. As such, I'll now split it up as so:
data Test = Test { source::string }
data Suite = Suite { title::string, children::[Either Test Suite] }
Alternatively I might roll a "custom" Either (especially if the example is more complicated and has more than 2 options), say something like:
data Node =
fromTest Test
fromSuite Suite
So, already its pretty unfortunate that just to be able to have a Suite that can have a combination of Suites or Tests I end up with a weird overhead Either class (whether it be with an actual Either or a custom one). If I use existential type classes, I could get away with making both Test and Suite derive "Node_" and then have Suite own a List of said Nodes. Coproducts would allow something similar, where I'd essentially do the same Either strategy without the verbosity of the tags.
Allow me to expand now with a more complex example. The results of the tests can be either Skipped (the test was disabled), Success, Failure, or Omitted (the test or suite could not be run due to a previous failure). Again, I originally started with something like this:
data Result = Success | Omitted | Failure | Skipped
data ResultTree =
Tree { children::[ResultTree], result::Result } |
Leaf Result
But I quickly realized I wanted to be able to write functions that took specific results, and more importantly, have the type itself enforce the ownership properties: A successful suite must only own Success or Skipped children, Failure's children can be anything, Omitted can only own Omitted, etc. So now I end up with something like this:
data Success = Success { children::[Either Success Skipped] }
data Failure = Failure { children::[AnyResult] }
data Omitted = Omitted { children::[Omitted] }
data Skipped = Skipped { children::[Skipped] }
data AnyResult =
fromSuccess Success |
fromFailure Failure |
fromOmitted Omitted |
fromSkipped Skipped
Again, I now have these weird "Wrapper" types like AnyResult, but, I get type enforcement of something that used to only be enforced from runtime operation. Is there a better strategy to this that doesn't involve turning on features like existential type classes?
The first thing that came to my mind reading your sentence: "I quickly realized I wanted to be able to write functions that took specific results" is Refinement Types.
They allow to take only some values from a type as input, and make those constraints compile-time check/error.
There is this video from a talk at HaskellX 2018, that introduces LiquidHaskell, which allows the use of Refinement Types in Haskell:
https://skillsmatter.com/skillscasts/11068-keynote-looking-forward-to-niki-vazou-s-keynote-at-haskellx-2018
You have to decorate your haskell function signature, and have LiquidHaskell installed:
f :: Int -> i : Int {i | i < 3} -> Int would be a function which could only accept as second parameter an Int with a value < 3, checked at compile time.
You might as well put constraints on your Result type.
I think what you may be looking for is GADTs with DataKinds. This lets you refine the types of each constructor in a data type to a particular set of possible values. For example:
data TestType = Test | Suite
data Node (t :: TestType) where
TestNode :: { source :: String } -> Node 'Test
SuiteNode :: { title :: String, children :: [SomeNode] } -> Node 'Suite
data SomeNode where
SomeNode :: Node t -> SomeNode
Then when a function operates only on tests, it can take a Node 'Test; on suites, a Node 'Suite; and on either, a polymorphic Node a. When pattern-matching on a Node a, each case branch gets access to an equality constraint:
useNode :: Node a -> Foo
useNode node = case node of
TestNode source -> {- here it’s known that (a ~ 'Test) -}
SuiteNode title children -> {- here, (a ~ 'Suite) -}
Indeed if you took a concrete Node 'Test, the SuiteNode branch would be disallowed by the compiler, since it can’t ever match.
SomeNode is an existential that wraps a Node of an unknown type; you can add extra class constraints to this if you want.
You can do a similar thing with Result:
data ResultType = Success | Omitted | Failure | Skipped
data Result (t :: ResultType) where
SuccessResult
:: [Either (Result 'Success) (Result 'Skipped)]
-> Result 'Success
FailureResult
:: [SomeResult]
-> Result 'Failure
OmittedResult
:: [Result 'Omitted]
-> Result 'Omitted
SkippedResult
:: [Result 'Skipped]
-> Result 'Skipped
data SomeResult where
SomeResult :: Result t -> SomeResult
Of course I assume in your actual code there’s more information in these types; as it is, they don’t represent much. When you have a dynamic computation such as running a test that may produce different kinds of result, you can return it wrapped in SomeResult.
In order to work with dynamic results, you may need to prove to the compiler that two types are equal; for that, I direct you to Data.Type.Equality, which provides a type a :~: b which is inhabited by a single constructor Refl when the two types a and b are equal; you can pattern-match on this to inform the typechecker about type equalities, or use the various combinators to carry out more complicated proofs.
Also useful in conjunction with GADTs (and ExistentialTypes, less generally) is RankNTypes, which basically enables you to pass polymorphic functions as arguments to other functions; this is necessary if you want to consume an existential generically:
consumeResult :: SomeResult -> (forall t. Result t -> r) -> r
consumeResult (SomeResult res) k = k res
This is an example of continuation-passing style (CPS), where k is the continuation.
As a final note, these extensions are widely used and largely uncontroversial; you needn’t be wary of opting in to (most) type system extensions when they let you express what you mean more directly.

Checking Haskell typeclasses in a function

If I wanted to perform a search on a problem space and I wanted to keep track of different states a node has already visited, I several options to do it depending on the constraints of those states. However; is there a way I can dispatch a function or another depending on the constraint of the states the user is using as input? For example, if I had:
data Node a = Node { state :: a, cost :: Double }
And I wanted to perform a search on a Problem a, is there a way I could check if a is Eq, Ord or Hashable and then call a different kind of search? In pseudocode, something like:
search :: Eq a => Problem a -> Node a
search problem#(... initial ...) -- Where initial is a State of type a
| (Hashable initial) = searchHash problem
| (Ord initial) = searchOrd problem
| otherwise = searchEq problem
I am aware I could just let the user choose one search or another depending on their own use; but being able to do something like that could be very handy for me since search is not really one of the user endpoints as such (one example could be a function bfs, that calls search with some parameters to make it behave like a Breadth-First Search).
No, you can't do this. However, you could make your own class:
class Memorable a where
type Memory a
remember :: a -> Memory a -> Memory a
known :: a -> Memory a -> Bool
Instantiate this class for a few base types, and add some default implementations for folks that want to add new instances, e.g.
-- suitable implementations of Memorable methods and type families for hashable things
type HashMemory = Data.HashSet.HashSet
hashRemember = Data.HashSet.insert
hashKnown = Data.HashSet.member
-- suitable implementations for orderable things
type OrdMemory = Data.Set.Set
ordRemember = Data.Set.insert
ordKnown = Data.Set.member
-- suitable implementations for merely equatable things
type EqMemory = Prelude.[]
eqRemember = (Prelude.:)
eqKnown = Prelude.elem

Haskell: Confusion with own data types. Record syntax and unique fields

I just uncovered this confusion and would like a confirmation that it is what it is. Unless, of course, I am just missing something.
Say, I have these data declarations:
data VmInfo = VmInfo {name, index, id :: String} deriving (Show)
data HostInfo = HostInfo {name, index, id :: String} deriving (Show)
vm = VmInfo "vm1" "01" "74653"
host = HostInfo "host1" "02" "98732"
What I always thought and what seems to be so natural and logical is this:
vmName = vm.name
hostName = host.name
But this, obviously, does not work. I got this.
Questions
So my questions are.
When I create a data type with record syntax, do I have to make sure that all the fields have unique names? If yes - why?
Is there a clean way or something similar to a "scope resolution operator", like :: or ., etc., so that Haskell distinguishes which data type the name (or any other none unique fields) belongs to and returns the correct result?
What is the correct way to deal with this if I have several declarations with the same field names?
As a side note.
In general, I need to return data types similar to the above example.
First I returned them as tuples (seemed to me the correct way at the time). But tuples are hard to work with as it is impossible to extract individual parts of a complex type as easy as with the lists using "!!". So next thing I thought of the dictionaries/hashes.
When I tried using dictionaries I thought what is the point of having own data types then?
Playing/learning data types I encountered the fact that led me to the above question.
So it looks like it is easier for me to use dictionaries instead of own data types as I can use the same fields for different objects.
Can you please elaborate on this and tell me how it is done in real world?
Haskell record syntax is a bit of a hack, but the record name emerges as a function, and that function has to have a unique type. So you can share record-field names among constructors of a single datatype but not among distinct datatypes.
What is the correct way to deal with this if I have several declarations with the same field names?
You can't. You have to use distinct field names. If you want an overloaded name to select from a record, you can try using a type class. But basically, field names in Haskell don't work the way they do in say, C or Pascal. Calling it "record syntax" might have been a mistake.
But tuples are hard to work with as it is impossible to extract individual parts of a complex type
Actually, this can be quite easy using pattern matching. Example
smallId :: VmInfo -> Bool
smallId (VmInfo { vmId = n }) = n < 10
As to how this is done in the "real world", Haskell programmers tend to rely heavily on knowing what type each field is at compile time. If you want the type of a field to vary, a Haskell programmer introduces a type parameter to carry varying information. Example
data VmInfo a = VmInfo { vmId :: Int, vmName :: String, vmInfo :: a }
Now you can have VmInfo String, VmInfo Dictionary, VmInfo Node, or whatever you want.
Summary: each field name must belong to a unique type, and experienced Haskell programmers work with the static type system instead of trying to work around it. And you definitely want to learn about pattern matching.
There are more reasons why this doesn't work: lowercase typenames and data constructors, OO-language-style member access with .. In Haskell, those member access functions actually are free functions, i.e. vmName = name vm rather than vmName = vm.name, that's why they can't have same names in different data types.
If you really want functions that can operate on both VmInfo and HostInfo objects, you need a type class, such as
class MachineInfo m where
name :: m -> String
index :: m -> String -- why String anyway? Shouldn't this be an Int?
id :: m -> String
and make instances
instance MachineInfo VmInfo where
name (VmInfo vmName _ _) = vmName
index (VmInfo _ vmIndex _) = vmIndex
...
instance MachineInfo HostInfo where
...
Then name machine will work if machine is a VmInfo as well as if it's a HostInfo.
Currently, the named fields are top-level functions, so in one scope there can only be one function with that name. There are plans to create a new record system that would allow having fields of the same name in different record types in the same scope, but that's still in the design phase.
For the time being, you can make do with unique field names, or define each type in its own module and use the module-qualified name.
Lenses can help take some of the pain out of dealing with getting and setting data structure elements, especially when they get nested. They give you something that looks, if you squint, kind of like object-oriented accessors.
Learn more about the Lens family of types and functions here: http://lens.github.io/tutorial.html
As an example for what they look like, this is a snippet from the Pong example found at the above github page:
data Pong = Pong
{ _ballPos :: Point
, _ballSpeed :: Vector
, _paddle1 :: Float
, _paddle2 :: Float
, _score :: (Int, Int)
, _vectors :: [Vector]
-- Since gloss doesn't cover this, we store the set of pressed keys
, _keys :: Set Key
}
-- Some nice lenses to go with it
makeLenses ''Pong
That makes lenses to access the members without the underscores via some TemplateHaskell magic.
Later on, there's an example of using them:
-- Update the paddles
updatePaddles :: Float -> State Pong ()
updatePaddles time = do
p <- get
let paddleMovement = time * paddleSpeed
keyPressed key = p^.keys.contains (SpecialKey key)
-- Update the player's paddle based on keys
when (keyPressed KeyUp) $ paddle1 += paddleMovement
when (keyPressed KeyDown) $ paddle1 -= paddleMovement
-- Calculate the optimal position
let optimal = hitPos (p^.ballPos) (p^.ballSpeed)
acc = accuracy p
target = optimal * acc + (p^.ballPos._y) * (1 - acc)
dist = target - p^.paddle2
-- Move the CPU's paddle towards this optimal position as needed
when (abs dist > paddleHeight/3) $
case compare dist 0 of
GT -> paddle2 += paddleMovement
LT -> paddle2 -= paddleMovement
_ -> return ()
-- Make sure both paddles don't leave the playing area
paddle1 %= clamp (paddleHeight/2)
paddle2 %= clamp (paddleHeight/2)
I recommend checking out the whole program in its original location and looking through the rest of the lens material; it's very interesting even if you don't end up using them.
Yes, you cannot have two records in the same module with the same field names. The field names are added to the module's scope as functions, so you would use name vm rather than vm.name. You could have two records with the same field names in different modules and import one of the modules qualified as some name, but this is probably awkward to work with.
For a case like this, you should probably just use a normal algebraic data type:
data VMInfo = VMInfo String String String
(Note that the VMInfo has to be capitalized.)
Now you can access the fields of VMInfo by pattern matching:
myFunc (VMInfo name index id) = ... -- name, index and id are bound here

Any nice record Handling tricks in Haskell?

I'm aware of partial updates for records like :
data A a b = A { a :: a, b :: b }
x = A { a=1,b=2 :: Int }
y = x { b = toRational (a x) + 4.5 }
Are there any tricks for doing only partial initialization, creating a subrecord type, or doing (de)serialization on subrecord?
In particular, I found that the first of these lines works but the second does not :
read "A {a=1,b=()}" :: A Int ()
read "A {a=1}" :: A Int ()
You could always massage such input using a regular expression, but I'm curious what Haskell-like options exist.
Partial initialisation works fine: A {a=1} is a valid expression of type A Int (); the Read instance just doesn't bother parsing anything the Show instance doesn't output. The b field is initialised to error "...", where the string contains file/line information to help with debugging.
You generally shouldn't be using Read for any real-world parsing situations; it's there for toy programs that have really simple serialisation needs and debugging.
I'm not sure what you mean by "subrecord", but if you want serialisation/deserialisation that can cope with "upgrades" to the record format to contain more information while still being able to process old (now "partial") serialisations, then the safecopy library does just that.
You cannot leave some value in Haskell "uninitialized" (it would not be possible to "initialize" it later anyway, since Haskell is pure). If you want to provide "default" values for the fields, then you can make some "default" value for your record type, and then do a partial update on that default value, setting only the fields you care about. I don't know how you would implement read for this in a simple way, however.

Updating elements of multiple collections with dynamic functions

Setup:
I have several collections of various data structures witch represent the state of simulated objects in a virtual system. I also have a number of functions that transform (that is create a new copy of the object based on the the original and 0 or more parameters) these objects.
The goal is to allow a user to select some object to apply transformations to (within the rules of the simulation), apply those the functions to those objects and update the collections by replacing the old objects with the new ones.
I would like to be able to build up a function of this type by combining smaller transformations into larger ones. Then evaluate this combined function.
Questions:
How to I structure my program to make this possible?
What kind of combinator do I use to build up a transaction like this?
Ideas:
Put all the collections into one enormous structure and pass this structure around.
Use a state monad to accomplish basically the same thing
Use IORef (or one of its more potent cousins like MVar) and build up an IO action
Use a Functional Reactive Programing Framework
1 and 2 seem like they carry a lot of baggage around especially if I envision eventually moving some of the collections into a database. (Darn IO Monad)
3 seems to work well but starts to look a lot like recreating OOP. I'm also not sure at what level to use the IORef. (e.g IORef (Collection Obj) or Collection (IORef Obj) or data Obj {field::IORef(Type)} )
4 feels the most functional in style, but it also seems to create a lot of code complexity without much payoff in terms of expressiveness.
Example
I have a web store front. I maintain a collections of products with (among other things) the quantity in stock and a price. I also have a collection of users who have credit with the store.
A user comes along ands selects 3 products to buy and goes to check out using store credit. I need to create a new products collection that has the amount in stock for the 3 products reduced, create a new user collection with the users account debited.
This means I get the following:
checkout :: Cart -> ProductsCol -> UserCol -> (ProductsCol, UserCol)
But then life gets more complicated and I need to deal with taxes:
checkout :: Cart -> ProductsCol -> UserCol -> TaxCol
-> (ProductsCol, UserCol, TaxCol)
And then I need to be sure to add the order to the shipping queue:
checkout :: Cart
-> ProductsCol
-> UserCol
-> TaxCol
-> ShipList
-> (ProductsCol, UserCol, TaxCol, ShipList)
And so forth...
What I would like to write is something like
checkout = updateStockAmount <*> applyUserCredit <*> payTaxes <*> shipProducts
applyUserCredit = debitUser <*> creditBalanceSheet
but the type-checker would have go apoplectic on me. How do I structure this store such that the checkout or applyUserCredit functions remains modular and abstract? I cannot be the only one to have this problem, right?
Okay, let's break this down.
You have "update" functions with types like A -> A for various specific types A, which may be derived from partial application, that specify a new value of some type in terms of a previous value. Each such type A should be specific to what that function does, and it should be easy to change those types as the program develops.
You also have some sort of shared state, which presumably contains all the information used by any of the aforementioned update functions. Further, it should be possible to change what the state contains, without significantly impacting anything other than the functions acting directly on it.
Additionally, you want to be able to abstractly combine update functions, without compromising the above.
We can deduce a few necessary features of a straightforward design:
An intermediate layer will be necessary, between the full shared state and the specifics needed by each function, allowing pieces of the state to be projected out and replaced independently of the rest.
The types of the update functions themselves are by definition incompatible with no real shared structure, so to compose them you'll need to first combine each with the intermediate layer portion. This will give you updates acting on the entire state, which can then be composed in the obvious way.
The only operations needed on the shared state as a whole are to interface with the intermediate layer, and whatever may be necessary to maintain the changes made.
This breakdown allows each entire layer to be modular to a large extent; in particular, type classes can be defined to describe the necessary functionality, allowing any relevant instance to be swapped in.
In particular, this essentially unifies your ideas 2 and 3. There's an inherent monadic context of some sort here, and the type class interface suggested would allow multiple approaches, such as:
Make the shared state a record type, store it in a State monad, and use lenses to provide the interface layer.
Make the shared state a record type containing something like an STRef for each piece, and combine field selectors with ST monad update actions to provide the interface layer.
Make the shared state a collection of TChans, with separate threads to read/write them as appropriate to communicate asynchronously with an external data store.
Or any number of other variations.
You can store your state in a record, and use lenses to update pieces of state. This lets you write the individual state updating components as simple, focused functions that may be composed to build more complex checkout functions.
{-# LANGUAGE TemplateHaskell #-}
import Data.Lens.Template
import Data.Lens.Common
import Data.List (foldl')
import Data.Map ((!), Map, adjust, fromList)
type User = String
type Item = String
type Money = Int -- money in pennies
type Prices = Map Item Money
type Cart = (User, [(Item,Int)])
type ProductsCol = Map Item Int
type UserCol = Map User Money
data StoreState = Store { _stock :: ProductsCol
, _users :: UserCol
, msrp :: Prices }
deriving Show
makeLens ''StoreState
updateProducts :: Cart -> ProductsCol -> ProductsCol
updateProducts (_,c) = flip (foldl' destock) c
where destock p' (item,count) = adjust (subtract count) item p'
updateUsers :: Cart -> Prices -> UserCol -> UserCol
updateUsers (name,c) p = adjust (subtract (sum prices)) name
where prices = map (\(itemName, itemCount) -> (p ! itemName) * itemCount) c
checkout :: Cart -> StoreState -> StoreState
checkout c s = (users ^%= updateUsers c (msrp s))
. (stock ^%= updateProducts c)
$ s
test = checkout cart store
where cart = ("Bob", [("Apples", 2), ("Bananas", 6)])
store = Store initialStock initialUsers prices
initialStock = fromList
[("Apples", 20), ("Bananas", 10), ("Lambdas", 1000)]
initialUsers = fromList [("Bob", 20000), ("Mary", 40000)]
prices = fromList [("Apples", 100), ("Bananas", 50), ("Lambdas", 0)]

Resources