Updating elements of multiple collections with dynamic functions - haskell

Setup:
I have several collections of various data structures witch represent the state of simulated objects in a virtual system. I also have a number of functions that transform (that is create a new copy of the object based on the the original and 0 or more parameters) these objects.
The goal is to allow a user to select some object to apply transformations to (within the rules of the simulation), apply those the functions to those objects and update the collections by replacing the old objects with the new ones.
I would like to be able to build up a function of this type by combining smaller transformations into larger ones. Then evaluate this combined function.
Questions:
How to I structure my program to make this possible?
What kind of combinator do I use to build up a transaction like this?
Ideas:
Put all the collections into one enormous structure and pass this structure around.
Use a state monad to accomplish basically the same thing
Use IORef (or one of its more potent cousins like MVar) and build up an IO action
Use a Functional Reactive Programing Framework
1 and 2 seem like they carry a lot of baggage around especially if I envision eventually moving some of the collections into a database. (Darn IO Monad)
3 seems to work well but starts to look a lot like recreating OOP. I'm also not sure at what level to use the IORef. (e.g IORef (Collection Obj) or Collection (IORef Obj) or data Obj {field::IORef(Type)} )
4 feels the most functional in style, but it also seems to create a lot of code complexity without much payoff in terms of expressiveness.
Example
I have a web store front. I maintain a collections of products with (among other things) the quantity in stock and a price. I also have a collection of users who have credit with the store.
A user comes along ands selects 3 products to buy and goes to check out using store credit. I need to create a new products collection that has the amount in stock for the 3 products reduced, create a new user collection with the users account debited.
This means I get the following:
checkout :: Cart -> ProductsCol -> UserCol -> (ProductsCol, UserCol)
But then life gets more complicated and I need to deal with taxes:
checkout :: Cart -> ProductsCol -> UserCol -> TaxCol
-> (ProductsCol, UserCol, TaxCol)
And then I need to be sure to add the order to the shipping queue:
checkout :: Cart
-> ProductsCol
-> UserCol
-> TaxCol
-> ShipList
-> (ProductsCol, UserCol, TaxCol, ShipList)
And so forth...
What I would like to write is something like
checkout = updateStockAmount <*> applyUserCredit <*> payTaxes <*> shipProducts
applyUserCredit = debitUser <*> creditBalanceSheet
but the type-checker would have go apoplectic on me. How do I structure this store such that the checkout or applyUserCredit functions remains modular and abstract? I cannot be the only one to have this problem, right?

Okay, let's break this down.
You have "update" functions with types like A -> A for various specific types A, which may be derived from partial application, that specify a new value of some type in terms of a previous value. Each such type A should be specific to what that function does, and it should be easy to change those types as the program develops.
You also have some sort of shared state, which presumably contains all the information used by any of the aforementioned update functions. Further, it should be possible to change what the state contains, without significantly impacting anything other than the functions acting directly on it.
Additionally, you want to be able to abstractly combine update functions, without compromising the above.
We can deduce a few necessary features of a straightforward design:
An intermediate layer will be necessary, between the full shared state and the specifics needed by each function, allowing pieces of the state to be projected out and replaced independently of the rest.
The types of the update functions themselves are by definition incompatible with no real shared structure, so to compose them you'll need to first combine each with the intermediate layer portion. This will give you updates acting on the entire state, which can then be composed in the obvious way.
The only operations needed on the shared state as a whole are to interface with the intermediate layer, and whatever may be necessary to maintain the changes made.
This breakdown allows each entire layer to be modular to a large extent; in particular, type classes can be defined to describe the necessary functionality, allowing any relevant instance to be swapped in.
In particular, this essentially unifies your ideas 2 and 3. There's an inherent monadic context of some sort here, and the type class interface suggested would allow multiple approaches, such as:
Make the shared state a record type, store it in a State monad, and use lenses to provide the interface layer.
Make the shared state a record type containing something like an STRef for each piece, and combine field selectors with ST monad update actions to provide the interface layer.
Make the shared state a collection of TChans, with separate threads to read/write them as appropriate to communicate asynchronously with an external data store.
Or any number of other variations.

You can store your state in a record, and use lenses to update pieces of state. This lets you write the individual state updating components as simple, focused functions that may be composed to build more complex checkout functions.
{-# LANGUAGE TemplateHaskell #-}
import Data.Lens.Template
import Data.Lens.Common
import Data.List (foldl')
import Data.Map ((!), Map, adjust, fromList)
type User = String
type Item = String
type Money = Int -- money in pennies
type Prices = Map Item Money
type Cart = (User, [(Item,Int)])
type ProductsCol = Map Item Int
type UserCol = Map User Money
data StoreState = Store { _stock :: ProductsCol
, _users :: UserCol
, msrp :: Prices }
deriving Show
makeLens ''StoreState
updateProducts :: Cart -> ProductsCol -> ProductsCol
updateProducts (_,c) = flip (foldl' destock) c
where destock p' (item,count) = adjust (subtract count) item p'
updateUsers :: Cart -> Prices -> UserCol -> UserCol
updateUsers (name,c) p = adjust (subtract (sum prices)) name
where prices = map (\(itemName, itemCount) -> (p ! itemName) * itemCount) c
checkout :: Cart -> StoreState -> StoreState
checkout c s = (users ^%= updateUsers c (msrp s))
. (stock ^%= updateProducts c)
$ s
test = checkout cart store
where cart = ("Bob", [("Apples", 2), ("Bananas", 6)])
store = Store initialStock initialUsers prices
initialStock = fromList
[("Apples", 20), ("Bananas", 10), ("Lambdas", 1000)]
initialUsers = fromList [("Bob", 20000), ("Mary", 40000)]
prices = fromList [("Apples", 100), ("Bananas", 50), ("Lambdas", 0)]

Related

Haskell lifetime of memoized function bound to record instance

First of all, I'm a beginner in Haskell so be kind :)
Consider the following example:
{-# LANGUAGE RecordWildCards #-}
data Item = Item {itemPrice :: Float, itemQuantity :: Float} deriving (Show, Eq)
data Order = Order {orderItems :: [Item]} deriving (Show, Eq)
itemTotal :: Item -> Float
itemTotal Item{..} = itemPrice * itemQuantity
orderTotal :: Order -> Float
orderTotal = sum . map itemTotal . orderItems
Is it possible to memoize the function orderTotal so it only execute once per "instance" of an Order record and, this is the tricky part, the cache entry bound to this instance is eliminated once this order is garbage collected? In other words, I don't want to have a cache that keeps growing forever.
Edit after comments:
Indeed, in this simple example the overhead of memoization probably doesn't pay off. But you can imagine a scenario where we have a complex graph of values (e.g. order, order items, products, client...) and lots of derived properties that operate on these values (like the orderTotal above). If we create a field for the order total, instead of using a function to compute it, we have to be very careful to not end up with an inconsistent order.
Wouldn't be nice if we can express these data interdependencies declaratively (using functions instead of fields) and delegate the job to optimize these calculations to the compiler? I believe that in a pure language like Haskell this would be possible, although I lack the knowledge to do that.
To illustrate what I'm trying to say, look at this code (in Python):
def memoized(function):
function_name = function.__name__
def wrapped(self):
try:
result = self._cache[function_name]
except KeyError:
result = self._cache[function_name] = function(self)
return result
return property(wrapped)
class Item:
def __init__(self, price, quantity):
self._price = price
self._quantity = quantity
self._cache = {}
#property
def price(self):
return self._price
#property
def quantity(self):
return self._quantity
#memoized
def total(self):
return self.price * self.quantity
The class Item is immutable (kind of), so we know that each derived property can be computed only once per instance. That's exactly what the memoized function does. Besides that, the cache lives inside the instance itself (self._cache), so it will be garbage collected with it.
What I'm looking for is to achieve a similar thing in Haskell.
A relatively simple way of memoizing a calculation on a value of a particular type is to bring the calculated result into the data type and use a smart constructor. That is, write the Order data type as:
data Order = Order
{ orderItems :: [Item]
, orderTotal :: Float
} deriving (Show, Eq)
Note that the orderTotal field replaces your function of the same name. Then, construct orders using the smart constructor:
order :: [Item] -> Order
order itms = Order itms (sum . map itemTotal $ itms)
Because of lazy evaluation, the orderTotal field will be calculated only the first time it's needed with the value cached thereafter. When the Order is garbage collected, obviously the orderTotal will be garbage collected at the same time.
Some people would pack this into a module and export only the smart constructor order instead of the usual constructor Order to ensure that an order with an inconsistent orderTotal could never be created. I worry about these people. How do they get through their daily lives knowing that they might double-cross themselves at any moment? Anyway, it's an available option for the truly paranoid.

Checking Haskell typeclasses in a function

If I wanted to perform a search on a problem space and I wanted to keep track of different states a node has already visited, I several options to do it depending on the constraints of those states. However; is there a way I can dispatch a function or another depending on the constraint of the states the user is using as input? For example, if I had:
data Node a = Node { state :: a, cost :: Double }
And I wanted to perform a search on a Problem a, is there a way I could check if a is Eq, Ord or Hashable and then call a different kind of search? In pseudocode, something like:
search :: Eq a => Problem a -> Node a
search problem#(... initial ...) -- Where initial is a State of type a
| (Hashable initial) = searchHash problem
| (Ord initial) = searchOrd problem
| otherwise = searchEq problem
I am aware I could just let the user choose one search or another depending on their own use; but being able to do something like that could be very handy for me since search is not really one of the user endpoints as such (one example could be a function bfs, that calls search with some parameters to make it behave like a Breadth-First Search).
No, you can't do this. However, you could make your own class:
class Memorable a where
type Memory a
remember :: a -> Memory a -> Memory a
known :: a -> Memory a -> Bool
Instantiate this class for a few base types, and add some default implementations for folks that want to add new instances, e.g.
-- suitable implementations of Memorable methods and type families for hashable things
type HashMemory = Data.HashSet.HashSet
hashRemember = Data.HashSet.insert
hashKnown = Data.HashSet.member
-- suitable implementations for orderable things
type OrdMemory = Data.Set.Set
ordRemember = Data.Set.insert
ordKnown = Data.Set.member
-- suitable implementations for merely equatable things
type EqMemory = Prelude.[]
eqRemember = (Prelude.:)
eqKnown = Prelude.elem

"Strategy Pattern" in Haskell

In the OO world, I have a class (let's call it "Suggestor") that implement something approaching a "Strategy Pattern" to provide differing implementations of an algorithm at runtime. As an exercise in learning Haskell, I want to rewrite this.
The actual use-case is quite complex, so I'll boil down a simpler example.
Let's say I have a class Suggester that's takes a list of rules, and applies each rule as a filter to a list of database results.
Each rule has three phases "Build Query", "Post Query Filter", and "Scorer". We essentially end up with an interface meeting the following
buildQuery :: Query -> Query
postQueryFilter :: [Record] -> [Record]
scorer :: [Record] -> [(Record, Int)]
Suggestor needs to take a list of rules that match this interface - dynamically at run time - and then execute them in sequence. buildQuery() must be run across all rules first, followed by postQueryFilter, then scorer. (i.e. I can't just compose the functions for one rule into a single function).
in the scala I simply do
// No state, so a singleton `object` instead of a class is ok
object Rule1 extends Rule {
def buildQuery ...
def postQueryFilter ...
def scorer ...
}
object Rule2 extends Rule { .... }
And can then initialise the service by passing the relevant rules through (Defined at runtime based on user input).
val suggester = new Suggester( List(Rule1, Rule2, Rule3) );
If the rules were a single function, this would be simple - just pass a list of functions. However since each rule is actually three functions, I need to group them together somehow, so I have multiple implementations meeting an interface.
My first thought was type classes, however these don't quite seem to meet my needs - they expect a type variable, and enforce that each of my methods must use it - which they don't.
No parameters for class `Rule`
My second thought was just to place each one in a haskell module, but as modules aren't "First Class" I can't pass them around directly (And they of course don't enforce an interface).
Thirdly I tried creating a record type to encapsulate the functions
data Rule = Rule { buildQuery :: Query -> Query, .... etc }
And then defined an instance of "Rule" for each. When this is done in each module it encapsulates nicely and works fine, but felt like a hack and I'm not sure if this is an appropriate use of records in haskell?
tl;dr - How do I encapsulate a group of functions together such that I can pass them around as an instance of something matching an interface, but don't actually use a type variable.
Or am I completely coming at this from the wrong mindset?
In my opinion your solution isn't the "hack", but the "strategy pattern" in OO languages: It is only needed to work around the limitations of a language, especially in case of missing, unsafe or inconvenient Lambdas/Closures/Function Pointers etc, so you need a kind of "wrapper" for it to make it "digestible" for that language.
A "strategy" is basically a function (may be with some additional data attached). But if a function is truly a first class member of the language - as in Haskell, there is no need to hide it in the object closet.
Just generate a single Rule type as you did
data Rule = Rule
{ buildQuery :: Query -> Query
, postQueryFilter :: [Record] -> [Record]
, scorer :: [Record] -> [(Record, Int)]
}
And build a general application method—I'm assuming such a generic thing exists given that these Rules are designed to operate independently over SQL results
applyRule :: Rule -> Results -> Results
Finally, you can implement as many rules as you like wherever you want: just import the Rule type and create an appropriate value. There's no a priori reason to give each different rule its own type as you might in an OO setting.
easyRule :: Rule
easyRule = Rule id id (\recs -> zip recs [1..])
upsideDownRule :: Rule
upsideDownRule = Rule reverse reverse (\recs -> zip recs [-1, -2..])
Then if you have a list of Rules you can apply them all in order
applyRules :: [Rule] -> Results -> Results
applyRules [] res = res
applyRules (r:rs) res = applyRules rs (applyRule r res)
which is actually just a foldr in disguise
applyRules rs res = foldr applyRule res rs
foo :: Results -> Results
foo = applyRules [Some.Module.easyRule, Some.Other.Module.upsideDownRule]

Data value dependencies, updates and memoisation

I'm sorry this problem description is so abstract: its for my job, and for commercial confidentiality reasons I can't give the real-world problem, just an abstraction.
I've got an application that receives messages containing key-value pairs. The keys are from a defined set of keywords, and each keyword has a fixed data type. So if "Foo" is an Integer and "Bar" is a date you might get a message like:
Foo: 234
Bar: 24 September 2011
A message may have any subset of keys in it. The number of keys is fairly large (several dozen). But lets stick with Foo and Bar for now.
Obviously there is a record like this corresponding to the messages:
data MyRecord {
foo :: Maybe Integer
bar :: Maybe UTCTime
-- ... and so on for several dozen fields.
}
The record uses "Maybe" types because that field may not have been received yet.
I also have many derived values that I need to compute from the current values (if they exist). For instance I want to have
baz :: MyRecord -> Maybe String
baz r = do -- Maybe monad
f <- foo r
b <- bar r
return $ show f ++ " " ++ show b
Some of these functions are slow, so I don't want to repeat them unnecessarily. I could recompute baz for each new message and memo it in the original structure, but if a message leaves the foo and bar fields unchanged then that is wasted CPU time. Conversely I could recompute baz every time I want it, but again that would waste CPU time if the underlying arguments have not changed since last time.
What I want is some kind of smart memoisation or push-based recomputation that only recomputes baz when the arguments change. I could detect this manually by noting that baz depends only on foo and bar, and so only recomputing it on messages that change those values, but for complicated functions that is error-prone.
An added wrinkle is that some of these functions may have multiple strategies. For instance you might have a value that can be computed from either Foo or Bar using 'mplus'.
Does anyone know of an existing solution to this? If not, how should I go about it?
I'll assume that you have one "state" record and these message all involve updating it as well as setting it. So if Foo is 12, it may later be 23, and therefore the output of baz would change. If any of this is not the case, then the answer becomes pretty trivial.
Let's start with the "core" of baz -- a function not on a record, but the values you want.
baz :: Int -> Int -> String
Now let's transform it:
data Cached a b = Cached (Maybe (a,b)) (a -> b)
getCached :: Eq a => Cached a b -> a -> (b,Cached a b)
getCached c#(Cached (Just (arg,res)) f) x | x == arg = (res,c)
getCached (Cached _ f) x = let ans = f x in (ans,Cached (Just (x,ans) f)
bazC :: Cached (Int,Int) String
bazC = Cached Nothing (uncurry baz)
Now whenever you would use a normal function, you use a cache-transformed function instead, substituting the resulting cache-transformed function back into your record. This is essentially a manual memotable of size one.
For the basic case you describe, this should be fine.
A fancier and more generalized solution involving a dynamic graph of dependencies goes under the name "incremental computation" but I've seen research papers for it more than serious production implementations. You can take a look at these for starters, and follow the reference trail forward:
http://www.carlssonia.org/ogi/Adaptive/
http://www.andres-loeh.de/Incrementalization/paper_final.pdf
Incremental computation is actually also very related to functional reactive programming, so you can take a look at conal's papers on that, or play with Heinrich Apfelmus' reactive-banana library: http://www.haskell.org/haskellwiki/Reactive-banana
In imperative languages, take a look at trellis in python: http://pypi.python.org/pypi/Trellis or Cells in lisp: http://common-lisp.net/project/cells/
You can build a stateful graph that corresponds to computations you need to do. When new values appear you push these into the graph and recompute, updating the graph until you reach the outputs. (Or you can store the value at the input and recompute on demand.) This is a very stateful solution but it works.
Are you perhaps creating market data, like yield curves, from live inputs of rates etc.?
What I want is some kind of smart memoisation or push-based recomputation that only recomputes baz when the arguments change.
It sounds to me like you want a variable that is sort of immutable, but allows a one-time mutation from "nothing computed yet" to "computed". Well, you're in luck: this is exactly what lazy evaluation gives you! So my proposed solution is quite simple: just extend your record with fields for each of the things you want to compute. Here's an example of such a thing, where the CPU-intensive task we're doing is breaking some encryption scheme:
data Foo = Foo
{ ciphertext :: String
, plaintext :: String
}
-- a smart constructor for Foo's
foo c = Foo { ciphertext = c, plaintext = crack c }
The point here is that calls to foo have expenses like this:
If you never ask for the plaintext of the result, it's cheap.
On the first call to plaintext, the CPU churns a long time.
On subsequent calls to plaintext, the previously computed answer is returned immediately.

Haskell: Confusion with own data types. Record syntax and unique fields

I just uncovered this confusion and would like a confirmation that it is what it is. Unless, of course, I am just missing something.
Say, I have these data declarations:
data VmInfo = VmInfo {name, index, id :: String} deriving (Show)
data HostInfo = HostInfo {name, index, id :: String} deriving (Show)
vm = VmInfo "vm1" "01" "74653"
host = HostInfo "host1" "02" "98732"
What I always thought and what seems to be so natural and logical is this:
vmName = vm.name
hostName = host.name
But this, obviously, does not work. I got this.
Questions
So my questions are.
When I create a data type with record syntax, do I have to make sure that all the fields have unique names? If yes - why?
Is there a clean way or something similar to a "scope resolution operator", like :: or ., etc., so that Haskell distinguishes which data type the name (or any other none unique fields) belongs to and returns the correct result?
What is the correct way to deal with this if I have several declarations with the same field names?
As a side note.
In general, I need to return data types similar to the above example.
First I returned them as tuples (seemed to me the correct way at the time). But tuples are hard to work with as it is impossible to extract individual parts of a complex type as easy as with the lists using "!!". So next thing I thought of the dictionaries/hashes.
When I tried using dictionaries I thought what is the point of having own data types then?
Playing/learning data types I encountered the fact that led me to the above question.
So it looks like it is easier for me to use dictionaries instead of own data types as I can use the same fields for different objects.
Can you please elaborate on this and tell me how it is done in real world?
Haskell record syntax is a bit of a hack, but the record name emerges as a function, and that function has to have a unique type. So you can share record-field names among constructors of a single datatype but not among distinct datatypes.
What is the correct way to deal with this if I have several declarations with the same field names?
You can't. You have to use distinct field names. If you want an overloaded name to select from a record, you can try using a type class. But basically, field names in Haskell don't work the way they do in say, C or Pascal. Calling it "record syntax" might have been a mistake.
But tuples are hard to work with as it is impossible to extract individual parts of a complex type
Actually, this can be quite easy using pattern matching. Example
smallId :: VmInfo -> Bool
smallId (VmInfo { vmId = n }) = n < 10
As to how this is done in the "real world", Haskell programmers tend to rely heavily on knowing what type each field is at compile time. If you want the type of a field to vary, a Haskell programmer introduces a type parameter to carry varying information. Example
data VmInfo a = VmInfo { vmId :: Int, vmName :: String, vmInfo :: a }
Now you can have VmInfo String, VmInfo Dictionary, VmInfo Node, or whatever you want.
Summary: each field name must belong to a unique type, and experienced Haskell programmers work with the static type system instead of trying to work around it. And you definitely want to learn about pattern matching.
There are more reasons why this doesn't work: lowercase typenames and data constructors, OO-language-style member access with .. In Haskell, those member access functions actually are free functions, i.e. vmName = name vm rather than vmName = vm.name, that's why they can't have same names in different data types.
If you really want functions that can operate on both VmInfo and HostInfo objects, you need a type class, such as
class MachineInfo m where
name :: m -> String
index :: m -> String -- why String anyway? Shouldn't this be an Int?
id :: m -> String
and make instances
instance MachineInfo VmInfo where
name (VmInfo vmName _ _) = vmName
index (VmInfo _ vmIndex _) = vmIndex
...
instance MachineInfo HostInfo where
...
Then name machine will work if machine is a VmInfo as well as if it's a HostInfo.
Currently, the named fields are top-level functions, so in one scope there can only be one function with that name. There are plans to create a new record system that would allow having fields of the same name in different record types in the same scope, but that's still in the design phase.
For the time being, you can make do with unique field names, or define each type in its own module and use the module-qualified name.
Lenses can help take some of the pain out of dealing with getting and setting data structure elements, especially when they get nested. They give you something that looks, if you squint, kind of like object-oriented accessors.
Learn more about the Lens family of types and functions here: http://lens.github.io/tutorial.html
As an example for what they look like, this is a snippet from the Pong example found at the above github page:
data Pong = Pong
{ _ballPos :: Point
, _ballSpeed :: Vector
, _paddle1 :: Float
, _paddle2 :: Float
, _score :: (Int, Int)
, _vectors :: [Vector]
-- Since gloss doesn't cover this, we store the set of pressed keys
, _keys :: Set Key
}
-- Some nice lenses to go with it
makeLenses ''Pong
That makes lenses to access the members without the underscores via some TemplateHaskell magic.
Later on, there's an example of using them:
-- Update the paddles
updatePaddles :: Float -> State Pong ()
updatePaddles time = do
p <- get
let paddleMovement = time * paddleSpeed
keyPressed key = p^.keys.contains (SpecialKey key)
-- Update the player's paddle based on keys
when (keyPressed KeyUp) $ paddle1 += paddleMovement
when (keyPressed KeyDown) $ paddle1 -= paddleMovement
-- Calculate the optimal position
let optimal = hitPos (p^.ballPos) (p^.ballSpeed)
acc = accuracy p
target = optimal * acc + (p^.ballPos._y) * (1 - acc)
dist = target - p^.paddle2
-- Move the CPU's paddle towards this optimal position as needed
when (abs dist > paddleHeight/3) $
case compare dist 0 of
GT -> paddle2 += paddleMovement
LT -> paddle2 -= paddleMovement
_ -> return ()
-- Make sure both paddles don't leave the playing area
paddle1 %= clamp (paddleHeight/2)
paddle2 %= clamp (paddleHeight/2)
I recommend checking out the whole program in its original location and looking through the rest of the lens material; it's very interesting even if you don't end up using them.
Yes, you cannot have two records in the same module with the same field names. The field names are added to the module's scope as functions, so you would use name vm rather than vm.name. You could have two records with the same field names in different modules and import one of the modules qualified as some name, but this is probably awkward to work with.
For a case like this, you should probably just use a normal algebraic data type:
data VMInfo = VMInfo String String String
(Note that the VMInfo has to be capitalized.)
Now you can access the fields of VMInfo by pattern matching:
myFunc (VMInfo name index id) = ... -- name, index and id are bound here

Resources