First of all, I'm a beginner in Haskell so be kind :)
Consider the following example:
{-# LANGUAGE RecordWildCards #-}
data Item = Item {itemPrice :: Float, itemQuantity :: Float} deriving (Show, Eq)
data Order = Order {orderItems :: [Item]} deriving (Show, Eq)
itemTotal :: Item -> Float
itemTotal Item{..} = itemPrice * itemQuantity
orderTotal :: Order -> Float
orderTotal = sum . map itemTotal . orderItems
Is it possible to memoize the function orderTotal so it only execute once per "instance" of an Order record and, this is the tricky part, the cache entry bound to this instance is eliminated once this order is garbage collected? In other words, I don't want to have a cache that keeps growing forever.
Edit after comments:
Indeed, in this simple example the overhead of memoization probably doesn't pay off. But you can imagine a scenario where we have a complex graph of values (e.g. order, order items, products, client...) and lots of derived properties that operate on these values (like the orderTotal above). If we create a field for the order total, instead of using a function to compute it, we have to be very careful to not end up with an inconsistent order.
Wouldn't be nice if we can express these data interdependencies declaratively (using functions instead of fields) and delegate the job to optimize these calculations to the compiler? I believe that in a pure language like Haskell this would be possible, although I lack the knowledge to do that.
To illustrate what I'm trying to say, look at this code (in Python):
def memoized(function):
function_name = function.__name__
def wrapped(self):
try:
result = self._cache[function_name]
except KeyError:
result = self._cache[function_name] = function(self)
return result
return property(wrapped)
class Item:
def __init__(self, price, quantity):
self._price = price
self._quantity = quantity
self._cache = {}
#property
def price(self):
return self._price
#property
def quantity(self):
return self._quantity
#memoized
def total(self):
return self.price * self.quantity
The class Item is immutable (kind of), so we know that each derived property can be computed only once per instance. That's exactly what the memoized function does. Besides that, the cache lives inside the instance itself (self._cache), so it will be garbage collected with it.
What I'm looking for is to achieve a similar thing in Haskell.
A relatively simple way of memoizing a calculation on a value of a particular type is to bring the calculated result into the data type and use a smart constructor. That is, write the Order data type as:
data Order = Order
{ orderItems :: [Item]
, orderTotal :: Float
} deriving (Show, Eq)
Note that the orderTotal field replaces your function of the same name. Then, construct orders using the smart constructor:
order :: [Item] -> Order
order itms = Order itms (sum . map itemTotal $ itms)
Because of lazy evaluation, the orderTotal field will be calculated only the first time it's needed with the value cached thereafter. When the Order is garbage collected, obviously the orderTotal will be garbage collected at the same time.
Some people would pack this into a module and export only the smart constructor order instead of the usual constructor Order to ensure that an order with an inconsistent orderTotal could never be created. I worry about these people. How do they get through their daily lives knowing that they might double-cross themselves at any moment? Anyway, it's an available option for the truly paranoid.
Related
Haskell enables one to construct algebraic data types using type constructors and data constructors. For example,
data Circle = Circle Float Float Float
and we are told this data constructor (Circle on right) is a function that constructs a circle when give data, e.g. x, y, radius.
Circle :: Float -> Float -> Float -> Circle
My questions are:
What is actually constructed by this function, specifically?
Can we define the constructor function?
I've seen Smart Constructors but they just seem to be extra functions that eventually call the regular constructors.
Coming from an OO background, constructors, of course, have imperative specifications. In Haskell, they seem to be system-defined.
In Haskell, without considering the underlying implementation, a data constructor creates a value, essentially by fiat. “ ‘Let there be a Circle’, said the programmer, and there was a Circle.” Asking what Circle 1 2 3 creates is akin to asking what the literal 1 creates in Python or Java.
A nullary constructor is closer to what you usually think of as a literal. The Boolean type is literally defined as
data Boolean = True | False
where True and False are data constructors, not literals defined by Haskell grammar.
The data type is also the definition of the constructor; as there isn't really anything to a value beyond the constructor name and its arguments, simply stating it is the definition. You create a value of type Circle by calling the data constructor Circle with 3 arguments, and that's it.
A so-called "smart constructor" is just a function that calls a data constructor, with perhaps some other logic to restrict which instances can be created. For example, consider a simple wrapper around Integer:
newtype PosInteger = PosInt Integer
The constructor is PosInt; a smart constructor might look like
mkPosInt :: Integer -> PosInteger
mkPosInt n | n > 0 = PosInt n
| otherwise = error "Argument must be positive"
With mkPosInt, there is no way to create a PosInteger value with a non-positive argument, because only positive arguments actually call the data constructor. A smart constructor makes the most sense when it, and not the data constructor, is exported by a module, so that a typical user cannot create arbitrary instances (because the data constructor does not exist outside the module).
Good question. As you know, given the definition:
data Foo = A | B Int
this defines a type with a (nullary) type constructor Foo and two data constructors, A and B.
Each of these data constructors, when fully applied (to no arguments in the case of A and to a single Int argument in the case of B) constructs a value of type Foo. So, when I write:
a :: Foo
a = A
b :: Foo
b = B 10
the names a and b are bound to two values of type Foo.
So, data constructors for type Foo construct values of type Foo.
What are values of type Foo? Well, first of all, they are different from values of any other type. Second, they are wholly defined by their data constructors. There is a distinct value of type Foo, different from all other values of Foo, for each combination of a data constructor with a set of distinct arguments passed to that data constructor. That is, two values of type Foo are identical if and only if they were constructed with the same data constructor given identical sets of arguments. ("Identical" here means something different from "equality", which may not necessarily be defined for a given type Foo, but let's not get into that.)
That's also what makes data constructors different from functions in Haskell. If I have a function:
bar :: Int -> Bool
It's possible that bar 1 and bar 2 might be exactly the same value. For example, if bar is defined by:
bar n = n > 0
then it's obvious that bar 1 and bar 2 (and bar 3) are identically True. Whether the value of bar is the same for different values of its arguments will depend on the function definition.
In contrast, if Bar is a constructor:
data BarType = Bar Int
then it's never going to be the case that Bar 1 and Bar 2 are the same value. By definition, they will be different values (of type BarType).
By the way, the idea that constructors are just a special kind of function is a common viewpoint. I personally think this is inaccurate and causes confusion. While it's true that constructors can often be used as if they are functions (specifically that they behave very much like functions when used in expressions), I don't think this view stands up under much scrutiny -- constructors are represented differently in the surface syntax of the language (with capitalized identifiers), can be used in contexts (like pattern matching) where functions cannot be used, are represented differently in compiled code, etc.
So, when you ask "can we define the constructor function", the answer is "no", because there is no constructor function. Instead, a constructor like A or B or Bar or Circle is what it is -- something different from a function (that sometimes behaves like a function with some special additional properties) which is capable of constructing a value of whatever type the data constructor belongs to.
This makes Haskell constructors very different from OO constructors, but that's not surprising since Haskell values are very different from OO objects. In an OO language, you can typically provide a constructor function that does some processing in building the object, so in Python you might write:
class Bar:
def __init__(self, n):
self.value = n > 0
and then after:
bar1 = Bar(1)
bar2 = Bar(2)
we have two distinct objects bar1 and bar2 (which would satify bar1 != bar2) that have been configured with the same field values and are in some sense "equal". This is sort of halfway between the situation above with bar 1 and bar 2 creating two identical values (namely True) and the situation with Bar 1 and Bar 2 creating two distinct values that, by definition, can't possibly be the "same" in any sense.
You can never have this situation with Haskell constructors. Instead of thinking of a Haskell constructor as running some underlying function to "construct" an object which might involve some cool processing and deriving of field values, you should instead think of a Haskell constructor as a passive tag attached to a value (which may also contain zero or more other values, depending on the arity of the constructor).
So, in your example, Circle 10 20 5 doesn't "construct" an object of type Circle by running some function. It directly creates a tagged object that, in memory, will look something like:
<Circle tag>
<Float value 10>
<Float value 20>
<Float value 5>
(or you can at least pretend that's what it looks like in memory).
The closest you can come to OO constructors in Haskell is using smart constructors. As you note, eventually a smart constructor just calls a regular constructor, because that's the only way to create a value of a given type. No matter what kind of bizarre smart constructor you build to create a Circle, the value it constructs will need to look like:
<Circle tag>
<some Float value>
<another Float value>
<a final Float value>
which you'll need to construct with a plain old Circle constructor call. There's nothing else the smart constructor could return that would still be a Circle. That's just how Haskell works.
Does that help?
I’m going to answer this in a somewhat roundabout way, with an example that I hope illustrates my point, which is that Haskell decouples several distinct ideas that are coupled in OOP under the concept of a “class”. Understanding this will help you translate your experience from OOP into Haskell with less difficulty. The example in OOP pseudocode:
class Person {
private int id;
private String name;
public Person(int id, String name) {
if (id == 0)
throw new InvalidIdException();
if (name == "")
throw new InvalidNameException();
this.name = name;
this.id = id;
}
public int getId() { return this.id; }
public String getName() { return this.name; }
public void setName(String name) { this.name = name; }
}
In Haskell:
module Person
( Person
, mkPerson
, getId
, getName
, setName
) where
data Person = Person
{ personId :: Int
, personName :: String
}
mkPerson :: Int -> String -> Either String Person
mkPerson id name
| id == 0 = Left "invalid id"
| name == "" = Left "invalid name"
| otherwise = Right (Person id name)
getId :: Person -> Int
getId = personId
getName :: Person -> String
getName = personName
setName :: String -> Person -> Either String Person
setName name person = mkPerson (personId person) name
Notice:
The Person class has been translated to a module which happens to export a data type by the same name—types (for domain representation and invariants) are decoupled from modules (for namespacing and code organisation).
The fields id and name, which are specified as private in the class definition, are translated to ordinary (public) fields on the data definition, since in Haskell they’re made private by omitting them from the export list of the Person module—definitions and visibility are decoupled.
The constructor has been translated into two parts: one (the Person data constructor) that simply initialises the fields, and another (mkPerson) that performs validation—allocation & initialisation and validation are decoupled. Since the Person type is exported, but its constructor is not, this is the only way for clients to construct a Person—it’s an “abstract data type”.
The public interface has been translated to functions that are exported by the Person module, and the setName function that previously mutated the Person object has become a function that returns a new instance of the Person data type that happens to share the old ID. The OOP code has a bug: it should include a check in setName for the name != "" invariant; the Haskell code can avoid this by using the mkPerson smart constructor to ensure that all Person values are valid by construction. So state transitions and validation are also decoupled—you only need to check invariants when constructing a value, because it can’t change thereafter.
So as for your actual questions:
What is actually constructed by this function, specifically?
A constructor of a data type allocates space for the tag and fields of a value, sets the tag to which constructor was used to create the value, and initialises the fields to the arguments of the constructor. You can’t override it because the process is completely mechanical and there’s no reason (in normal safe code) to do so. It’s an internal detail of the language and runtime.
Can we define the constructor function?
No—if you want to perform additional validation to enforce invariants, you should use a “smart constructor” function which calls the lower-level data constructor. Because Haskell values are immutable by default, values can be made correct by construction; that is, when you don’t have mutation, you don’t need to enforce that all state transitions are correct, only that all states themselves are constructed correctly. And often you can arrange your types so that smart constructors aren’t even necessary.
The only thing you can change about the generated data constructor “function” is making its type signature more restrictive using GADTs, to help enforce more invariants at compile-time. And as a side note, GADTs also let you do existential quantification, which lets you carry around encapsulated/type-erased information at runtime, exactly like an OOP vtable—so this is another thing that’s decoupled in Haskell but coupled in typical OOP languages.
Long story short (too late), you can do all the same things, you just arrange them differently, because Haskell provides the various features of OOP classes under separate orthogonal language features.
I am attempting to create a function in Haskell returning the Resp type illustrated below in a strange mix between BNF and Haskell types.
elem ::= String | (String, String, Resp)
Resp ::= [elem]
My question is (a) how to define this type in Haskell, and (b) if there is a way of doing so without being forced to use custom constructors, e.g., Node, rather using only tuples and arrays.
You said that "the variety of keywords (data, type, newtype) has been confusing for me". Here's a quick primer on the data construction keywords in Haskell.
Data
The canonical way to create a new type is with the data keyword. A general type in Haskell is a union of product types, each of which is tagged with a constructor. For example, an Employee might be a line worker (with a name and a salary) or a manager (with a name, salary and a list of reports).
We use the String type to represent an employee's name, and the Int type to represent a salaray. A list of reports is just a list of Employees.
data Employee = Worker String Int
| Manager String Int [Employee]
Type
The type keyword is used to create type synonyms, i.e. alternate names for the same type. This is typically used to make the source more immediately understandable. For example, we could declare a type Name for employee names (which is really just a String) and Salary for salaries (which are just Ints), and Reports for a list of reports.
type Name = String
type Salary = Int
type Reports = [Employee]
data Employee = Worker Name Salary
| Manager Name Salary Reports
Newtype
The newtype keyword is similar to the type keyword, but it adds an extra dash of type safety. One problem with the previous block of code is that, although a worker is a combination of a Name and a Salary, there is nothing to stop you using any old String in the Name field (for example, an address). The compiler doesn't distinguish between Names and plain old Strings, which introduces a class of potential bugs.
With the newtype keyword we can make the compiler enforce that the only Strings that can be used in a Name field are the ones explicitly tagged as Names
newtype Name = Name String
newtype Salary = Salary Int
newtype Reports = Reports [Employee]
data Employee = Worker Name Salary
| Manager Name Salary Reports
Now if we tried to enter a String in the Name field without explicitly tagging it, we get a type error
>>> let kate = Worker (Name "Kate") (Salary 50000) -- this is ok
>>> let fred = Worker "18 Tennyson Av." (Salary 40000) -- this will fail
<interactive>:10:19:
Couldn't match expected type `Name' with actual type `[Char]'
In the first argument of `Worker', namely `"18 Tennyson Av."'
In the expression: Worker "18 Tennyson Av." (Salary 40000)
In an equation for `fred':
fred = Worker "18 Tennyson Av." (Salary 40000)
What's great about this is that because the compiler knows that a Name is really just a String, it optimizes away the extra constructor, so this is just as efficient as using a type declaration -- the extra type safety comes "for free". This requires an important restriction -- a newtype has exactly one constructor with exactly one value. Otherwise the compiler wouldn't know which constructor or value was the correct synonym!
One disadvantage of using a newtype declaration is that now a Salary is no longer just an Int, you can't directly add them together. For example
>>> let kate'sSalary = Salary 50000
>>> let fred'sSalary = Salary 40000
>>> kate'sSalary + fred'sSalary
<interactive>:14:14:
No instance for (Num Salary)
arising from a use of `+'
Possible fix: add an instance declaration for (Num Salary)
In the expression: kate'sSalary + fred'sSalary
In an equation for `it': it = kate'sSalary + fred'sSalary
The somewhat complicated error message is telling you that a Salary isn't a numeric type, so you can't add them together (or at least, you haven't told the compiler how to add them together). One option would be to define a function that gets the underlying Int from the Salary
getSalary :: Salary -> Int
getSalary (Salary sal) = sal
but in fact Haskell will write these for you if you use record syntax when declaring your newtypes
data Salary = Salary { getSalary :: Int }
Now you can write
>>> getSalary kate'sSalary + getSalary fred'sSalary
90000
Part 1:
data Elem = El String | Node String String Resp
type Resp = [Elem]
Part 2: Well... kinda. The unsatisfying answer is: You shouldn't want to because doing so is less type safe. The more direct answer is Elem needs it's own constructor but Resp is easily defined as a type synonym as above. However, I would recommend
newtype Resp = Resp { getElems :: [Elem] }
so that you can't mix up some random list of Elems with a Resp. This also gives you the function getElems so you don't have to do as much pattern matching on a single constructor. The newtype basically let's Haskell know that it should get rid of the overhead of the constructor during runtime so there's no extra indirection which is nice.
I just uncovered this confusion and would like a confirmation that it is what it is. Unless, of course, I am just missing something.
Say, I have these data declarations:
data VmInfo = VmInfo {name, index, id :: String} deriving (Show)
data HostInfo = HostInfo {name, index, id :: String} deriving (Show)
vm = VmInfo "vm1" "01" "74653"
host = HostInfo "host1" "02" "98732"
What I always thought and what seems to be so natural and logical is this:
vmName = vm.name
hostName = host.name
But this, obviously, does not work. I got this.
Questions
So my questions are.
When I create a data type with record syntax, do I have to make sure that all the fields have unique names? If yes - why?
Is there a clean way or something similar to a "scope resolution operator", like :: or ., etc., so that Haskell distinguishes which data type the name (or any other none unique fields) belongs to and returns the correct result?
What is the correct way to deal with this if I have several declarations with the same field names?
As a side note.
In general, I need to return data types similar to the above example.
First I returned them as tuples (seemed to me the correct way at the time). But tuples are hard to work with as it is impossible to extract individual parts of a complex type as easy as with the lists using "!!". So next thing I thought of the dictionaries/hashes.
When I tried using dictionaries I thought what is the point of having own data types then?
Playing/learning data types I encountered the fact that led me to the above question.
So it looks like it is easier for me to use dictionaries instead of own data types as I can use the same fields for different objects.
Can you please elaborate on this and tell me how it is done in real world?
Haskell record syntax is a bit of a hack, but the record name emerges as a function, and that function has to have a unique type. So you can share record-field names among constructors of a single datatype but not among distinct datatypes.
What is the correct way to deal with this if I have several declarations with the same field names?
You can't. You have to use distinct field names. If you want an overloaded name to select from a record, you can try using a type class. But basically, field names in Haskell don't work the way they do in say, C or Pascal. Calling it "record syntax" might have been a mistake.
But tuples are hard to work with as it is impossible to extract individual parts of a complex type
Actually, this can be quite easy using pattern matching. Example
smallId :: VmInfo -> Bool
smallId (VmInfo { vmId = n }) = n < 10
As to how this is done in the "real world", Haskell programmers tend to rely heavily on knowing what type each field is at compile time. If you want the type of a field to vary, a Haskell programmer introduces a type parameter to carry varying information. Example
data VmInfo a = VmInfo { vmId :: Int, vmName :: String, vmInfo :: a }
Now you can have VmInfo String, VmInfo Dictionary, VmInfo Node, or whatever you want.
Summary: each field name must belong to a unique type, and experienced Haskell programmers work with the static type system instead of trying to work around it. And you definitely want to learn about pattern matching.
There are more reasons why this doesn't work: lowercase typenames and data constructors, OO-language-style member access with .. In Haskell, those member access functions actually are free functions, i.e. vmName = name vm rather than vmName = vm.name, that's why they can't have same names in different data types.
If you really want functions that can operate on both VmInfo and HostInfo objects, you need a type class, such as
class MachineInfo m where
name :: m -> String
index :: m -> String -- why String anyway? Shouldn't this be an Int?
id :: m -> String
and make instances
instance MachineInfo VmInfo where
name (VmInfo vmName _ _) = vmName
index (VmInfo _ vmIndex _) = vmIndex
...
instance MachineInfo HostInfo where
...
Then name machine will work if machine is a VmInfo as well as if it's a HostInfo.
Currently, the named fields are top-level functions, so in one scope there can only be one function with that name. There are plans to create a new record system that would allow having fields of the same name in different record types in the same scope, but that's still in the design phase.
For the time being, you can make do with unique field names, or define each type in its own module and use the module-qualified name.
Lenses can help take some of the pain out of dealing with getting and setting data structure elements, especially when they get nested. They give you something that looks, if you squint, kind of like object-oriented accessors.
Learn more about the Lens family of types and functions here: http://lens.github.io/tutorial.html
As an example for what they look like, this is a snippet from the Pong example found at the above github page:
data Pong = Pong
{ _ballPos :: Point
, _ballSpeed :: Vector
, _paddle1 :: Float
, _paddle2 :: Float
, _score :: (Int, Int)
, _vectors :: [Vector]
-- Since gloss doesn't cover this, we store the set of pressed keys
, _keys :: Set Key
}
-- Some nice lenses to go with it
makeLenses ''Pong
That makes lenses to access the members without the underscores via some TemplateHaskell magic.
Later on, there's an example of using them:
-- Update the paddles
updatePaddles :: Float -> State Pong ()
updatePaddles time = do
p <- get
let paddleMovement = time * paddleSpeed
keyPressed key = p^.keys.contains (SpecialKey key)
-- Update the player's paddle based on keys
when (keyPressed KeyUp) $ paddle1 += paddleMovement
when (keyPressed KeyDown) $ paddle1 -= paddleMovement
-- Calculate the optimal position
let optimal = hitPos (p^.ballPos) (p^.ballSpeed)
acc = accuracy p
target = optimal * acc + (p^.ballPos._y) * (1 - acc)
dist = target - p^.paddle2
-- Move the CPU's paddle towards this optimal position as needed
when (abs dist > paddleHeight/3) $
case compare dist 0 of
GT -> paddle2 += paddleMovement
LT -> paddle2 -= paddleMovement
_ -> return ()
-- Make sure both paddles don't leave the playing area
paddle1 %= clamp (paddleHeight/2)
paddle2 %= clamp (paddleHeight/2)
I recommend checking out the whole program in its original location and looking through the rest of the lens material; it's very interesting even if you don't end up using them.
Yes, you cannot have two records in the same module with the same field names. The field names are added to the module's scope as functions, so you would use name vm rather than vm.name. You could have two records with the same field names in different modules and import one of the modules qualified as some name, but this is probably awkward to work with.
For a case like this, you should probably just use a normal algebraic data type:
data VMInfo = VMInfo String String String
(Note that the VMInfo has to be capitalized.)
Now you can access the fields of VMInfo by pattern matching:
myFunc (VMInfo name index id) = ... -- name, index and id are bound here
Setup:
I have several collections of various data structures witch represent the state of simulated objects in a virtual system. I also have a number of functions that transform (that is create a new copy of the object based on the the original and 0 or more parameters) these objects.
The goal is to allow a user to select some object to apply transformations to (within the rules of the simulation), apply those the functions to those objects and update the collections by replacing the old objects with the new ones.
I would like to be able to build up a function of this type by combining smaller transformations into larger ones. Then evaluate this combined function.
Questions:
How to I structure my program to make this possible?
What kind of combinator do I use to build up a transaction like this?
Ideas:
Put all the collections into one enormous structure and pass this structure around.
Use a state monad to accomplish basically the same thing
Use IORef (or one of its more potent cousins like MVar) and build up an IO action
Use a Functional Reactive Programing Framework
1 and 2 seem like they carry a lot of baggage around especially if I envision eventually moving some of the collections into a database. (Darn IO Monad)
3 seems to work well but starts to look a lot like recreating OOP. I'm also not sure at what level to use the IORef. (e.g IORef (Collection Obj) or Collection (IORef Obj) or data Obj {field::IORef(Type)} )
4 feels the most functional in style, but it also seems to create a lot of code complexity without much payoff in terms of expressiveness.
Example
I have a web store front. I maintain a collections of products with (among other things) the quantity in stock and a price. I also have a collection of users who have credit with the store.
A user comes along ands selects 3 products to buy and goes to check out using store credit. I need to create a new products collection that has the amount in stock for the 3 products reduced, create a new user collection with the users account debited.
This means I get the following:
checkout :: Cart -> ProductsCol -> UserCol -> (ProductsCol, UserCol)
But then life gets more complicated and I need to deal with taxes:
checkout :: Cart -> ProductsCol -> UserCol -> TaxCol
-> (ProductsCol, UserCol, TaxCol)
And then I need to be sure to add the order to the shipping queue:
checkout :: Cart
-> ProductsCol
-> UserCol
-> TaxCol
-> ShipList
-> (ProductsCol, UserCol, TaxCol, ShipList)
And so forth...
What I would like to write is something like
checkout = updateStockAmount <*> applyUserCredit <*> payTaxes <*> shipProducts
applyUserCredit = debitUser <*> creditBalanceSheet
but the type-checker would have go apoplectic on me. How do I structure this store such that the checkout or applyUserCredit functions remains modular and abstract? I cannot be the only one to have this problem, right?
Okay, let's break this down.
You have "update" functions with types like A -> A for various specific types A, which may be derived from partial application, that specify a new value of some type in terms of a previous value. Each such type A should be specific to what that function does, and it should be easy to change those types as the program develops.
You also have some sort of shared state, which presumably contains all the information used by any of the aforementioned update functions. Further, it should be possible to change what the state contains, without significantly impacting anything other than the functions acting directly on it.
Additionally, you want to be able to abstractly combine update functions, without compromising the above.
We can deduce a few necessary features of a straightforward design:
An intermediate layer will be necessary, between the full shared state and the specifics needed by each function, allowing pieces of the state to be projected out and replaced independently of the rest.
The types of the update functions themselves are by definition incompatible with no real shared structure, so to compose them you'll need to first combine each with the intermediate layer portion. This will give you updates acting on the entire state, which can then be composed in the obvious way.
The only operations needed on the shared state as a whole are to interface with the intermediate layer, and whatever may be necessary to maintain the changes made.
This breakdown allows each entire layer to be modular to a large extent; in particular, type classes can be defined to describe the necessary functionality, allowing any relevant instance to be swapped in.
In particular, this essentially unifies your ideas 2 and 3. There's an inherent monadic context of some sort here, and the type class interface suggested would allow multiple approaches, such as:
Make the shared state a record type, store it in a State monad, and use lenses to provide the interface layer.
Make the shared state a record type containing something like an STRef for each piece, and combine field selectors with ST monad update actions to provide the interface layer.
Make the shared state a collection of TChans, with separate threads to read/write them as appropriate to communicate asynchronously with an external data store.
Or any number of other variations.
You can store your state in a record, and use lenses to update pieces of state. This lets you write the individual state updating components as simple, focused functions that may be composed to build more complex checkout functions.
{-# LANGUAGE TemplateHaskell #-}
import Data.Lens.Template
import Data.Lens.Common
import Data.List (foldl')
import Data.Map ((!), Map, adjust, fromList)
type User = String
type Item = String
type Money = Int -- money in pennies
type Prices = Map Item Money
type Cart = (User, [(Item,Int)])
type ProductsCol = Map Item Int
type UserCol = Map User Money
data StoreState = Store { _stock :: ProductsCol
, _users :: UserCol
, msrp :: Prices }
deriving Show
makeLens ''StoreState
updateProducts :: Cart -> ProductsCol -> ProductsCol
updateProducts (_,c) = flip (foldl' destock) c
where destock p' (item,count) = adjust (subtract count) item p'
updateUsers :: Cart -> Prices -> UserCol -> UserCol
updateUsers (name,c) p = adjust (subtract (sum prices)) name
where prices = map (\(itemName, itemCount) -> (p ! itemName) * itemCount) c
checkout :: Cart -> StoreState -> StoreState
checkout c s = (users ^%= updateUsers c (msrp s))
. (stock ^%= updateProducts c)
$ s
test = checkout cart store
where cart = ("Bob", [("Apples", 2), ("Bananas", 6)])
store = Store initialStock initialUsers prices
initialStock = fromList
[("Apples", 20), ("Bananas", 10), ("Lambdas", 1000)]
initialUsers = fromList [("Bob", 20000), ("Mary", 40000)]
prices = fromList [("Apples", 100), ("Bananas", 50), ("Lambdas", 0)]
Has anyone written a generic function so that hash functions can be generated automatically for custom data types (using the deriving mechanism)? A few times, I've written the following kind of boilerplate,
data LeafExpr = Var Name | Star deriving (Eq, Show)
instance Hashable LeafExpr where
hash (Var name) = 476743 * hash name
hash Star = 152857
This could be generated automatically: the basic idea is that whenever adding data, you multiply by a prime, for example with lists,
hash (x:xs) = hash x + 193847 * hash xs
Essentially, what I'd like to write is
data LeafExpr = ... deriving (Hashable)
Edit 1
Thanks for all of the very helpful responses, everyone. I'll try to add a generic method as an exercise when I have time. For now (perhaps what sclv was referring to?), I realized I could write the slightly better code,
instance Hashable LeafExpr where
hash (Var name) = hash ("Leaf-Var", name)
hash Star = hash "Leaf-Star"
Edit 2
Using ghc, multiplying by random primes works much better than tupling in edit 1. Conflicts with Data.HashTable went from something like 95% (very bad) to 36%. Code is here: [ http://pastebin.com/WD0Xp0T1 ] [ http://pastebin.com/Nd6cBy6G ].
How much speed do you need? You could use one of the packages that use template haskell to generate serialization code to convert the value to binary and then hash the binary array with hashable.