Idiomatic way to modify a member variable - haskell

I know Haskell isn't OO so it isn't strictly a 'member variable'.
data Foo = Foo {
bar :: Int,
moo :: Int,
meh :: Int,
yup :: Int
}
modifyBar (Foo b m me y) = (Foo b' m me y)
where b' = 2
This is how my code looks at the moment. The problem is I am now making data types with 16 or more members. When I need to modify a single member it results in very verbose code. Is there a way around this?

modifyBar foo = foo { bar = 2 }
This syntax will copy foo, and then modify the bar field of that copy to 2. This could be naturally extended to more fields, so you don't need to write that modifyBar function at all.
(See http://book.realworldhaskell.org/read/code-case-study-parsing-a-binary-data-format.html#id625467)

Haskell's "record syntax" that #KennyTM shows is the built-in way to do this, though keep in mind that it's still just a way of constructing a new value based on the old one.
There are some annoying limitations to record syntax, though, particularly that the form used to "modify" a single item in a record aren't first-class entities in the language, so you can't abstract over them and pass them around the way you'd do with a regular function.
An alternative is using a library such as fclabels which provides similar functionality, using Template Haskell to auto-generate accessor functions instead of built-in syntax. The result is often much nicer, with the downside that you now have a dependency on TH....

Related

"Embedding/inheriting" one `data` constructor in another?

Consider the following fragment:
data File
= NoFile
| FileInfo {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime
}
| FileFull {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime,
content :: String
}
deriving Eq
That duplication is a bit of a "wart", though in this one-off instance not particularly painful. In order to further improve my understanding of Haskell's rich type system, what might be preferred "clean"/"idiomatic" approaches for refactoring other than either simply creating a separate data record type for the 2 duplicate fields (then replacing them with single fields of that new data type) or replacing the FileFull record notation with something like | FileFull File String, which wouldn't be quite clean either (as here one would only want FileInfo in there for example, not NoFile)?
(Both these "naive" approaches would be somewhat intrusive/annoying with respect to having to then fix up many modules manually throughout the rest of the code-base here.)
One thing I considered would be parameterizing like so:
data File a
= NoFile
| FileMaybeWithContent {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime
content :: a
}
deriving Eq
Then for those "just info, not loaded" contexts a would be (), otherwise String. Seems too general anyway, we want either String or nothing, leading us to Maybe, doing once again away with the a parameter.
Of course we've been there before: content could just be done with Maybe String of course, then "refactor any compile errors away" and "done". That'll probably be the order of the day, but knowing Haskell and the many funky GHC extensions.. who knows just what exotic theoretic trick/axiom/law I've been missing, right?! See, the differently-named "semantic insta-differentiator" between a "just meta-data info" value and a "file content with meta info" value does work well throughout the rest of the code-base as far as eased comprehension.
(And yes, I perhaps should have removed NoFile and used Maybe Files throughout, but then... not sure whether there's really a solid reason to do so and a different question altogether anyway..)
All of the following are equivalent/isomorphic, as I think you've discovered:
data F = U | X A B | Y A B C
data F = U | X AB | Y AB C
data AB = AB A B
data F = U | X A B (Maybe C)
So the color of the bike shed really depends on the context (e.g. do you have use for an AB elsewhere?) and your own aesthetic preferences.
It might clarify things and help you understand what you're doing to have some sense of the algebra of algebraic data types
We call types like Either "sum types" and types like (,) "product types" and they are subject to the same kinds of transformations you're familiar with like factoring
f = 1 + (a * b) + (a * b * c)
= 1 + ((a * b) * ( 1 + c))
As others have noted, the NoFile constructor is probably not necessary, but you can keep it if you want. If you feel your code is more readable and/or better understood with it, then I say keep it.
Now the trick with combining the other two constructors is by hiding the content field. You were on the right track by parameterizing File, but that alone isn't enough since then we can have File Foo, File Bar, etc. Fortunately, GHC has some nifty ways to help us.
I'll write out the code here and then explain how it works.
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DataKinds #-}
import Data.Void
data Desc = Info | Full
type family Content (a :: Desc) where
Content Full = String
Content _ = Void
data File a = File
{ path :: FilePath
, modTime :: UTCTime
, content :: Content a
}
There are a few things going on here.
First, note that in the File record, the content field now has type Content a instead of just a. Content is a type family, which is (in my opinion) a confusing name for type-level function. That is, the compiler replaces Content a with some other type based on what a is and how we've defined Content.
We defined Content Full to be String, so that when we have a value f1 :: File Full, its content field will have a String value. On the other hand, f2 :: File Info will have a content field with type Void which has no values.
Cool right? But what's preventing us from having File Foo now?
That's where DataKinds comes to the rescue. It "promotes" the data type Desc to a kind (the type of types in Haskell) and type constructors ,Info and Full, to types of kind Desc instead of merely values of type Desc.
Notice in the declaration of Content that I have annotated a. It looks like a type annotation, but a is already a type. This is a kind annotation. It forces a to be something of kind Desc and the only types of kind Desc are Info and Full.
By now you're probably totally sold on how awesome this is, but I should warn you there's no free lunch. In particular, this is a compile-time construction. Your single File type becomes two different types. This can cause other related logic (producers and consumers of File records) to become complicated. If your use case doesn't mix File Info records with File Full records, then this is the way to go. On the other hand, if you want to do something like have a list of File records which can be a mixture of both types, then you're better off just making the type of your content field Maybe String.
Another thing is, how exactly do you make a File Info since there's no value of Void to use for the content field? Well, technically it should be ok to use undefined or error "this should never happen" since it is (morally) impossible to have a function of type Void -> a, but if that makes you feel uneasy (and it probably should), then just replace Void with (). Unit is almost as useless and doesn't require 'values' of bottom.

What is the purpose of including the type in its definition in haskell?

I'm a beginner in haskell and I wonder about the right way to define a new type. Suppose I want to define a Point type. In an imperative language, it's usually the equivalent of:
data Point = Int Int
However in haskell I usually see definitions such as:
data Point = Point Int Int
What are the differences and when should each approach be used?
In OO languages you can define a class with something like this
class Point {
int x,y;
Point(int x, int y) {...
}
it's similar
data Point = ...
is the type definition (similar to class Point above , and
... = Point Int Int
is the constructor, you can also define the constructor with a different name, but you need a name regardless.
data Point = P Int Int
The data definitions are, ultimately, tagged unions. For example:
data Maybe a = Nothing | Just a
Now how would you write this type using your syntax?
Moreover it remains the fact that in Haskell you can pattern match over this values and see which constructor was used to build a value. The name of the constructor is needed for pattern matching, and if the type has just one constructor it often re-uses the same name as the type.
For example:
let x = someOperationReturningMaybe
in case x of
Nothing -> 0
Just y -> y+5
This is different from plain union type, such as C's union where you can say "this thing is etiher an int or a float" but you have no way to know which one it actually is (except by keeping track of the state by hand).
Writing the code above using a C union you have no way to use a case to perform different actions depending on the constructor used, and you have to keep track explicitly what type is contained in that x and use an if.

How do I restrict the types in a heterogenous list?

I am currently trying to create a (sort of) typesafe xml like syntax embedded into Haskell. In the end I am hoping to achieve something like this:
tree = group [arg1 "str", arg2 42]
[item [foo, bar] []
,item [foo, bar] []
]
where group and item are of kind Node :: [Arg t] -> [Node c] -> Node t. If this doesn't make any sense it is most probably because I have no idea what I am doing :)
My question now is how to make the type system prevent me from giving 'wrong' arguments to a Node. Eg Nodes of type Group only may have arguments of type Arg1 and Arg2 but Items may have arguments of type Foo and Bar.
I guess the bottom line question is: how do i restrict the types in a heterogenous list?
Example of the (user) syntax i am trying to achieve:
group .: arg1 "str" .: arg2 42
item .: foo .: bar
item .: foo .: bar
where (.:) is a function that sets the parameter in the node. This would represent a group with some parameters containing two items.
Additionally there would be some (pseudo) definition like:
data Node = Node PossibleArguments PossibleChildNodes
type Group = Node [Arg1, Arg2] [Item]
type Item = Node [Foo, Bar] []
I am searching for a way to catch usage errors by the typechecker.
What you have doesn't sound to me like you need a heterogeneous list. Maybe you're looking for something like this?
data Foo = Foo Int
data Bar = Bar Int
data Arg = StringArg String | IntArg Int | DoubleArg Double
data Tree = Group Arg Arg [Item]
data Item = Item Foo Bar
example :: Tree
example = Group (StringArg "str") (IntArg 42)
[Item (Foo 1) (Bar 2), Item (Foo 12) (Bar 36)]
Note that we could even create a list of Args of different "sub-types". For example, [StringArg "hello", IntArg 3, DoubleArg 12.0]. It would still be a homogeneous list, though.
===== EDIT =====
There are a few ways you could handle the "default argument" situation. Suppose the Bar argument in an item is optional. My first thought is that while it may be optional for the user to specify it, when I store the data I want to include the default argument. That way,
determining a default is separated from the code that actually does something with it. So,
if the user specifies a Foo of 3, but doesn't supply a Bar, and the default is Bar 77, then I create my item as:
Item (Foo 3) (Bar 77)
This has the advantage that functions that operate on this object don't need to worry about defaults; both parameters will always be present as far as they are concerned.
However, if you really want to omit the default arguments in your data structure, you could do somthing like this:
data Bar = Bar Int | DefaultBar
example = Group (StringArg "str") (IntArg 42)
[Item (Foo 1) (Bar 2), Item (Foo 12) DefaultBar]
Or even:
data Item = Item Foo Bar | ItemWithDefaultBar Foo
===== Edit #2 =====
So perhaps you could use something like this:
data ComplicatedItem = ComplicatedItem
{
location :: (Double, Double),
size :: Int,
rotation :: Double,
. . . and so on . . .
}
defaultComplicatedItem = ComplicatedItem { location = (0.0,0.0), size = 1, rotation = 0.0), ... }
To create a ComplicatedItem, the user only has to specify the non-default parameters:
myComplicatedItem = defaultComplicatedItem { size=3 }
If you add new paramters to the ComplicatedItem type, you need to update defaultComplicatedItem, but the definition for myComplicatedItem doesn't change.
You could also override the show function so that it omits the default parameters when printing.
Based on the ensuing discussion, it sounds like what you want is to create a DSL (Domain-Specific Language) to represent XML.
One option is to embed your DSL in Haskell so it can appear in Haskell source code. In general, you can do this by defining the types you need, and providing a set of functions to work with those types. It sounds like this is what you're hoping to do. However, as an embedded DSL, it will be subject to some constraints, and this is the problem you're encountering. Perhaps there is a clever trick to do what you want, maybe something involving type functions, but I can't think of anything at present. If you want to keep trying, maybe add the tags dsl and gadt to your question, catch the attention of people who know more about this stuff than I do. Alternatively, you might be able to use something like Template Haskell or Scrap Your Boilerplate to allow your users to omit some information, which would them be "filled in" before Haskell "sees" it.
Another option is to have an external DSL, which you parse using Haskell. You could define a DSL, but maybe it would be easier to just use XML directly with a suitable DTD. There are Haskell libraries for parsing XML, of course.

Data value dependencies, updates and memoisation

I'm sorry this problem description is so abstract: its for my job, and for commercial confidentiality reasons I can't give the real-world problem, just an abstraction.
I've got an application that receives messages containing key-value pairs. The keys are from a defined set of keywords, and each keyword has a fixed data type. So if "Foo" is an Integer and "Bar" is a date you might get a message like:
Foo: 234
Bar: 24 September 2011
A message may have any subset of keys in it. The number of keys is fairly large (several dozen). But lets stick with Foo and Bar for now.
Obviously there is a record like this corresponding to the messages:
data MyRecord {
foo :: Maybe Integer
bar :: Maybe UTCTime
-- ... and so on for several dozen fields.
}
The record uses "Maybe" types because that field may not have been received yet.
I also have many derived values that I need to compute from the current values (if they exist). For instance I want to have
baz :: MyRecord -> Maybe String
baz r = do -- Maybe monad
f <- foo r
b <- bar r
return $ show f ++ " " ++ show b
Some of these functions are slow, so I don't want to repeat them unnecessarily. I could recompute baz for each new message and memo it in the original structure, but if a message leaves the foo and bar fields unchanged then that is wasted CPU time. Conversely I could recompute baz every time I want it, but again that would waste CPU time if the underlying arguments have not changed since last time.
What I want is some kind of smart memoisation or push-based recomputation that only recomputes baz when the arguments change. I could detect this manually by noting that baz depends only on foo and bar, and so only recomputing it on messages that change those values, but for complicated functions that is error-prone.
An added wrinkle is that some of these functions may have multiple strategies. For instance you might have a value that can be computed from either Foo or Bar using 'mplus'.
Does anyone know of an existing solution to this? If not, how should I go about it?
I'll assume that you have one "state" record and these message all involve updating it as well as setting it. So if Foo is 12, it may later be 23, and therefore the output of baz would change. If any of this is not the case, then the answer becomes pretty trivial.
Let's start with the "core" of baz -- a function not on a record, but the values you want.
baz :: Int -> Int -> String
Now let's transform it:
data Cached a b = Cached (Maybe (a,b)) (a -> b)
getCached :: Eq a => Cached a b -> a -> (b,Cached a b)
getCached c#(Cached (Just (arg,res)) f) x | x == arg = (res,c)
getCached (Cached _ f) x = let ans = f x in (ans,Cached (Just (x,ans) f)
bazC :: Cached (Int,Int) String
bazC = Cached Nothing (uncurry baz)
Now whenever you would use a normal function, you use a cache-transformed function instead, substituting the resulting cache-transformed function back into your record. This is essentially a manual memotable of size one.
For the basic case you describe, this should be fine.
A fancier and more generalized solution involving a dynamic graph of dependencies goes under the name "incremental computation" but I've seen research papers for it more than serious production implementations. You can take a look at these for starters, and follow the reference trail forward:
http://www.carlssonia.org/ogi/Adaptive/
http://www.andres-loeh.de/Incrementalization/paper_final.pdf
Incremental computation is actually also very related to functional reactive programming, so you can take a look at conal's papers on that, or play with Heinrich Apfelmus' reactive-banana library: http://www.haskell.org/haskellwiki/Reactive-banana
In imperative languages, take a look at trellis in python: http://pypi.python.org/pypi/Trellis or Cells in lisp: http://common-lisp.net/project/cells/
You can build a stateful graph that corresponds to computations you need to do. When new values appear you push these into the graph and recompute, updating the graph until you reach the outputs. (Or you can store the value at the input and recompute on demand.) This is a very stateful solution but it works.
Are you perhaps creating market data, like yield curves, from live inputs of rates etc.?
What I want is some kind of smart memoisation or push-based recomputation that only recomputes baz when the arguments change.
It sounds to me like you want a variable that is sort of immutable, but allows a one-time mutation from "nothing computed yet" to "computed". Well, you're in luck: this is exactly what lazy evaluation gives you! So my proposed solution is quite simple: just extend your record with fields for each of the things you want to compute. Here's an example of such a thing, where the CPU-intensive task we're doing is breaking some encryption scheme:
data Foo = Foo
{ ciphertext :: String
, plaintext :: String
}
-- a smart constructor for Foo's
foo c = Foo { ciphertext = c, plaintext = crack c }
The point here is that calls to foo have expenses like this:
If you never ask for the plaintext of the result, it's cheap.
On the first call to plaintext, the CPU churns a long time.
On subsequent calls to plaintext, the previously computed answer is returned immediately.

Any nice record Handling tricks in Haskell?

I'm aware of partial updates for records like :
data A a b = A { a :: a, b :: b }
x = A { a=1,b=2 :: Int }
y = x { b = toRational (a x) + 4.5 }
Are there any tricks for doing only partial initialization, creating a subrecord type, or doing (de)serialization on subrecord?
In particular, I found that the first of these lines works but the second does not :
read "A {a=1,b=()}" :: A Int ()
read "A {a=1}" :: A Int ()
You could always massage such input using a regular expression, but I'm curious what Haskell-like options exist.
Partial initialisation works fine: A {a=1} is a valid expression of type A Int (); the Read instance just doesn't bother parsing anything the Show instance doesn't output. The b field is initialised to error "...", where the string contains file/line information to help with debugging.
You generally shouldn't be using Read for any real-world parsing situations; it's there for toy programs that have really simple serialisation needs and debugging.
I'm not sure what you mean by "subrecord", but if you want serialisation/deserialisation that can cope with "upgrades" to the record format to contain more information while still being able to process old (now "partial") serialisations, then the safecopy library does just that.
You cannot leave some value in Haskell "uninitialized" (it would not be possible to "initialize" it later anyway, since Haskell is pure). If you want to provide "default" values for the fields, then you can make some "default" value for your record type, and then do a partial update on that default value, setting only the fields you care about. I don't know how you would implement read for this in a simple way, however.

Resources