Any nice record Handling tricks in Haskell? - haskell

I'm aware of partial updates for records like :
data A a b = A { a :: a, b :: b }
x = A { a=1,b=2 :: Int }
y = x { b = toRational (a x) + 4.5 }
Are there any tricks for doing only partial initialization, creating a subrecord type, or doing (de)serialization on subrecord?
In particular, I found that the first of these lines works but the second does not :
read "A {a=1,b=()}" :: A Int ()
read "A {a=1}" :: A Int ()
You could always massage such input using a regular expression, but I'm curious what Haskell-like options exist.

Partial initialisation works fine: A {a=1} is a valid expression of type A Int (); the Read instance just doesn't bother parsing anything the Show instance doesn't output. The b field is initialised to error "...", where the string contains file/line information to help with debugging.
You generally shouldn't be using Read for any real-world parsing situations; it's there for toy programs that have really simple serialisation needs and debugging.
I'm not sure what you mean by "subrecord", but if you want serialisation/deserialisation that can cope with "upgrades" to the record format to contain more information while still being able to process old (now "partial") serialisations, then the safecopy library does just that.

You cannot leave some value in Haskell "uninitialized" (it would not be possible to "initialize" it later anyway, since Haskell is pure). If you want to provide "default" values for the fields, then you can make some "default" value for your record type, and then do a partial update on that default value, setting only the fields you care about. I don't know how you would implement read for this in a simple way, however.

Related

Haskell conversion between types

Again stuck on something probably theoretical. There are many libraries in Haskell, i'd like to use less as possible. If I have a type like this:
data Note = Note { _noteID :: Int
, _noteTitle :: String
, _noteBody :: String
, _noteSubmit :: String
} deriving Show
And use that to create a list of [Note {noteID=1...}, Note {noteID=2...}, ] et cetera. I now have a list of type Note. Now I want to write it to a file using writeFile. Probably it ghc will not allow it considering writeFile has type FilePath -> String -> IO (). But I also want to avoid deconstructing (writeFile) and constructing (readFile) the types all the time, assuming I will not leave the Haskell 'realm'. Is there a way to do that, without using special libs? Again: thanks a lot. Books on Haskell are good, but StackOverflow is the glue between the books and the real world.
If you're looking for a "quick fix", for a one-off script or something like that, you can derive Read in addition to Show, and then you'll be able to use show to convert to String and read to convert back, for example:
data D = D { x :: Int, y :: Bool }
deriving (Show, Read)
d1 = D 42 True
s = show d1
-- s == "D {x = 42, y = True}"
d2 :: D
d2 = read s
-- d2 == d1
However, please, please don't put this in production code. First, you're implicitly relying on how the record is coded, and there are no checks to protect from subtle changes. Second, the read function is partial - that is, it will crash if it can't parse the input. And finally, if you persist your data this way, you'll be stuck with this record format and can never change it.
For a production-quality solution, I'm sorry, but you'll have to come up with an explicit, documented serialization format. No way around it - in any language.

"Embedding/inheriting" one `data` constructor in another?

Consider the following fragment:
data File
= NoFile
| FileInfo {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime
}
| FileFull {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime,
content :: String
}
deriving Eq
That duplication is a bit of a "wart", though in this one-off instance not particularly painful. In order to further improve my understanding of Haskell's rich type system, what might be preferred "clean"/"idiomatic" approaches for refactoring other than either simply creating a separate data record type for the 2 duplicate fields (then replacing them with single fields of that new data type) or replacing the FileFull record notation with something like | FileFull File String, which wouldn't be quite clean either (as here one would only want FileInfo in there for example, not NoFile)?
(Both these "naive" approaches would be somewhat intrusive/annoying with respect to having to then fix up many modules manually throughout the rest of the code-base here.)
One thing I considered would be parameterizing like so:
data File a
= NoFile
| FileMaybeWithContent {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime
content :: a
}
deriving Eq
Then for those "just info, not loaded" contexts a would be (), otherwise String. Seems too general anyway, we want either String or nothing, leading us to Maybe, doing once again away with the a parameter.
Of course we've been there before: content could just be done with Maybe String of course, then "refactor any compile errors away" and "done". That'll probably be the order of the day, but knowing Haskell and the many funky GHC extensions.. who knows just what exotic theoretic trick/axiom/law I've been missing, right?! See, the differently-named "semantic insta-differentiator" between a "just meta-data info" value and a "file content with meta info" value does work well throughout the rest of the code-base as far as eased comprehension.
(And yes, I perhaps should have removed NoFile and used Maybe Files throughout, but then... not sure whether there's really a solid reason to do so and a different question altogether anyway..)
All of the following are equivalent/isomorphic, as I think you've discovered:
data F = U | X A B | Y A B C
data F = U | X AB | Y AB C
data AB = AB A B
data F = U | X A B (Maybe C)
So the color of the bike shed really depends on the context (e.g. do you have use for an AB elsewhere?) and your own aesthetic preferences.
It might clarify things and help you understand what you're doing to have some sense of the algebra of algebraic data types
We call types like Either "sum types" and types like (,) "product types" and they are subject to the same kinds of transformations you're familiar with like factoring
f = 1 + (a * b) + (a * b * c)
= 1 + ((a * b) * ( 1 + c))
As others have noted, the NoFile constructor is probably not necessary, but you can keep it if you want. If you feel your code is more readable and/or better understood with it, then I say keep it.
Now the trick with combining the other two constructors is by hiding the content field. You were on the right track by parameterizing File, but that alone isn't enough since then we can have File Foo, File Bar, etc. Fortunately, GHC has some nifty ways to help us.
I'll write out the code here and then explain how it works.
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DataKinds #-}
import Data.Void
data Desc = Info | Full
type family Content (a :: Desc) where
Content Full = String
Content _ = Void
data File a = File
{ path :: FilePath
, modTime :: UTCTime
, content :: Content a
}
There are a few things going on here.
First, note that in the File record, the content field now has type Content a instead of just a. Content is a type family, which is (in my opinion) a confusing name for type-level function. That is, the compiler replaces Content a with some other type based on what a is and how we've defined Content.
We defined Content Full to be String, so that when we have a value f1 :: File Full, its content field will have a String value. On the other hand, f2 :: File Info will have a content field with type Void which has no values.
Cool right? But what's preventing us from having File Foo now?
That's where DataKinds comes to the rescue. It "promotes" the data type Desc to a kind (the type of types in Haskell) and type constructors ,Info and Full, to types of kind Desc instead of merely values of type Desc.
Notice in the declaration of Content that I have annotated a. It looks like a type annotation, but a is already a type. This is a kind annotation. It forces a to be something of kind Desc and the only types of kind Desc are Info and Full.
By now you're probably totally sold on how awesome this is, but I should warn you there's no free lunch. In particular, this is a compile-time construction. Your single File type becomes two different types. This can cause other related logic (producers and consumers of File records) to become complicated. If your use case doesn't mix File Info records with File Full records, then this is the way to go. On the other hand, if you want to do something like have a list of File records which can be a mixture of both types, then you're better off just making the type of your content field Maybe String.
Another thing is, how exactly do you make a File Info since there's no value of Void to use for the content field? Well, technically it should be ok to use undefined or error "this should never happen" since it is (morally) impossible to have a function of type Void -> a, but if that makes you feel uneasy (and it probably should), then just replace Void with (). Unit is almost as useless and doesn't require 'values' of bottom.

What is the purpose of including the type in its definition in haskell?

I'm a beginner in haskell and I wonder about the right way to define a new type. Suppose I want to define a Point type. In an imperative language, it's usually the equivalent of:
data Point = Int Int
However in haskell I usually see definitions such as:
data Point = Point Int Int
What are the differences and when should each approach be used?
In OO languages you can define a class with something like this
class Point {
int x,y;
Point(int x, int y) {...
}
it's similar
data Point = ...
is the type definition (similar to class Point above , and
... = Point Int Int
is the constructor, you can also define the constructor with a different name, but you need a name regardless.
data Point = P Int Int
The data definitions are, ultimately, tagged unions. For example:
data Maybe a = Nothing | Just a
Now how would you write this type using your syntax?
Moreover it remains the fact that in Haskell you can pattern match over this values and see which constructor was used to build a value. The name of the constructor is needed for pattern matching, and if the type has just one constructor it often re-uses the same name as the type.
For example:
let x = someOperationReturningMaybe
in case x of
Nothing -> 0
Just y -> y+5
This is different from plain union type, such as C's union where you can say "this thing is etiher an int or a float" but you have no way to know which one it actually is (except by keeping track of the state by hand).
Writing the code above using a C union you have no way to use a case to perform different actions depending on the constructor used, and you have to keep track explicitly what type is contained in that x and use an if.

Haskell: Confusion with own data types. Record syntax and unique fields

I just uncovered this confusion and would like a confirmation that it is what it is. Unless, of course, I am just missing something.
Say, I have these data declarations:
data VmInfo = VmInfo {name, index, id :: String} deriving (Show)
data HostInfo = HostInfo {name, index, id :: String} deriving (Show)
vm = VmInfo "vm1" "01" "74653"
host = HostInfo "host1" "02" "98732"
What I always thought and what seems to be so natural and logical is this:
vmName = vm.name
hostName = host.name
But this, obviously, does not work. I got this.
Questions
So my questions are.
When I create a data type with record syntax, do I have to make sure that all the fields have unique names? If yes - why?
Is there a clean way or something similar to a "scope resolution operator", like :: or ., etc., so that Haskell distinguishes which data type the name (or any other none unique fields) belongs to and returns the correct result?
What is the correct way to deal with this if I have several declarations with the same field names?
As a side note.
In general, I need to return data types similar to the above example.
First I returned them as tuples (seemed to me the correct way at the time). But tuples are hard to work with as it is impossible to extract individual parts of a complex type as easy as with the lists using "!!". So next thing I thought of the dictionaries/hashes.
When I tried using dictionaries I thought what is the point of having own data types then?
Playing/learning data types I encountered the fact that led me to the above question.
So it looks like it is easier for me to use dictionaries instead of own data types as I can use the same fields for different objects.
Can you please elaborate on this and tell me how it is done in real world?
Haskell record syntax is a bit of a hack, but the record name emerges as a function, and that function has to have a unique type. So you can share record-field names among constructors of a single datatype but not among distinct datatypes.
What is the correct way to deal with this if I have several declarations with the same field names?
You can't. You have to use distinct field names. If you want an overloaded name to select from a record, you can try using a type class. But basically, field names in Haskell don't work the way they do in say, C or Pascal. Calling it "record syntax" might have been a mistake.
But tuples are hard to work with as it is impossible to extract individual parts of a complex type
Actually, this can be quite easy using pattern matching. Example
smallId :: VmInfo -> Bool
smallId (VmInfo { vmId = n }) = n < 10
As to how this is done in the "real world", Haskell programmers tend to rely heavily on knowing what type each field is at compile time. If you want the type of a field to vary, a Haskell programmer introduces a type parameter to carry varying information. Example
data VmInfo a = VmInfo { vmId :: Int, vmName :: String, vmInfo :: a }
Now you can have VmInfo String, VmInfo Dictionary, VmInfo Node, or whatever you want.
Summary: each field name must belong to a unique type, and experienced Haskell programmers work with the static type system instead of trying to work around it. And you definitely want to learn about pattern matching.
There are more reasons why this doesn't work: lowercase typenames and data constructors, OO-language-style member access with .. In Haskell, those member access functions actually are free functions, i.e. vmName = name vm rather than vmName = vm.name, that's why they can't have same names in different data types.
If you really want functions that can operate on both VmInfo and HostInfo objects, you need a type class, such as
class MachineInfo m where
name :: m -> String
index :: m -> String -- why String anyway? Shouldn't this be an Int?
id :: m -> String
and make instances
instance MachineInfo VmInfo where
name (VmInfo vmName _ _) = vmName
index (VmInfo _ vmIndex _) = vmIndex
...
instance MachineInfo HostInfo where
...
Then name machine will work if machine is a VmInfo as well as if it's a HostInfo.
Currently, the named fields are top-level functions, so in one scope there can only be one function with that name. There are plans to create a new record system that would allow having fields of the same name in different record types in the same scope, but that's still in the design phase.
For the time being, you can make do with unique field names, or define each type in its own module and use the module-qualified name.
Lenses can help take some of the pain out of dealing with getting and setting data structure elements, especially when they get nested. They give you something that looks, if you squint, kind of like object-oriented accessors.
Learn more about the Lens family of types and functions here: http://lens.github.io/tutorial.html
As an example for what they look like, this is a snippet from the Pong example found at the above github page:
data Pong = Pong
{ _ballPos :: Point
, _ballSpeed :: Vector
, _paddle1 :: Float
, _paddle2 :: Float
, _score :: (Int, Int)
, _vectors :: [Vector]
-- Since gloss doesn't cover this, we store the set of pressed keys
, _keys :: Set Key
}
-- Some nice lenses to go with it
makeLenses ''Pong
That makes lenses to access the members without the underscores via some TemplateHaskell magic.
Later on, there's an example of using them:
-- Update the paddles
updatePaddles :: Float -> State Pong ()
updatePaddles time = do
p <- get
let paddleMovement = time * paddleSpeed
keyPressed key = p^.keys.contains (SpecialKey key)
-- Update the player's paddle based on keys
when (keyPressed KeyUp) $ paddle1 += paddleMovement
when (keyPressed KeyDown) $ paddle1 -= paddleMovement
-- Calculate the optimal position
let optimal = hitPos (p^.ballPos) (p^.ballSpeed)
acc = accuracy p
target = optimal * acc + (p^.ballPos._y) * (1 - acc)
dist = target - p^.paddle2
-- Move the CPU's paddle towards this optimal position as needed
when (abs dist > paddleHeight/3) $
case compare dist 0 of
GT -> paddle2 += paddleMovement
LT -> paddle2 -= paddleMovement
_ -> return ()
-- Make sure both paddles don't leave the playing area
paddle1 %= clamp (paddleHeight/2)
paddle2 %= clamp (paddleHeight/2)
I recommend checking out the whole program in its original location and looking through the rest of the lens material; it's very interesting even if you don't end up using them.
Yes, you cannot have two records in the same module with the same field names. The field names are added to the module's scope as functions, so you would use name vm rather than vm.name. You could have two records with the same field names in different modules and import one of the modules qualified as some name, but this is probably awkward to work with.
For a case like this, you should probably just use a normal algebraic data type:
data VMInfo = VMInfo String String String
(Note that the VMInfo has to be capitalized.)
Now you can access the fields of VMInfo by pattern matching:
myFunc (VMInfo name index id) = ... -- name, index and id are bound here

Idiomatic way to modify a member variable

I know Haskell isn't OO so it isn't strictly a 'member variable'.
data Foo = Foo {
bar :: Int,
moo :: Int,
meh :: Int,
yup :: Int
}
modifyBar (Foo b m me y) = (Foo b' m me y)
where b' = 2
This is how my code looks at the moment. The problem is I am now making data types with 16 or more members. When I need to modify a single member it results in very verbose code. Is there a way around this?
modifyBar foo = foo { bar = 2 }
This syntax will copy foo, and then modify the bar field of that copy to 2. This could be naturally extended to more fields, so you don't need to write that modifyBar function at all.
(See http://book.realworldhaskell.org/read/code-case-study-parsing-a-binary-data-format.html#id625467)
Haskell's "record syntax" that #KennyTM shows is the built-in way to do this, though keep in mind that it's still just a way of constructing a new value based on the old one.
There are some annoying limitations to record syntax, though, particularly that the form used to "modify" a single item in a record aren't first-class entities in the language, so you can't abstract over them and pass them around the way you'd do with a regular function.
An alternative is using a library such as fclabels which provides similar functionality, using Template Haskell to auto-generate accessor functions instead of built-in syntax. The result is often much nicer, with the downside that you now have a dependency on TH....

Resources