Again stuck on something probably theoretical. There are many libraries in Haskell, i'd like to use less as possible. If I have a type like this:
data Note = Note { _noteID :: Int
, _noteTitle :: String
, _noteBody :: String
, _noteSubmit :: String
} deriving Show
And use that to create a list of [Note {noteID=1...}, Note {noteID=2...}, ] et cetera. I now have a list of type Note. Now I want to write it to a file using writeFile. Probably it ghc will not allow it considering writeFile has type FilePath -> String -> IO (). But I also want to avoid deconstructing (writeFile) and constructing (readFile) the types all the time, assuming I will not leave the Haskell 'realm'. Is there a way to do that, without using special libs? Again: thanks a lot. Books on Haskell are good, but StackOverflow is the glue between the books and the real world.
If you're looking for a "quick fix", for a one-off script or something like that, you can derive Read in addition to Show, and then you'll be able to use show to convert to String and read to convert back, for example:
data D = D { x :: Int, y :: Bool }
deriving (Show, Read)
d1 = D 42 True
s = show d1
-- s == "D {x = 42, y = True}"
d2 :: D
d2 = read s
-- d2 == d1
However, please, please don't put this in production code. First, you're implicitly relying on how the record is coded, and there are no checks to protect from subtle changes. Second, the read function is partial - that is, it will crash if it can't parse the input. And finally, if you persist your data this way, you'll be stuck with this record format and can never change it.
For a production-quality solution, I'm sorry, but you'll have to come up with an explicit, documented serialization format. No way around it - in any language.
Related
Consider the following fragment:
data File
= NoFile
| FileInfo {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime
}
| FileFull {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime,
content :: String
}
deriving Eq
That duplication is a bit of a "wart", though in this one-off instance not particularly painful. In order to further improve my understanding of Haskell's rich type system, what might be preferred "clean"/"idiomatic" approaches for refactoring other than either simply creating a separate data record type for the 2 duplicate fields (then replacing them with single fields of that new data type) or replacing the FileFull record notation with something like | FileFull File String, which wouldn't be quite clean either (as here one would only want FileInfo in there for example, not NoFile)?
(Both these "naive" approaches would be somewhat intrusive/annoying with respect to having to then fix up many modules manually throughout the rest of the code-base here.)
One thing I considered would be parameterizing like so:
data File a
= NoFile
| FileMaybeWithContent {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime
content :: a
}
deriving Eq
Then for those "just info, not loaded" contexts a would be (), otherwise String. Seems too general anyway, we want either String or nothing, leading us to Maybe, doing once again away with the a parameter.
Of course we've been there before: content could just be done with Maybe String of course, then "refactor any compile errors away" and "done". That'll probably be the order of the day, but knowing Haskell and the many funky GHC extensions.. who knows just what exotic theoretic trick/axiom/law I've been missing, right?! See, the differently-named "semantic insta-differentiator" between a "just meta-data info" value and a "file content with meta info" value does work well throughout the rest of the code-base as far as eased comprehension.
(And yes, I perhaps should have removed NoFile and used Maybe Files throughout, but then... not sure whether there's really a solid reason to do so and a different question altogether anyway..)
All of the following are equivalent/isomorphic, as I think you've discovered:
data F = U | X A B | Y A B C
data F = U | X AB | Y AB C
data AB = AB A B
data F = U | X A B (Maybe C)
So the color of the bike shed really depends on the context (e.g. do you have use for an AB elsewhere?) and your own aesthetic preferences.
It might clarify things and help you understand what you're doing to have some sense of the algebra of algebraic data types
We call types like Either "sum types" and types like (,) "product types" and they are subject to the same kinds of transformations you're familiar with like factoring
f = 1 + (a * b) + (a * b * c)
= 1 + ((a * b) * ( 1 + c))
As others have noted, the NoFile constructor is probably not necessary, but you can keep it if you want. If you feel your code is more readable and/or better understood with it, then I say keep it.
Now the trick with combining the other two constructors is by hiding the content field. You were on the right track by parameterizing File, but that alone isn't enough since then we can have File Foo, File Bar, etc. Fortunately, GHC has some nifty ways to help us.
I'll write out the code here and then explain how it works.
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DataKinds #-}
import Data.Void
data Desc = Info | Full
type family Content (a :: Desc) where
Content Full = String
Content _ = Void
data File a = File
{ path :: FilePath
, modTime :: UTCTime
, content :: Content a
}
There are a few things going on here.
First, note that in the File record, the content field now has type Content a instead of just a. Content is a type family, which is (in my opinion) a confusing name for type-level function. That is, the compiler replaces Content a with some other type based on what a is and how we've defined Content.
We defined Content Full to be String, so that when we have a value f1 :: File Full, its content field will have a String value. On the other hand, f2 :: File Info will have a content field with type Void which has no values.
Cool right? But what's preventing us from having File Foo now?
That's where DataKinds comes to the rescue. It "promotes" the data type Desc to a kind (the type of types in Haskell) and type constructors ,Info and Full, to types of kind Desc instead of merely values of type Desc.
Notice in the declaration of Content that I have annotated a. It looks like a type annotation, but a is already a type. This is a kind annotation. It forces a to be something of kind Desc and the only types of kind Desc are Info and Full.
By now you're probably totally sold on how awesome this is, but I should warn you there's no free lunch. In particular, this is a compile-time construction. Your single File type becomes two different types. This can cause other related logic (producers and consumers of File records) to become complicated. If your use case doesn't mix File Info records with File Full records, then this is the way to go. On the other hand, if you want to do something like have a list of File records which can be a mixture of both types, then you're better off just making the type of your content field Maybe String.
Another thing is, how exactly do you make a File Info since there's no value of Void to use for the content field? Well, technically it should be ok to use undefined or error "this should never happen" since it is (morally) impossible to have a function of type Void -> a, but if that makes you feel uneasy (and it probably should), then just replace Void with (). Unit is almost as useless and doesn't require 'values' of bottom.
For the representation of a DSL syntax tree I have data types that represent this tree. At several places, within this tree I get quite a number of subelements that are optional and/or have a "*" multiplicity. So one data type might look something like
data SpecialDslExpression = MyExpression String [Int] Double [String] (Maybe Bool)
What I am looking for is a possibility to construct such a type without having to specify all of the parameters, assuming I have a valid default for each of them. The usage scenario is such that I need to create many instances of the type with all kinds of combinations of its parameters given or omitted (most of the time two or three), but very rarely all of them. Grouping the parameters into subtypes won't get me far as the parameter combinations don't follow a pattern that would have segmentation improve matters.
I could define functions with different parameter combinations to create the type using defaults for the rest, but I might end up with quite a number of them that would become hard to name properly, as there might be no possibility to give a proper name to the idea of createWithFirstAndThirdParameter in a given context.
So in the end the question boils down to: Is it possible to create such a data type or an abstraction over it that would give me something like optional parameters that I can specify or omit at wish?
I would suggest a combinations of lenses and a default instance. If you are not already importing Control.Lens in half of your modules, now is the time to start! What the heck are lenses, anyway? A lens is a getter and a setter mashed into one function. And they are very composable. Any time you need to access or modify parts of a data structure but you think record syntax is unwieldy, lenses are there for you.
So, the first thing you need to do – enable TH and import Control.Lens.
{-# LANGUAGE TemplateHaskell #-}
import Control.Lens
The modification you need to do to your data type is adding names for all the fields, like so:
data SpecialDslExpression = MyExpression { _exprType :: String
, _exprParams :: [Int]
, _exprCost :: Double
, _exprComment :: [String]
, _exprLog :: Maybe Bool
} deriving Show
The underscores in the beginning of the field names are important, for the following step. Because now we want to generate lenses for the fields. We can ask GHC to do that for us with Template Haskell.
$(makeLenses ''SpecialDslExpression)
Then the final thing that needs to be done is constructing an "empty" instance. Beware that nobody will check statically that you actually fill all the required fields, so you should preferably add an error to those fields so you at least get a run-time error. Something like this:
emptyExpression = MyExpression (error "Type field is required!") [] 0.0 [] Nothing
Now you are ready to roll! You cannot use an emptyExpression, and it will fail at run-time:
> emptyExpression
MyExpression {_exprType = "*** Exception: Type field is required!
But! As long as you populate the type field, you will be golden:
> emptyExpression & exprType .~ "Test expression"
MyExpression { _exprType = "Test expression"
, _exprParams = []
, _exprCost = 0.0
, _exprComment = []
, _exprLog = Nothing
}
You can also fill several fields at once, if you want to.
> emptyExpression & exprType .~ "Test expression"
| & exprLog .~ Just False
| & exprComment .~ ["Test comment"]
MyExpression { _exprType = "Test expression"
, _exprParams = []
, _exprCost = 0.0
, _exprComment = ["Test comment"]
, _exprLog = Just False
}
You can also use lenses to apply a function to a field, or look inside a field of a field, or modify any other existing expression and so on. I definitely recommend taking a look at what you can do!
Alright I'll actually expand upon my comment. Firstly, define your data type as a record (and throw in a few type synonyms).
data Example = E {
one :: Int,
two :: String,
three :: Bool,
four :: Double
}
next you create a default instance
defaultExample = Example 1 "foo" False 1.4
and then when a user wants to tweak a field in the default to make their own data they can do this:
myData = defaultExample{four=2.8}
Finally, when they want to pattern match just one item, they can use
foo MyData{four=a} = a
I'm sorry this problem description is so abstract: its for my job, and for commercial confidentiality reasons I can't give the real-world problem, just an abstraction.
I've got an application that receives messages containing key-value pairs. The keys are from a defined set of keywords, and each keyword has a fixed data type. So if "Foo" is an Integer and "Bar" is a date you might get a message like:
Foo: 234
Bar: 24 September 2011
A message may have any subset of keys in it. The number of keys is fairly large (several dozen). But lets stick with Foo and Bar for now.
Obviously there is a record like this corresponding to the messages:
data MyRecord {
foo :: Maybe Integer
bar :: Maybe UTCTime
-- ... and so on for several dozen fields.
}
The record uses "Maybe" types because that field may not have been received yet.
I also have many derived values that I need to compute from the current values (if they exist). For instance I want to have
baz :: MyRecord -> Maybe String
baz r = do -- Maybe monad
f <- foo r
b <- bar r
return $ show f ++ " " ++ show b
Some of these functions are slow, so I don't want to repeat them unnecessarily. I could recompute baz for each new message and memo it in the original structure, but if a message leaves the foo and bar fields unchanged then that is wasted CPU time. Conversely I could recompute baz every time I want it, but again that would waste CPU time if the underlying arguments have not changed since last time.
What I want is some kind of smart memoisation or push-based recomputation that only recomputes baz when the arguments change. I could detect this manually by noting that baz depends only on foo and bar, and so only recomputing it on messages that change those values, but for complicated functions that is error-prone.
An added wrinkle is that some of these functions may have multiple strategies. For instance you might have a value that can be computed from either Foo or Bar using 'mplus'.
Does anyone know of an existing solution to this? If not, how should I go about it?
I'll assume that you have one "state" record and these message all involve updating it as well as setting it. So if Foo is 12, it may later be 23, and therefore the output of baz would change. If any of this is not the case, then the answer becomes pretty trivial.
Let's start with the "core" of baz -- a function not on a record, but the values you want.
baz :: Int -> Int -> String
Now let's transform it:
data Cached a b = Cached (Maybe (a,b)) (a -> b)
getCached :: Eq a => Cached a b -> a -> (b,Cached a b)
getCached c#(Cached (Just (arg,res)) f) x | x == arg = (res,c)
getCached (Cached _ f) x = let ans = f x in (ans,Cached (Just (x,ans) f)
bazC :: Cached (Int,Int) String
bazC = Cached Nothing (uncurry baz)
Now whenever you would use a normal function, you use a cache-transformed function instead, substituting the resulting cache-transformed function back into your record. This is essentially a manual memotable of size one.
For the basic case you describe, this should be fine.
A fancier and more generalized solution involving a dynamic graph of dependencies goes under the name "incremental computation" but I've seen research papers for it more than serious production implementations. You can take a look at these for starters, and follow the reference trail forward:
http://www.carlssonia.org/ogi/Adaptive/
http://www.andres-loeh.de/Incrementalization/paper_final.pdf
Incremental computation is actually also very related to functional reactive programming, so you can take a look at conal's papers on that, or play with Heinrich Apfelmus' reactive-banana library: http://www.haskell.org/haskellwiki/Reactive-banana
In imperative languages, take a look at trellis in python: http://pypi.python.org/pypi/Trellis or Cells in lisp: http://common-lisp.net/project/cells/
You can build a stateful graph that corresponds to computations you need to do. When new values appear you push these into the graph and recompute, updating the graph until you reach the outputs. (Or you can store the value at the input and recompute on demand.) This is a very stateful solution but it works.
Are you perhaps creating market data, like yield curves, from live inputs of rates etc.?
What I want is some kind of smart memoisation or push-based recomputation that only recomputes baz when the arguments change.
It sounds to me like you want a variable that is sort of immutable, but allows a one-time mutation from "nothing computed yet" to "computed". Well, you're in luck: this is exactly what lazy evaluation gives you! So my proposed solution is quite simple: just extend your record with fields for each of the things you want to compute. Here's an example of such a thing, where the CPU-intensive task we're doing is breaking some encryption scheme:
data Foo = Foo
{ ciphertext :: String
, plaintext :: String
}
-- a smart constructor for Foo's
foo c = Foo { ciphertext = c, plaintext = crack c }
The point here is that calls to foo have expenses like this:
If you never ask for the plaintext of the result, it's cheap.
On the first call to plaintext, the CPU churns a long time.
On subsequent calls to plaintext, the previously computed answer is returned immediately.
I'm aware of partial updates for records like :
data A a b = A { a :: a, b :: b }
x = A { a=1,b=2 :: Int }
y = x { b = toRational (a x) + 4.5 }
Are there any tricks for doing only partial initialization, creating a subrecord type, or doing (de)serialization on subrecord?
In particular, I found that the first of these lines works but the second does not :
read "A {a=1,b=()}" :: A Int ()
read "A {a=1}" :: A Int ()
You could always massage such input using a regular expression, but I'm curious what Haskell-like options exist.
Partial initialisation works fine: A {a=1} is a valid expression of type A Int (); the Read instance just doesn't bother parsing anything the Show instance doesn't output. The b field is initialised to error "...", where the string contains file/line information to help with debugging.
You generally shouldn't be using Read for any real-world parsing situations; it's there for toy programs that have really simple serialisation needs and debugging.
I'm not sure what you mean by "subrecord", but if you want serialisation/deserialisation that can cope with "upgrades" to the record format to contain more information while still being able to process old (now "partial") serialisations, then the safecopy library does just that.
You cannot leave some value in Haskell "uninitialized" (it would not be possible to "initialize" it later anyway, since Haskell is pure). If you want to provide "default" values for the fields, then you can make some "default" value for your record type, and then do a partial update on that default value, setting only the fields you care about. I don't know how you would implement read for this in a simple way, however.
Data.Binary is great. There is just one question I have. Let's imagine I've got a datatype like this:
import Data.Binary
data Ref = Ref {
refName :: String,
refRefs :: [(String, Ref)]
}
instance Binary Ref where
put a = put (refName a) >> put (refRefs a)
get = liftM2 Ref get get
It's easily to see that this is a recursive datatype, which works because Haskell is lazy. Since Haskell as a language uses neither references nor pointers, but presents the data as-is, I am not sure how this is going to be saved. I have the strong indication that this naive reproach will lead to an infinite bytestring...
So how can this type be safely saved?
If your data has no cycles you'll be fine. But a cycle, like
r = Ref "a" [("b", r)]
is indeed going to generate an infinite result. The only way around this is for you to give unique labels to all nodes and use those to avoid cycles when converting to binary.