Related
In my app I'm doing a lot of conversions from Text to various datatypes, often just to Text itself, but sometimes to other datatypes.
I also rarely do conversions from other string types, e.g. String and ByteString.
Interestingly, Readable.fromText does the job for me, at least for Integer and Text. However I also now need UTCTime, which Readable.fromText doesn't have an instance for (but which I could write myself).
I was thinking that Readable.fromText was a Text analogy of Text.Read.readEither for [Char], however I've realised that Readable.fromText is actually subtlety different, in that readEither for text isn't just pure, but instead expects the input string to be quoted. This isn't the case however for reading integers however, who don't expect quotes.
I understand that this is because show shows strings with quotes, so for read to be consistent it needs to require quotes.
However this is not the behaviour I want. I'm looking for a typeclass where reading strings to strings is basically the id function.
Readable seems to do this, but it's misleadingly named, as its behaviour is not entirely analogous to read on [Char]. Is there another typeclass that has this behaviour also? Or am I best of just extending Readable, perhaps with newtypes or alternatively PRs?
The what
Just use Data.Text and Data.Text.Read directly
With signed decimal or just decimal you get a simple and yet expressive minimalistic parser function. It's directly usable:
type Reader a = Text -> Either String (a, Text)
decimal :: Integral a => Reader a
signed :: Num a => Reader a -> Reader a
Or you cook up your own runReader :: Reader a -> M a combinator for some M to possibly handle non-empty leftover and deal with the Left case.
For turning a String -> Text, all you have to do is use pack
The why
Disclaimer: The matter of parsing data the right way is answered differently depending on who you ask.
I belong to the school that believes typeclasses are a poor fit for parsing mainly for two reasons.
Typeclasses limit you to one instance per type
You can easily have two different time formats in the data. Now you might tell yourself that you only have one use case, but what if you depend on another library that itself or transitively introduces another instance Readable UTCTime? Now you have to use newtypes for no reason other than be able to select a particular implementation, which is not nice!
Code transparency
You cannot make any inference as to what parser behavior you get from a typename alone. And for the most part haddock instance documentation often does not exist because it is often assumed the behavior be obvious.
Consider for example: What will instance Readable Int64 do?
Will it assume an ASCII encoded numeric representation? Or some binary representation?
If binary, which endianness is going to be assumed?
What representation of signedness is expected? In ASCII case perhaps a minus? Or maybe with a space? Or if binary, is it going to be one-complement? Two-complement?
How will it handle overflow?
Code transparency on call-sites
But the intransparency extends to call-sites as well. Consider the following example
do fieldA <- fromText
fieldB <- fromText
fieldB <- fromText
pure T{..}
What exactly does this do? Which parsers will be invoked? You will have to know the types of fieldA, fieldB and fieldB to answer that question. Now in simple code that might seem obvious, but you might easily forget if you look at the same code 2 weeks from now. Or you have more elaborate code, where the types involves are inferred non-locally. It becomes hard to follow which instance this will end up selecting (and the instance can make a huge difference, especially if you start newtyping for different formats. Say you cannot make any inference from a field name fooTimestamp because it might perhaps be UnixTime or UTCTime)
And much worse: If you refactor and alter one of the field types data declaration from one type to another - say a time field from Word64 to UTCTime - this might silently and unexpectedly switch out to a different parser, leading to a bug. Yuk!
On the topic of Show/Read
By the way, the reason why show/read behave they way they do for Prelude instances and deriving-generated instances can be discovered in the Haskell Report 2010.
On the topic of show it says
The result of show is a syntactically correct Haskell expression
containing only constants [...]
And equivalently for read
The result of show is readable by read if all component types are readable.
(This is true for all instances defined in the Prelude but may not be true
for user-defined instances.) [...]
So show for a string foo produces "foo" because that is the syntactically valid Haskell literal representing the string value of foo, and read will read that back, acting as a kind of eval
My question came up while following the tutorial Functors, Applicatives, And Monads In Pictures and its JavaScript version.
When the text says that functor unwraps value from the context, I understand that a Just 5 -> 5 transformation is happening. As per What does the "Just" syntax mean in Haskell? , Just is "defined in scope" of the Maybe monad.
My question is what is so magical about the whole unwrapping thing? I mean, what is the problem of having some language rule which automatically unwraps the "scoped" variables? It looks to me that this action is merely a lookup in some kind of a table where the symbol Just 5 corresponds to the integer 5.
My question is inspired by the JavaScript version, where Just 5 is prototype array instance. So unwrapping is, indeed, not rocket science at all.
Is this a "for-computation" type of reason or a "for-programmer" one? Why do we distinguish Just 5 from 5 on the programming language level?
First of all, I don't think you can understand Monads and the like without understanding a Haskell like type system (i.e. without learning a language like Haskell). Yes, there are many tutorials that claim otherwise, but I've read a lot of them before learning Haskell and I didn't get it. So my advice: If you want to understand Monads learn at least some Haskell.
To your question "Why do we distinguish Just 5 from 5 on the programming language level?". For type safety. In most languages that happen not to be Haskell null, nil, whatever, is often used to represent the absence of a value. This however often results in things like NullPointerExceptions, because you didn't anticipate that a value may not be there.
In Haskell there is no null. So if you have a value of type Int, or anything else, that value can not be null. You are guarantied that there is a value. Great! But sometimes you actually want/need to encode the absence of a value. In Haskell we use Maybe for that. So something of type Maybe Int can either be something like Just 5 or Nothing. This way it is explicit that the value may not be there and you can not accidentally forget that it might be Nothing because you have to explicitly unwrap the value.
This has nothing really to do with Monads, except that Maybe happens to implement the Monad type class (a type class is a bit like a Java interface, if you are familiar with Java). That is Maybe is not primarily a Monad, but just happens to also be a Monad.
I think you're looking at this from the wrong direction. Monad is explicitly not about unwrapping. Monad is about composition.
It lets you combine (not necessarily apply) a function of type a -> m b with a value of type m a to get a value of type m b. I can understand where you might think the obvious way to do that is unwrapping the value of type m a into an value of type a. But very few Monad instances work that way. In fact, the only ones that can work that way are the ones that are equivalent to the Identity type. For nearly all instances of Monad, it's just not possible to unwrap a value.
Consider Maybe. Unwrapping a value of type Maybe a into a value of type a is impossible when the starting value is Nothing. Monadic composition has to do something more interesting than just unwrapping.
Consider []. Unwrapping a value of type [a] into a value of type a is impossible unless the input just happens to be a list of length 1. In every other case, monadic composition is doing something more interesting than unwrapping.
Consider IO. A value like getLine :: IO String doesn't contain a String value. It's plain impossible to unwrap, because it isn't wrapping something. Monadic composition of IO values doesn't unwrap anything. It combines IO values into more complex IO values.
I think it's worthwhile to adjust your perspective on what Monad means. If it were only an unwrapping interface, it would be pretty useless. It's more subtle, though. It's a composition interface.
A possible example is this: consider the Haskell type Maybe (Maybe Int). Its values can be of the following form
Nothing
Just Nothing
Just (Just n) for some integer n
Without the Just wrapper we couldn't distinguish between the first two.
Indeed, the whole point of the optional type Maybe a is to add a new value (Nothing) to an existing type a. To ensure such Nothing is indeed a fresh value, we wrap the other values inside Just.
It also helps during type inference. When we see the function call f 'a' we can see that f is called at the type Char, and not at type Maybe Char or Maybe (Maybe Char). The typeclass system would allow f to have a different implementation in each of these cases (this is similar to "overloading" in some OOP languages).
My question is, what is so magical about the whole unwrapping thing?
There is nothing magical about it. You can use garden-variety pattern matching (here in the shape of a case expression) to define...
mapMaybe :: (a -> b) -> Maybe a -> Maybe b
mapMaybe f mx = case mx of
Just x -> Just (f x)
_ -> mx
... which is exactly the same than fmap for Maybe. The only thing the Functor class adds -- and it is a very useful thing, make no mistake -- is an extra level of abstraction that covers various structures that can be mapped over.
Why do we distinguish Just 5 from 5 on programming language level?
More meaningful than the distinction between Just 5 and 5 is the one between their types -- e.g. between Maybe Intand Int. If you have x :: Int, you can be certain x is an Int value you can work with. If you have mx :: Maybe Int, however, you have no such certainty, as the Int might be missing (i.e. mx might be Nothing), and the type system forces you to acknowledge and deal with this possibility.
See also: jpath's answer for further comments on the usefulness of Maybe (which isn't necessarily tied to classes such as Functor and Monad); Carl's answer for further comments on the usefulness of classes like Functor and Monad (beyond the Maybe example).
What "unwrap" means depends on the container. Maybe is just one example. "Unwrapping" means something completely different when the container is [] instead of Maybe.
The magical about the whole unwrapping thing is the abstraction: In a Monad we have a notion of "unwrapping" which abstracts the nature of the container; and then it starts to get "magical"...
You ask what Just means: Just is nothing but a Datatype constructor in Haskell defined via a data declaration like :
data Maybe a = Just a | Nothing
Just take a value of type a and creates a value of type Maybe a. It's Haskell's way to distinguigh values of type a from values of type Maybe a
First of all, you need to remove monads from your question. They have nothing to do this. Treat this articles as one of the points of view on the monads, maybe it does not suit you, you may still little understood in the type system that would understand monads in haskell.
And so, your question can be rephrased as: Why is there no implicit conversion Just 5 => 5? But answer is very simple. Because value Just 5 has type Maybe Integer, so this value may would be Nothing, but what must do compiler in this case? Only programmer can resolve this situation.
But there is more uncomfortable question. There are types, for example, newtype Identity a = Identity a. It's just wrapper around some value. So, why is there no impliciti conversion Identity a => a?
The simple answer is - an attempt to realize this would lead to a different system types, which would not have had many fine qualities that exist in the current. According to this, it can be sacrificed for the benefit of other possibilities.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
For a few weeks I’ve been thinking about relation between objects – not especially OOP’s objects. For instance in C++, we’re used to representing that by layering pointers or container of pointers in the structure that needs an access to the other object. If an object A needs to have an access to B, it’s not uncommon to find a B *pB in A.
But I’m not a C++ programmer anymore, I write programs using functional languages, and more especially in Haskell, which is a pure functional language. It’s possible to use pointers, references or that kind of stuff, but I feel strange with that, like “doing it the non-Haskell way”.
Then I thought a bit deeper about all that relation stuff and came to the point:
“Why do we even represent such relation by layering?
I read some folks already thought about that (here). In my point of view, representing relations through explicit graphes is way better since it enables us to focus on the core of our type, and express relations later are through combinators (a bit like SQL does).
By core I mean that when we define A, we expect to define what A is made of, not what it depends on. For instance, in a video game, if we have a type Character, it’s legit to talk about Trait, Skill or that kind of stuff, but is it if we talk about Weapon or Items? I’m not so sure anymore. Then:
data Character = {
chSkills :: [Skill]
, chTraits :: [Traits]
, chName :: String
, chWeapon :: IORef Weapon -- or STRef, or whatever
, chItems :: IORef [Item] -- ditto
}
sounds really wrong in term of design to me. I’d rather prefer something like:
data Character = {
chSkills :: [Skill]
, chTraits :: [Traits]
, chName :: String
}
-- link our character to a Weapon using a Graph Character Weapon
-- link our character to Items using a Graph Character [Item] or that kind of stuff
Furthermore, when a day comes to add new features, we can just create new types, new graphs and link. In the first design, we’d have to break the Character type, or use some kind of
work around to extend it.
What do you think about that idea? What do you think is best to deal with that kind of issues in Haskell, a pure functional language?
I have thought of using Haskell for a game server but when coding, I found myself looking at the part where I parse packets thinking "wow, this will result in a lot of pattern matching". This seeing the amount of matches to be done are many (walk there, attack that, loot that, open that, and so on).
What I do is:
Receive a packet
Parse the packet header into a hexadecimal String (say "02B5" for example)
Get rest of data from the packet
Match header in parseIO
Call the appropriate function with the packet content
It would be easy to map String -> method, but the methods have different amount of in-parameters.
I thought of the simple two ways of pattern matching shown below.
#1
packetIO :: String -> IO ()
packetIO packet =
case packet of
"02B5" -> function1
"ADD5" -> function2
... and so on
#2
packetIO :: String -> IO ()
packetIO "02B5" = function1
packetIO "ADD5" = function2
... and so on
Both looking at performance and coding style, is there a way to better handle the packets received from the client?
If you have any resources or links I failed to find, please do point me in their direction!
EDIT 130521:
Seems like both alternatives, listed below, are good choices. Just waiting to see answers to my questions in the comments before choosing which was the best solution for me.
Storing (ByteString -> Function) in a Map structure. O(log n)
Converting ByteString to Word16 and pattern match. O(log n) through tree or O(1) through lookup tables
EDIT 130521:
Decided to go for pattern matching with Word16 as Philip JF said.
Both are great alternatives and while my guess is both is equally fast, Map might be faster seeing I don't have to convert to Word16, the other option gave more readable code for my use:
packetIO 0x02B5 = function1
packetIO 0xADD5 = function2
etc
Why not parse to numbers (Word16 in Data.Word?) and then do the matching with that, instead of using strings? Haskell supports hex literals...
Both of your functions are equivalent. The compiler desugars the second one to the first one. Pattern matching is syntactic sugar for case.
case is optimal for this kind of thing. It compiles to a jump table, which is O(1). That means both of the solutions you listed are optimal.
As far as coding style goes, both styles are perfectly idiomatic. I personally prefer case over pattern matching, but I know a lot of other people prefer pattern matching for top-level functions.
I find this answer and this wiki page to be excellent introductions to memoization in Haskell. They do, however, still leave me with a question that I hope to get answered:
It seems to me that the technique used requires you to "open up" (as in "access the internals of") the data structure you use to store your memoization. For example, 1 implements a table structure and 2 implements a tree in section 3. Is it possible to do something similar with a pre-made data structure? Suppose, for example, that you think that Data.Map is really awesome, and would like to store your memoized values in such a Map. Can one approach memoization with a pre-made data structure such as this, where one does not implement the structure itself, but rather use a pre-made one?
Hopefully someone will give me a hint on how to think, or, perhaps more likely, correct my misunderstanding of functional memoization in general.
Edit: I can think of one way to do it, but it's not at all elegant: If f :: a -> b, then one can probably easily make a memoized version f' :: Map a b -> a -> (Map a b, b), where the first argument is the memoization storage, and the output pair contains a potentially updated storage and the computed value. This state-passing is certainly not what I want (although I guess it could be wrapped in a monad, but it's several orders of magnitudes uglier than the approach in 1 and 2).
Edit 2: Maybe it helps to try and express my current way of (incorrect) thought. Currently, I seem to repeatedly pull myself, against my will, into the non-solution
import qualified Data.Map as Map
memo :: (Ord a) => [a] -> (a -> b) -> (a -> b)
memo domain f = (Map.!) storage
where
storage = Map.fromList (zip domain (map f domain))
The more I stare at this, the more I realize I've misunderstood something basic. You see, it feels to me that my memo [True, False] is equivalent to the bool memoizer of 1.
If you notice, Data.Memocombinators actually relies on the "pre-made" Data.IntTrie. I'm sure you could take the same code and replace uses of the IntTrie with another data structure, though it may not be as efficient.
The general idea of memoization is to save computed results. In Haskell, the easiest way to do this is to map your function onto a table where the table has one dimension per parameter. Since Haskell is lazy (well, most data structures in Haskell are), it will only evaluate the value of a given cell when you specifically ask for it. "table" basically means "map" since it takes you from key(s) to value.
[edit] Additional thoughts regarding Example 2
If I'm not mistaken, then the first time (Map.!) storage is forced to evaluate for a given key, the storage structure will be entirely wrung out (though the computation f won't happen for anything but the given key). So the first lookup will cause an additional O(n) operation, n being length domain. Subsequent lookups would not, afaik, incur this cost.
Lazier structures like typical int-indexed lists or the IntTrie similarly need to manifest their structure when a lookup is invoked, but unlike a Map, they need not do so all at once. Lists are wrung out until the indexed key is accessed. IntTries wring out only the integer keys that are "prefixes" (or suffixes? not sure. could be implemented either way) of the desired key. Index 11, (1011) would wring out 1 (1), 2 (10), 5 (101), and 11 (1011). Data.Memocombinators simply transforms all keys into Ints (or "bits") so that an IntTrie can be used.
p.s. is there a better phrase than "wring out" for this? The words "force", "spine", and "manifest" come to mind, but I can't quite think of the right word/phrase for this.