Match a lot of patterns in Haskell efficiently

Match a lot of patterns in Haskell efficiently - haskell

I have thought of using Haskell for a game server but when coding, I found myself looking at the part where I parse packets thinking "wow, this will result in a lot of pattern matching". This seeing the amount of matches to be done are many (walk there, attack that, loot that, open that, and so on).
What I do is:
Receive a packet
Parse the packet header into a hexadecimal String (say "02B5" for example)
Get rest of data from the packet
Match header in parseIO
Call the appropriate function with the packet content
It would be easy to map String -> method, but the methods have different amount of in-parameters.
I thought of the simple two ways of pattern matching shown below.
#1
packetIO :: String -> IO ()
packetIO packet =
case packet of
"02B5" -> function1
"ADD5" -> function2
... and so on
#2
packetIO :: String -> IO ()
packetIO "02B5" = function1
packetIO "ADD5" = function2
... and so on
Both looking at performance and coding style, is there a way to better handle the packets received from the client?
If you have any resources or links I failed to find, please do point me in their direction!
EDIT 130521:
Seems like both alternatives, listed below, are good choices. Just waiting to see answers to my questions in the comments before choosing which was the best solution for me.
Storing (ByteString -> Function) in a Map structure. O(log n)
Converting ByteString to Word16 and pattern match. O(log n) through tree or O(1) through lookup tables
EDIT 130521:
Decided to go for pattern matching with Word16 as Philip JF said.
Both are great alternatives and while my guess is both is equally fast, Map might be faster seeing I don't have to convert to Word16, the other option gave more readable code for my use:
packetIO 0x02B5 = function1
packetIO 0xADD5 = function2
etc

Why not parse to numbers (Word16 in Data.Word?) and then do the matching with that, instead of using strings? Haskell supports hex literals...

Both of your functions are equivalent. The compiler desugars the second one to the first one. Pattern matching is syntactic sugar for case.
case is optimal for this kind of thing. It compiles to a jump table, which is O(1). That means both of the solutions you listed are optimal.
As far as coding style goes, both styles are perfectly idiomatic. I personally prefer case over pattern matching, but I know a lot of other people prefer pattern matching for top-level functions.

Related

A version of "read" that doesn't require quoted strings

In my app I'm doing a lot of conversions from Text to various datatypes, often just to Text itself, but sometimes to other datatypes.
I also rarely do conversions from other string types, e.g. String and ByteString.
Interestingly, Readable.fromText does the job for me, at least for Integer and Text. However I also now need UTCTime, which Readable.fromText doesn't have an instance for (but which I could write myself).
I was thinking that Readable.fromText was a Text analogy of Text.Read.readEither for [Char], however I've realised that Readable.fromText is actually subtlety different, in that readEither for text isn't just pure, but instead expects the input string to be quoted. This isn't the case however for reading integers however, who don't expect quotes.
I understand that this is because show shows strings with quotes, so for read to be consistent it needs to require quotes.
However this is not the behaviour I want. I'm looking for a typeclass where reading strings to strings is basically the id function.
Readable seems to do this, but it's misleadingly named, as its behaviour is not entirely analogous to read on [Char]. Is there another typeclass that has this behaviour also? Or am I best of just extending Readable, perhaps with newtypes or alternatively PRs?

The what
Just use Data.Text and Data.Text.Read directly
With signed decimal or just decimal you get a simple and yet expressive minimalistic parser function. It's directly usable:
type Reader a = Text -> Either String (a, Text)
decimal :: Integral a => Reader a
signed :: Num a => Reader a -> Reader a
Or you cook up your own runReader :: Reader a -> M a combinator for some M to possibly handle non-empty leftover and deal with the Left case.
For turning a String -> Text, all you have to do is use pack
The why
Disclaimer: The matter of parsing data the right way is answered differently depending on who you ask.
I belong to the school that believes typeclasses are a poor fit for parsing mainly for two reasons.
Typeclasses limit you to one instance per type
You can easily have two different time formats in the data. Now you might tell yourself that you only have one use case, but what if you depend on another library that itself or transitively introduces another instance Readable UTCTime? Now you have to use newtypes for no reason other than be able to select a particular implementation, which is not nice!
Code transparency
You cannot make any inference as to what parser behavior you get from a typename alone. And for the most part haddock instance documentation often does not exist because it is often assumed the behavior be obvious.
Consider for example: What will instance Readable Int64 do?
Will it assume an ASCII encoded numeric representation? Or some binary representation?
If binary, which endianness is going to be assumed?
What representation of signedness is expected? In ASCII case perhaps a minus? Or maybe with a space? Or if binary, is it going to be one-complement? Two-complement?
How will it handle overflow?
Code transparency on call-sites
But the intransparency extends to call-sites as well. Consider the following example
do fieldA <- fromText
fieldB <- fromText
fieldB <- fromText
pure T{..}
What exactly does this do? Which parsers will be invoked? You will have to know the types of fieldA, fieldB and fieldB to answer that question. Now in simple code that might seem obvious, but you might easily forget if you look at the same code 2 weeks from now. Or you have more elaborate code, where the types involves are inferred non-locally. It becomes hard to follow which instance this will end up selecting (and the instance can make a huge difference, especially if you start newtyping for different formats. Say you cannot make any inference from a field name fooTimestamp because it might perhaps be UnixTime or UTCTime)
And much worse: If you refactor and alter one of the field types data declaration from one type to another - say a time field from Word64 to UTCTime - this might silently and unexpectedly switch out to a different parser, leading to a bug. Yuk!
On the topic of Show/Read
By the way, the reason why show/read behave they way they do for Prelude instances and deriving-generated instances can be discovered in the Haskell Report 2010.
On the topic of show it says
The result of show is a syntactically correct Haskell expression
containing only constants [...]
And equivalently for read
The result of show is readable by read if all component types are readable.
(This is true for all instances defined in the Prelude but may not be true
for user-defined instances.) [...]
So show for a string foo produces "foo" because that is the syntactically valid Haskell literal representing the string value of foo, and read will read that back, acting as a kind of eval

How to craft a type matching "a list with a single element of type Int"?

I am currently writing my very first program in Haskell.
In the specification I am working with, [0] 5 is used to define a MAC key that could be written "\x00\x00\x00\x00\x00"::ByteString.
I somewhat fancy the idea of reusing that notation (even though it makes very little sense from a programming perspective). Eventually writing mackey so that mackey [0] 5 does the right thing was simple enough.
The only question that remains is how to define my input type so that it enforces the use of a list with a single integer element. Is that even possible?
NB: normally, I wouldn't bother too much about that. I shouldn't even use a list in such case: a simple Int would be enough and "enforce" everything I need; so I know that the correct way is to use a simple integer. But this is a very good way to explore what can be done (or not) with Haskell type system. :)

As you've observed yourself, a single Int does exactly what's needed and is probably the way to go. Don't use a list if you don't want a list!
That said, using a plain Int may not be the best thing either. Perhaps you want to be clear what's the meaning of each argument. You might for that purpose make an alias for Int and call it accordingly:
newtype KeyWord = KeyWord Int
macKey :: KeyWord -> Int -> MAC
In this case the syntax at the call site would then be macKey (KeyWord 0) 5.
It would be possible to shorten that a bit more, but it's probably not worth it. In fact, even the newtype is probably overkill – the main benefit is that the type signature becomes more explicit, but for calling the function this is mostly boilerplate. A simple type-alias is probably enough:
type KeyWord = Int
and then you can again write macKey 0 5 while retaining the clear signature.
If you need to write out lots of those keys in a concise manner, you might consider making macKey and infix operator:
infix 7 #*
(#*) :: KeyWord -> Int -> MAC
and then write 0#*5.

Idiomatic way to take a substring of a ByteString

I need to make extensive use of:
slice :: Int -> Int -> ByteString -> ByteString
slice start len = take len . drop start
Two part question:
Does this already have a name? I can't find anything searching for that type on Hoogle, but it seems like it should be a really common need. I also tried searching for (Int, Int) -> ByteString -> ByteString and some flip'd versions of same. I also tried looking for [a] versions to see if there was a name in common use.
Is there a better way to write it?
I'm suspicious that I'm doing something wrong because I strongly expected to find lots of people having gone down the same road, but my google-fu isn't finding anything.

The idiomatic way is via take and drop, which has O(1) complexity on strict bytestrings.
slice is not provided, to discourage the reliance on unsafe indexing operations.

According to the documentation there is no such function. Currently strict ByteStrings are represented as a pointer to beggining of pinned memory, an offset and a length. So, indeed, your implementation is better way to do splice. However, you should be careful with splices because spliced bytestrings takes the same amount of space as the original bytestring. In order to avoid this you might want to copy a spliced bytestring, but this is not always necessarily.

Memoization in Haskell using premade data structures

I find this answer and this wiki page to be excellent introductions to memoization in Haskell. They do, however, still leave me with a question that I hope to get answered:
It seems to me that the technique used requires you to "open up" (as in "access the internals of") the data structure you use to store your memoization. For example, 1 implements a table structure and 2 implements a tree in section 3. Is it possible to do something similar with a pre-made data structure? Suppose, for example, that you think that Data.Map is really awesome, and would like to store your memoized values in such a Map. Can one approach memoization with a pre-made data structure such as this, where one does not implement the structure itself, but rather use a pre-made one?
Hopefully someone will give me a hint on how to think, or, perhaps more likely, correct my misunderstanding of functional memoization in general.
Edit: I can think of one way to do it, but it's not at all elegant: If f :: a -> b, then one can probably easily make a memoized version f' :: Map a b -> a -> (Map a b, b), where the first argument is the memoization storage, and the output pair contains a potentially updated storage and the computed value. This state-passing is certainly not what I want (although I guess it could be wrapped in a monad, but it's several orders of magnitudes uglier than the approach in 1 and 2).
Edit 2: Maybe it helps to try and express my current way of (incorrect) thought. Currently, I seem to repeatedly pull myself, against my will, into the non-solution
import qualified Data.Map as Map
memo :: (Ord a) => [a] -> (a -> b) -> (a -> b)
memo domain f = (Map.!) storage
where
storage = Map.fromList (zip domain (map f domain))
The more I stare at this, the more I realize I've misunderstood something basic. You see, it feels to me that my memo [True, False] is equivalent to the bool memoizer of 1.

If you notice, Data.Memocombinators actually relies on the "pre-made" Data.IntTrie. I'm sure you could take the same code and replace uses of the IntTrie with another data structure, though it may not be as efficient.
The general idea of memoization is to save computed results. In Haskell, the easiest way to do this is to map your function onto a table where the table has one dimension per parameter. Since Haskell is lazy (well, most data structures in Haskell are), it will only evaluate the value of a given cell when you specifically ask for it. "table" basically means "map" since it takes you from key(s) to value.
[edit] Additional thoughts regarding Example 2
If I'm not mistaken, then the first time (Map.!) storage is forced to evaluate for a given key, the storage structure will be entirely wrung out (though the computation f won't happen for anything but the given key). So the first lookup will cause an additional O(n) operation, n being length domain. Subsequent lookups would not, afaik, incur this cost.
Lazier structures like typical int-indexed lists or the IntTrie similarly need to manifest their structure when a lookup is invoked, but unlike a Map, they need not do so all at once. Lists are wrung out until the indexed key is accessed. IntTries wring out only the integer keys that are "prefixes" (or suffixes? not sure. could be implemented either way) of the desired key. Index 11, (1011) would wring out 1 (1), 2 (10), 5 (101), and 11 (1011). Data.Memocombinators simply transforms all keys into Ints (or "bits") so that an IntTrie can be used.
p.s. is there a better phrase than "wring out" for this? The words "force", "spine", and "manifest" come to mind, but I can't quite think of the right word/phrase for this.

Good Haskell coding standards

Could someone provide a link to a good coding standard for Haskell? I've found this and this, but they are far from comprehensive. Not to mention that the HaskellWiki one includes such "gems" as "use classes with care" and "defining symbolic infix identifiers should be left to library writers only."

Really hard question. I hope your answers turn up something good. Meanwhile, here is a catalog of mistakes or other annoying things that I have found in beginners' code. There is some overlap with the Cal Tech style page that Kornel Kisielewicz points to. Some of my advice is every bit as vague and useless as the HaskellWiki "gems", but I hope at least it is better advice :-)
Format your code so it fits in 80 columns. (Advanced users may prefer 87 or 88; beyond that is pushing it.)
Don't forget that let bindings and where clauses create a mutually recursive nest of definitions, not a sequence of definitions.
Take advantage of where clauses, especially their ability to see function parameters that are already in scope (nice vague advice). If you are really grokking Haskell, your code should have a lot more where-bindings than let-bindings. Too many let-bindings is a sign of an unreconstructed ML programmer or Lisp programmer.
Avoid redundant parentheses. Some places where redundant parentheses are particularly offensive are
Around the condition in an if expression (brands you as an unreconstructed C programmer)
Around a function application which is itself the argument of an infix operator (Function application binds tighter than any infix operator. This fact should be burned into every Haskeller's brain, in much the same way that us dinosaurs had APL's right-to-left scan rule burned in.)
Put spaces around infix operators. Put a space following each comma in a tuple literal.
Prefer a space between a function and its argument, even if the argument is parenthesized.
Use the $ operator judiciously to cut down on parentheses. Be aware of the close relationship between $ and infix .:
f $ g $ h x == (f . g . h) x == f . g . h $ x
Don't overlook the built-in Maybe and Either types.
Never write if <expression> then True else False; the correct phrase is simply <expression>.
Don't use head or tail when you could use pattern matching.
Don't overlook function composition with the infix dot operator.
Use line breaks carefully. Line breaks can increase readability, but there is a tradeoff: Your editor may display only 40–50 lines at once. If you need to read and understand a large function all at once, you mustn't overuse line breaks.
Almost always prefer the -- comments which run to end of line over the {- ... -} comments. The braced comments may be appropriate for large headers—that's it.
Give each top-level function an explicit type signature.
When possible, align -- lines, = signs, and even parentheses and commas that occur in adjacent lines.
Influenced as I am by GHC central, I have a very mild preference to use camelCase for exported identifiers and short_name with underscores for local where-bound or let-bound variables.

Some good rules of thumbs imho:
Consult with HLint to make sure you don't have redundant braces and that your code isn't pointlessly point-full.
Avoid recreating existing library functions. Hoogle can help you find them.
Often times existing library functions are more general than what one was going to make. For example if you want Maybe (Maybe a) -> Maybe a, then join does that among other things.
Argument naming and documentation is important sometimes.
For a function like replicate :: Int -> a -> [a], it's pretty obvious what each of the arguments does, from their types alone.
For a function that takes several arguments of the same type, like isPrefixOf :: (Eq a) => [a] -> [a] -> Bool, naming/documentation of arguments is more important.
If one function exists only to serve another function, and isn't otherwise useful, and/or it's hard to think of a good name for it, then it probably should exist in it's caller's where clause instead of in the module's scope.
DRY
Use Template-Haskell when appropriate.
Bundles of functions like zip3, zipWith3, zip4, zipWith4, etc are very meh. Use Applicative style with ZipLists instead. You probably never really need functions like those.
Derive instances automatically. The derive package can help you derive instances for type-classes such as Functor (there is only one correct way to make a type an instance of Functor).
Code that is more general has several benefits:
It's more useful and reusable.
It is less prone to bugs because there are more constraints.
For example if you want to program concat :: [[a]] -> [a], and notice how it can be more general as join :: Monad m => m (m a) -> m a. There is less room for error when programming join because when programming concat you can reverse the lists by mistake and in join there are very few things you can do.
When using the same stack of monad transformers in many places in your code, make a type synonym for it. This will make the types shorter, more concise, and easier to modify in bulk.
Beware of "lazy IO". For example readFile doesn't really read the file's contents at the moment the file is read.
Avoid indenting so much that I can't find the code.
If your type is logically an instance of a type-class, make it an instance.
The instance can replace other interface functions you may have considered with familiar ones.
Note: If there is more than one logical instance, create newtype-wrappers for the instances.
Make the different instances consistent. It would have been very confusing/bad if the list Applicative behaved like ZipList.

I like to try to organize functions
as point-free style compositions as
much as possible by doing things
like:
func = boo . boppity . bippity . snd
where boo = ...
boppity = ...
bippity = ...
I like using ($) only to avoid nested parens or long parenthesized expressions
... I thought I had a few more in me, oh well

I'd suggest taking a look at this style checker.

I found good markdown file covering almost every aspect of haskell code style. It can be used as cheat sheet. You can find it here: link

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string