Serialization of a TChan String - haskell

I have declared the following
type KEY = (IPv4, Integer)
type TPSQ = TVar (PSQ.PSQ KEY POSIXTime)
type TMap = TVar (Map.Map KEY [String])
data Qcfg = Qcfg { qthresh :: Int, tdelay :: Rational, cwpsq :: TPSQ, cwmap :: TMap, cw
chan :: TChan String } deriving (Show)
and would like this to be serializable in a sense that Qcfg can either be written to disk or be sent over the network. When I compile this I get the error
No instances for (Show TMap, Show TPSQ, Show (TChan String))
arising from the 'deriving' clause of a data type declaration
Possible fix:
add instance declarations for
(Show TMap, Show TPSQ, Show (TChan String))
or use a standalone 'deriving instance' declaration,
so you can specify the instance context yourself
When deriving the instance for (Show Qcfg)
I am now not quite sure whether there is a chance at all to serialize my TChan although all individual nodes in it are members of the show class.
For TMap and TPSQ I wonder whether there are ways to show the values in the TVar directly (because it does not get changed, so there should no need to lock it) without having to declare an instance that does a readTVar ?

I understood your comments to mean that you want to serialize the contents of the TVar and not the TVar itself.
There is only one way to extract the value from a TVar, and that's readTVar:
readTVar :: TVar a -> STM a
... which you can do in the IO monad using atomically:
atomically . readTVar :: TVar a -> IO a
TChan is more tricky, though, since you can't inspect the contents without flushing out the entire TChan. This is doable, even if wastefully, by inspecting the entire contents as a single STM action and then reinserting them all. If this is what you choose to do, it would also eventually require being run in the IO monad.
This means you won't be able to derive a Show instance for it, since Show expects a pure computation that converts it to a String, and not one residing in the IO monad.
However, there's no reason you have to use the Show class. You can just define a custom function to serialize your data type in the IO monad. Also, it's generally not advisable to use Show for serialization purposes since:
Some of your data types (like PSQ) have no Read instance
It's a pain in the butt to define Read instances in general
String representations are very space-inefficient
So I would recommend you use a proper serialization library like binary or cereal to do serialization and deserialization. These convert data types to a binary representation, and they make it very easy to define encoders and decoders.
However, even those libraries only accept instances for pure conversions and not operations in the IO monad, so what you must do is factor your serialization into a two-step process:
Extract the contents of your TVars in the IO monad.
Serialize the contents (along with the rest of your data-type) using cereal/binary.
There is still one last caveat, which is that not all of your data types have Binary instances (assuming we use the binary package), but fortunately lists do have a Binary instance, so a convenient work-around is to just convert your data type to a list (using toList and serialize the list. Then, when you deserialize the list, you use fromList to recover your original type.
So the following function will do all of that (using binary):
serializeQcfg file (Qcfg qthresh tdelay cwpsq cwmap cwchan) = do
-- Step 1: Extract contents of concurrency variables
psq <- atomically $ readTVar cwpsq
myMap <- atomically $ readTVar cwmap
myChan <- atomically $ entireTChan cwchan
-- Step 2: Encode the extracted data
encodeFile file (qthresh, tdelay, toList psq, myMap, myChan)
Edit: Actually, it's probably better to combine all the atomic transactions into a single transaction as Daniel pointed out, so you would actually do:
serializeQcfg file (Qcfg qthresh tdelay cwpsq cwmap cwchan) = do
-- Step 1: Extract contents of concurrency variables
(psq, myMap, myChain) <- atomically $ (,,) <$> readTVar cwpsq
<*> readTVar cwmap
<*> entireTChan cwchan
-- Step 2: Encode the extracted data
encodeFile file (qthresh, tdelay, toList psq, myMap, myChan)
I left out the implementation of entireTChan, which would basically flush the TChan to inspect the entire contents and then reload it again, but its type signature would be something like:
entireTChan :: TChan a -> STM [a]
I also left out the deserialization implementation, but I think if you understand the above example and take the time to learn how to use the binary or cereal packages you should be able to figure it out easily enough.

Related

Why do Data.Binary instances of bytestring add the length of the bytestring as prefix

Looking at the put instances of the various ByteString types we find that the length of the bytestring is always prefixed in the binary file before writing it. For example here - https://hackage.haskell.org/package/binary-0.8.8.0/docs/src/Data.Binary.Class.html#put
Taking an example
instance Binary B.ByteString where
put bs = put (B.length bs) -- Why this??
<> putByteString bs
get = get >>= getByteString
Is there any particular reason for doing this? And is the only way to write Bytestring without prefixing the length - creating our own newtype wrapper and having an instance for Binary?
Is there any particular reason for doing this?
The idea of get and put is that you can combine several objects. For example you can write:
write_func :: ByteString -> Char -> Put
write_func some_bytestring some_char = do
put some_bytestring
put some_char
then you want to define a function that can read the data back, and evidently you want the two functions to act together as an identity function: that if the writer writes a certain ByteString and a certain Char, then you want the read function to read the same ByteString and character.
The reader function should look similar to:
read_fun :: Get (ByteString, Char)
read_fun = do
bs <- get
c <- get
return (bs, c)
but the problem is, when does a ByteString ends? The 'A' character could also be part of the ByteString. You thus need to somehow indicate where the ByteString ends. This can be done by saving the length, or some marker at the end. In case of a marker, you will need to "escape" the bytestring, such that it can not contain the marker itself.
But you thus need some mechanism to specify that when the ByteString ends.
And is the only way to write Bytestring without prefixing the length - creating our own newtype wrapper and having an instance for Binary?
No, in fact it is already in the instance definition. If you want to write a ByteString without length, then you can use putByteString :: ByteString -> Put:
write_func :: ByteString -> Char -> Put
write_func some_bytestring some_char = do
putByteString some_bytestring
put some_char
but when reading the ByteString back, you will need to figure out how many bytes you have to read.

Sharing observer function code between mutable and frozen versions of a type

I was working on creating my own custom mutable/frozen data type that internally contains an MVector/Vector. It needs to be mutable for performance reasons so switching to an immutable data structure is not something I am considering.
It seems like implementing an observer function for one of the two versions should allow me to just steal that implementation for the other type. Here are the two options I am considering:
render :: Show a => MCustom s a -> ST s String
render mc = ...non trivial implementation...
show :: Show a => Custom a -> a
show c = runST $ render =<< unsafeThaw c
Where unsafeThaw calls Vector.unsafeThaw under the covers, which should be safe as that thawed vector is never mutated, only read. This approach feels the cleanest, the only downside is that render is strict, which forces show to be strict whereas a duplicate implementation could correctly stream the String without forcing it all at once.
The other option, which feels much more dirty but that I think is safe is to do this:
show :: Show a => Custom a -> a
show c = ...non trivial implementation that allows lazy streaming...
render :: Show a => MCustom s a -> ST s String
render mc = do
s <- show <$> unsafeFreeze mc
s `deepseq` pure s
Are either of these my best option? If not what else should I do?
To me it seemed most intuitive to build one version off of the other. But it seems like if I make the mutable version the base version then I will end up with a lot more strictness then I want, even if the implementations seem fairly clean and logical, just because ST necessitates strictness unless I throw in some unsafeInterleaveST calls, but these would only be safe when the mutable observer was called via an immutable object.
On the other hand if I make the immutable version the base version then I will end up with more dirty, deepseq code, and sometimes I would just have to reimplement things. For example all in place editing functions can be done on a frozen object pretty easily by just copying the frozen object and then calling unsafeThaw on it and modifying the copy in place before calling unsafeFreeze and returning it. But doing the opposite isn't really doable, as a copy modification that is used for the immutable version cannot be converted to an in place modification.
Should I perhaps write all modification functions alongside the mutable implementation, and all observer functions alongside the immutable implementation. And then have a file that depends on both that unifies everything via unsafeThaw and unsafeFreeze?
How about having a pure function
show :: (StringLike s, Show a) => Custom a -> s
You can get both lazy and strict output with different instantiations of s, in which cons is either lazy or strict; e.g. String or Text:
class StringLike s where
cons :: Char -> s -> s
nil :: s
uncons :: s -> Maybe (Char, s)
instance StringLike String where ...
instance StringLike Text where ...
You could use other methods, e.g. phantom types, or simply having two functions (showString and showText), to distinguish between lazy and strict output if you like. But if you look at types as specifications of a function's semantics, then the place to indicate laziness or strictness is in the return type of that operation. This removes the need for some sort of strict show for Custom inside of ST.
For the MCustom version, you probably do not export the String version, e.g:
render :: MCustom s a -> ST s Text
render a = show <$> unsafeFreeze a
You can throw in a seq to force the result when the function runs but the entire Text would be forced when any character is used anyways.
But the simplest solution seems to just abstract the pattern of using a mutable structure in an immutable fashion, e.g.
atomically :: (NFData a) => (Custom x -> a) -> MCustom s x -> ST s a
atomically f v = do
r <- f <$> unsafeFreeze v
r `deepseq` pure r
This saves you from using unsafeFreeze/deepseq all over your code, just as you have modify to do immutable operations on mutable vectors.

How to modify parts of a State in Haskell

I have a number of operations which modify a System. System is defined like this:
data System = Sys {
sysId :: Int,
sysRand :: StdGen,
sysProcesses :: ProcessDb,
sysItems :: ItemDb
}
with e.g.
type ProcessDb = M.Map Int Process
But I also have some functions, which do not need access to the full System, but have types like this:
foo' :: (Process, ItemDb) -> ((Process, ItemDb),[Event])
Currently I gave them types like
foo: System -> (System, [Event])
But this is a needlessly broad interface. To use the narrow interface above in conjuntion with System I would have to extract a single Process and the ItemDb from System, run foo' and then modify System with the results.
This is quite some unwrapping and wrapping and results in more lines of code than just passing system as a whole and let foo extract whatever it needs. In the latter case, the wrapping and unwrapping is mingled with the actual foo' operation and I have the feeling that these two aspects should be separated.
I suppose I need some kind of lifting operation which turns a narrow foo' into a foo. I suppose I could write this, but I would have to write such a lifter for every signature of the narrow functions, resulting is lots of different lifters.
is there an idiom how to solve such problems?
is it worth bothering?
One common solution is to use a class, possibly created by the Template Haskell magic of Control.Lens.TH.makeClassy. The gist is that you pass in the whole System, but you don't let the function know that that's what you're giving it. All it's allowed to know is that what you're giving it offers methods for getting and/or modifying the pieces it's supposed to handle.
I ended up writing a function which work on any State and which requires a "Lens" which captures the specfic transformation from the bigger State to the smaller State and back
focus :: (Lens s' s) -> State s' a -> State s a
focus lens ms'= do
s <- get
let (s', set) = lens s
(a, s'') = runState ms' s'
put (set s'')
return a
It allows me to write things like
run :: ExitP -> State SimState Log
...
do
evqs' <-focus onSys $ step (t,evt)
...
Where step operates on the "smaller" state
step :: Timed Event -> State Sys.System [EventQu]
Here onSys is a "Lens" and it works like this:
onSys :: Lens Sys.System SimState
onSys (Sis e s) = (s, Sis e)
where
data SimState = Sis {
events :: EventQu,
sisSys :: Sys.System
I suppose the existing Lens libraries follow a similar approach, but do much more magic, like creating lenses automatically. I did shy away from lenses. Instead I was pleased to realise that all it takes was a few lines of codes to get what I need.

Haskell code littered with TVar operations and functions taking many arguments: code smell?

I'm writing a MUD server in Haskell (MUD = Multi User Dungeon: basically, a multi-user text adventure/role-playing game). The game world data/state is represented in about 15 different IntMaps. My monad transformer stack looks like this: ReaderT MudData IO, where the MudData type is a record type containing the IntMaps, each in its own TVar (I'm using STM for concurrency):
data MudData = MudData { _armorTblTVar :: TVar (IntMap Armor)
, _clothingTblTVar :: TVar (IntMap Clothing)
, _coinsTblTVar :: TVar (IntMap Coins)
...and so on. (I'm using lenses, thus the underscores.)
Some functions need certain IntMaps, while other functions need others. Thus, having each IntMap in its own TVar provides granularity.
However, a pattern has emerged in my code. In the functions that handle player commands, I need to read (and sometimes later write) to my TVars within the STM monad. Thus these functions end up having an STM helper defined in their where blocks. These STM helpers often have quite a few readTVar operations in them, as most commands need to access a handful of the IntMaps. Furthermore, a function for a given command may call out to a number of pure helper functions that also need some or all of the IntMaps. These pure helper functions thus sometimes end up taking a lot of arguments (sometimes over 10).
So, my code has become "littered" with lots of readTVar expressions and functions that take a large number of arguments. Here are my questions: is this a code smell? Am I missing some abstraction that would make my code more elegant? Is there a more ideal way to structure my data/code?
Thanks!
The solution to this problem is in changing the pure helper functions. We don't really want them to be pure, we want to leak out a single side-effect - whether or not they read specific pieces of data.
Let's say we have a pure function that uses only clothing and coins:
moreVanityThanWealth :: IntMap Clothing -> IntMap Coins -> Bool
moreVanityThanWealth clothing coins = ...
It's usually nice to know that a function only cares about e.g. clothing and coins, but in your case this knowledge is irrelevant and is just creating headaches. We are going to deliberately forget this detail. If we followed mb14's suggestion, we would pass an entire pure MudData' like the following to the helper functions.
data MudData' = MudData' { _armorTbl :: IntMap Armor
, _clothingTbl :: IntMap Clothing
, _coinsTbl :: IntMap Coins
moreVanityThanWealth :: MudData' -> Bool
moreVanityThanWealth md =
let clothing = _clothingTbl md
coins = _coinsTbl md
in ...
MudData and MudData' are almost identical to each other. One of them wraps its fields in TVars and the other one doesn't. We can modify MudData so that it takes an extra type parameter (of kind * -> *) for what to wrap the fields in. MudData will have the slightly unusual kind (* -> *) -> *, which is closely related to lenses but doesn't have much library support. I call this pattern a Model.
data MudData f = MudData { _armorTbl :: f (IntMap Armor)
, _clothingTbl :: f (IntMap Clothing)
, _coinsTbl :: f (IntMap Coins)
We can recover the original MudData with MudData TVar. We can recreate the pure version by wrapping the fields in Identity, newtype Identity a = Identity {runIdentity :: a}. In terms of MudData Identity, our function would be written as
moreVanityThanWealth :: MudData Identity -> Bool
moreVanityThanWealth md =
let clothing = runIdentity . _clothingTbl $ md
coins = runIdentity . _coinsTbl $ md
in ...
We've successfully forgotten which parts of the MudData we've used, but now we don't have the lock granularity we want. We need to recover, as a side effect, exactly what we just forgot. If we wrote the STM version of the helper it would look like
moreVanityThanWealth :: MudData TVar -> STM Bool
moreVanityThanWealth md =
do
clothing <- readTVar . _clothingTbl $ md
coins <- readTVar . _coinsTbl $ md
return ...
This STM version for MudData TVar is almost exactly the same as the pure version we just wrote for MudData Identity. They only differ by the type of the reference (TVar vs. Identity), what function we use to get the values out of the references (readTVar vs runIdentity), and how the result is returned (in STM or as a plain value). It would be nice if the same function could be used to provide both. We are going to extract what is common between the two functions. To do so, we'll introduce a type class MonadReadRef r m for the Monads we can read some type of reference from. r is the type of the reference, readRef is the function to get the values out of the references, and m is how the result is returned. The following MonadReadRef is closely related to the MonadRef class from ref-fd.
{-# LANGUAGE FunctionalDependencies #-}
class Monad m => MonadReadRef r m | m -> r where
readRef :: r a -> m a
As long as code is parameterized over all MonadReadRef r ms, it is pure. We can see this by running it with the following instance of MonadReadRef for ordinary values held in an Identity. The id in readRef = id is the same as return . runIdentity.
instance MonadReadRef Identity Identity where
readRef = id
We'll rewrite moreVanityThanWealth in terms of MonadReadRef.
moreVanityThanWealth :: MonadReadRef r m => MudData r -> m Bool
moreVanityThanWealth md =
do
clothing <- readRef . _clothingTbl $ md
coins <- readRef . _coinsTbl $ md
return ...
When we add a MonadReadRef instance for TVars in STM, we can use these "pure" computations in STM but leak the side-effect of which TVars were read.
instance MonadReadRef TVar STM where
readRef = readTVar
Yes, this obviously makes your code complex and clutters the important code with a lot of boilerplate details. And functions with more than 4 arguments are a sign of problems.
I'd ask the question: Do you really gain anything by having separate TVars? Isn't it a case of premature optimization? Before taking such a design decision as splitting your data structure among multiple separate TVars, I'd definitely do some measurements (see criterion). You can create a sample test that models the expected number of concurrent threads and frequency of data updates and check what are you really gaining or losing by having multiple TVars vs a single one vs an IORef.
Keep in mind:
If there are multiple threads competing for common locks in a STM transaction, the transactions can get restarted several times before they manage to successfully complete. So under some circumstances, having multiple locks can actually make things worse.
If there is ultimately just one data structure that you need to synchronize, you might consider using a single IORef instead. It's atomic operations are very fast, which could compensate for having a single central lock.
In Haskell it's surprisingly difficult for a pure function to block an atomic STM or a IORef transaction for a long time. The reason is laziness: You only need to create thunks within such a transaction, not to evaluate them. This is true in particular for a single atomic IORef. The thunks are evaluated outside such transactions (by a thread that inspects them, or you can decide to force them at some point, if you need more control; this can be desired in your case, as if your system evolves without anybody observing it, you can easily accumulate unevaluated thunks).
If it turns out that having multiple TVars is indeed crucial, then I'd probably write all the code in a custom monad (as described by #Cirdec while I was writing my answer), whose implementation would be hidden from the main code, and which would provide functions for reading (and perhaps also writing) parts of the state. It'd then be run as a single STM transaction, reading and writing only what's needed, and you could have a pure version of the monad for testing.

How to preserve information when failing?

I'm writing some code that uses the StateT monad transformer to keep track of some stateful information (logging and more).
The monad I'm passing to StateT is very simple:
data CheckerError a = Bad {errorMessage :: Log} | Good a
deriving (Eq, Show)
instance Monad CheckerError where
return x = Good x
fail msg = Bad msg
(Bad msg) >>= f = Bad msg
(Good x) >>= f = f x
type CheckerMonad a = StateT CheckerState CheckerError a
It's just a Left and Right variant.
What troubles me is the definition of fail. In my computation I produce a lot of information inside this monad and I'd like to keep this information even when failing.
Currently the only thing I can do is to convert everything to a String and create a Bad instance with the String passed as argument to fail.
What I'd like to do is something like:
fail msg = do
info <- getInfoOutOfTheComputation
return $ Bad info
However everything I tried until now gives type errors, probably because this would mix different monads.
Is there anyway in which I can implement fail in order to preserve the information I need without having to convert all of it into a String?
I cannot believe that the best Haskell can achieve is using show+read to pass all the information as the string to fail.
Your CheckerError monad is very similar to the Either monad. I will use the Either monad (and its monad transformer counterpart ErrorT) in my answer.
There is a subtlety with monad trasformers: order matters. Effects in the "inner" monad have primacy over effects caused by the "outer" layers. Consider these two alternative definitions of CheckerMonad:
import Control.Monad.State
import Control.Monad.Error
type CheckerState = Int -- dummy definitions for convenience
type CheckerError = String
type CheckerMonad a = StateT CheckerState (Either String) a
type CheckerMonad' a = ErrorT String (State CheckerState) a
In CheckerMonad, Either is the inner monad, and this means a failure will wipe the whole state. Notice the type of this run function:
runCM :: CheckerMonad a -> CheckerState -> Either CheckerError (a,CheckerState)
runCM m s = runStateT m s
You either fail, or return a result along with the state up to that point.
In CheckerMonad', State is the inner monad. This means the state will be preserved even in case of failures:
runCM' :: CheckerMonad' a -> CheckerState -> (Either CheckerError a,CheckerState)
runCM' m s = runState (runErrorT m) s
A pair is returned, which contains the state up to that point, and either a failure or a result.
It takes a bit of practice to develop an intuition of how to properly order monad transformers. The chart in the Type juggling section of this Wikibook page is a good starting point.
Also, it is better to avoid using fail directly, because it is considered a bit of a wart in the language. Instead, use the specialized functions for throwing errors provided by the error transformer. When working with ErrorT or some other instance of MonadError, use throwError.
sillycomp :: CheckerMonad' Bool
sillycomp = do
modify (+1)
s <- get
if s == 3
then throwError "boo"
else return True
*Main> runCM' sillycomp 2
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package mtl-2.1.2 ... linking ... done.
(Left "boo",3)
*Main> runCM' sillycomp 3
(Right True,4)
ErrorT is sometimes annoying to use because, unlike Either, it requires an Error constraint on the error type. The Error typeclass forces you to define two error constructors noMsg and strMsg, which may or may not make sense for your type.
You can use EitherT from the either package instead, which lets you use any type whatsoever as the error. When working with EitherT, use the left function to throw errors.

Resources