The two major competing packages for serializing and deserializing data in Haskell that I am aware of are binary and cereal. When should one choose one of these packages over the other? Or are there other choices that I am neglecting?
They aren't competing, they are complementary. cereal works on strict bytestrings while binary works on lazy. Because of its lazy nature, binary depends on throwing an exception on parse error while cereal can fail via Either.
Also, to imply there are "only" two main packages is a misrepresentation. At the very least you should look at blaze-builder too.
For one thing, binary has questionable default encoding for floating point, instead of simply IEEE-754 encoding. So for example, NaN does not round-trip properly. cereal does not have such a known issue. The issue shows no sign of being resolved but can be circumvented by explicitly using things like getFloatle, which means generically derived instances of Binary still have the issue.
On the flip side, though, binary seems to be more popular than cereal. There are currently on hackage 345 packages that depend on cereal vs 821 that depend on binary. So, you may find related libraries you need more easily if you choose binary.
Related
The Haskell parser/combinator Parsec supports input streams from Data.ByteString and Data.Text. Are there any plans to add more support for these types in future releases? The combinators (many, sepby, string...) seem to be designed around lists, and the reason one uses ByteStrings and Text in the first place is to get around the use of lists. I understand that most will convert with a pack and therefore the lists will be garbage-collected away, but isn't this just half-way support of Text/ByteString? Shouldn't there be a Data.Parsec.Text.Combinator and a Data.Parsec.ByteString.Combinator?
To answer your question directly:
Are there any plans to add more support for these types in future releases?
Most probably, no. As is easy to infer from darcs changes, the package hasn't seen any active development for many years.
That said, the core API is exposed, so if you know what you want and how to do that, you can do it yourself.
I'm working on updating my ReadArgs package. I had a request to add Arguable instances for Data.Text and FileSystem.Path.FilePath. The former is no big deal, since it's in the base package, but the latter requires system-filepath
So I could release a ReadArgs-ext package, chock full of orphan instances, or I could update the ReadArgs package with an additional external dependency. Which option makes more sense?
My usual rule of thumb is to tend towards adding the instances for packages that are in the Haskell Platform, but don't involve less portable elements such as graphics. This covers both filepath and text. Since you are already dealing with the outside world for command line arguments, neither one of those seems like a particularly egregious addition.
Orphans can lead to pretty terrible problems.
I don't use them in 95% of my packages, and I go out of my way to avoid packages that use them.
The two exceptions I have at this point are a few missing monoids in reducers and a package full of vector-instances I picked up because I wasn't willing to make my entire hierarchy of packages depend on vector, downgrading everything from Safe to Trustworthy.
I find when I'm tempted to add an orphan instance, I can usually work around it by providing some kind of WrappedMonad-like newtype wrapper for lifting or lowering another class.
The text package is marked as GHC-only, whereas the aeson package is marked as Portable. However, aeson relies on Data.Text.Internal, which is in the text package. But if text is GHC-only, then surely aeson must be too?
The Portability/Stability tags aren't really taken too seriously most of the time, there's no community standard as to how they are used. aeson certainly isn't portable across Haskell implementations, since it uses Template Haskell, which is only available on GHC. I would assume, however, that it is portable across platforms (i.e. Mac, Windows, Linux), so my guess is it uses the term in a different sense to the way text does.
There are many different libraries on Hackage dealing with interpolated strings. Some have poor quality while other vary with number of features they support.
Which ones are worth using?
Examples of libraries (in no particular order): shakespeare, interpolatedstring-qq, Interpolation
I took a look at all the interpolation quasiquoter libraries I could find on Hackage.
Interpolation libraries worth using:
interpolatedstring-perl6: Supports interpolating arbitrary Haskell code with reasonable syntax, but requires haskell-src-exts. If you just want a general string interpolation syntax, I'd use this.
shakespeare-text: Based on the Shakespeare family of templates, and has minimal dependencies; most other interpolated string packages depend on haskell-src-exts, which is quite a heavy package and can take a lot of time and resources to compile. If you use any other Shakespeare templates, I'd suggest going with this one.
However, it doesn't support interpolating arbitrary Haskell code, although it supports more than simple variable expansion; it also does function application, operators, etc. I think it also uses Text rather than String; I'm not sure whether it can be used with Strings looking from the source code, though there is support code to suggest it can be.
Interpolation: Supports arbitrary expressions (again with haskell-src-exts), and also has built-in looping facilities. If you want more "template"-like features than just plain interpolation, it's worth considering, although I personally find the syntax quite ugly.
Interpolation libraries probably not worth using:
interpolatedstring-qq: Seems to be based on interpolatedstring-perl6; it hasn't been updated for over a year, and seems to have less functionality than interpolatedstring-perl6. Unless you're really attached to the #{expr} syntax, I wouldn't consider this one.
interpol: Implemented as a preprocessor, gives {foo} special meaning in strings; IMO too heavyweight a solution, and with such lightweight syntax, likely to break existing code.
In summary, I'd suggest interpolatedstring-perl6 if you don't mind the haskell-src-exts dependency, and shakespeare-text if you do (or are already using Shakespeare templates).
Another option might be to use the string-qq package with a more general template engine; it supports String, Text and ByteString, which should cover every use. However, this obviously doesn't support embedding Haskell code, and you'll need to specify the variables separately, so it's probably only useful in certain situations.
I have a cabal package that exports a type NBT which might be useful for other developers. I've gone through the trouble of defining an Arbitrary instance for my type, and it would be a shame to not offer it to other developers for testing their code that integrates my work.
However, I want to avoid situations where my instance might get in the way. Perhaps the other developer has a different idea for what the Arbitrary instance should be. Perhaps my package's dependency on a particular version of QuickCheck might interfere with or be unwanted in the dependencies of the client project.
My ideas, in no particular order, are:
Leave the Arbitrary instance next to the definition of the type, and let clients deal with shadowing the instance or overriding the QuickCheck version number.
Make the Arbitrary instance an orphan instance in a separate module within the same package, say Data.NBT.Arbitrary. The dependency on QuickCheck for the overall package remains.
Offer the Arbitrary instance in a totally separate package, so that it can be listed as a separate test dependency for client projects.
Conditionally include both the Arbitrary instance and the QuickCheck dependency in the main package, but only if a flag like -ftest is set.
I've seen combinations of all of these used in other libraries, but haven't found any consensus on which works best. I want to try and get it right before uploading to Hackage.
On the basis of not much specific experience, but a general desire for robustness, the guiding principle for package dependencies should perhaps be
From each according to their ability; to each according to their need.
It's good to keep the dependencies of a package to the minimum needed for its essential functionality. That suggests option 3 or option 4 to me. Of course, it's a pain to chop the package up so much. If options are capable of expressing the conditionality involved, then option 4 sounds like a sensible approach, based on using language effectively to say what you mean.
It would be really good if a consensus emerged about which one switch we need to flick to get the testing kit as well as the basic functionality.
It's also clear that there's room for refinement here. It's amazing that Cabal works as well as it does, but it could allow for more sophisticated notions of "package", perhaps after the manner of the SML module system. Translating dependencies into function types, we basically get to write
simplePackage :: (Dependency1, .., Dependencyn) -> Deliverable
but one could imagine more elaborate combinations of products and functions, like
fancyPackage :: BasicDependency -> (BasicDeliverable, HelpfulExtras -> Gravy)
Until then, pick the option that most accurately reflects the actual deal. And tell us about it, so we can build that consensus.
The problem comes down to: how likely is it that someone using your library will be wanting to run QuickCheck tests using your NBT type?
If it is likely, and the Arbitrary instance is detailed (and thus not likely to change for different people), it would probably be best to ship it with your package, especially if you're going to make sure you keep updating the package (as for using a flag or not, that comes down to a bit of personal preference). If the instance is relatively simple however (and thus more likely that people would want to customise it), then it might be an idea to just provide a sample instance in the documentation.
If the type is primarily internal in nature and not likely to be used by others wanting to run tests, then using a flag to conditionally bring in QuickCheck is probably the best way to go to avoid unnecessary dependencies (i.e. the test suite is there just so you can test the package).
I'm not a fan of having QuickCheck-only packages in general, though it might be useful in some situations.