Is there an object-identity-based, thread-safe memoization library somewhere?

Is there an object-identity-based, thread-safe memoization library somewhere? - haskell

I know that memoization seems to be a perennial topic here on the haskell tag on stack overflow, but I think this question has not been asked before.
I'm aware of several different 'off the shelf' memoization libraries for Haskell:
The memo-combinators and memotrie packages, which make use of a beautiful trick involving lazy infinite data structures to achieve memoization in a purely functional way. (As I understand it, the former is slightly more flexible, while the latter is easier to use in simple cases: see this SO answer for discussion.)
The uglymemo package, which uses unsafePerformIO internally but still presents a referentially transparent interface. The use of unsafePerformIO internally results in better performance than the previous two packages. (Off the shelf, its implementation uses comparison-based search data structures, rather than perhaps-slightly-more-efficient hash functions; but I think that if you find and replace Cmp for Hashable and Data.Map for Data.HashMap and add the appropraite imports, you get a hash based version.)
However, I'm not aware of any library that looks answers up based on object identity rather than object value. This can be important, because sometimes the kinds of object which are being used as keys to your memo table (that is, as input to the function being memoized) can be large---so large that fully examining the object to determine whether you've seen it before is itself a slow operation. Slow, and also unnecessary, if you will be applying the memoized function again and again to an object which is stored at a given 'location in memory' 1. (This might happen, for example, if we're memoizing a function which is being called recursively over some large data structure with a lot of structural sharing.) If we've already computed our memoized function on that exact object before, we can already know the answer, even without looking at the object itself!
Implementing such a memoization library involves several subtle issues and doing it properly requires several special pieces of support from the language. Luckily, GHC provides all the special features that we need, and there is a paper by Peyton-Jones, Marlow and Elliott which basically worries about most of these issues for you, explaining how to build a solid implementation. They don't provide all details, but they get close.
The one detail which I can see which one probably ought to worry about, but which they don't worry about, is thread safety---their code is apparently not threadsafe at all.
My question is: does anyone know of a packaged library which does the kind of memoization discussed in the Peyton-Jones, Marlow and Elliott paper, filling in all the details (and preferably filling in proper thread-safety as well)?
Failing that, I guess I will have to code it up myself: does anyone have any ideas of other subtleties (beyond thread safety and the ones discussed in the paper) which the implementer of such a library would do well to bear in mind?
UPDATE
Following #luqui's suggestion below, here's a little more data on the exact problem I face. Let's suppose there's a type:
data Node = Node [Node] [Annotation]
This type can be used to represent a simple kind of rooted DAG in memory, where Nodes are DAG Nodes, the root is just a distinguished Node, and each node is annotated with some Annotations whose internal structure, I think, need not concern us (but if it matters, just ask and I'll be more specific.) If used in this way, note that there may well be significant structural sharing between Nodes in memory---there may be exponentially more paths which lead from the root to a node than there are nodes themselves. I am given a data structure of this form, from an external library with which I must interface; I cannot change the data type.
I have a function
myTransform : Node -> Node
the details of which need not concern us (or at least I think so; but again I can be more specific if needed). It maps nodes to nodes, examining the annotations of the node it is given, and the annotations its immediate children, to come up with a new Node with the same children but possibly different annotations. I wish to write a function
recursiveTransform : Node -> Node
whose output 'looks the same' as the data structure as you would get by doing:
recursiveTransform Node originalChildren annotations =
myTransform Node recursivelyTransformedChildren annotations
where
recursivelyTransformedChildren = map recursiveTransform originalChildren
except that it uses structural sharing in the obvious way so that it doesn't return an exponential data structure, but rather one on the order of the same size as its input.
I appreciate that this would all be easier if say, the Nodes were numbered before I got them, or I could otherwise change the definition of a Node. I can't (easily) do either of these things.
I am also interested in the general question of the existence of a library implementing the functionality I mention quite independently of the particular concrete problem I face right now: I feel like I've had to work around this kind of issue on a few occasions, and it would be nice to slay the dragon once and for all. The fact that SPJ et al felt that it was worth adding not one but three features to GHC to support the existence of libraries of this form suggests that the feature is genuinely useful and can't be worked around in all cases. (BUT I'd still also be very interested in hearing about workarounds which will help in this particular case too: the long term problem is not as urgent as the problem I face right now :-) )
1 Technically, I don't quite mean location in memory, since the garbage collector sometimes moves objects around a bit---what I really mean is 'object identity'. But we can think of this as being roughly the same as our intuitive idea of location in memory.

If you only want to memoize based on object identity, and not equality, you can just use the existing laziness mechanisms built into the language.
For example, if you have a data structure like this
data Foo = Foo { ... }
expensive :: Foo -> Bar
then you can just add the value to be memoized as an extra field and let the laziness take care of the rest for you.
data Foo = Foo { ..., memo :: Bar }
To make it easier to use, add a smart constructor to tie the knot.
makeFoo ... = let foo = Foo { ..., memo = expensive foo } in foo
Though this is somewhat less elegant than using a library, and requires modification of the data type to really be useful, it's a very simple technique and all thread-safety issues are already taken care of for you.

It seems that stable-memo would be just what you needed (although I'm not sure if it can handle multiple threads):
Whereas most memo combinators memoize based on equality, stable-memo does it based on whether the exact same argument has been passed to the function before (that is, is the same argument in memory).
stable-memo only evaluates keys to WHNF.
This can be more suitable for recursive functions over graphs with cycles.
stable-memo doesn't retain the keys it has seen so far, which allows them to be garbage collected if they will no longer be used. Finalizers are put in place to remove the corresponding entries from the memo table if this happens.
Data.StableMemo.Weak provides an alternative set of combinators that also avoid retaining the results of the function, only reusing results if they have not yet been garbage collected.
There is no type class constraint on the function's argument.
stable-memo will not work for arguments which happen to have the same value but are not the same heap object. This rules out many candidates for memoization, such as the most common example, the naive Fibonacci implementation whose domain is machine Ints; it can still be made to work for some domains, though, such as the lazy naturals.

Ekmett just uploaded a library that handles this and more (produced at HacPhi): http://hackage.haskell.org/package/intern. He assures me that it is thread safe.
Edit: Actually, strictly speaking I realize this does something rather different. But I think you can use it for your purposes. It's really more of a stringtable-atom type interning library that works over arbitrary data structures (including recursive ones). It uses WeakPtrs internally to maintain the table. However, it uses Ints to index the values to avoid structural equality checks, which means packing them into the data type, when what you want are apparently actually StableNames. So I realize this answers a related question, but requires modifying your data type, which you want to avoid...

Related

Suitable Haskell type for large, frequently changing sequence of floats

I have to pick a type for a sequence of floats with 16K elements. The values will be updated frequently, potentially many times a second.
I've read the wiki page on arrays. Here are the conclusions I've drawn so far. (Please correct me if any of them are mistaken.)
IArrays would be unacceptably slow in this case, because they'd be copied on every change. With 16K floats in the array, that's 64KB of memory copied each time.
IOArrays could do the trick, as they can be modified without copying all the data. In my particular use case, doing all updates in the IO monad isn't a problem at all. But they're boxed, which means extra overhead, and that could add up with 16K elements.
IOUArrays seem like the perfect fit. Like IOArrays, they don't require a full copy on each change. But unlike IOArrays, they're unboxed, meaning they're basically the Haskell equivalent of a C array of floats. I realize they're strict. But I don't see that being an issue, because my application would never need to access anything less than the entire array.
Am I right to look to IOUArrays for this?
Also, suppose I later want to read or write the array from multiple threads. Will I have backed myself into a corner with IOUArrays? Or is the choice of IOUArrays totally orthogonal to the problem of concurrency? (I'm not yet familiar with the concurrency primitives in Haskell and how they interact with the IO monad.)

A good rule of thumb is that you should almost always use the vector library instead of arrays. In this case, you can use mutable vectors from the Data.Vector.Mutable module.
The key operations you'll want are read and write which let you mutably read from and write to the mutable vector.

You'll want to benchmark of course (with criterion) or you might be interested in browsing some benchmarks I did e.g. here (if that link works for you; broken for me).
The vector library is a nice interface (crazy understatement) over GHC's more primitive array types which you can get to more directly in the primitive package. As are the things in the standard array package; for instance an IOUArray is essentially a MutableByteArray#.
Unboxed mutable arrays are usually going to be the fastest, but you should compare them in your application to IOArray or the vector equivalent.
My advice would be:
if you probably don't need concurrency first try a mutable unboxed Vector as Gabriel suggests
if you know you will want concurrent updates (and feel a little brave) then first try a MutableArray and then do atomic updates with these functions from the atomic-primops library. If you want fine-grained locking, this is your best choice. Of course concurrent reads will work fine on whatever array you choose.
It should also be theoretically possible to do concurrent updates on a MutableByteArray (equivalent to IOUArray) with those atomic-primops functions too, since a Float should always fit into a word (I think), but you'd have to do some research (or bug Ryan).
Also be aware of potential memory reordering issues when doing concurrency with the atomic-primops stuff, and help convince yourself with lots of tests; this is somewhat uncharted territory.

Why aren't FingerTrees used enough to have a stable implementation?

A while ago, I ran across an article on FingerTrees (See Also an accompanying Stack Overflow Question) and filed the idea away. I have finally found a reason to make use of them.
My problem is that the Data.FingerTree package seems to have a little bit rot around the edges. Moreover, Data.Sequence in the Containers package which makes use of the data structure re-implements a (possibly better) version, but doesn't export it.
As theoretically useful as this structure seems to be, it doesn't seem to get a lot of actual use or attention. Have people found that FingerTrees are not useful as a practical matter, or is this a case not enough attention?
further explanation:
I'm interested in building a data structure holding text that has good concatenation properties. Think about building an HTML document from assorted fragments. Most pre-built solutions use bytestrings, but I really want something that deals with Unicode text properly. My plan at the moment is to layer Data.Text fragments into a FingerTree.
I would also like to borrow the trick from Data.Vector of taking slices without copying using (offset,length) manipulation. Data.Text.Text has this built in to the data type, but only uses it for efficient uncons and unsnoc opperations. In FingerTree this information could very easily becomes the v or annotation of the tree.

To answer your question about finger trees in particular, I think the problem is that they have relatively high constant costs compared to arrays, and are more complex than other ways of achieving efficient concatenation. A Builder has a more efficient interface for just appending chunks, and they're usually readily available (see the links in #informatikr's answer). Suppose that Data.Text.Lazy is implemented with a linked list of chunks, and you're creating a Data.Text.Lazy from a builder. Unless you have a lot of chunks (probably more than 50), or are accessing data near the end of the list repeatedly, the high constant cost of a finger tree probably isn't worth it.
The Data.Sequence implementation is specialized for performance reasons, and isn't as general as the full interface provided by the fingertree package. That's why it isn't exported; it's not really possible to use it for anything other than a Sequence.
I also suspect that many programmers are at a loss as to how to actually use the monoidal annotation, as it's behind a fairly significant abstraction barrier. So many people wouldn't use it because they don't see how it can be useful compared to other data types.
I didn't really get it until I read Chung-chieh Shan's blog series on word numbers (part2, part3, part4). That's proof that the idea can definitely be used in practical code.
In your case, if you need to both inspect partial results and have efficient appends, using a fingertree may be better than a builder. Depending on the builder's implementation, you may end up doing a lot of repeated work as you convert to Text, add more stuff to the builder, convert to Text again, etc. It would depend on your usage pattern though.
You might be interested in my splaytree package, which provides splay trees with monoidal annotations, and several different structures build upon them. Other than the splay tree itself, the Set and RangeSet modules have more-or-less complete API's, the Sequence module is mostly a skeleton I used for testing. It's not a "batteries included" solution to what you're looking for (again, #informatikr's answer provides those), but if you want to experiment with monoidal annotations it may be more useful than Data.FingerTree. Be aware that a splay tree can get unbalanced if you traverse all the elements in sequence (or continually snoc onto the end, or similar), but if appends and lookups are interleaved performance can be excellent.

In addition to John Lato's answer, I'll add some specific details about the performance of finger trees, since I spent some time looking at that in the past.
The broad summary is:
Data.Sequence has great constant factors and asymptotics: it is almost as fast as [] when accessing the front of the list (where both data structures have O(1) asymptotics), and much faster elsewhere in the list (where Data.Sequence's logarithmic asymptotics trounce []'s linear asymptotics).
Data.FingerTree has the same asymptotics as Data.Sequence, but is about an order of magnitude slower.
Just like lists, finger trees have high per-element memory overheads, so they should be combined with chunking for better memory and cache use. Indeed, a few packages do this (yi, trifecta, rope). If Data.FingerTree could be brought close to Data.Sequence in performance, I would hope to see a Data.Text.Sequence type, which implemented a finger tree of Data.Text values. Such a type would lose the streaming behaviour of Data.Text.Lazy, but benefit from improved random access and concatenation performance. (Similarly, I would want to see Data.ByteString.Sequence and Data.Vector.Sequence.)
The obstacle to implementing these now is that no efficient and generic implementation of finger trees exists (see below where I discuss this further). To produce efficient implementations of Data.Text.Sequence one would have to completely reimplement finger trees, specialised to Text - just as Data.Text.Lazy completely reimplements lists, specialised to Text. Unfortunately, finger trees are much more complex than lists (especially concatenation!), so this is a considerable amount of work.
So as I see it the answer is:
specialised finger trees are great, but a lot of work to implement
chunked finger trees (e.g. Data.Text.Sequence) would be great, but at present the poor performance of Data.FingerTree means they are not a viable alternative to chunked lists in the common case
builders and chunked lists achieve many of the benefits of chunked finger trees, and so they suffice for the common case
in the uncommon case where builders and chunked lists don't suffice, we grit our teeth and put up with the poor constant factors of chunked finger trees (e.g. in yi and trifecta).
Obstacles to an efficient and generic finger tree
Much of the performance gap between Data.Sequence and Data.FingerTree is due to two optimisations in Data.Sequence:
The measure type is specialised to Int, so measure manipulations will compile down to efficient integer arithmetic rather
The measure type is unpacked into the Deep constructor, which saves pointer dereferences in the inner loops of the tree operations.
It is possible to apply these optimisations in the general case of Data.FingerTree by using data families for generic unpacking and by exploiting GHC's inliner and specialiser - see my fingertree-unboxed package, which brings generic finger tree performance almost up to that of Data.Sequence. Unfortunately, these techniques have some significant problems:
data families for generic unpacking is unpleasant for the user, because they have to define lots of instances. There is no clear solution to this problem.
finger trees use polymorphic recursion, which GHC's specialiser doesn't handle well (1, 2). This means that, to get sufficient specialisation on the measure type, we need lots of INLINE pragmas, which causes GHC to generate huge amounts of code.
Due to these problems, I never released the package on Hackage.

Ignoring your Finger Tree question and only responding to your further explanation: did you look into Data.Text.Lazy.Builder or, specifically for building HTML, blaze-html?
Both allow fast concatenation. For slicing, if that is important for solving your problem, they might not have ideal performance.

What's so bad about Template Haskell?

It seems that Template Haskell is often viewed by the Haskell community as an unfortunate convenience. It's hard to put into words exactly what I have observed in this regard, but consider these few examples
Template Haskell listed under "The Ugly (but necessary)" in response to the question Which Haskell (GHC) extensions should users use/avoid?
Template Haskell considered a temporary/inferior solution in Unboxed Vectors of newtype'd values thread (libraries mailing list)
Yesod is often criticized for relying too much on Template Haskell (see the blog post in response to this sentiment)
I've seen various blog posts where people do pretty neat stuff with Template Haskell, enabling prettier syntax that simply wouldn't be possible in regular Haskell, as well as tremendous boilerplate reduction. So why is it that Template Haskell is looked down upon in this way? What makes it undesirable? Under what circumstances should Template Haskell be avoided, and why?

One reason for avoiding Template Haskell is that it as a whole isn't type-safe, at all, thus going against much of "the spirit of Haskell." Here are some examples of this:
You have no control over what kind of Haskell AST a piece of TH code will generate, beyond where it will appear; you can have a value of type Exp, but you don't know if it is an expression that represents a [Char] or a (a -> (forall b . b -> c)) or whatever. TH would be more reliable if one could express that a function may only generate expressions of a certain type, or only function declarations, or only data-constructor-matching patterns, etc.
You can generate expressions that don't compile. You generated an expression that references a free variable foo that doesn't exist? Tough luck, you'll only see that when actually using your code generator, and only under the circumstances that trigger the generation of that particular code. It is very difficult to unit test, too.
TH is also outright dangerous:
Code that runs at compile-time can do arbitrary IO, including launching missiles or stealing your credit card. You don't want to have to look through every cabal package you ever download in search for TH exploits.
TH can access "module-private" functions and definitions, completely breaking encapsulation in some cases.
Then there are some problems that make TH functions less fun to use as a library developer:
TH code isn't always composable. Let's say someone makes a generator for lenses, and more often than not, that generator will be structured in such a way that it can only be called directly by the "end-user," and not by other TH code, by for example taking a list of type constructors to generate lenses for as the parameter. It is tricky to generate that list in code, while the user only has to write generateLenses [''Foo, ''Bar].
Developers don't even know that TH code can be composed. Did you know that you can write forM_ [''Foo, ''Bar] generateLens? Q is just a monad, so you can use all of the usual functions on it. Some people don't know this, and because of that, they create multiple overloaded versions of essentially the same functions with the same functionality, and these functions lead to a certain bloat effect. Also, most people write their generators in the Q monad even when they don't have to, which is like writing bla :: IO Int; bla = return 3; you are giving a function more "environment" than it needs, and clients of the function are required to provide that environment as an effect of that.
Finally, there are some things that make TH functions less fun to use as an end-user:
Opacity. When a TH function has type Q Dec, it can generate absolutely anything at the top-level of a module, and you have absolutely no control over what will be generated.
Monolithism. You can't control how much a TH function generates unless the developer allows it; if you find a function that generates a database interface and a JSON serialization interface, you can't say "No, I only want the database interface, thanks; I'll roll my own JSON interface"
Run time. TH code takes a relatively long time to run. The code is interpreted anew every time a file is compiled, and often, a ton of packages are required by the running TH code, that have to be loaded. This slows down compile time considerably.

This is solely my own opinion.
It's ugly to use. $(fooBar ''Asdf) just does not look nice. Superficial, sure, but it contributes.
It's even uglier to write. Quoting works sometimes, but a lot of the time you have to do manual AST grafting and plumbing. The API is big and unwieldy, there's always a lot of cases you don't care about but still need to dispatch, and the cases you do care about tend to be present in multiple similar but not identical forms (data vs. newtype, record-style vs. normal constructors, and so on). It's boring and repetitive to write and complicated enough to not be mechanical. The reform proposal addresses some of this (making quotes more widely applicable).
The stage restriction is hell. Not being able to splice functions defined in the same module is the smaller part of it: the other consequence is that if you have a top-level splice, everything after it in the module will be out of scope to anything before it. Other languages with this property (C, C++) make it workable by allowing you to forward declare things, but Haskell doesn't. If you need cyclic references between spliced declarations or their dependencies and dependents, you're usually just screwed.
It's undisciplined. What I mean by this is that most of the time when you express an abstraction, there is some kind of principle or concept behind that abstraction. For many abstractions, the principle behind them can be expressed in their types. For type classes, you can often formulate laws which instances should obey and clients can assume. If you use GHC's new generics feature to abstract the form of an instance declaration over any datatype (within bounds), you get to say "for sum types, it works like this, for product types, it works like that". Template Haskell, on the other hand, is just macros. It's not abstraction at the level of ideas, but abstraction at the level of ASTs, which is better, but only modestly, than abstraction at the level of plain text.*
It ties you to GHC. In theory another compiler could implement it, but in practice I doubt this will ever happen. (This is in contrast to various type system extensions which, though they might only be implemented by GHC at the moment, I could easily imagine being adopted by other compilers down the road and eventually standardized.)
The API isn't stable. When new language features are added to GHC and the template-haskell package is updated to support them, this often involves backwards-incompatible changes to the TH datatypes. If you want your TH code to be compatible with more than just one version of GHC you need to be very careful and possibly use CPP.
There's a general principle that you should use the right tool for the job and the smallest one that will suffice, and in that analogy Template Haskell is something like this. If there's a way to do it that's not Template Haskell, it's generally preferable.
The advantage of Template Haskell is that you can do things with it that you couldn't do any other way, and it's a big one. Most of the time the things TH is used for could otherwise only be done if they were implemented directly as compiler features. TH is extremely beneficial to have both because it lets you do these things, and because it lets you prototype potential compiler extensions in a much more lightweight and reusable way (see the various lens packages, for example).
To summarize why I think there are negative feelings towards Template Haskell: It solves a lot of problems, but for any given problem that it solves, it feels like there should be a better, more elegant, disciplined solution better suited to solving that problem, one which doesn't solve the problem by automatically generating the boilerplate, but by removing the need to have the boilerplate.
* Though I often feel that CPP has a better power-to-weight ratio for those problems that it can solve.
EDIT 23-04-14: What I was frequently trying to get at in the above, and have only recently gotten at exactly, is that there's an important distinction between abstraction and deduplication. Proper abstraction often results in deduplication as a side effect, and duplication is often a telltale sign of inadequate abstraction, but that's not why it's valuable. Proper abstraction is what makes code correct, comprehensible, and maintainable. Deduplication only makes it shorter. Template Haskell, like macros in general, is a tool for deduplication.

I'd like to address a few of the points dflemstr brings up.
I don't find the fact that you can't typecheck TH to be that worrying. Why? Because even if there is an error, it will still be compile time. I'm not sure if this strengthens my argument, but this is similar in spirit to the errors that you receive when using templates in C++. I think these errors are more understandable than C++'s errors though, as you'll get a pretty printed version of the generated code.
If a TH expression / quasi-quoter does something that's so advanced that tricky corners can hide, then perhaps it's ill-advised?
I break this rule quite a bit with quasi-quoters I've been working on lately (using haskell-src-exts / meta) - https://github.com/mgsloan/quasi-extras/tree/master/examples . I know this introduces some bugs such as not being able to splice in the generalized list comprehensions. However, I think that there's a good chance that some of the ideas in http://hackage.haskell.org/trac/ghc/blog/Template%20Haskell%20Proposal will end up in the compiler. Until then, the libraries for parsing Haskell to TH trees are a nearly perfect approximation.
Regarding compilation speed / dependencies, we can use the "zeroth" package to inline the generated code. This is at least nice for the users of a given library, but we can't do much better for the case of editing the library. Can TH dependencies bloat generated binaries? I thought it left out everything that's not referenced by the compiled code.
The staging restriction / splitting of compilation steps of the Haskell module does suck.
RE Opacity: This is the same for any library function you call. You have no control over what Data.List.groupBy will do. You just have a reasonable "guarantee" / convention that the version numbers tell you something about the compatibility. It is somewhat of a different matter of change when.
This is where using zeroth pays off - you're already versioning the generated files - so you'll always know when the form of the generated code has changed. Looking at the diffs might be a bit gnarly, though, for large amounts of generated code, so that's one place where a better developer interface would be handy.
RE Monolithism: You can certainly post-process the results of a TH expression, using your own compile-time code. It wouldn't be very much code to filter on top-level declaration type / name. Heck, you could imagine writing a function that does this generically. For modifying / de-monolithisizing quasiquoters, you can pattern match on "QuasiQuoter" and extract out the transformations used, or make a new one in terms of the old.

This answer is in response to the issues brought up by illissius, point by point:
It's ugly to use. $(fooBar ''Asdf) just does not look nice. Superficial, sure, but it contributes.
I agree. I feel like $( ) was chosen to look like it was part of the language - using the familiar symbol pallet of Haskell. However, that's exactly what you /don't/ want in the symbols used for your macro splicing. They definitely blend in too much, and this cosmetic aspect is quite important. I like the look of {{ }} for splices, because they are quite visually distinct.
It's even uglier to write. Quoting works sometimes, but a lot of the time you have to do manual AST grafting and plumbing. The [API][1] is big and unwieldy, there's always a lot of cases you don't care about but still need to dispatch, and the cases you do care about tend to be present in multiple similar but not identical forms (data vs. newtype, record-style vs. normal constructors, and so on). It's boring and repetitive to write and complicated enough to not be mechanical. The [reform proposal][2] addresses some of this (making quotes more widely applicable).
I also agree with this, however, as some of the comments in "New Directions for TH" observe, the lack of good out-of-the-box AST quoting is not a critical flaw. In this WIP package, I seek to address these problems in library form: https://github.com/mgsloan/quasi-extras . So far I allow splicing in a few more places than usual and can pattern match on ASTs.
The stage restriction is hell. Not being able to splice functions defined in the same module is the smaller part of it: the other consequence is that if you have a top-level splice, everything after it in the module will be out of scope to anything before it. Other languages with this property (C, C++) make it workable by allowing you to forward declare things, but Haskell doesn't. If you need cyclic references between spliced declarations or their dependencies and dependents, you're usually just screwed.
I've run into the issue of cyclic TH definitions being impossible before... It's quite annoying. There is a solution, but it's ugly - wrap the things involved in the cyclic dependency in a TH expression that combines all of the generated declarations. One of these declarations generators could just be a quasi-quoter that accepts Haskell code.
It's unprincipled. What I mean by this is that most of the time when you express an abstraction, there is some kind of principle or concept behind that abstraction. For many abstractions, the principle behind them can be expressed in their types. When you define a type class, you can often formulate laws which instances should obey and clients can assume. If you use GHC's [new generics feature][3] to abstract the form of an instance declaration over any datatype (within bounds), you get to say "for sum types, it works like this, for product types, it works like that". But Template Haskell is just dumb macros. It's not abstraction at the level of ideas, but abstraction at the level of ASTs, which is better, but only modestly, than abstraction at the level of plain text.
It's only unprincipled if you do unprincipled things with it. The only difference is that with the compiler implemented mechanisms for abstraction, you have more confidence that the abstraction isn't leaky. Perhaps democratizing language design does sound a bit scary! Creators of TH libraries need to document well and clearly define the meaning and results of the tools they provide. A good example of principled TH is the derive package: http://hackage.haskell.org/package/derive - it uses a DSL such that the example of many of the derivations /specifies/ the actual derivation.
It ties you to GHC. In theory another compiler could implement it, but in practice I doubt this will ever happen. (This is in contrast to various type system extensions which, though they might only be implemented by GHC at the moment, I could easily imagine being adopted by other compilers down the road and eventually standardized.)
That's a pretty good point - the TH API is pretty big and clunky. Re-implementing it seems like it could be tough. However, there are only really only a few ways to slice the problem of representing Haskell ASTs. I imagine that copying the TH ADTs, and writing a converter to the internal AST representation would get you a good deal of the way there. This would be equivalent to the (not insignificant) effort of creating haskell-src-meta. It could also be simply re-implemented by pretty printing the TH AST and using the compiler's internal parser.
While I could be wrong, I don't see TH as being that complicated of a compiler extension, from an implementation perspective. This is actually one of the benefits of "keeping it simple" and not having the fundamental layer be some theoretically appealing, statically verifiable templating system.
The API isn't stable. When new language features are added to GHC and the template-haskell package is updated to support them, this often involves backwards-incompatible changes to the TH datatypes. If you want your TH code to be compatible with more than just one version of GHC you need to be very careful and possibly use CPP.
This is also a good point, but somewhat dramaticized. While there have been API additions lately, they haven't been extensively breakage inducing. Also, I think that with the superior AST quoting I mentioned earlier, the API that actually needs to be used can be very substantially reduced. If no construction / matching needs distinct functions, and are instead expressed as literals, then most of the API disappears. Moreover, the code you write would port more easily to AST representations for languages similar to Haskell.
In summary, I think that TH is a powerful, semi-neglected tool. Less hate could lead to a more lively eco-system of libraries, encouraging the implementation of more language feature prototypes. It's been observed that TH is an overpowered tool, that can let you /do/ almost anything. Anarchy! Well, it's my opinion that this power can allow you to overcome most of its limitations, and construct systems capable of quite principled meta-programming approaches. It's worth the usage of ugly hacks to simulate the "proper" implementation, as this way the design of the "proper" implementation will gradually become clear.
In my personal ideal version of nirvana, much of the language would actually move out of the compiler, into libraries of these variety. The fact that the features are implemented as libraries does not heavily influence their ability to faithfully abstract.
What's the typical Haskell answer to boilerplate code? Abstraction. What're our favorite abstractions? Functions and typeclasses!
Typeclasses let us define a set of methods, that can then be used in all manner of functions generic on that class. However, other than this, the only way classes help avoid boilerplate is by offering "default definitions". Now here is an example of an unprincipled feature!
Minimal binding sets are not declarable / compiler checkable. This could lead to inadvertent definitions that yield bottom due to mutual recursion.
Despite the great convenience and power this would yield, you cannot specify superclass defaults, due to orphan instances http://lukepalmer.wordpress.com/2009/01/25/a-world-without-orphans/ These would let us fix the numeric hierarchy gracefully!
Going after TH-like capabilities for method defaults led to http://www.haskell.org/haskellwiki/GHC.Generics . While this is cool stuff, my only experience debugging code using these generics was nigh-impossible, due to the size of the type induced for and ADT as complicated as an AST. https://github.com/mgsloan/th-extra/commit/d7784d95d396eb3abdb409a24360beb03731c88c
In other words, this went after the features provided by TH, but it had to lift an entire domain of the language, the construction language, into a type system representation. While I can see it working well for your common problem, for complex ones, it seems prone to yielding a pile of symbols far more terrifying than TH hackery.
TH gives you value-level compile-time computation of the output code, whereas generics forces you to lift the pattern matching / recursion part of the code into the type system. While this does restrict the user in a few fairly useful ways, I don't think the complexity is worth it.
I think that the rejection of TH and lisp-like metaprogramming led to the preference towards things like method-defaults instead of more flexible, macro-expansion like declarations of instances. The discipline of avoiding things that could lead to unforseen results is wise, however, we should not ignore that Haskell's capable type system allows for more reliable metaprogramming than in many other environments (by checking the generated code).

One rather pragmatic problem with Template Haskell is that it only works when GHC's bytecode interpreter is available, which is not the case on all architectures. So if your program uses Template Haskell or relies on libraries that use it, it will not run on machines with an ARM, MIPS, S390 or PowerPC CPU.
This is relevant in practice: git-annex is a tool written in Haskell that makes sense to run on machines worrying about storage, such machines often have non-i386-CPUs. Personally, I run git-annex on a NSLU 2 (32 MB of RAM, 266MHz CPU; did you know Haskell works fine on such hardware?) If it would use Template Haskell, this is not possible.
(The situation about GHC on ARM is improving these days a lot and I think 7.4.2 even works, but the point still stands).

Why is TH bad? For me, it comes down to this:
If you need to produce so much repetitive code that you find yourself trying to use TH to auto-generate it, you're doing it wrong!
Think about it. Half the appeal of Haskell is that its high-level design allows you to avoid huge amounts of useless boilerplate code that you have to write in other languages. If you need compile-time code generation, you're basically saying that either your language or your application design has failed you. And we programmers don't like to fail.
Sometimes, of course, it's necessary. But sometimes you can avoid needing TH by just being a bit more clever with your designs.
(The other thing is that TH is quite low-level. There's no grand high-level design; a lot of GHC's internal implementation details are exposed. And that makes the API prone to change...)

How does one avoid creating an ad-hoc type system in dynamically typed languages?

In every project I've started in languages without type systems, I eventually begin to invent a runtime type system. Maybe the term "type system" is too strong; at the very least, I create a set of type/value-range validators when I'm working with complex data types, and then I feel the need to be paranoid about where data types can be created and modified.
I hadn't thought twice about it until now. As an independent developer, my methods have been working in practice on a number of small projects, and there's no reason they'd stop working now.
Nonetheless, this must be wrong. I feel as if I'm not using dynamically-typed languages "correctly". If I must invent a type system and enforce it myself, I may as well use a language that has types to begin with.
So, my questions are:
Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?
Here is a concrete example for you to consider. I'm working with datetimes and timezones in erlang (a dynamic, strongly typed language). This is a common datatype I work with:
{{Y,M,D},{tztime, {time, HH,MM,SS}, Flag}}
... where {Y,M,D} is a tuple representing a valid date (all entries are integers), tztime and time are atoms, HH,MM,SS are integers representing a sane 24-hr time, and Flag is one of the atoms u,d,z,s,w.
This datatype is commonly parsed from input, so to ensure valid input and a correct parser, the values need to be checked for type correctness, and for valid ranges. Later on, instances of this datatype are compared to each other, making the type of their values all the more important, since all terms compare. From the erlang reference manual
number < atom < reference < fun < port < pid < tuple < list < bit string

Aside from the confsion of static vs. dynamic and strong vs. weak typing:
What you want to implement in your example isn't really solved by most existing static typing systems. Range checks and complications like February 31th and especially parsed input are usually checked during runtime no matter what type system you have.
Your example being in Erlang I have a few recommendations:
Use records. Besides being usefull and helpfull for a whole bunch of reasons, the give you easy runtime type checking without a lot of effort e.g.:
is_same_day(#datetime{year=Y1, month=M1, day=D1},
#datetime{year=Y2, month=M2, day=D2}) -> ...
Effortless only matches for two datetime records. You could even add guards to check for ranges if the source is untrusted. And it conforms to erlangs let it crash method of error handling: if no match is found you get a badmatch, and can handle this on the level where it is apropriate (usually the supervisor level).
Generally write your code that it crashes when the assumptions are not valid
If this doesn't feel static checked enough: use typer and dialyzer to find the kind of errors that can be found statically, whatever remains will be checkd at runtime.
Don't be too restrictive in your functions what "types" you accept, sometimes the added functionality of just doing someting useful even for different inputs is worth more than checking the types and ranges on every function. If you do it where it matters usually you will catch the error early enough for it to be easy fixable. This is especially true for a functionaly language where you allways know where every value comes from.

A lot of good answers, let me add:
Are there existing programming paradigms (for languages without types) that avoid the necessity of using or inventing type systems?
The most important paradigm, especially in Erlang, is this: Assume the type is right, otherwise let it crash. Don't write excessively checking paranoid code, but assume that the input you get is of the right type or the right pattern. Don't write (there are exceptions to this rule, but in general)
foo({tag, ...}) -> do_something(..);
foo({tag2, ...}) -> do_something_else(..);
foo(Otherwise) ->
report_error(Otherwise),
try to fix problem here...
Kill the last clause and have it crash right away. Let a supervisor and other processes do the cleanup (you can use monitors() for janitorial processes to know when a crash has occurred).
Do be precise however. Write
bar(N) when is_integer(N) -> ...
baz([]) -> ...
baz(L) when is_list(L) -> ...
if the function is known only to work with integers or lists respectively. Yes, it is a runtime check but the goal is to convey information to the programmer. Also, HiPE tend to utilize the hint for optimization and eliminate the type check if possible. Hence, the price may be less than what you think it is.
You choose an untyped/dynamically-typed language so the price you have to pay is that type checking and errors from clashes will happen at runtime. As other posts hint, a statically typed language is not exempt from doing some checks as well - the type system is (usually) an approximation of a proof of correctness. In most static languages you often get input which you can't trust. This input is transformed at the "border" of the application and then converted to an internal format. The conversion serves to mark trust: From now on, the thing has been validated and we can assume certain things about it. The power and correctness of this assumption is directly tied to its type signature and how good the programmer is with juggling the static types of the language.
Are there otherwise common recommendations on how to solve the problems that static typing solves in dynamically-typed languages (without sheepishly reinventing types)?
Erlang has the dialyzer which can be used to statically analyze and infer types of your programs. It will not come up with as many type errors as a type checker in e.g., Ocaml, but it won't "cry wolf" either: An error from the dialyzer is provably an error in the program. And it won't reject a program which may be working ok. A simple example is:
and(true, true) -> true;
and(true, _) -> false;
and(false, _) -> false.
The invocation and(true, greatmistake) will return false, yet a static type system will reject the program because it will infer from the first line that the type signature takes a boolean() value as the 2nd parameter. The dialyzer will accept this function in contrast and give it the signature (boolean(), term()) -> boolean(). It can do this, because there is no need to protect a priori for an error. If there is a mistake, the runtime system has a type check that will capture it.

In order for a statically-typed language to match the flexibility of a dynamically-typed one, I think it would need a lot, perhaps infinitely many, features.
In the Haskell world, one hears a lot of sophisticated, sometimes to the point of being scary, teminology. Type classes. Parametric polymorphism. Generalized algebraic data types. Type families. Functional dependencies. The Ωmega programming language takes it even further, with the website listing "type-level functions" and "level polymorphism", among others.
What are all these? Features added to static typing to make it more flexible. These features can be really cool, and tend to be elegant and mind-blowing, but are often difficult to understand. Learning curve aside, type systems often fail to model real-world problems elegantly. A particularly good example of this is interacting with other languages (a major motivation for C# 4's dynamic feature).
Dynamically-typed languages give you the flexibility to implement your own framework of rules and assumptions about data, rather than be constrained by the ever-limited static type system. However, "your own framework" won't be machine-checked, meaning the onus is on you to ensure your "type system" is safe and your code is well-"typed".
One thing I've found from learning Haskell is that I can carry lessons learned about strong typing and sound reasoning over to weaker-typed languages, such as C and even assembly, and do the "type checking" myself. Namely, I can prove that sections of code are correct in and of themselves, by bearing in mind the rules my functions and values are supposed to follow, and the assumptions I am allowed to make about other functions and values. When debugging, I go through and check things again, and think through whether or not my approach is sound.
The bottom line: dynamic typing puts more flexibility at your fingertips. On the other hand, statically-typed languages tend to be more efficient (by orders of magnitude), and good static type systems drastically cut down on debugging time by letting the computer do much of it for you. If you want the benefits of both, install a static type checker in your brain by learning decent, strongly-typed languages.

Sometimes data need validation. Validating any data received from the network is almost always a good idea — especially data from a public network. Being paranoid here is only good. If something resembling a static type system helps this in the least painful way, so be it. There's a reason why Erlang allows type annotations. Even pattern matching can be seen as just a kind of dynamic type checking; nevertheless, it's a central feature of the language. The very structure of data is its 'type' in Erlang.
The good thing is that you can custom-tailor your 'type system' to your needs, make it flexible and smart, while type systems of OO languages typically have fixed features. When data structures you use are immutable, once you've validated such a structure, you're safe to assume it conforms your restrictions, just like with static typing.
There's no point in being ready to process any kind of data at any point of a program, dynamically-typed or not. A 'dynamic type' is essentially a union of all possible types; limiting it to a useful subset is a valid way to program.

A statically typed language detects type errors at compile time. A dynamically typed language detects them at runtime. There are some modest restrictions on what one can write in a statically typed language such that all type errors can be caught at compile time.
But yes, you still have types even in a dynamically typed language, and that's a good thing. The problem is you wander into lots of runtime checks to ensure that you have the types you think you do, since the compiler hasn't taken care of that for you.
Erlang has a very nice tool for specifying and statically verifying lots of types -- dialyzer: Erlang type system, for references.
So don't reinvent types, use the typing tools that Erlang already provides, to handle the types that already exist in your program (but which you haven't yet specified).
And this on its own won't eliminate range checks, unfortunately. Without lots of special sauce you really have to enforce this on your own by convention (and smart constructors, etc. to help), or fall back to runtime checks, or both.

What does it mean for something to "compose well"?

Many a times, I've come across statements of the form
X does/doesn't compose well.
I can remember few instances that I've read recently :
Macros don't compose well (context: clojure)
Locks don't compose well (context: clojure)
Imperative programming doesn't compose well... etc.
I want to understand the implications of composability in terms of designing/reading/writing code ? Examples would be nice.

"Composing" functions basically just means sticking two or more functions together to make a big function that combines their functionality in a useful way. Essentially, you define a sequence of functions and pipe the results of each one into the next, finally giving the result of the whole process. Clojure provides the comp function to do this for you, you could do it by hand too.
Functions that you can chain with other functions in creative ways are more useful in general than functions that you can only call in certain conditions. For example, if we didn't have the last function and only had the traditional Lisp list functions, we could easily define last as (def last (comp first reverse)). Look at that — we didn't even need to defn or mention any arguments, because we're just piping the result of one function into another. This would not work if, for example, reverse took the imperative route of modifying the sequence in-place. Macros are problematic as well because you can't pass them to functions like comp or apply.

Composition in programming means assembling bigger pieces out of smaller ones.
Composition of unary functions creates a more complicated unary function by chaining simpler ones.
Composition of control flow constructs places control flow constructs inside other control flow constructs.
Composition of data structures combines multiple simpler data structures into a more complicated one.
Ideally, a composed unit works like a basic unit and you as a programmer do not need to be aware of the difference. If things fall short of the ideal, if something doesn't compose well, your composed program may not have the (intended) combined behavior of its individual pieces.
Suppose I have some simple C code.
void run_with_resource(void) {
Resource *r = create_resource();
do_some_work(r);
destroy_resource(r);
}
C facilitates compositional reasoning about control flow at the level of functions. I don't have to care about what actually happens inside do_some_work(); I know just by looking at this small function that every time a resource is created on line 2 with create_resource(), it will eventually be destroyed on line 4 by destroy_resource().
Well, not quite. What if create_resource() acquires a lock and destroy_resource() frees it? Then I have to worry about whether do_some_work acquires the same lock, which would prevent the function from finishing. What if do_some_work() calls longjmp(), and skips the end of my function entirely? Until I know what goes on in do_some_work(), I won't be able to predict the control flow of my function. We no longer have compositionality: we can no longer decompose the program into parts, reason about the parts independently, and carry our conclusions back to the whole program. This makes designing and debugging much harder and it's why people care whether something composes well.

"Bang for the Buck" - composing well implies a high ratio of expressiveness per rule-of-composition. Each macro introduces its own rules of composition. Each custom data structure does the same. Functions, especially those using general data structures have far fewer rules.
Assignment and other side effects, especially wrt concurrency have even more rules.

Think about when you write functions or methods. You create a group of functionality to do a specific task. When working in an Object Oriented language you cluster your behavior around the actions you think a distinct entity in the system will perform. Functional programs break away from this by encouraging authors to group functionality according to an abstraction. For example, the Clojure Ring library comprises a group of abstractions that cover routing in web applications.
Ring is composable where functions that describe paths in the system (routes) can be grouped into higher order functions (middlewhere). In fact, Clojure is so dynamic that it is possible (and you are encouraged) to come up with patterns of routes that can be dynamically created at runtime. This is the essence of composablilty, instead of coming up with patterns that solve a certain problem you focus on patterns that generate solutions to a certain class of problem. Builders and code generators are just two of the common patterns used in functional programming. Function programming is the art of patterns that generate other patterns (and so on and so on).
The idea is to solve a problem at its most basic level then come up with patterns or groups of the lowest level functions that solve the problem. Once you start to see patterns in the lowest level you've discovered composition. As folks discover second order patterns in groups of functions they may start to see a third level. And so on...

Composition (in the context you describe at a functional level) is typically the ability to feed one function into another cleanly and without intermediate processing. Such an example of composition is in std::cout in C++:
cout << each << item << links << on;
That is a simple example of composition which doesn't really "look" like composition.
Another example with a form more visibly compositional:
foo(bar(baz()));
Wikipedia Link
Composition is useful for readability and compactness, however chaining large collections of interlocking functions which can potentially return error codes or junk data can be hazardous (this is why it is best to minimize error code or null return values.)
Provided your functions use exceptions, or alternatively return null objects you can minimize the requirement for branching (if) on errors and maximize the compositional potential of your code at no extra risk.
Object composition (vs inheritance) is a separate issue (and not what you are asking, but it shares the name). It is one of containment to derive object hierarchy as opposed to direct inheritance.

Within the context of clojure, this comment addresses certain aspects of composability. In general, it seems to emerge when units of the system do one thing well, do not require other units to understand its internals, eschew side-effects, and accept and return the system's pervasive data structures. All of the above can be seen in M2tM's C++ example.

composability, applied to functions, means that the functions are smaller and well-defined, thus easy to integrate into other functions (i have seen this idea in the book "the joy of clojure")
the concept can apply to other things that are supposed be composed into something else.
the purpose of composability is reuse. for example, a function well-build (composable) is easier to reuse
macros aren't that well-composable because you can't pass them as parameters
lock are crap because you can't really give them names (define them well) or reuse them. you just do them inplace
imperative languages aren't that composable because (some of them, at least) don't have closures. if you want functionality passed as parameter, you're screwed. you have to build an object and pass that; disclaimer here: this last idea i'm not entirely convinced is true, therefore research more before taking it for granted
another idea on imperative languages is that they don't compose well because they imply state (from wikipedia knowledgebase :) "Imperative programming - describes computation in terms of statements that change a program state").
state does not compose well because although you have given a specific "something" in input, that "something" generates an output according to it's state. different internal state, different behaviour. and thus you can say good-bye to what you where expecting to happen.
with state, you depend to much on knowing what the current state of an object is... if you want to predict it's behavior. more stuff to keep in the back of your mind, less composable (remember well-defined ? or "small and simple", as in "easy to use" ?)
ps: thinking of learning clojure, huh ? investigating... ? good for you ! :P

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string