How does lazy-evaluation allow for greater modularization? - haskell

In his article "Why Functional Programming Matters," John Hughes argues that "Lazy evaluation is perhaps the most powerful tool for modularization in the functional programmer's repertoire." To do so, he provides an example like this:
Suppose you have two functions, "infiniteLoop" and "terminationCondition." You can do the following:
terminationCondition(infiniteLoop input)
Lazy evaluation, in Hughes' words "allows termination conditions to be separated from loop bodies." This is definitely true, since "terminationCondition" using lazy evaluation here means this condition can be defined outside the loop -- infiniteLoop will stop executing when terminationCondition stops asking for data.
But couldn't higher-order functions achieve the same thing as follows?
infiniteLoop(input, terminationCondition)
How does lazy evaluation provide modularization here that's not provided by higher-order functions?

Yes you could use a passed in termination check, but for that to work the author of infiniteLoop would have had to forsee the possibility of wanting to terminate the loop with that sort of condition, and hardwire a call to the termination condition into their function.
And even if the specific condition can be passed in as a function, the "shape" of it is predetermined by the author of infiniteLoop. What if they give me a termination condition "slot" that is called on each element, but I need access to the last several elements to check some sort of convergence condition? Maybe for a simple sequence generator you could come up with "the most general possible" termination condition type, but it's not obvious how to do so and remain efficient and easy to use. Do I repeatedly pass the entire sequence so far into the termination condition, in case that's what it's checking? Do I force my callers to wrap their simple termination conditions up in a more complicated package so they fit the most general condition type?
The callers certainly have to know exactly how the termination condition is called in order to supply a correct condition. That could be quite a bit of dependence on this specific implementation. If they switch to a different implementation of infiniteLoop written by another third party, how likely is it that exactly the same design for the termination condition would be used? With a lazy infiniteLoop, I can drop in any implementation that is supposed to produce the same sequence.
And what if infiniteLoop isn't a simple sequence generator, but actually generates a more complex infinite data structure, like a tree? If all the branches of the tree are independently recursively generated (think of a move tree for a game like chess) it could make sense to cut different branches at different depths, based on all sorts of conditions on the information generated thus far.
If the original author didn't prepare (either specifically for my use case or for a sufficiently general class of use cases), I'm out of luck. The author of the lazy infiniteLoop can just write it the natural way, and let each individual caller lazily explore what they want; neither has to know much about the other at all.
Furthermore, what if the decision to stop lazily exploring the infinite output is actually interleaved with (and dependent on) the computation the caller is doing with that output? Think of the chess move tree again; how far I want to explore one branch of the tree could easily depend on my evaluation of the best option I've found in other branches of the tree. So either I do my traversal and calculation twice (once in the termination condition to return a flag telling infinteLoop to stop, and then once again with the finite output so I can actually have my result), or the author of infiniteLoop had to prepare for not just a termination condition, but a complicated function that also gets to return output (so that I can push my entire computation inside the "termination condition").
Taken to extremes, I could explore the output and calculate some results, display them to a user and get input, and then continue exploring the data structure (without recalling infiniteLoop based on the user's input). The original author of the lazy infiniteLoop need have no idea that I would ever think of doing such a thing, and it will still work. If we've got purity enforced by the type system, then that would be impossible with the passed-in termination condition approach unless the whole infiniteLoop was allowed to have side effects if the termination condition needs to (say by giving the whole thing a monadic interface).
In short, to allow the same flexibility you'd get with lazy evaluation by using a strict infiniteLoop that takes higher order functions to control it can be a large amount of extra complexity for both the author of infiniteLoop and its caller (unless a variety of simpler wrappers are exposed, and one of them matches the caller's use case). Lazy evaluation can allow producers and consumers to be almost completely decoupled, while still giving the consumer the ability to control how much output the producer generates. Everything you can do that way you can do with extra function arguments as you say, but it requires to the producer and consumer to essentially agree on a protocol for how the control functions work; and that protocol is almost always either specialised to the use case at hand (tying the consumer and producer together) or so complicated in order to be fully-general that the producer and consumer are up tied to that protocol, which is unlikely to be recreated elsewhere, and so they're still tied together.

Related

Why does FRP consider time as a factor for values?

Behaviors are ubiquitously defined as “time-varying value”s1.
Why? time being the dependency/parameter for varying values is very uncommon.
My intuition for FRP would be to have behaviors as event-varying values instead; it is much more common, much more simple, I wage a much more of an efficient idea, and extensible enough to support time too (tick event).
For instance, if you write a counter, you don't care about time/associated timestamps, you just care about the "Increase-button clicked" and "Decrease-button clicked" events.
If you write a game and want a position/force behavior, you just care about the WASD/arrow keys held events, etc. (unless you ban your players for moving to the left in the afternoon; how iniquitous!).
So: Why time is a consideration at all? why timestamps? why are some libraries (e.g. reactive-banana, reactive) take it up to the extent of having Future, Moment values? Why work with event-streams instead of just responding to an event occurrence? All of this just seems to over-complicate a simple idea (event-varying/event-driven value); what's the gain? what problem are we solving here? (I'd love to also get a concrete example along with a wonderful explanation, if possible).
1 Behaviors have been defined so here, here, here... & pretty much everywhere I've encountered.
Behaviors differ from Events primarily in that a Behavior has a value right now while an Event only has a value whenever a new event comes in.
So what do we mean by "right now"? Technically all changes are implemented as push or pull semantics over event streams, so we can only possibly mean "the most recent value as of the last event of consequence for this Behavior". But that's a fairly hairy concept—in practice "now" is much simpler.
The reasoning for why "now" is simpler comes down to the API. Here are two examples from Reactive Banana.
Eventually an FRP system must always produce some kind of externally visible change. In Reactive Banana this is facilitated by the reactimate :: Event (IO ()) -> Moment () function which consumes event streams. There is no way to have a Behavior trigger external changes---you always have to do something like reactimate (someBehavior <# sampleTickEvent) to sample the behavior at concrete times.
Behaviors are Applicatives unlike Events. Why? Well, let's assume Event was an applicative and think about what happens when we have two event streams f and x and write f <*> x: since events occur all at different times the chances of f and x being defined simultaneously are (almost certainly) 0. So f <*> x would always mean the empty event stream which is useless.
What you really want is for f <*> x to cache the most current values for each and take their combined value "all of the time". That's really confusing concept to talk about in terms of an event stream, so instead lets consider f and x as taking values for all points in time. Now f <*> x is also defined as taking values for all points in time. We've just invented Behaviors.
Because it was the simplest way I could think of to give a precise denotation (implementation-independent meaning) to the notion of behaviors, including the sorts of operations I wanted, including differentiation and integration, as well as tracking one or more other behaviors (including but not limited to user-generated behavior).
Why? time being the dependency/parameter for varying values is very uncommon.
I suspect that you're confusing the construction (recipe) of a behavior with its meaning. For instance, a behavior might be constructed via a dependency on something like user input, possibly with additional synthetic transformation. So there's the recipe. The meaning, however, is simply a function of time, related to the time-function that is the user input. Note that by "function", I mean in the math sense of the word: a (deterministic) mapping from domain (time) to range (value), not in the sense that there's a purely programmatic description.
I've seen many questions asking why time matters and why continuous time. If you apply the simple discipline of giving a mathematical meaning in the style of denotational semantics (a simple and familiar style for functional programmers), the issues become much clearer.
If you really want to grok the essence of and thinking behind FRP, I recommend you read my answer to "Specification for a Functional Reactive Programming language" and follow pointers, including "What is Functional Reactive Programming".
Conal Elliott's Push-Pull FRP paper describes event-varying data, where the only points in time that are interesting are when events occcur. Reactive event-varying data is the current value and the next Event that will change it. An Event is a Future point in the event-varying Reactive data.
data Reactive a = a ‘Stepper ‘ Event a
newtype Event a = Ev (Future (Reactive a))
The Future doesn't need to have a time associated with it, it just need to represent the idea of a value that hasn't happened yet. In an impure language with events, for example, a future can be an event handle and a value. When the event occurs, you set the value and raise the handle.
Reactive a has a value for a at all points in time, so why would we need Behaviors? Let's make a simple game. In between when the user presses the WASD keys, the character, accelerated by the force applied, still moves on the screen. The character's position at different points in time is different, even though no event has occurred in the intervening time. This is what a Behavior describes - something that not only has a value at all points in time, but its value can be different at all points in time, even with no intervening events.
One way to describe Behaviors would be to repeat what we just stated. Behaviors are things that can change in-between events. In-between events they are time-varying values, or functions of time.
type Behavior a = Reactive (Time -> a)
We don't need Behavior, we could simply add events for clock ticks, and write all of the logic in our entire game in terms of these tick events. This is undesirable to some developers as the code declaring what our game is is now intermingled with the code providing how it is implemented. Behaviors allow the developer to separate this logic between the description of the game in terms of time-varying variables and the implementation of the engine that executes that description.

FRP - Event streams and Signals - what is lost in using just signals?

In recent implementations of Classic FRP, for instance reactive-banana, there are event streams and signals, which are step functions (reactive-banana calls them behaviours but they are nevertheless step functions). I've noticed that Elm only uses signals, and doesn't differentiate between signals and event streams. Also, reactive-banana allows to go from event streams to signals (edited: and it's sort of possible to act on behaviours using reactimate' although it not considered good practice), which kind of means that in theory we could apply all the event stream combinators on signals/behaviours by first converting the signal to event stream, applying and then converting again. So, given that it's in general easier to use and learn just one abstraction, what is the advantage of having separated signals and event streams ? Is anything lost in using just signals and converting all the event stream combinators to operate on signals ?
edit: The discussion has been very interesting. The main conclusions that I took from the discussion myself is that behaviours/event sources are both needed for mutually recursive definitions (feedback) and for having an output depend on two inputs (a behaviour and an event source) but only cause an action when one of them changes (<#>).
(Clarification: In reactive-banana, it is not possible to convert a Behavior back to an Event. The stepper function is a one-way ticket. There is a changes function, but its type indicates that it is "impure" and it comes with a warning that it does not preserve the semantics.)
I believe that having two separates concepts makes the API more elegant. In other words, it boils down to a question of API usability. I think that the two concepts behave sufficiently differently that things flow better if you have two separate types.
For example, the direct product for each type is different. A pair of Behavior is equivalent to a Behavior of pairs
(Behavior a, Behavior b) ~ Behavior (a,b)
whereas a pair of Events is equivalent to an Event of a direct sum:
(Event a, Event b) ~ Event (EitherOrBoth a b)
If you merge both types into one, then neither of these equivalence will hold anymore.
However, one of the main reasons for the separation of Event and Behavior is that the latter does not have a notion of changes or "updates". This may seem like an omission at first, but it is extremely useful in practice, because it leads to simpler code. For instance, consider a monadic function newInput that creates an input GUI widget that displays the text indicated in the argument Behavior,
input <- newInput (bText :: Behavior String)
The key point now is that the text displayed does not depend on how often the Behavior bText may have been updated (to the same or a different value), only on the actual value itself. This is a lot easier to reason about than the other case, where you would have to think about what happens when two successive event occurrences have the same value. Do you redraw the text while the user edits it?
(Of course, in order to actually draw the text, the library has to interface with the GUI framework and does keep track of changes in the Behavior. This is what the changes combinator is for. However, this can be seen as an optimization and is not available from "within FRP".)
The other main reason for the separation is recursion. Most Events that recursively depend on themselves are ill-defined. However, recursion is always allowed if you have mutual recursion between an Event and a Behavior
e = ((+) <$> b) <#> einput
b = stepper 0 e
There is no need to introduce delays by hand, it just works out of the box.
Something critically important to me is lost, namely the essence of behaviors, which is (possibly continuous) variation over continuous time.
Precise, simple, useful semantics (independent of a particular implementation or execution) is often lost as well.
Check out my answer to "Specification for a Functional Reactive Programming language", and follow the links there.
Whether in time or in space, premature discretization thwarts composability and complicates semantics.
Consider vector graphics (and other spatially continuous models like Pan's). Just as with premature finitization of data structures as explained in Why Functional Programming Matters.
I don't think there's any benefit to using the signals/behaviors abstraction over elm-style signals. As you point out, it's possible to create a signal-only API on top of the signal/behavior API (not at all ready for use, but see https://github.com/JohnLato/impulse/blob/dyn2/src/Reactive/Impulse/Syntax2.hs for an example). I'm pretty sure it's also possible to write a signal/behavior API on top of an elm-style API as well. That would make the two APIs functionally equivalent.
WRT efficiency, with a signals-only API the system should have a mechanism where only signals that have updated values will cause recomputations (e.g. if you don't move the mouse, the FRP network won't re-calculate the pointer coordinates and redraw the screen). Provided this is done, I don't think there's any loss of efficiency compared to a signals-and-streams approach. I'm pretty sure Elm works this way.
I don't think the continuous-behavior issue makes any difference here (or really at all). What people mean by saying behaviors are continuous over time is that they are defined at all times (i.e. they're functions over a continuous domain); the behavior itself isn't a continuous function. But we don't actually have a way to sample a behavior at any time; they can only be sampled at times corresponding to events, so we can't use the full power of this definition!
Semantically, starting from these definitions:
Event == for some t ∈ T: [(t,a)]
Behavior == ∀ t ∈ T: t -> b
since behaviors can only be sampled at times where events are defined, we can create a new domain TX where TX is the set of all times t at which Events are defined. Now we can loosen the Behavior definition to
Behavior == ∀ t ∈ TX: t -> b
without losing any power (i.e. this is equivalent to the original definition within the confines of our frp system). Now we can enumerate all times in TX to transform this to
Behavior == ∀ t ∈ TX: [(t,b)]
which is identical to the original Event definition except for the domain and quantification. Now we can change the domain of Event to TX (by the definition of TX), and the quantification of Behavior (from forall to for some) and we get
Event == for some t ∈ TX: [(t,a)]
Behavior == for some t ∈ TX: [(t,b)]
and now Event and Behavior are semantically identical, so they could obviously be represented using the same structure in an FRP system. We do lose a bit of information at this step; if we don't differentiate between Event and Behavior we don't know that a Behavior is defined at every time t, but in practice I don't think this really matters much. What elm does IIRC is require both Events and Behaviors to have values at all times and just use the previous value for an Event if it hasn't changed (i.e. change the quantification of Event to forall instead of changing the quantification of Behavior). This means you can treat everything as a signal and it all Just Works; it's just implemented so that the signal domain is exactly the subset of time that the system actually uses.
I think this idea was presented in a paper (which I can't find now, anyone else have a link?) about implementing FRP in Java, perhaps from POPL '14? Working from memory, so my outline isn't as rigorous as the original proof.
There's nothing to stop you from creating a more-defined Behavior by e.g. pure someFunction, this just means that within an FRP system you can't make use of that extra defined-ness, so nothing is lost by a more restricted implementation.
As for notional signals such as time, note that it's impossible to implement an actual continuous-time signal using typical programming languages. Since the implementation will necessarily be discrete, converting that to an event stream is trivial.
In short, I don't think anything is lost by using just signals.
I've unfortunately have no references in mind, but I distinctly remember
different reactive authors claiming this choice is just for efficiency. You expose
both to give the programmer a choice in what implementation of the same idea is
more efficient for your problem.
I might be lying now, but I believe Elm implements everything as event streams under the
hood. Things like time wouldn't be so nice like event streams though, since there are an
infinite amount of events during any time frame. I'm not sure how Elm solves this, but I
think it's a good example on something that makes more sense as a signal, both conceptually
and in implementation.

How to find the optimal processing order?

I have an interesting question, but I'm not sure exactly how to phrase it...
Consider the lambda calculus. For a given lambda expression, there are several possible reduction orders. But some of these don't terminate, while others do.
In the lambda calculus, it turns out that there is one particular reduction order which is guaranteed to always terminate with an irreducible solution if one actually exists. It's called Normal Order.
I've written a simple logic solver. But the trouble is, the order in which it processes the constraints seems to have a huge effect on whether it finds any solutions or not. Basically, I'm wondering whether something like a normal order exists for my logic programming language. (Or wether it's impossible for a mere machine to deterministically solve this problem.)
So that's what I'm after. Presumably the answer critically depends on exactly what the "simple logic solver" is. So I will attempt to briefly describe it.
My program is closely based on the system of combinators in chapter 9 of The Fun of Programming (Jeremy Gibbons & Oege de Moor). The language has the following structure:
The input to the solver is a single predicate. Predicates may involve variables. The output from the solver is zero or more solutions. A solution is a set of variable assignments which make the predicate become true.
Variables hold expressions. An expression is an integer, a variable name, or a tuple of subexpressions.
There is an equality predicate, which compares expressions (not predicates) for equality. It is satisfied if substituting every (bound) variable with its value makes the two expressions identical. (In particular, every variable equals itself, bound or not.) This predicate is solved using unification.
There are also operators for AND and OR, which work in the obvious way. There is no NOT operator.
There is an "exists" operator, which essentially creates local variables.
The facility to define named predicates enables recursive looping.
One of the "interesting things" about logic programming is that once you write a named predicate, it typically works fowards and backwards (and sometimes even sideways). Canonical example: A predicate to concatinate two lists can also be used to split a list into all possible pairs.
But sometimes running a predicate backwards results in an infinite search, unless you rearrange the order of the terms. (E.g., swap the LHS and RHS of an AND or an OR somehwere.) I'm wondering whether there's some automated way to detect the best order to run the predicates in, to ensure prompt termination in all cases where the solution set is exactually finite.
Any suggestions?
Relevant paper, I think: http://www.cs.technion.ac.il/~shaulm/papers/abstracts/Ledeniov-1998-DCS.html
Also take a look at this: http://en.wikipedia.org/wiki/Constraint_logic_programming#Bottom-up_evaluation

Call by need: When is it used in Haskell?

http://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_need says:
"Call-by-need is a memoized version of call-by-name where, if the function argument is evaluated, that value is stored for subsequent uses. [...] Haskell is the most well-known language that uses call-by-need evaluation."
However, the value of a computation is not always stored for faster access (for example consider a recursive definition of fibonacci numbers). I asked someone on #haskell and the answer was that this memoization is done automatically "only in one instance, e.g. if you have `let foo = bar baz', foo will be evaluated once".
My questions is: What does instance exactly mean, are there other cases than let in which memoization is done automatically?
Describing this behavior as "memoization" is misleading. "Call by need" just means that a given input to a function will be evaluated somewhere between 0 and 1 times, never more than once. (It could be partially evaluated as well, which means the function only needed part of that input.) In contrast, "call by name" is simply expression substitution, which means if you give the expression 2 + 3 as an input to a function, it may be evaluated multiple times if the input is used more than once. Both call by need and call by name are non-strict: if the input is not used, then it is never evaluated. Most programming languages are strict, and use a "call by value" approach, which means that all inputs are evaluated before you begin evaluating the function, whether or not the inputs are used. This all has nothing to do with let expressions.
Haskell does not perform any automatic memoization. Let expressions are not an example of memoization. However, most compilers will evaluate let bindings in a call-by-need-esque fashion. If you model a let expression as a function, then the "call by need" mentality does apply:
let foo = expression one in expression two that uses foo
==>
(\foo -> expression two that uses foo) (expression one)
This doesn't correctly model recursive bindings, but you get the idea.
The haskell language definition does not define when, or how often, code is invoked. Infinite loops are defined in terms of 'the bottom' (written ⊥), which is a value (which exists within all types) that represents an error condition. The compiler is free to make its own decisions regarding when and how often to evaluate things as long as the program (and presence/absence of error conditions, including infinite loops!) behaves according to spec.
That said, the usual way of doing this is that most expressions generate 'thunks' - basically a pointer to some code and some context data. The first time you attempt to examine the result of the expression (ie, pattern match it), the thunk is 'forced'; the pointed-to code is executed, and the thunk overwritten with real data. This in turn can recursively evaluate other thunks.
Of course, doing this all the time is slow, so the compiler usually tries to analyze when you'd end up forcing a thunk right away anyway (ie, when something is 'strict' on the value in question), and if it finds this, it'll skip the whole thunk thing and just call the code right away. If it can't prove this, it can still make this optimization as long as it makes sure that executing the thunk right away can't crash or cause an infinite loop (or it handles these conditions somehow).
If you don't want to have to get very technical about this, the essential point is that when you have an expression like some_expensive_computation of all these arguments, you can do whatever you want with it; store it in a data structure, create a list of 53 copies of it, pass it to 6 other functions, etc, and then even return it to your caller for the caller to do whatever it wants with it.
What Haskell will (mostly) do is evaluate it at most once; if it the program ever needs to know something about what that expression returned in order to make a decision, then it will be evaluated (at least enough to know which way the decision should go). That evaluation will affect all the other references to the same expression, even if they are now scattered around in data structures and other not-yet-evaluated expressions all throughout your program.

What does it mean for something to "compose well"?

Many a times, I've come across statements of the form
X does/doesn't compose well.
I can remember few instances that I've read recently :
Macros don't compose well (context: clojure)
Locks don't compose well (context: clojure)
Imperative programming doesn't compose well... etc.
I want to understand the implications of composability in terms of designing/reading/writing code ? Examples would be nice.
"Composing" functions basically just means sticking two or more functions together to make a big function that combines their functionality in a useful way. Essentially, you define a sequence of functions and pipe the results of each one into the next, finally giving the result of the whole process. Clojure provides the comp function to do this for you, you could do it by hand too.
Functions that you can chain with other functions in creative ways are more useful in general than functions that you can only call in certain conditions. For example, if we didn't have the last function and only had the traditional Lisp list functions, we could easily define last as (def last (comp first reverse)). Look at that — we didn't even need to defn or mention any arguments, because we're just piping the result of one function into another. This would not work if, for example, reverse took the imperative route of modifying the sequence in-place. Macros are problematic as well because you can't pass them to functions like comp or apply.
Composition in programming means assembling bigger pieces out of smaller ones.
Composition of unary functions creates a more complicated unary function by chaining simpler ones.
Composition of control flow constructs places control flow constructs inside other control flow constructs.
Composition of data structures combines multiple simpler data structures into a more complicated one.
Ideally, a composed unit works like a basic unit and you as a programmer do not need to be aware of the difference. If things fall short of the ideal, if something doesn't compose well, your composed program may not have the (intended) combined behavior of its individual pieces.
Suppose I have some simple C code.
void run_with_resource(void) {
Resource *r = create_resource();
do_some_work(r);
destroy_resource(r);
}
C facilitates compositional reasoning about control flow at the level of functions. I don't have to care about what actually happens inside do_some_work(); I know just by looking at this small function that every time a resource is created on line 2 with create_resource(), it will eventually be destroyed on line 4 by destroy_resource().
Well, not quite. What if create_resource() acquires a lock and destroy_resource() frees it? Then I have to worry about whether do_some_work acquires the same lock, which would prevent the function from finishing. What if do_some_work() calls longjmp(), and skips the end of my function entirely? Until I know what goes on in do_some_work(), I won't be able to predict the control flow of my function. We no longer have compositionality: we can no longer decompose the program into parts, reason about the parts independently, and carry our conclusions back to the whole program. This makes designing and debugging much harder and it's why people care whether something composes well.
"Bang for the Buck" - composing well implies a high ratio of expressiveness per rule-of-composition. Each macro introduces its own rules of composition. Each custom data structure does the same. Functions, especially those using general data structures have far fewer rules.
Assignment and other side effects, especially wrt concurrency have even more rules.
Think about when you write functions or methods. You create a group of functionality to do a specific task. When working in an Object Oriented language you cluster your behavior around the actions you think a distinct entity in the system will perform. Functional programs break away from this by encouraging authors to group functionality according to an abstraction. For example, the Clojure Ring library comprises a group of abstractions that cover routing in web applications.
Ring is composable where functions that describe paths in the system (routes) can be grouped into higher order functions (middlewhere). In fact, Clojure is so dynamic that it is possible (and you are encouraged) to come up with patterns of routes that can be dynamically created at runtime. This is the essence of composablilty, instead of coming up with patterns that solve a certain problem you focus on patterns that generate solutions to a certain class of problem. Builders and code generators are just two of the common patterns used in functional programming. Function programming is the art of patterns that generate other patterns (and so on and so on).
The idea is to solve a problem at its most basic level then come up with patterns or groups of the lowest level functions that solve the problem. Once you start to see patterns in the lowest level you've discovered composition. As folks discover second order patterns in groups of functions they may start to see a third level. And so on...
Composition (in the context you describe at a functional level) is typically the ability to feed one function into another cleanly and without intermediate processing. Such an example of composition is in std::cout in C++:
cout << each << item << links << on;
That is a simple example of composition which doesn't really "look" like composition.
Another example with a form more visibly compositional:
foo(bar(baz()));
Wikipedia Link
Composition is useful for readability and compactness, however chaining large collections of interlocking functions which can potentially return error codes or junk data can be hazardous (this is why it is best to minimize error code or null return values.)
Provided your functions use exceptions, or alternatively return null objects you can minimize the requirement for branching (if) on errors and maximize the compositional potential of your code at no extra risk.
Object composition (vs inheritance) is a separate issue (and not what you are asking, but it shares the name). It is one of containment to derive object hierarchy as opposed to direct inheritance.
Within the context of clojure, this comment addresses certain aspects of composability. In general, it seems to emerge when units of the system do one thing well, do not require other units to understand its internals, eschew side-effects, and accept and return the system's pervasive data structures. All of the above can be seen in M2tM's C++ example.
composability, applied to functions, means that the functions are smaller and well-defined, thus easy to integrate into other functions (i have seen this idea in the book "the joy of clojure")
the concept can apply to other things that are supposed be composed into something else.
the purpose of composability is reuse. for example, a function well-build (composable) is easier to reuse
macros aren't that well-composable because you can't pass them as parameters
lock are crap because you can't really give them names (define them well) or reuse them. you just do them inplace
imperative languages aren't that composable because (some of them, at least) don't have closures. if you want functionality passed as parameter, you're screwed. you have to build an object and pass that; disclaimer here: this last idea i'm not entirely convinced is true, therefore research more before taking it for granted
another idea on imperative languages is that they don't compose well because they imply state (from wikipedia knowledgebase :) "Imperative programming - describes computation in terms of statements that change a program state").
state does not compose well because although you have given a specific "something" in input, that "something" generates an output according to it's state. different internal state, different behaviour. and thus you can say good-bye to what you where expecting to happen.
with state, you depend to much on knowing what the current state of an object is... if you want to predict it's behavior. more stuff to keep in the back of your mind, less composable (remember well-defined ? or "small and simple", as in "easy to use" ?)
ps: thinking of learning clojure, huh ? investigating... ? good for you ! :P

Resources