Access the configuration parameters through a monad? - haskell

Quote from here: http://www.haskell.org/haskellwiki/Global_variables
If you have a global environment,
which various functions read from (and
you might, for example, initialise
from a configuration file) then you
should thread that as a parameter to
your functions (after having, very
likely, set it up in your 'main'
action). If the explicit parameter
passing annoys you, then you can
'hide' it with a Monad.
Now I'm writing something that needs access to configuration parameters and I wonder if someone could point me to a tutorial or any other resource that describes how monads can be used for this purpose. Sorry if this question is stupid, I'm just starting to grok monads. Reading Mike Vainer's tutorial on them now.

The basic idea is that you write code like this:
main = do
parameters <- readConfigurationParametersSomehow
forever $ do
myData <- readUserInput
putStrLn $ bigComplicatedFunction myData parameters
bigComplicatedFunction d params = someFunction params x y z
where x = function1 params d
y = function2 params x d
z = function3 params y
You read the parameters in the "main" function with an IO action, and then pass those parameters to your worker function(s) as an extra argument.
The trouble with this style is that the parameter block has to be passed down to every little function that needs to access it. This is a nuisance. You find that some function ten levels down in the call tree now needs some run-time parameter, and you have to add that run-time parameter as an argument to all the functions in between. This is known as tramp data.
The monad "solution" is to embed the run-time parameter in the Reader Monad, and make all your functions into monadic actions. This gets rid of the explicit tramp data parameter, but replaces it with a monadic type, and under the hood this monad is actually doing the data tramping for you.
The imperative world solves this problem with a global variable. In Haskell you can sort-of do the same thing like this:
parameters = unsafePerformIO readConfigurationParametersSomehow
The first time you use "parameters" the "readConfigurationParametersSomehow" gets executed, and from then on it behaves like a constant value, at least as long as your program is running. This is one of the few righteous uses for unsafePerformIO.
However if you find yourself needing such a solution then you really need to have a think about your design. Odds are you are not thinking hard enough about generalising your functions lower down; if some previously pure function suddenly needs a run-time parameter then look at the reason and see if you can exploit higher order functions in some way. For instance:
Pass down a function built using the parameter rather than the parameter itself.
Have the worker function at the bottom return a function as a result, which gets
passed up to be composed with a parameter-based function at the higher level.
Refactor your call stack so that fundamental operations are done by lower level
primitives at the bottom which are composed in a parameter-dependent way at the top.
Either way is going to involve

Related

How does variable binding work with recursion in Haskell?

I was reading about Haskell and it stated that once a variable is bound to an expression it cannot be rebound for example
x = 10
x = 11
Assign.hs:2:1: error:
Multiple declarations of ‘x’
Declared at: Assign.hs:1:1
Assign.hs:2:1
So if this is the case... how is recursion working where the same variable keeps getting bound to other things? For example
drop n xs = if n <= 0 || null xs
then xs
else drop (n-1) (tail xs)
the variable n... is being bound again and again every time the function recurses. Why is this allowed?
Also if you guys could tell me some key words to search for so I can learn more about this myself I'd highly appreciate it.
Haskell has lexical scoping. When you declare a variable inside a scope, it hides any variable with the same name from an outer scope. For example, in the following program,
x :: Int
x = 5
main :: IO ()
main = do
x <- readLn :: IO Int
print x
If you compile with ghc -Wall, it will compile correctly, but give you the following warnings:
sx-scope.hs:2:1: warning: [-Wunused-top-binds]
Defined but not used: ‘x’
sx-scope.hs:6:10: warning: [-Wname-shadowing]
This binding for ‘x’ shadows the existing binding
defined at sx-scope.hs:2:1
So the x outside main and the x in main are different variables, and the one in the inner scope temporarily hides the one in the outer scope. (Okay, “temporarily” in this example means until main returns, which is as long as the program is running, but you get the concept.)
Similarly, when you declare drop n xs or \n xs ->, n and xs are local variables that only exist during that invocation of the function. It can be called more than once with different values of n and xs.
When a function is tail-recursive, that is, returns a call to itself, the compiler knows it is about to replace the old parameters, which were from a scope that no longer exists, with updated parameters of the same type. So it can re-use the stack frame and store the new parameters in the same locations as the previous ones. The resulting code can be as fast as iteration in a procedural language.
In almost any language, identifiers are only meaningful because they exist in a scope; conceptually a sort of implicit map from names to the things they denote.
In modern languages you usually have (at least) a global scope, and each local scopes associated with each function/procedure. Pseudocode example:
x = 1
print x
x = 2
print x
function plus_one(a):
b = 1
return a + b
print plus_one(x)
x is a name in the global scope, a and b are names in the local scope of the function plus_one.
Imperative languages (and non-pure declarative languages) are generally understood by thinking of those names as mapped to a sort of slot or pigeon hole, into which things can be stored by assignment, and the current connects referred to by using the name. This works only because an imperative step-by-step way of thinking about these programs gives us a way of understanding what "current" means.1 The above example shows this; x is first assigned 1, then printed, then assigned 2, then printed again; we'd expect this to print "1" and then print "2" (and then print "3" from the last line).
Given that understanding of variables as slots that can store things, it's easy to fall into the trap of thinking of the local variables that represent a function's arguments as just slots that get filled when you call the function. I say trap because this is not a helpful way of thinking about function arguments and calls, even in imperative languages. It more or less works as long as each function only ever has one call "in flight" at once, but introducing one of any number of common programming features breaks this model (recursion, concurrency, laziness, closures, etc). That's the misunderstanding at the heart of a number of questions I've seen here on SO, where the poster was having trouble understanding recursive calls, or wanted to know how to access the local variables of a function from outside, or etc.
You should actually think of a function as having a separate scope associated with each call of that function2. The function itself is kind of like a template for a scope, rather than a scope itself (although common language does usually talk about "the scope of a function" as shorthand). If you provided bindings for the function's parameters then you can produce a scope, but a typical function is called many times with different parameters, so there isn't just one scope for that function.
Consider my pseudocode plus_one, with argument a. You could imagine that a is a local name for a variable, and the call plus_one(x) just assigns the contents of x into the slot for a and then starts executing the code of plus_one. But I contend that it's better to think that when you call plus_one on x you're creating a new scope, in which there is a variable called a (containing with the contents of the global scope x at this point), but it is not "the" a variable.
This is vitally important for understanding recursion, even in imperative languages:
function drop(n, xs):
if n <= 0 || null(xs):
return xs
else:
return drop(n - 1, tail(xs))
Here we could try to imagine that there's only one xs variable, and when we make the recursive call to drop we're assigning the tail of the original xs to the xs variable and starting the function's code again. But that falls down as soon as we change it to something like:
function drop(n, xs):
if n <= 0 || null(xs):
return xs
else:
result = drop(n - 1, tail(xs))
print xs
return result
Now we're using xs after the recursive call. What this does is hard to explain if we're imagining that there's only one xs, but trivial if we think of there being a separate xs in a separate scope every time we call drop. When we make the recursive call to drop (passing it n - 1 and tail(xs)) it creates its own separate xs, so it's entirely unmysterious that print xs in this scope still can access xs.
So, is this story different with Haskell? It's true that the nature of variables is quite different in Haskell than from typical recursive languages. Instead of scopes mapping names to slots in which we can place different contents at different times, scopes map names directly to values, and there's no notion of time in which we could say an identifier "was" bound to one value (or was not bound to anything) and "now" is bound to a different value. x = 1 at global scope in Haskell isn't a step that "happens" and changes something, it's just a fact. x just is 1. Haskell isn't throwing an error at your x = 10 and x = 11 lines because the designers wanted to restrict you to only assigning a variable once, rather Haskell does not have a concept of assignment at all. x = 10 is giving a definition for x in that scope, and you can't have two separate definitions for the same name in the same scope.
The local scopes that come from functions in Haskell are the same; each name just is associated with a particular value3. But the actual values are parameterised on the values on which you call the function (they are quite literally a function of those values); each call has its own separate scope, with a different mapping from names to values. It's not that at each call n "changes" to become bound to the new argument, just that each call has a different n.
So recursion affects variable binding in Haskell in pretty much the same way it does in all major imperative languages. What's more, the extremely flippant way to describe how recursion affects name binding in all of these languages is to say "it doesn't". Recursion actually isn't special at all in this way; parameter passing, local scopes, etc, works exactly the same way when you call a recursive function as when you call an "ordinary" non-recursive functions, provided you understand the local scopes as being associated with each call of the function, not as a single thing associated with the function.
1 Sometimes even the scope mapping itself is mutable, and entries mapping names to slots can be added and removed as steps of the program. Python, for example, has a mutable global scope (in each module); you can add and remove module global variables dynamically, even with names determined from runtime data. But it uses immutable local scopes: a function's local variables exist even before they've been assigned a value (or after that value has been removed with del).
2 At least, this is how functions/procedures very common work in modern languages. It's not totally universal, but it's generally acknowledged to be a good way for functions/procedures to work.
3 Of course, thanks to laziness the particular value might be the bottom value, which is the "value" we pretend is the value of infinite loops and other forms of nontermination, so that we can interpret expressions as always having a value.
From https://en.wikibooks.org/wiki/Haskell/Variables_and_functions,
Within a given scope, a variable in Haskell gets defined only once and cannot change.
but scope isn't just the code of the function. Roughly, it's the code of function + its context, i.e. the arguments it's been called with, and anything from its outer scope. This is often referred to as a closure. Every time you call a function, the code runs under a new closure, and so the variables can evaluate to different things.
The recursive case isn't anything special with regards to this: a function being called twice by other code would run with a different closure, and its internal variables can evaluate to different things.

Guarantee of sameness of output after switching order in functional programming

I started reading some of Haskell's documentation, and there's a fundamental concept I just don't understand. I read about it in other places as well, but I want to understand it once and for all.
In many places discussing functional programing, I keep reading that if the functions you're using are pure (have no side effects, and give same response for the same input at every call) then you can switch the order in which they are called when composing them, with it being guaranteed that the output of this composed call will remain the same regardless of the order.
For example, here is an entry from the Haskell Wiki:
Haskell is a pure language, which means that the result of any
function call is fully determined by its arguments. Pseudo-functions
like rand() or getchar() in C, which return different results on each
call, are simply impossible to write in Haskell. Moreover, Haskell
functions can't have side effects, which means that they can't effect
any changes to the "real world", like changing files, writing to the
screen, printing, sending data over the network, and so on. These two
restrictions together mean that any function call can be replaced by
the result of a previous call with the same parameters, and the
language guarantees that all these rearrangements will not change the
program result!
But when I fiddle with this idea I can quickly think of examples that contradict the statement above. For instance, let's say I have two functions (I will use pseudo code rather than Haskell):
x(a)->a+3
y(a)->a*3
z(a)->x(y(a))
w(a)->y(x(a))
Now, if we execute z and w, we get:
z(5) //gives 3*5+3=18
w(5) //gives (5+3)*3=24
That being so, I think I misunderstood the promised guarantee they speak about. Can anybody explain it to me?
When you compare x(y(a)) to y(x(a)), those two expressions are not equivalent because x and y aren't called with the same arguments in each. In the first expression x is called with the argument y(a) and y is called with the argument a. Whereas in the second y is called with x(a), not a, as its argument and x is called with a, not y(a). So: different arguments, (possibly) different results.
When people say that the order does not matter, they mean that in the following code:
a = f(x)
b = g(y)
you can switch the definition of a and b without affecting their values. That is it makes no difference whether f is called before g or vice versa. This is clearly not true for the following code:
a = getchar()
b = getchar()
If you switch a and b here, their values are switched as well, because getchar returns a (possibly) different character each time that it's called. So a purely functional language can't have a function exactly like getchar.

Is there a fast way of going from a symbol to a function call in Julia? [duplicate]

This question already has an answer here:
Julia: invoke a function by a given string
(1 answer)
Closed 6 years ago.
I know that you can call functions using their name as follows
f = x -> println(x)
y = :f
eval(:($y("hi")))
but this is slow since it is using eval is it possible to do this in a different way? I know it's easy to go the other direction by just doing symbol(f).
What are you trying to accomplish? Needing to eval a symbol sounds like a solution in search of a problem. In particular, you can just pass around the original function, thereby avoiding issues with needing to track the scope of f (or, since f is just an ordinary variable in your example, the possibility that it would get reassigned), and with fewer characters to type:
f = x -> println(x)
g = f
g("hi")
I know it's easy to go the other direction by just doing symbol(f).
This is misleading, since it's not actually going to give you back f (that transform would be non-unique). But it instead gives you the string representation for the function (which might happen to be f, sometimes). It is simply equivalent to calling Symbol(string(f)), since the combination is common enough to be useful for other purposes.
Actually I have found use for the above scenario. I am working on a simple form compiler allowing for the convenient definition of variational problems as encountered in e.g. finite element analysis.
I am relying on the Julia parser to do an initial analysis of the syntax. The equations entered are valid Julia syntax, but will trigger errors on execution because some of the symbols or methods are not available at the point of the problem definition.
So what I do is roughly this:
I have a type that can hold my problem description:
type Cmd f; a; b; end
I have defined a macro so that I have access to the problem description AST. I travers this expression and create a Cmd object from its elements (this is not completely unlike the strategy behind the #mat macro in MATLAB.jl):
macro m(xp)
c = Cmd(xp.args[1], xp.args[3], xp.args[2])
:($c)
end
At a later step, I run the Cmd. Evaluation of the symbols happens only at this stage (yes, I need to be careful of the evaluation context):
function run(c::Cmd)
xp = Expr(:call, c.f, c.a, c.b)
eval(xp)
end
Usage example:
c = #m a^b
...
a, b = 2, 3
run(c)
which returns 9. So in short, the question is relevant in at least some meta-programming scenarios. In my case I have to admit I couldn't care less about performance as all of this is mere preprocessing and syntactic sugar.

Does Unbound always need to be in a `FreshM` monad?

I'm working on a project based on some existing code that uses the unbound library.
The code uses unsafeUnbind a bunch, which is causing me problems.
I've tried using freshen, but I get the following error:
error "fresh encountered bound name!
Please report this as a bug."
I'm wondering:
Is the library intended to be used entirely within a FreshM monad? Or are their ways to do things like lambda application without being in Fresh?
What kinds of values can I give to freshen, in order to avoid the errors they list?
If I end up using unsafeUnbind, under what conditions is it safe to use?
Is the library intended to be used entirely within a FreshM monad? Or are their ways to do things like lambda application without being in Fresh?
In most situations you will want to operate within a Fresh or an LFresh monad.
What kinds of values can I give to freshen, in order to avoid the errors they list?
So I think the reason you're getting the error is because you're passing a term to freshen rather than a pattern. In Unbound, patterns are like a generalization of names: a single Name E is a pattern consisting of a single variable which stands for Es, but also (p1, p2) or [p] are patterns comprised of a pair of patterns p1 and p2 or a list of patterns p, respectively. This lets you define terms that bind two variables at the same time, for example. Other more exotic type constructors include Embed t and Rebind p1 p2 former makes a pattern that embeds a term inside of a pattern, while the latter is similar to (p1,p2) except that the names within p1 scope over p2 (for example if p2 has Embeded terms in it, p1 will be scope over those terms). This is really powerful because it lets you define things like Scheme's let* form, or telescopes like in dependently typed languages. (See the paper for details).
Now finally the type constructorBind p t is what brings a term and a type together: A term Bind p t means that the names in p are bound in Bind p t and scope over t. So an (untyped) lambda term might be constructed with data Expr = Lam (Bind Var Expr) | App Expr Expr | V Var where type Var = Name Expr.
So back to freshen. You should only call freshen on patterns so calling it on something of type Bind p t is incorrect (and I suspect the source of the error message you're seeing) - you should call it on just the p and then apply the resulting permutation to the term t to apply the renaming that freshen constructs.
If I end up using `unsafeUnbind, under what conditions is it safe to use?
The place where I've used it is if I need to temporarily sneak under a binder and do some operation that I know for sure does not do anything to the names. An example might be collecting some source position annotations from a term, or replacing some global constant by a closed term. Also if you can guarantee that the term you're working with already has been renamed so any names that you unsafeUnbind are going to be unique already.
Hope this helps.
PS: I maintain unbound-generics which is a clone of Unbound, but using GHC.Generics instead of RepLib.

Call by need: When is it used in Haskell?

http://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_need says:
"Call-by-need is a memoized version of call-by-name where, if the function argument is evaluated, that value is stored for subsequent uses. [...] Haskell is the most well-known language that uses call-by-need evaluation."
However, the value of a computation is not always stored for faster access (for example consider a recursive definition of fibonacci numbers). I asked someone on #haskell and the answer was that this memoization is done automatically "only in one instance, e.g. if you have `let foo = bar baz', foo will be evaluated once".
My questions is: What does instance exactly mean, are there other cases than let in which memoization is done automatically?
Describing this behavior as "memoization" is misleading. "Call by need" just means that a given input to a function will be evaluated somewhere between 0 and 1 times, never more than once. (It could be partially evaluated as well, which means the function only needed part of that input.) In contrast, "call by name" is simply expression substitution, which means if you give the expression 2 + 3 as an input to a function, it may be evaluated multiple times if the input is used more than once. Both call by need and call by name are non-strict: if the input is not used, then it is never evaluated. Most programming languages are strict, and use a "call by value" approach, which means that all inputs are evaluated before you begin evaluating the function, whether or not the inputs are used. This all has nothing to do with let expressions.
Haskell does not perform any automatic memoization. Let expressions are not an example of memoization. However, most compilers will evaluate let bindings in a call-by-need-esque fashion. If you model a let expression as a function, then the "call by need" mentality does apply:
let foo = expression one in expression two that uses foo
==>
(\foo -> expression two that uses foo) (expression one)
This doesn't correctly model recursive bindings, but you get the idea.
The haskell language definition does not define when, or how often, code is invoked. Infinite loops are defined in terms of 'the bottom' (written ⊥), which is a value (which exists within all types) that represents an error condition. The compiler is free to make its own decisions regarding when and how often to evaluate things as long as the program (and presence/absence of error conditions, including infinite loops!) behaves according to spec.
That said, the usual way of doing this is that most expressions generate 'thunks' - basically a pointer to some code and some context data. The first time you attempt to examine the result of the expression (ie, pattern match it), the thunk is 'forced'; the pointed-to code is executed, and the thunk overwritten with real data. This in turn can recursively evaluate other thunks.
Of course, doing this all the time is slow, so the compiler usually tries to analyze when you'd end up forcing a thunk right away anyway (ie, when something is 'strict' on the value in question), and if it finds this, it'll skip the whole thunk thing and just call the code right away. If it can't prove this, it can still make this optimization as long as it makes sure that executing the thunk right away can't crash or cause an infinite loop (or it handles these conditions somehow).
If you don't want to have to get very technical about this, the essential point is that when you have an expression like some_expensive_computation of all these arguments, you can do whatever you want with it; store it in a data structure, create a list of 53 copies of it, pass it to 6 other functions, etc, and then even return it to your caller for the caller to do whatever it wants with it.
What Haskell will (mostly) do is evaluate it at most once; if it the program ever needs to know something about what that expression returned in order to make a decision, then it will be evaluated (at least enough to know which way the decision should go). That evaluation will affect all the other references to the same expression, even if they are now scattered around in data structures and other not-yet-evaluated expressions all throughout your program.

Resources