Using parameters that don't change after being read - haskell

I'm learning Haskell and writing a program to solve a toy problem. The program uses a parameter k that doesn't change while it is running, after the parameter has been read from a file. I'm very new to using pure functions, and I would like to write as many functions as pure ones as I can.
I have a data type Node, and functions to compare nodes, get the descendants of nodes, and more. Currently, all of these functions take a parameter k as an argument, such as
compare k node1 node2 = ...
desc k node = ...
and whenever I have to recursively call any of these within the function, I have to repeat the k parameter. It seems redundant, since k will never have a different value for these functions and since it makes the type signatures less readable, and I would like to refactor it out if possible.
Are there any strategies to do this with pure functions, or is it simply a limitation I'll have to deal with?
What I've thought of
Earlier I hard-coded k at the top level, and it seemed to work (I was able to use k in the functions without needing it as an explicit argument). But this obviously wasn't feasible once I needed to read input from file.
Another possible strategy would be to define all these functions in the main function, but this seems to be strongly discouraged in Haskell.

The usual Haskell approach would be to use the Reader monad. One way of thinking about Reader is that it provides computations with access to an environment. It can be defined as
newtype Reader r a = Reader { runReader :: r -> a }
So your functions would have the types
compare :: Node -> Node -> Reader k Ordering -- or Bool, or whatever the return value is
desc :: Node -> Reader k String -- again, guessing at the output type.
Within a Reader computation, use the function ask :: Reader r r to get access to the parameter.
At the top level, you can run a Reader computation with runReader theComputation env
This is often nicer than passing arguments explicitly. First, any function that doesn't need the environment can be written as a normal function without taking it as a parameter. If it calls another function that does use the environment, the monad will provide it automatically with no extra work on your part.
You can even define a type synonym,
type MyEnv = Reader Env
and use that for your functions type signatures. Then if you need to change the environment, you only need to change one type instead of changing all your type signatures.
The definition from standard libraries is a bit more complicated to handle monad transformers, but it works the same way as this simpler version.

Ultimately you have to pass in the value of k everywhere it is needed, but there are some things you can do to avoid repeating it.
One thing you can do is to define convenience functions once the value of k is known:
myfunc = let k = ...
compare' = compare k
desc' = desc k
in ...
(use compare' and desc' here)
Another approach is to use the Implicit Parameters extension. This involves defining compare and desc to take k as an implicit parameter:
{-# LANGUAGE ImplicitParameters #-}
compare :: (?k :: Int) => Node -> Node
compare n1 n2 = ... (can use ?k here) ...
desc :: (?k :: Int) => Node
desc = ... (can use ?k here) ...
myfunc = let ?k = ...
in ... use compare and desc ...
Note that in either case you can't call compare or desc until you've defined what k is.

This is how I like to structure recursive functions with values that don't change
map f xs = map' xs
where map' (x:xs) = f x : map' xs

Two simple tricks with local function definitions that may be useful. First, you can make k implicit within your recursive definitions by just playing with scope:
desc :: Int -> Node -> [Node]
desc k node = desc' node
where
desc' node = -- Carry on; k is in scope now.
Second, if you are going to call your functions a lot of times with the same k within the same scope, you can use a local definition for the partially applied function:
main = do
k <- getKFromFile
-- etc.
let desc' = desc k -- Partial application.
descA = desc' nodeA
descB = desc' nodeB
print [descA, descB]
Proper implicit parameter passing is commonly done (or, arguably, simulated) with the Reader monad (see John L's answer), though that sounds kind of heavyweight for your use case.

Related

Using different Ordering for Sets

I was reading a Chapter 2 of Purely Functional Data Structures, which talks about unordered sets implemented as binary search trees. The code is written in ML, and ends up showing a signature ORDERED and a functor UnbalancedSet(Element: ORDERED): SET. Coming from more of a C++ background, this makes sense to me; custom comparison function objects form part of the type and can be passed in at construction time, and this seems fairly analogous to the ML functor way of doing things.
When it comes to Haskell, it seems the behavior depends only on the Ord instance, so if I wanted to have a set that had its order reversed, it seems like I'd have to use a newtype instance, e.g.
newtype ReverseInt = ReverseInt Int deriving (Eq, Show)
instance Ord ReverseInt where
compare (ReverseInt a) (ReverseInt b)
| a == b = EQ
| a < b = GT
| a > b = LT
which I could then use in a set:
let x = Set.fromList $ map ReverseInt [1..5]
Is there any better way of doing this sort of thing that doesn't resort to using newtype to create a different Ord instance?
No, this is really the way to go. Yes, having a newtype is sometimes annoying but you get some big benefits:
When you see a Set a and you know a, you immediately know what type of comparison it uses (sort of the same way that purity makes code more readable by not making you have to trace execution). You don't have to know where that Set a comes from.
For many cases, you can coerce your way through multiple newtypes at once. For example, I can turn xs = [1,2,3] :: Int into ys = [ReverseInt 1, ReverseInt 2, ReverseInt 3] :: [ReverseInt] just using ys = coerce xs :: [ReverseInt]. Unfortunately, that isn't the case for Set (and it shouldn't - you'd need the coercion function to be monotonic to not screw up the data structure invariants, and there is not yet a way to express that in the type system).
newtypes end up being more composable than you expect. For example, the ReverseInt type you made already exists in a form that generalizes to reversing any type with an Ord constraint: it is called Down. To be explicit, you could use Down Int instead of ReversedInt, and you get the instance you wrote out for free!
Of course, if you still feel very strongly about this, nothing is stopping you from writing your version of Set which has to have a field which is the comparison function it uses. Something like
data Set a = Set { comparisionKey :: a -> a -> Ordering
, ...
}
Then, every time you make a Set, you would have to pass in the comparison key.

Accessing vector element by index using lens

I'm looking for a way to reference an element of a vector using lens library...
Let me try to explain what I'm trying to achieve using a simplified example of my code.
I'm working in this monad transformer stack (where StateT is the focus, everything else is not important)
newtype MyType a = MyType (StateT MyState (ExceptT String IO) a)
MyState has a lot of fields but one of those is a vector of clients which is a data type I defined:
data MyState = MyState { ...
, _clients :: V.Vector ClientT
}
Whenever I need to access one of my clients I tend to do it like this:
import Control.Lens (use)
c <- use clients
let neededClient = c V.! someIndex
... -- calculate something, update client if needed
clients %= (V.// [(someIndex, updatedClient)])
Now, here is what I'm looking for: I would like my function to receive a "reference" to the client I'm interested in and use it (retrieve it from State, update it if needed).
In order to clear up what I mean here is a snippet (that won't compile even in pseudo code):
...
myFunction (clients.ix 0)
...
myFunction clientLens = do
c <- use clientLens -- I would like to access a client in the vector
... -- calculate stuff
clientLens .= updatedClient
Basically, I would like to pass to myFunction something from Lens library (I don't know what I'm passing here... Lens? Traversal? Getting? some other thingy?) which will allow me to point at particular element in the vector which is kept in my StateT. Is it at all possible? Currently, when using "clients.ix 0" I get an error that my ClientT is not an instance of Monoid.
It is a very dumbed down version of what I have. In order to answer the question "why I need it this way" requires a lot more explanation. I'm interested if it is possible to pass this "reference" which will point to some element in my vector which is kept in State.
clients.ix 0 is a traversal. In particular, traversals are setters, so setting and modifying should work fine:
clients.ix 0 .= updatedClient
Your problem is with use. Because a traversal doesn't necessarily contain exactly one value, when you use a traversal (or use some other getter function on it), it combines all the values assuming they are of a Monoid type.
In particular,
use (clients.ix n)
would want to return mempty if n is out of bounds.
Instead, you can use the preuse function, which discards all but the first target of a traversal (or more generally, a fold), and wraps it in a Maybe. E.g.
Just c <- preuse (clients.ix n)
Note this will give a pattern match error if n is out of bounds, since preuse returns Nothing then.

How do I use the Supply monad to create a function that generates globally unique names?

Background:
I'm doing a code translation project that requires me to generate variable names. None of the names I generate should be duplicates of each other.
I'm really frustrated since this would be stupidly simple and elegant with a Python generator function.
What I've tried:
The way I was doing it before was to pass a counter variable down through recursive calls to my translate code, and pass the (possibly incremented) counter back up in the return value of basically every function.
This was really messy: it added an extra parameter to keep track of to each of these functions; and worse still it forced me to work with messy tuple return values where I would otherwise have a simple unary return value.
I've never really gotten proficient with monads in my short time with Haskell, but I had an inkling that I could use a wrapper on the State monad to simulate a global counter variable. After 3 days of messing around trying to grok monads and make one of my own, then trying to alter someone else's monads to generate the values I needed, I've finally resigned myself to straight-up using someone else's high-level monad (perhaps with a few alterations.)
My problem now:
I've identified the MonadSupply and MonadUnique modules as a couple which likely provide the simple kind of interface I need. Unfortunately I can't figure out how to use them.
In particular the MonadSupply module documentation provides this nice example use case:
runSupplyVars x = runSupply x vars
where vars = [replicate k ['a'..'z'] | k <- [1..]] >>= sequence
Looks like what I want! Once I got the module to compile I checked the type of this function in the interpreter:
> :t runSupplyVars
runSupplyVars :: Supply [Char] a -> Identity (a, [[Char]])
I've tried passing lots (hours worth) of different things to this function, with no success. I also tried passing the function to some various other functions to see if they would provide the parameters I needed implicitly. No luck so far.
The Questions:
Could someone please provide an example use case of this runSupplyVars function?
Would it be possible to do what I'm thinking with it? I want to have a function I can call from anywhere in the program, which will provide me with a different variable name or integer on each call.
To actually use the Supply monad you should structure your code with do notation and call the supply function when you actually need a name.
For example, this will produce a new variable name prefixed with var_, just to show how you might get something from the supply and use it:
newVar :: Supply [Char] [Char]
newVar = do
name <- supply
return ("var"_++name)
You'll need to structure your whole program around the Supply monad and then only call runSupplyVars once at the top-level, otherwise different parts of the program will have independent supplies and so might reuse the same variable name.
Finally, you'll need runIdentity from Control.Monad.Identity to unpack the result of runSupplyVars into the underlying tuple of type (a, [[Char]]), and then throw away the second value which is just the (infinite) list of unused names. You might be better off redefining runSupplyVars to do this for you:
import Control.Monad.Identity
[...]
runSupplyVars :: Supply [Char] a -> a
runSupplyVars x = fst (runIdentity (runSupply x vars))
where vars = [replicate k ['a'..'z'] | k <- [1..]] >>= sequence
Here's a more complete example putting it all together. Note the different monads with which do notation is used - IO for the main function, and Supply [Char] for realProgram and probably most of the rest of the code in a bigger version:
import MonadSupply
import Control.Monad.Identity
main :: IO ()
main = do
let result = runSupplyVars realProgram
print result
realProgram :: Supply [Char] Int
realProgram = do
x <- newVar
return 0
newVar :: Supply [Char] [Char]
newVar = do
name <- supply
return ("var_"++name)
runSupplyVars :: Supply [Char] a -> a
runSupplyVars x = fst (runIdentity (runSupply x vars))
where vars = [replicate k ['a'..'z'] | k <- [1..]] >>= sequence

Short-lived memoization in Haskell?

In an object-oriented language when I need to cache/memoize the results of a function for a known life-time I'll generally follow this pattern:
Create a new class
Add to the class a data member and a method for each function result I want to cache
Implement the method to first check to see if the result has been stored in the data member. If so, return that value; else call the function (with the appropriate arguments) and store the returned result in the data member.
Objects of this class will be initialized with values that are needed for the various function calls.
This object-based approach is very similar to the function-based memoization pattern described here: http://www.bardiak.com/2012/01/javascript-memoization-pattern.html
The main benefit of this approach is that the results are kept around only for the life time of the cache object. A common use case is in the processing of a list of work items. For each work item one creates the cache object for that item, processes the work item with that cache object then discards the work item and cache object before proceeding to the next work item.
What are good ways to implement short-lived memoization in Haskell? And does the answer depend on if the functions to be cached are pure or involve IO?
Just to reiterate - it would be nice to see solutions for functions which involve IO.
Let's use Luke Palmer's memoization library: Data.MemoCombinators
import qualified Data.MemoCombinators as Memo
import Data.Function (fix) -- we'll need this too
I'm going to define things slightly different from how his library does, but it's basically the same (and furthermore, compatible). A "memoizable" thing takes itself as input, and produces the "real" thing.
type Memoizable a = a -> a
A "memoizer" takes a function and produces the memoized version of it.
type Memoizer a b = (a -> b) -> a -> b
Let's write a little function to put these two things together. Given a Memoizable function and a Memoizer, we want the resultant memoized function.
runMemo :: Memoizer a b -> Memoizable (a -> b) -> a -> b
runMemo memo f = fix (f . memo)
This is a little magic using the fixpoint combinator (fix). Never mind that; you can google it if you are interested.
So let's write a Memoizable version of the classic fib example:
fib :: Memoizable (Integer -> Integer)
fib self = go
where go 0 = 1
go 1 = 1
go n = self (n-1) + self (n-2)
Using a self convention makes the code straightforward. Remember, self is what we expect to be the memoized version of this very function, so recursive calls should be on self. Now fire up ghci.
ghci> let fib' = runMemo Memo.integral fib
ghci> fib' 10000
WALL OF NUMBERS CRANKED OUT RIDICULOUSLY FAST
Now, the cool thing about runMemo is you can create more than one freshly memoized version of the same function, and they will not share memory banks. That means that I can write a function that locally creates and uses fib', but then as soon as fib' falls out of scope (or earlier, depending on the intelligence of the compiler), it can be garbage collected. It doesn't have to be memoized at the top level. This may or may not play nicely with memoization techniques that rely on unsafePerformIO. Data.MemoCombinators uses a pure, lazy Trie, which fits perfectly with runMemo. Rather than creating an object which essentially becomes a memoization manager, you can simply create memoized functions on demand. The catch is that if your function is recursive, it must be written as Memoizable. The good news is you can plug in any Memoizer that you wish. You could even use:
noMemo :: Memoizer a b
noMemo f = f
ghci> let fib' = runMemo noMemo fib
ghci> fib' 30 -- wait a while; it's computing stupidly
1346269
Lazy-Haskell programming is, in a way, the memoization paradigm taken to a extreme. Also, whatever you do in an imperative language is possible in Haskell, using either IO monad, the ST monad, monad transformers, arrows, or you name what.
The only problem is that these abstraction devices are much more complicated than the imperative equivalent that you mentioned, and they need a pretty deep mind-rewiring.
I believe the above answers are both more complex than necessary, although they might be more portable than what I'm about to describe.
As I understand it, there is a rule in ghc that each value is computed exactly once when it's enclosing lambda expression is entered. You may thus create exactly your short lived memoization object as follows.
import qualified Data.Vector as V
indexerVector :: (t -> Int) -> V.Vector t -> Int -> [t]
indexerVector idx vec = \e -> tbl ! e
where m = maximum $ map idx $ V.toList vec
tbl = V.accumulate (flip (:)) (V.replicate m [])
(V.map (\v -> (idx v, v)) vec)
What does this do? It groups all the elements in the Data.Vector t passed as it's second argument vec according to integer computed by it's first argument idx, retaining their grouping as a Data.Vector [t]. It returns a function of type Int -> [t] which looks up this grouping by this pre-computed index value.
Our compiler ghc has promised that tbl shall only be thunked once when we invoke indexerVector. We may therefore assign the lambda expression \e -> tbl ! e returned by indexVector to another value, which we may use repeatedly without fear that tbl ever gets recomputed. You may verify this by inserting a trace on tbl.
In short, your caching object is exactly this lambda expression.
I've found that almost anything you can accomplish with a short term object can be better accomplished by returning a lambda expression like this.
You can use very same pattern in haskell too. Lazy evaluation will take care of checking whether value is evaluated already. It has been mentioned mupltiple times already but code example could be useful. In example below memoedValue will calculated only once when it is demanded.
data Memoed = Memoed
{ value :: Int
, memoedValue :: Int
}
memo :: Int -> Memoed
memo i = Memoed
{ value = i
, memoedValue = expensiveComputation i
}
Even better you can memoize values which depend on other memoized values. You shoud avoid dependecy loops. They can lead to nontermination
data Memoed = Memoed
{ value :: Int
, memoedValue1 :: Int
, memoedValue2 :: Int
}
memo :: Int -> Memoed
memo i = r
where
r = Memoed
{ value = i
, memoedValue1 = expensiveComputation i
, memoedValue2 = anotherComputation (memoedValue1 r)
}

SML conversions to Haskell

A few basic questions, for converting SML code to Haskell.
1) I am used to having local embedded expressions in SML code, for example test expressions, prints, etc. which functions local tests and output when the code is loaded (evaluated).
In Haskell it seems that the only way to get results (evaluation) is to add code in a module, and then go to main in another module and add something to invoke and print results.
Is this right? in GHCi I can type expressions and see the results, but can this be automated?
Having to go to the top level main for each test evaluation seems inconvenient to me - maybe just need to shift my paradigm for laziness.
2) in SML I can do pattern matching and unification on a returned result, e.g.
val myTag(x) = somefunct(a,b,c);
and get the value of x after a match.
Can I do something similar in Haskell easily, without writing separate extraction functions?
3) How do I do a constructor with a tuple argument, i.e. uncurried.
in SML:
datatype Thing = Info of Int * Int;
but in Haskell, I tried;
data Thing = Info ( Int Int)
which fails. ("Int is applied to too many arguments in the type:A few Int Int")
The curried version works fine,
data Thing = Info Int Int
but I wanted un-curried.
Thanks.
This question is a bit unclear -- you're asking how to evaluate functions in Haskell?
If it is about inserting debug and tracing into pure code, this is typically only needed for debugging. To do this in Haskell, you can use Debug.Trace.trace, in the base package.
If you're concerned about calling functions, Haskell programs evaluate from main downwards, in dependency order. In GHCi you can, however, import modules and call any top-level function you wish.
You can return the original argument to a function, if you wish, by making it part of the function's result, e.g. with a tuple:
f x = (x, y)
where y = g a b c
Or do you mean to return either one value or another? Then using a tagged union (sum-type), such as Either:
f x = if x > 0 then Left x
else Right (g a b c)
How do I do a constructor with a tuple argument, i.e. uncurried in SML
Using the (,) constructor. E.g.
data T = T (Int, Int)
though more Haskell-like would be:
data T = T Int Bool
and those should probably be strict fields in practice:
data T = T !Int !Bool
Debug.Trace allows you to print debug messages inline. However, since these functions use unsafePerformIO, they might behave in unexpected ways compared to a call-by-value language like SML.
I think the # syntax is what you're looking for here:
data MyTag = MyTag Int Bool String
someFunct :: MyTag -> (MyTag, Int, Bool, String)
someFunct x#(MyTag a b c) = (x, a, b, c) -- x is bound to the entire argument
In Haskell, tuple types are separated by commas, e.g., (t1, t2), so what you want is:
data Thing = Info (Int, Int)
Reading the other answers, I think I can provide a few more example and one recommendation.
data ThreeConstructors = MyTag Int | YourTag (String,Double) | HerTag [Bool]
someFunct :: Char -> Char -> Char -> ThreeConstructors
MyTag x = someFunct 'a' 'b' 'c'
This is like the "let MyTag x = someFunct a b c" examples, but it is a the top level of the module.
As you have noticed, Haskell's top level can defined commands but there is no way to automatically run any code merely because your module has been imported by another module. This is entirely different from Scheme or SML. In Scheme the file is interpreted as being executed form-by-form, but Haskell's top level is only declarations. Thus Libraries cannot do normal things like run initialization code when loaded, they have to provide a "pleaseRunMe :: IO ()" kind of command to do any initialization.
As you point out this means running all the tests requires some boilerplate code to list them all. You can look under hackage's Testing group for libraries to help, such as test-framework-th.
For #2, yes, Haskell's pattern matching does the same thing. Both let and where do pattern matching. You can do
let MyTag x = someFunct a b c
in ...
or
...
where MyTag x = someFunct a b c

Resources