I've come to Idris from Scala. Scala has tail call optimization (TCO), and I can tell the compiler to balk if it can't optimize a recursive function using TCO. For example, see these posts.
This Scala function successfully counts String lengths
def allLengths(strs: List[String]): List[Int] = strs match {
case Nil => Nil
case x :: xs => x.length :: allLengths(xs)
}
but if I annotate it with annotation.tailrec, the compiler errors
error: could not optimize #tailrec annotated method zip: it contains a recursive call not in tail position
because the function doesn't directly return a call to allLengths. If I run it (without annotation) with a really long List, I get "ERROR: too much recursion", as expected.
I can, however, rewrite it as
#tailrec
def allLengths(strs: List[String], acc: List[Int] = Nil): List[Int] = strs match {
case Nil => acc.reverse
case x :: xs => allLengths(xs, x.length :: acc)
}
which compiles fine with the annotation. With #tailrec, Scala compiles the code by converting it to an imperative loop which isn't at risk of recursion errors. I believe it may also be faster as an imperative loop.
In Brady's Idris book, he uses the example
allLengths : List String -> List Nat
allLengths [] = []
allLengths (x :: xs) = length x :: allLengths xs
which can be compiled with a total annotation, and I can't seem to cause a recursion error (though allLengths (replicate 5000 "hi") is having difficulty). Having come from Scala, I was surprised that he doesn't write it in a tail recursive way. A few questions:
are runtime recursion errors possible in Idris for recursive functions that aren't tail recursive?
are tail recursive functions optimized in Idris during compilation? What about non-tail recursive?
is there an annotation like in Scala that ensures TCO? #tailrec feels very similar to total, but the former doesn't guarantee totality and the latter doesn't guarantee tail recursion
TCOs are mostly subjective to the corresponding runtime or the backend system for Idris code generation. They can take Idris IR, identify tail calls and choose to optimize them. As I've been working on the JVM backend for Idris, I can say that JVM backend for Idris does eliminate tail recursion and uses trampolines for non-self tail calls without any explicit annotation required from the user.
Here is how Idris 2 JVM backend handles the following tail recursive functions:
reverse : List a -> List a
reverse xs = go [] xs where
go : List a -> List a -> List a
go acc [] = acc
go acc (x :: xs) = go (x :: acc) xs
allLengths : List Nat -> List String -> List Nat
allLengths acc [] = reverse acc
allLengths acc (x :: xs) = allLengths (length x :: acc) xs
Here both allLengths and go inside reverse are tail recursive and also note that allLengths calls reverse in tail position. Idris 2 JVM backend turns both of these functions into loops at the bytecode level but also trampolines other tail calls. This is how the decompiled bytecode would look like:
// `go` function decompiled code
public static Object Main$$nested1190$243$go(Object arg$0, Object arg$1, IdrisObject arg$2, IdrisObject arg$3) {
while(true) {
switch(arg$3.getConstructorId()) {
case 0:
return arg$2;
case 1:
Object e$2 = arg$3.getProperty(0);
IdrisObject e$3 = (IdrisObject)Runtime.unwrap(arg$3.getProperty(1));
arg$0 = null;
arg$2 = (IdrisObject)(new col(1, Runtime.unwrap(e$2), arg$2));
arg$3 = e$3;
break;
default:
return Runtime.crash("Unreachable code");
}
}
}
// `reverse` function
public static Thunk Main$reverse(Object arg$0, IdrisObject arg$1) {
return () -> {
return Runtime.createThunk(Main$$nested1190$243$go((Object)null, arg$1, (IdrisObject)(new Nil(0)), arg$1));
};
}
// `allLengths` function
public static Thunk Main$allLengths(IdrisObject arg$0, IdrisObject arg$1) {
while(true) {
switch(arg$1.getConstructorId()) {
case 0:
return () -> {
return Main$reverse((Object)null, arg$0);
};
case 1:
String e$2 = (String)Runtime.unwrap(arg$1.getProperty(0));
IdrisObject e$3 = (IdrisObject)Runtime.unwrap(arg$1.getProperty(1));
arg$0 = (IdrisObject)(new col(1, Runtime.unwrap(Prelude.length(e$2)), arg$0));
arg$1 = e$3;
break;
default:
return Runtime.createThunk(Runtime.crash("Unreachable code"));
}
}
}
We can see while loops for both go and allLengths and the tail recursive function calls are completely eliminated by just storing values to function arguments for next iteration of while loop. We can also see lambdas for trampoline thunks (reverse call in allLengths function) for other functions called in tail positions. Non-tail recursive functions are not transformed currently by the JVM backend so they can still exhaust the stack.
Related
I need to create a parse function. I am new in Haskell and I am interesting can my thinking be implemented in Haskell using only GHC base functions.
So the problem is : I have so message in string with coordinates and value like (x: 01, 01, ...
y:01, 02,: v: X, Y, Z) and i need to parse it type like ([Char], [Int], [Int]).
In language like C , I would create loop and go from start and would check and then put it in there arrays but I am afraid this would not work in Haskell. Can someone give a hint on a approachable solutions to this problem?
If you’re accustomed to imperative programming with loops, you can actually do a fairly literal translation of an imperative solution to Haskell using direct recursion.
Bear in mind, this isn’t the easiest or best way to arrive at a working solution, but it’s good to learn the technique so that you understand what more idiomatic solutions are abstracting away for you.
The basic principle is to replace each loop with a recursive function, and replace each mutable variable with an accumulator parameter to that function. Where you would modify the variable within an iteration of the loop, just make a new variable; where you would modify it between iterations of the loop, call the looping function with a different argument in place of that parameter.
For a simple example, consider computing the sum of a list of integers. In C, that might be written like this:
struct ListInt { int head; struct ListInt *tail; }
int total(ListInt const *list) {
int acc = 0;
ListInt const *xs = list;
while (xs != NULL) {
acc += xs->head;
xs = xs->tail;
}
return acc;
}
We can translate that literally to low-level Haskell:
total :: [Int] -> Int
total list
= loop
0 -- acc = 0
list -- xs = list
where
loop
:: Int -- int acc;
-> [Int] -- ListInt const *xs;
-> Int
loop acc xs -- loop:
| not (null xs) = let -- if (xs != NULL) {
acc' = acc + head xs -- acc += xs->head;
xs' = tail xs -- xs = xs->tail;
in loop acc' xs' -- goto loop;
-- } else {
| otherwise = acc -- return acc;
-- }
The outer function total sets up the initial state, and the inner function loop handles the iteration over the input. In this case, total immediately returns after the loop, but if there were some more code after the loop to process the results, that would go in total:
total list = let
result = loop 0 list
in someAdditionalProcessing result
It’s extremely common in Haskell for a helper function to accumulate a list of results by prepending them to the beginning of an accumulator list with :, and then reversing this list after the loop, because appending a value to the end of a list is much more costly. You can think of this pattern as using a list as a stack, where : is the “push” operation.
Also, straight away, we can make some simple improvements. First, the accessor functions head and tail may throw an error if our code is wrong and we call them on empty lists, just like accessing a head or tail member of a NULL pointer (although an exception is clearer than a segfault!), so we can simplify it and make it safer use pattern matching instead of guards & head/tail:
loop :: Int -> [Int] -> Int
loop acc [] = acc
loop acc (h : t) = loop (acc + h) t
Finally, this pattern of recursion happens to be a fold: there’s an initial value of the accumulator, updated for each element of the input, with no complex recursion. So the whole thing can be expressed with foldl':
total :: [Int] -> Int
total list = foldl' (\ acc h -> acc + h) 0 list
And then abbreviated:
total = foldl' (+) 0
So, for parsing your format, you can follow a similar approach: instead of a list of integers, you have a list of characters, and instead of a single integer result, you have a compound data type, but the overall structure is very similar:
parse :: String -> ([Char], [Int], [Int])
parse input = let
(…, …, …) = loop ([], [], []) input
in …
where
loop (…, …, …) (c : rest) = … -- What to do for each character.
loop (…, …, …) [] = … -- What to do at end of input.
If there are different sub-parsers, where you would use a state machine in an imperative language, you can make the accumulator include a data type for the different states. For example, here’s a parser for numbers separated by spaces:
import Data.Char (isSpace, isDigit)
data ParseState
= Space
| Number [Char] -- Digit accumulator
numbers :: String -> [Int]
numbers input = loop (Space, []) input
where
loop :: (ParseState, [Int]) -> [Char] -> [Int]
loop (Space, acc) (c : rest)
| isSpace c = loop (Space, acc) rest -- Ignore space.
| isDigit c = loop (Number [c], acc) rest -- Push digit.
| otherwise = error "expected space or digit"
loop (Number ds, acc) (c : rest)
| isDigit c = loop (Number (c : ds), acc) rest -- Push digit.
| otherwise
= loop
(Space, read (reverse ds) : acc) -- Save number, expect space.
(c : rest) -- Repeat loop for same char.
loop (Number ds, acc) [] = let
acc' = read (reverse ds) : acc -- Save final number.
in reverse acc' -- Return final result.
loop (Space, acc) [] = reverse acc -- Return final result.
Of course, as you may be able to tell, this approach quickly becomes very complicated! Even if you write your code very compactly, or express it as a fold, if you’re working at the level of individual characters and parser state machines, it will take a lot of code to express your meaning, and there are many opportunities for error. A better approach is to consider the data flow at work here, and put together the parser from high-level components.
For example, the intent of the above parser is to do the following:
Split the input on whitespace
For each split, read it as an integer
And that can be expressed very directly with the words and map functions:
numbers :: String -> [Int]
numbers input = map read (words input)
One readable line instead of dozens! Clearly this approach is better. Consider how you can express the format you’re trying to parse in this style. If you want to avoid libraries like split, you can still write a function to split a string on separators using base functions like break, span, or takeWhile; then you can use that to split the input into records, and split each record into fields, and parse fields as integers or textual names accordingly.
But the preferred approach for parsing in Haskell is not to manually split up input at all, but to use parser combinator libraries like megaparsec. There are parser combinators in base too, under Text.ParserCombinators.ReadP. With those, you can express a parser in the abstract, without talking about splitting up input at all, by just combining subparsers with standard interfaces (Functor, Applicative, Alternative, and Monad), for example:
import Data.Char (isDigit)
import Text.ParserCombinators.ReadP
( endBy
, eof
, munch1
, readP_to_S
, skipSpaces
, skipSpaces
)
numbers :: String -> [Int]
numbers = fst . head . readP_to_S onlyNumbersP
where
onlyNumbersP :: ReadP [Int]
onlyNumbersP = skipSpaces *> numbersP <* eof
numbersP :: ReadP [Int]
numbersP = numberP `endBy` skipSpaces
numberP :: ReadP Int
numberP = read <$> munch1 isDigit
This is the approach I would recommend in your case. Parser combinators are also an excellent way to get comfortable using applicatives and monads in practice.
I am trying to go through a list of characters in a list and do something to the current character. My java equivalent of what I am trying to accomplish is:
public class MyClass {
void repeat(String s) {
String newString = "";
for(int i = 0; i < s.length(); i++) {
newString += s.charAt(i);
newString += s.charAt(i);
}
public static void main(String args[]) {
MyClass test = new MyClass();
test.repeat("abc");
}
}
One of the nicest thing about functional programming is that patterns like yours can be encapsulated in one higher-order function; if nothing fits, you can still use recursion.
Recursion
First up, a simple recursive solution. The idea behind this is that it's like a for-loop:
recursiveFunction [] = baseCase
recursiveFunction (char1:rest) = (doSomethingWith char1) : (recursiveFunction rest)
So let's write your repeat function in this form. What is the base case? Well, if you repeat an empty string, you'll get an empty string back. What is the recursion? In this case, we're doubling the first character, then recursing along the rest of the string. So here's a recursive solution:
repeat1 [] = []
repeat1 (c:cs) = c : c : (repeat1 cs)
Higher-order Functions
As you start writing more Haskell, you'll discover that these sort of recursive solutions often fit into a few repetitive patterns. Luckily, the standard library contains several predefined recursive functions for these sort of patterns:
fmap is used to map each element of a list to a different value using a function given as a parameter. For example, fmap (\x -> x + 1) adds 1 to each element of a list. Unfortunately, it can't change the length of a list, so we can't use fmap by itself.
concat is used to 'flatten' a nested list. For example, concat [[1,2],[3,4,5]] is [1,2,3,4,5].
foldr/foldl are two more complex and generic functions. For more details, consult Learn You a Haskell.
None of these seem to directly fit your needs. However, we can use concat and fmap together:
repeat2 list = concat $ fmap (\x -> [x,x]) list
The idea is that fmap changes e.g. [1,2,3] to a nested list [[1,1],[2,2],[3,3]], which concat then flattens. This pattern of generating multiple elements from a single one is so common that the combination even has a special name: concatMap. You use it like so:
repeat3 list = concatMap (\x -> [x,x]) list
Personally, this is how I'd write repeat in Haskell. (Well, almost: I'd use eta-reduction to simplify it slightly more. But at your level that's irrelevant.) This is why Haskell in my opinion is so much more powerful than many other languages: this 7-line Java method is one line of highly readable, idiomatic Haskell!
As others have suggested, it's probably wise to start with a list comprehension:
-- | Repeat each element of a list twice.
double :: [x] -> [x]
double xs = [d | x <- xs, d <- [x, x]]
But the fact that the second list in the comprehension always has the same number of elements, regardless of the value of x, means that we don't need quite that much power: the Applicative interface is sufficient. Let's start by writing the comprehension a bit differently:
double xs = xs >>= \x -> [x, x] >>= \d -> pure d
We can simplify immediately using a monad identity law:
double xs = xs >>= \x -> [x, x]
Now we switch over to Applicative, but let's leave a hole for the hard part:
double :: [x] -> [x]
double xs = liftA2 _1 xs [False, True]
The compiler lets us know that
_1 :: x -> Bool -> x
Since the elements of the inner/second list are always the same, and always come from the current outer/first list element, we don't have to care about the Bool:
double xs = liftA2 const xs [False, True]
Indeed, we don't even need to be able to distinguish the list positions:
double xs = liftA2 const xs [(),()]
Of course, we have a special Applicative method, (<*), that corresponds to liftA2 const, so let's use it:
double xs = xs <* [(),()]
And then, if we like, we can avoid mentioning xs by switching to a "point-free" form:
-- | Repeat each element of a list twice.
double :: [x] -> [x]
double = (<* [(),()])
Now for the test:
main :: IO ()
main = print $ double [1..3]
This will print [1,1,2,2,3,3].
double admits a slight generalization of dubious value:
double :: Alternative f => f x -> f x
double = (<* join (<|>) (pure ()))
This will work for sequences as well as lists:
double (Data.Sequence.fromList [1..3]) = Data.Sequence.fromList [1,1,2,2,3,3]
but it could be a bit confusing for some other Alternative instances:
double (Just 3) = Just 3
I am trying to write a function to find the index of a given element using tail recursion. Lets say the list contains the numbers 1 through 10, and I am searching for 5, then the output should be 4. The problem I am having is 'counting' using tail recursion. However, I am not even sure if I need to maunally 'count' the number of recursive calls in this case. I tried using !! which does not help because it returns the element in a particular position. I need the the function to return the position of a particular element (the exact opposite).
I have been trying to figure this one out for a hours now.
Code:
whatIndex a [] = error "cannot search empty list"
whatIndex a (x:xs) = foo a as
where
foo m [] = error "empty list"
foo m (y:ys) = if m==y then --get index of y
else foo m ys
Note: I am trying to implement this without using library functions
Your helper function needs an additional parameter for the count.
whatIndex a as = foo as 0
where
foo [] _ = error "empty list"
foo (y:ys) c
| a == y = c
| otherwise = foo ys (c+1)
BTW, it's better form to give this function a Maybe return type instead of using errors. That's how elemIndex works too, for good reason. This would look like
whatIndex a as = foo as 0
where
foo [] _ = Nothing
foo (y:ys) c
| a == y = Just c
| otherwise = foo ys (c+1)
Note: I am trying to implement this without using library functions
This is not a good idea in general. A better exercise is this:
Figure out how to implement it using library functions.
Figure out how to implement whichever library functions you used in step 1 on your own.
This way you're learning three key skills:
What are the standard library functions, and examples of when they are useful.
How to break problems into smaller pieces
How to write basic functions like the ones in the libraries.
In this case, however, your whatIndex is more or less the same function as elemIndex in Data.List, so your problem reduces to writing your own version of this library function.
The trick here is that you want to increment a counter while you recurse down the list. There is a standard technique for writing tail recursive functions, which is called an accumulating parameter. It works like this:
You write an auxiliary function that, compared to the "front-end" function, takes an extra parameter (or more) to keep track of the extra information.
You then define the "real" function as a call to the auxiliary one.
So for elemIndex, the auxiliary function would be something like this (with i as the accumulating parameter for the current element index):
-- I'll leave the blanks for you to fill.
elemIndex' i x [] = ...
elemIndex' i x (x':xs) = ...
Then the "driver" function is this:
elemIndex x xs = elemIndex 0 x xs
But there is a serious problem here that I must mention: getting this function to perform well in Haskell is tricky. Tail recursion is a useful trick in strict (non-lazy) functional languages, but not so much in Haskell, because:
A tail-recursive function in Haskell can still blow the stack,
A non-tail-recursive function can run in constant space.
This older answer of mine shows an example of the second point.
So in your case, a non-tail-recursive solution is probably the easiest one you can give that will run in constant space (i.e., not blow the stack on a long list):
elemIndex x xs = elemIndex' x (zip xs [0..])
elemIndex' x pairs = snd (find (\(x', i) -> x == x') pairs)
-- | Combine two lists by pairing together their first elements, their second
-- elements, etc., until one of the lists runs out.
--
-- EXERCISE: write this function on your own!
zip :: [a] -> [b] -> [(a, b)]
zip xs ys = ...
-- | Return the first element x of xs such that pred x == True. Returns Nothing if
-- there isn't one, Just x if there is one.
--
-- EXERCISE: write this function on your own!
find :: (a -> Bool) -> [a] -> Maybe a
find pred xs = ...
I've written a haskell function which splits a list xs into (init xs, last xs) like so:
split xs = split' [] xs
where
split' acc (x:[]) = (reverse acc, x)
split' acc (x:xs) = split' (x:acc) xs
Since an empty list can not be split in this way, there is no match for the empty list. However, I did not want to simply error ... the function. Thus I defined the following:
split [] = ([], undefined)
Thanks to lazy evaluation I can thus define a safe init which simply returns the empty list for the empty list:
init' = fst . split
Is there some way how I could detect the undefined if I tried to access it, such that
last' xs
| isUndefined (snd xs) = ...
| otherwise = ...
I do know about Maybe and Either, and that those are a better choice for expressing what I want. However I wondered if there is a way to detect an actual value of undefined, i.e. in terms of catching errors, like catching exceptions.
undefined is no better than using error. In fact, undefined in Prelude is defined as
undefined = error "Prelude.undefined"
Now, a function that can't result in an error is called a "total function", i.e. it is valid for all input values.
The split function you've currently implemented has the signature
split :: [a] -> ([a], a)
This is a problem, since the type signature promises that the result always contains a list and an element, which is clearly impossible to provide for empty lists of generic type.
The canonical way in Haskell to address this is to change the type signature to signify that sometimes we don't have a valid value for the second item.
split :: [a] -> ([a], Maybe a)
Now you can write a proper implementation for the case where you get an empty list
split [] = ([], Nothing)
split xs = split' [] xs
where
split' acc (x:[]) = (reverse acc, Just x)
split' acc (x:xs) = split' (x:acc) xs
Now you can detect the missing value case by pattern-matching
let (init', last') = split xs
in case last' of
Nothing -> ... -- do something if we don't have a value
Just x -> ... -- do something with value x
Because bottom subsumes non-termination, the function isUndefined would have to solve the halting problem and thus cannot exist.
But note that even if it existed, you still could not tell if the undefined value in the 2nd element of your tuple was put there through your split function or if the last element of the list was already undefined.
The error function doesn't do anything until it is evaluated, so you can do something like:
split [] = ([], error "split: empty list")
last' = snd . split
From the Haskell 2010 Language Report > Introduction # Values and Types
Errors in Haskell are semantically equivalent to ⊥ (“bottom”). Technically, they are indistinguishable from nontermination, so the language includes no mechanism for detecting or acting upon errors.
To be clear, undefined is intended to be a way to insert ⊥ into your program, and given that (as shang noted) undefined is defined in terms of error, there is, therefore, "no mechanism for detecting or acting upon undefined".
Although semantically speaking Ingo's answer is correct, if you're using GHC, there is a way using a couple of "unsafe" functions that, although not quite perfect as if you pass it a computation of type IO a which contains an exception it will return True, works. It's a bit of a cheat though :).
import Control.Exception
import System.IO.Unsafe
import Unsafe.Coerce
isUndefined :: a -> Bool
isUndefined x = unsafePerformIO $ catch ((unsafeCoerce x :: IO ()) >> return False) (const $ return True :: SomeException -> IO Bool)
I know this is horrible, but none the less it works. It won't detect non termination though ;)
In a programming language that is purely functional (like Haskell) or where you are only using it in a functional way (eg clojure); suppose you have a list/seq/enumerable (of unknown size) of integers and you want to produce a new list/seq/enumerable that contains the differences between successive items, how would you do it?
What I did previously in C# was to fold over the list and keep a state object as the aggregating value which recorded the 'previous' item so that you could do a diff on it from the current item. The the result list also had to go into the state object (which is a problem for a list of unknown size)
What is the general approach for doing this kind of thing functionally?
In Haskell you would probably just use some higher order function like zipWith. So you could do something like this:
diff [] = []
diff ls = zipWith (-) (tail ls) ls
Note how I handled the [] case separately--if you pass an empty list to tail you get a runtime error, and Haskellers really, really hate runtime errors. However, in my function, I'm guaranteed the ls is not empty, so using tail is safe. (For reference, tail just returns everything except the first item of the list. It's the same as cdr in Scheme.)
This just takes the list and its tail and combine all of the items using the (-) function.
Given a list [1,2,3,4], this would go something like this:
zipWith (-) [2,3,4] [1,2,3,4]
[2-1, 3-2, 4-3]
[1,1,1]
This is a common pattern: you can compute surprisingly many things by cleverly using standard higher-order functions. You are also not afraid of passing in a list and its own tail to a function--there is no mutation to mess you up and the compiler is often very clever about optimizing code like this.
Coincidentally, if you like list comprehensions and don't mind enabling the ParallelListComp extension, you could write zipWith (-) (tail ls) ls like this:
[b - a | a <- ls | b <- tail ls]
In clojure, you can use the map function:
(defn diff [coll]
(map - coll (rest coll)))
You can also pattern-match consecutive elements. In OCaml:
let rec diff = function
| [] | [_] -> []
| x::(y::_ as t) -> (y-x) :: diff t
And the usual tail-recursive version:
let diff =
let rec aux accu = function
| [] | [_] -> List.rev accu
| x::(y::_ as t) -> aux ((y-x)::accu) t in
aux []
For another Clojure solution, try
(map (fn [[a b]] (- b a))
(partition 2 1 coll))
Just to complement the idiomatic answers: it is possible in functional languages to process a list using a state object, just like you described. It is definitely discouraged in cases when simpler solutions exist, but possible.
The following example implements iteration by computing the new 'state' and passing it recursively to self.
(defn diffs
([coll] (diffs (rest coll) (first coll) []))
([coll prev acc]
(if-let [s (seq coll)]
; new 'state': rest of the list, head as the next 'prev' and
; diffs with the next difference appended at the end:
(recur (rest s) (first s) (conj acc (- (first s) prev)))
acc)))
The state is represented in in the previous (prev) value from the list, the diffs computed so far (acc) and the rest of the list left to process (coll).
This is how it can be done in Haskell without any standard functions, just recursion and pattern matching:
diff :: [Int] -> [Int]
diff [] = []
diff (x:xs) = hdiff xs x
hdiff :: [Int] -> Int -> [Int]
hdiff [] p = []
hdiff (x:xs) p = (x-p):hdiff xs x
OK, here are two C# versions for those who are interested:
First, the bad version, or the one the previously imperative (in other words I) might try to write as functional programming is learnt:
private static IEnumerable<int> ComputeUsingFold(IEnumerable<int> source)
{
var seed = new {Result = new List<int>(), Previous = 0};
return source.Aggregate(
seed,
(aggr, item) =>
{
if (aggr.Result.Count > 0)
{
aggr.Result.Add(item - aggr.Previous);
}
return new { Result = aggr.Result, Previous = item };
}).Result;
}
Then a better version using the idioms expressed in other answers in this question:
private static IEnumerable<int> ComputeUsingMap(IEnumerable<int> source)
{
return source.Zip(source.Skip(1), (f, s) => s - f);
}
I am not sure, but it might be true that in this version the source enumerable is iterated over twice.