Haskell, Monads, Stack Space, Laziness -- how to structure code to be lazy? - haskell

A contrived example, but the below code demonstrates a class of problems I keep running into while learning Haskell.
import Control.Monad.Error
import Data.Char (isDigit)
countDigitsForList [] = return []
countDigitsForList (x:xs) = do
q <- countDigits x
qs <- countDigitsForList xs
return (q:qs)
countDigits x = do
if all isDigit x
then return $ length x
else throwError $ "Bad number: " ++ x
t1 = countDigitsForList ["1", "23", "456", "7890"] :: Either String [Int]
t2 = countDigitsForList ["1", "23", "4S6", "7890"] :: Either String [Int]
t1 gives me the right answer and t2 correctly identifies the error.
Seems to me that, for a sufficiently long list, this code is going to run out of stack space because it runs inside of a monad and at each step it tries to process the rest of the list before returning the result.
An accumulator and tail recursion seems like it may solve the problem but I repeatedly read that neither are necessary in Haskell because of lazy evaluation.
How do I structure this kind of code into one which won't have a stack space problem and/or be lazy?

How do I structure this kind of code into one which won't have a stack space problem and/or be lazy?
You can't make this function process the list lazily, monads or no. Here's a direct translation of countDigitsForList to use pattern matching instead of do notation:
countDigitsForList [] = return []
countDigitsForList (x:xs) = case countDigits x of
Left e -> Left e
Right q -> case countDigitsForList xs of
Left e -> Left e
Right qs -> Right (q:qs)
It should be easier to see here that, because a Left at any point in the list makes the whole thing return that value, in order to determine the outermost constructor of the result, the entire list must be traversed and processed; likewise for processing each element. Because the final result potentially depends on the last character in the last string, this function as written is inherently strict, much like summing a list of numbers.
Given that, the thing to do is ensure that the function is strict enough to avoid building up a huge unevaluated expression. A good place to start for information on that is discussions on the difference between foldr, foldl and foldl'.
An accumulator and tail recursion seems like it may solve the problem but I repeatedly read that neither are necessary in Haskell because of lazy evaluation.
Both are unnecessary when you can instead generate, process, and consume a list lazily; the simplest example here being map. For a function where that's not possible, strictly-evaluated tail recursion is precisely what you want.

camccann is right that the function is inherently strict. But that doesn't mean that it can't run in constant stack!
countDigitsForList xss = go xss []
where go (x:xs) acc = case countDigits x of
Left e -> Left e
Right q -> go xs (q:acc)
go [] acc = reverse acc
This accumulating parameter version is a partial cps transform of camccann's code, and I bet that you could get the same result by working over a cps-transformed either monad as well.
Edited to take into account jwodder's correction regarding reverse. oops. As John L notes an implicit or explicit difference list would work as well...

Related

Should I use foldr or foldl' to build a String in Haskell?

Assuming that foldr should be used to build data structures and foldl' if the result is supposed to be a single value, I'm not sure what to use for Strings. On the one hand it is a data structure, but on the other hand a String is usually only used as a whole, meaning that short-circuiting isn't very relevant. To answer this question, it's probably crucial how functions like putStrLn use Strings, isn't it? Or am I on a completely wrong track?
EDIT: So I want my function to turn something like [(5, 's'), (1, ’a'), (3, 'd')] into sssssaddd (following an exercise from https://en.m.wikibooks.org/wiki/Haskell) and I have to choose one from those two functions:
decode :: [(Int, Char)] -> String
decode = foldr ff []
where
ff (l, c) xs = replicate l c ++ xs
decode' :: [(Int, Char)] -> String
decode' = foldl' ff []
where
ff xs (l, c) = xs ++ replicate l c
You're on the completely wrong track. The only correct way to decide what fold to use involves knowing what the fold will do. Knowing only the output type is not enough.
A String is just an alias of [Char], so a list. If you use foldl or foldl' with append ((++)) or cons ((:)), it will fold with foldl as:
(("hell" ++ "o") ++ " ") ++ "world"
Concatenating takes linear time in the length of the left operand. So if you eacn time concatenate a single character, then constructing a string of n characters will take O(n2) time.
Another problem that might arise if you have an infinite list, in that case, foldl will get stuck in an infinite loop. Whereas in foldr you can "consume" the output if that happens in a generator-like approach.
But as #chepner says, using Strings for large amounts of text is not effective: it requires a cons per character, so it blows up in memory. Text allows one to have a more compact and efficient way to store text, in an unboxed type, and often the algorithms are more efficient than what one can do with a String.

Efficient string swapping in Haskell

I'm trying to solve a problem for a functional programming exercise in Haskell. I have to implement a function such that, given a string with an even number of characters, the function returns the same string with character pairs swapped.
Like this:
"helloworld" -> "ehllworodl"
This is my current implementation:
swap :: String -> String
swap s = swapRec s ""
where
swapRec :: String -> String -> String
swapRec [] result = result
swapRec (x:y:xs) result = swapRec xs (result++[y]++[x])
My function returns the correct results, however the programming exercise is timed, and It seems like my code is running too slowly.
Is there something I could do to make my code run faster, or I am following the wrong approach to the problem ?
Yes. If you use (++) :: [a] -> [a] -> [a], then this takes linear time in the number of elements of the first list you want to concatenate. Since result can be large, this will result in a ineffeciency: the algorithm is then O(n2).
You however do not need to construct the result with an accumulator. You can return a list, and do the processing of the remaining elements with a recursive call, like:
swap :: [a] -> [a]
swap [] = []
swap [x] = [x]
swap (x:y:xs) = y : x : swap xs
The above also uncovered a problem with the implementation: if the list had an odd length, then the function would have crashed. Here in the second case, we handle a list with one element by returning that list (perhaps you need to modify this according to the specifications).
Furthermore here we can benefit of Haskell's laziness: if we have a large list, want to pass it through the swap function, but are only interested in the first five elements, then we will not calculate the entire list.
We can also process all kinds of list with the above function: a list of numbers, of strings, etc.
Note that (++) itself is not inherently bad: if you need to concatenate, it is of course the most efficient way to do this. The problem is that you here in every recursive step will concatenate again, and the left list is growing each time.
Affixing something at the end of the accumulator passed into a recursive call
swapRec (x:y:xs) resultSoFar = swapRec xs
(resultSoFar ++ [y] ++ [x])
is the same as prepending it at the start of the result returned from the recursive call:
swapRec (x:y:xs) = [y] ++ [x] ++ swapRec xs
You will have to amend your function accordingly throughout.
This is known as guarded recursion. What you were using is known as tail recursion (a left fold).
The added benefit is that it will now be on-line (i.e., taking O(1) time per each processed element). You were creating the (++) nesting on the left which leads to quadratic behaviour, as discussed e.g. here.

Functionally solving questions: how to use Haskell?

I am trying to solve one of the problem in H99:
Split a list into two parts; the length of the first part is given.
Do not use any predefined predicates.
Example:
> (split '(a b c d e f g h i k) 3)
( (A B C) (D E F G H I K))
And I can quickly come with a solution:
split'::[a]->Int->Int->[a]->[[a]]
split' [] _ _ _ = []
split' (x:xs) y z w = if y == z then [w,xs] else split' xs y (z+1) (w++[x])
split::[a]->Int->[[a]]
split x y = split' x y 0 []
My question is that what I am doing is kind of just rewriting the loop version in a recursion format. Is this the right way you do things in Haskell? Isn't it just the same as imperative programming?
EDIT: Also, how do you generally avoid the extra function here?
It's convenient that you can often convert an imperative solution to Haskell, but you're right, you do usually want to find a more natural recursive statement. For this one in particular, reasoning in terms of base case and inductive case can be very helpful. So what's your base case? Why, when the split location is 0:
split x 0 = ([], x)
The inductive case can be built on that by prepending the first element of the list onto the result of splitting with n-1:
split (x:xs) n = (x:left, right)
where (left, right) = split xs (n-1)
This may not perform wonderfully (it's probably not as bad as you'd think) but it illustrates my thought process when I first encounter a problem and want to approach it functionally.
Edit: Another solution relying more heavily on the Prelude might be:
split l n = (take n l, drop n l)
It's not the same as imperative programming really, each function call avoids any side effects, they're just simple expressions. But I have a suggestion for your code
split :: Int -> [a] -> ([a], [a])
split p xs = go p ([], xs)
where go 0 (xs, ys) = (reverse xs, ys)
go n (xs, y:ys) = go (n-1) (y : xs, ys)
So how we've declared that we're only returning two things ([a], [a]) instead of a list of things (which is a bit misleading) and that we've constrained our tail recursive call to be in local scope.
I'm also using pattern matching, which is a more idiomatic way to write recursive functions in Haskell, when go is called with a zero, then the first case is run. It's more pleasant generally to write recursive functions that go down rather than up since you can use pattern matching rather than if statements.
Finally this is more efficient since ++ is linear in the length of the first list, which means that the complexity of your function is quadratic rather than linear. This method is also tail recursive unlike Daniel's solution, which is important for handling any large lists.
TLDR: Both versions are functional style, avoiding mutation, using recursion instead of loops. But the version I've presented is a little more Haskell-ish and slightly faster.
A word on tail recursion
This solution uses tail recursion which isn't always essential in Haskell but in this case is helpful when you use the resulting lists, but at other times is actually a bad thing. For example, map isn't tail recursive, but if it was you couldn't use it over infinite lists!
In this case, we can use tail recursion, since an integer is always finite. But, if we only use the first element of the list, Daniel's solution is much faster, since it produces the list lazily. On the other hand, if we use the whole list, my solution is much faster.
split'::[a]->Int->([a],[a])
split' [] _ = ([],[])
split' xs 0 = ([],xs)
split' (x:xs) n = (x:(fst splitResult),snd splitResult)
where splitResult = split' xs (n-1)
It seems you have already shown an example of a better solution.
I would recommend you read SICP. Then you come to the conclusion that the extra function is normal. There's also widely used approach to hide functions in the local area. The book may seem boring to you but in the early chapters she will get used to the functional approach in solving problems.
There are tasks in which the recursive approach is more necessary. But for example if you use tail recursion (which is so often praised without cause) then you will notice that this is just the usual iteration. Often with "extra-function" which hide iteration variable (oh.. word variable is not very appropriate, likely argument).

Compare the head of a haskell string?

Struggling to learn Haskell, how does one take the head of a string and compare it with the next character untill it finds a character thats note true?
In pseudo code I'm trying to:
while x == 'next char in string' put in new list to be returned
The general approach would be to create a function that recursively evaluates the head of the string until it finds the false value or reaches the end.
To do that, you would need to
understand recursion (prerequisite: understand recursion) and how to write recursive functions in Haskell
know how to use the head function
quite possibly know how to use list comprehension in Haskell
I have notes on Haskell that you may find useful, but you may well find Yet Another Haskell Tutorial more comprehensive (Sections 3.3 Lists; 3.5 Functions; and 7.8 More Lists would probably be good places to start in order to address the bullet points I mention)
EDIT0:
An example using guards to test the head element and continue only if it the same as the second element:
someFun :: String -> String
someFun[] = []
someFun [x:y:xs]
| x == y = someFun(y:xs)
| otherwise = []
EDIT1:
I sort of want to say x = (newlist) and then rather than otherwise = [] have otherwise = [newlist] if that makes any sense?
It makes sense in an imperative programming paradigm (e.g. C or Java), less so for functional approaches
Here is a concrete example to, hopefully, highlight the different between the if,then, else concept the quote suggests and what is happening in the SomeFun function:
When we call SomeFun [a,a,b,b] we match this to SomeFun [x:y:xs] and since x is 'a', and y is 'a', and x==y, then SomeFun [a,a,b,b] = SomeFun [a,b,b], which again matches SomeFun [x:y:xs] but condition x==y is false, so we use the otherwise guard, and so we get SomeFun [a,a,b,b] = SomeFun [a,b,b] = []. Hence, the result of SomeFun [a,a,b,b] is [].
So where did the data go? .Well, I'll hold my hands up and admit a bug in the code, which is now a feature I'm using to explain how Haskell functions work.
I find it helpful to think more in terms of constructing mathematical expressions rather than programming operations. So, the expression on the right of the = is your result, and not an assignment in the imperative (e.g. Java or C sense).
I hope the concrete example has shown that Haskell evaluates expressions using substitution, so if you don't want something in your result, then don't include it in that expression. Conversely, if you do want something in the result, then put it in the expression.
Since your psuedo code is
while x == 'next char in string' put in new list to be returned
I'll modify the SomeFun function to do the opposite and let you figure out how it needs to be modified to work as you desire.
someFun2 :: String -> String
someFun2[] = []
someFun2 [x:y:xs]
| x == y = []
| otherwise = x : someFun(y:xs)
Example Output:
SomeFun2 [a,a,b,b] = []
SomeFun2 [a,b,b,a,b] = [a]
SomeFun2 [a,b,a,b,b,a,b] = [a,b,a]
SomeFun2 [a,b,a,b] = [a,b,a,b]
(I'd like to add at this point, that these various code snippets aren't tested as I don't have a compiler to hand, so please point out any errors so I can fix them, thanks)
There are two typical ways to get the head of a string. head, and pattern matching (x:xs).
In fact, the source for the head function shows is simply defined with pattern matching:
head (x:_) = x
head _ = badHead
I highly recommend you check out Learn You a Haskell # Pattern Matching. It gives this example, which might help:
tell (x:y:[]) = "The list has two elements: " ++ show x ++ " and " ++ show y
Notice how it pattern matched against (x:y:[]), meaning the list must have two elements, and no more. To match the first two elements in a longer list, just swap [] for a variable (x:y:xs)
If you choose the pattern matching approach, you will need to use recursion.
Another approach is the zip xs (drop 1 xs). This little idiom creates tuples from adjacent pairs in your list.
ghci> let xs = [1,2,3,4,5]
ghci> zip xs (drop 1 xs)
[(1,2),(2,3),(3,4),(4,5)]
You could then write a function that looks at these tuples one by one. It would also be recursive, but it could be written as a foldl or foldr.
For understanding recursion in Haskell, LYAH is again highly recommended:
Learn You a Haskell # Recursion

Haskell guards not being met

test :: [String] -> [String]
test = foldr step []
where step x ys
| elem x ys = x : ys
| otherwise = ys
I am trying to build a new list consisting of all the distinct strings being input. My test data is:
test ["one", "one", "two", "two", "three"]
expected result:
["one", "two", "three"]
I am new to Haskell, and I am sure that I am missing something very fundamental and obvious, but have run out of ways to explore this. Could you provide pointers to where my thinking is deficient?
The actual response is []. It seems that the first guard condition is never met (if I replace it with True, the original list is replicated), so the output list is never built.
My understanding was that the fold would accumulate the result of step on each item of the list, adding it to the empty list. I anticipated that step would test each item for its inclusion in the output list (the first element tested not being there) and would add anything that was not already included to the output list. Obviously not :-)
Your reasoning is correct: you just need to switch = x : ys and = ys so that you add the x when it's not an element of ys. Also, Data.List.nub does this exact thing.
Think about it: your code is saying "when x is in the remainder, prepend x to the result", i.e. creating a duplicate. You just need to change it to "when x is not in the remainder, prepend x to the result" and you get the correct function.
This function differs from Data.List.nub in an important way: this function is more strict. Thus:
test [1..] = _|_ -- infinite loop (try it)
nub [1..] = [1..]
nub gives the answer correctly for infinite lists -- this means that it doesn't need the whole list to start computing results, and thus it is a nice player in the stream processing game.
The reason it is strict is because elem is strict: it searches the whole list (presuming it doesn't find a match) before it returns a result. You could write that like this:
nub :: (Eq a) => [a] -> [a]
nub = go []
where
go seen [] = []
go seen (x:xs) | x `elem` seen = go seen xs
| otherwise = x : go (x:seen) xs
Notice how seen grows like the output so far, whereas yours grows like the remainder of the output. The former is always finite (starting at [] and adding one at a time), whereas the latter may be infinite (eg. [1..]). So this variant can yield elements more lazily.
This would be faster (O(n log n) instead of O(n^2)) if you used a Data.Set instead of a list for seen. But it adds an Ord constraint.

Resources