Should I use foldr or foldl' to build a String in Haskell? - string

Assuming that foldr should be used to build data structures and foldl' if the result is supposed to be a single value, I'm not sure what to use for Strings. On the one hand it is a data structure, but on the other hand a String is usually only used as a whole, meaning that short-circuiting isn't very relevant. To answer this question, it's probably crucial how functions like putStrLn use Strings, isn't it? Or am I on a completely wrong track?
EDIT: So I want my function to turn something like [(5, 's'), (1, ’a'), (3, 'd')] into sssssaddd (following an exercise from https://en.m.wikibooks.org/wiki/Haskell) and I have to choose one from those two functions:
decode :: [(Int, Char)] -> String
decode = foldr ff []
where
ff (l, c) xs = replicate l c ++ xs
decode' :: [(Int, Char)] -> String
decode' = foldl' ff []
where
ff xs (l, c) = xs ++ replicate l c

You're on the completely wrong track. The only correct way to decide what fold to use involves knowing what the fold will do. Knowing only the output type is not enough.

A String is just an alias of [Char], so a list. If you use foldl or foldl' with append ((++)) or cons ((:)), it will fold with foldl as:
(("hell" ++ "o") ++ " ") ++ "world"
Concatenating takes linear time in the length of the left operand. So if you eacn time concatenate a single character, then constructing a string of n characters will take O(n2) time.
Another problem that might arise if you have an infinite list, in that case, foldl will get stuck in an infinite loop. Whereas in foldr you can "consume" the output if that happens in a generator-like approach.
But as #chepner says, using Strings for large amounts of text is not effective: it requires a cons per character, so it blows up in memory. Text allows one to have a more compact and efficient way to store text, in an unboxed type, and often the algorithms are more efficient than what one can do with a String.

Related

Efficient string swapping in Haskell

I'm trying to solve a problem for a functional programming exercise in Haskell. I have to implement a function such that, given a string with an even number of characters, the function returns the same string with character pairs swapped.
Like this:
"helloworld" -> "ehllworodl"
This is my current implementation:
swap :: String -> String
swap s = swapRec s ""
where
swapRec :: String -> String -> String
swapRec [] result = result
swapRec (x:y:xs) result = swapRec xs (result++[y]++[x])
My function returns the correct results, however the programming exercise is timed, and It seems like my code is running too slowly.
Is there something I could do to make my code run faster, or I am following the wrong approach to the problem ?
Yes. If you use (++) :: [a] -> [a] -> [a], then this takes linear time in the number of elements of the first list you want to concatenate. Since result can be large, this will result in a ineffeciency: the algorithm is then O(n2).
You however do not need to construct the result with an accumulator. You can return a list, and do the processing of the remaining elements with a recursive call, like:
swap :: [a] -> [a]
swap [] = []
swap [x] = [x]
swap (x:y:xs) = y : x : swap xs
The above also uncovered a problem with the implementation: if the list had an odd length, then the function would have crashed. Here in the second case, we handle a list with one element by returning that list (perhaps you need to modify this according to the specifications).
Furthermore here we can benefit of Haskell's laziness: if we have a large list, want to pass it through the swap function, but are only interested in the first five elements, then we will not calculate the entire list.
We can also process all kinds of list with the above function: a list of numbers, of strings, etc.
Note that (++) itself is not inherently bad: if you need to concatenate, it is of course the most efficient way to do this. The problem is that you here in every recursive step will concatenate again, and the left list is growing each time.
Affixing something at the end of the accumulator passed into a recursive call
swapRec (x:y:xs) resultSoFar = swapRec xs
(resultSoFar ++ [y] ++ [x])
is the same as prepending it at the start of the result returned from the recursive call:
swapRec (x:y:xs) = [y] ++ [x] ++ swapRec xs
You will have to amend your function accordingly throughout.
This is known as guarded recursion. What you were using is known as tail recursion (a left fold).
The added benefit is that it will now be on-line (i.e., taking O(1) time per each processed element). You were creating the (++) nesting on the left which leads to quadratic behaviour, as discussed e.g. here.

Exchanging multiple pairs of characters in a Haskell string

I'm trying to write a Haskell function that takes a string of pairs of letters, and exchanges the letters of the pair in a string of all letters, but what I've come up with feels awkward and unidiomatic.
I have
swap a b = map (\x-> if x == a then b else if x == b then a else x)
sub n = foldr (.) id (zipWith swap (head <$> splitOn "." n) (last <$> splitOn "." n)) ['A'..'Z']
which works well enough giving
> sub "RB.XD.EU.ZM.IJ"
"ARCXUFGHJIKLZNOPQBSTEVWDYM"
and
> sub "YC.LU.EB.TZ.RB.XD.IJ"
"ARYXBFGHJIKUMNOPQESZLVWDCT"
but I'm new to Haskell and feel like my approach — especially my swap helper function (which I only use here) — is more elaborate than it needs to be.
Is there a better, more idiomatic, approach to this problem; especially one that takes advantage of a language feature, builtin, or library function that I've missed?
Doing a left fold over the substitution list makes the code shorter:
import Data.List
import Data.List.Split
sub = foldl' swap ['A'..'Z'] . splitOn "." . reverse
where
swap az [a,b] = map (\x -> if x == a then b else if x == b then a else x) az
Drop the reverse if you don't care whether EB or RB is swapped first.
If you'd want to replace instead of a swap:
import Data.List
import Data.List.Split
replace needle replacement haystack =
intercalate replacement (splitOn needle haystack)
rep = foldl' replace' ['A'..'Z'] . splitOn "."
where
replace' az [a,b] = replace [a] [b] az
I'd break the problem down a bit more. It's important to remember that shorter code is not necessarily the best code. Your implementation works, but it's too compact for me to quickly understand. I'd recommend something more like
import Data.Maybe (mapMaybe)
swap = undefined -- Your current implementation is fine,
-- although you could rewrite it using
-- a local function instead of a lambda
-- |Parses the swap specification string into a list of
-- of characters to swap as tuples
parseSwap :: String -> [(Char, Char)]
parseSwap = mapMaybe toTuple . splitOn "."
where
toTuple (first:second:_) = Just (first, second)
toTuple _ = Nothing
-- |Takes a list of characters to swap and applies it
-- to a target string
sub :: [(Char, Char)] -> String -> String
sub charsToSwap = foldr (.) id (map (uncurry swap) charsToSwap)
The equivalent to your sub function would be
sub swapSpec = foldr (.) id (map (uncurry swap) $ parseSwap swapSpec)
But the former is probably easier to understand for most haskellers. You could also do more transformations more easily to your swap specification as a list of tuples making it more powerful overall. You essentially decouple the representation of the swap specification and the actual swapping. Even for small programs like this it's important to maintain loose coupling so that you develop a habit for when you write larger programs!
This implementation also avoids recalculating splitOn for the swap specification string.
(I wasn't able to execute this code because I'm on a computer without Haskell installed, if anyone notices any bugs please edit to fix.) Tried it out in FPComplete, output matches #raxacoricofallapatorius'.
Some things I noticed from reading your code (I haven't tried to rewrite it). My first suggestion involves separation of concerns:
I'm trying to write a Haskell function that takes a string of pairs of letters, and exchanges the letters of the pair in a string of all letters
That means the a more natural type for your function would be:
sub :: [(Char, Char)] -> String -> String
Or, using Data.Map for more efficient lookups:
sub :: Map Char Char -> String -> String
Which is a lot more precise than taking a string with dot-separated pairs. You can then generate the associations between Chars in a separate step:
parseCharPairs :: String -> Map Char Char
Ideally you should also handle invalid inputs (e.g. AB.CDE) and empty input strings.
my swap helper function (which I only use here)
Then you probably should define it in a where clause. I would also avoid the name swap, as there is a relatively common function in Data.Tuple with the same name (swapLetters might be a nice choice).
sub n = foldr (.) id -- etc.
foldr (.) id (fmap f xs) y is the same thing as foldr f y xs. I'm almost certain this can be rewritten in a simpler way.

How lazy is Haskell's `++`?

I'm curious how I should go about improving the performance of a Haskell routine that finds the lexicographically minimal cyclic rotation of a string.
import Data.List
swapAt n = f . splitAt n where f (a,b) = b++a
minimumrotation x = minimum $ map (\i -> swapAt i x) $ elemIndices (minimum x) x
I'd imagine that I should use Data.Vector rather than lists because Data.Vector provides in-place operations, probably just manipulating some indices into the original data. I shouldn't actually need to bother tracking the indices myself to avoid excess copying, right?
I'm curious how the ++ impact the optimization though. I'd imagine it produces a lazy string thunk that never does the appending until the string gets read that far. Ergo, the a should never actually be appended onto the b whenever minimum can eliminate that string early, like because it begins with some very later letter. Is this correct?
xs ++ ys adds some overhead in all the list cells from xs, but once it reaches the end of xs it's free — it just returns ys.
Looking at the definition of (++) helps to see why:
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
i.e., it has to "re-build" the entire first list as the result is traversed. This article is very helpful for understanding how to reason about lazy code in this way.
The key thing to realise is that appending isn't done all at once; a new linked list is incrementally built by first walking through all of xs, and then putting ys where the [] would go.
So, you don't have to worry about reaching the end of b and suddenly incurring the one-time cost of "appending" a to it; the cost is spread out over all the elements of b.
Vectors are a different matter entirely; they're strict in their structure, so even examining just the first element of xs V.++ ys incurs the entire overhead of allocating a new vector and copying xs and ys to it — just like in a strict language. The same applies to mutable vectors (except that the cost is incurred when you perform the operation, rather than when you force the resulting vector), although I think you'd have to write your own append operation with those anyway. You could represent a bunch of appended (immutable) vectors as [Vector a] or similar if this is a problem for you, but that just moves the overhead to when you flattening it back into a single Vector, and it sounds like you're more interested in mutable vectors.
Try
minimumrotation :: Ord a => [a] -> [a]
minimumrotation xs = minimum . take len . map (take len) $ tails (cycle xs)
where
len = length xs
I expect that to be faster than what you have, though index-juggling on an unboxed Vector or UArray would probably be still faster. But, is it really a bottleneck?
If you're interested in fast concatenation and a fast splitAt, use Data.Sequence.
I've made some stylistic modifications to your code, to make it look more like idiomatic Haskell, but the logic is exactly the same, except for a few conversions to and from Seq:
import qualified Data.Sequence as S
import qualified Data.Foldable as F
minimumRotation :: Ord a => [a] -> [a]
minimumRotation xs = F.toList
. F.minimum
. fmap (`swapAt` xs')
. S.elemIndicesL (F.minimum xs')
$ xs'
where xs' = S.fromList xs
swapAt n = f . S.splitAt n
where f (a,b) = b S.>< a

Character & strings

I am new in haskell and I have a problem (aka homework).
So, I have a list with a tuple – a string and an integer:
xxs :: [([Char], Integer)]
I need to know how many of the strings in xxs start with a given character.
Let me exemplify:
foo 'A' [("Abc",12),("Axx",34),("Zab",56)]
Output: 2
foo 'B' [("Abc",12),("Bxx",34),("Zab",56)]
Output: 1
My best attempt so far:
foo c xxs = length (foldl (\acc (x:xs) -> if x == c then c else x) [] xxs)
But, of course, there's something VERY wrong inside the lambda expression.
Any suggestion?
Thanks.
You can use a fold, but I would suggest another way, which breaks the problem in three steps:
transform the input list to the list of first letters. You can use map for this
filter out all elements not equal to the given Char
take the length of the remaining list
Obviously the first step is the hardest, but not as hard as it looks. For doing it you just have to combine somehow the functions fst and head, or even easier, map twice.
You can write this as a simple one-liner, but maybe you should start with a let:
foo c xxs = let strings = map ...
firstLetters = map ...
filteredLetters = filter ...
in length ...
There are a few problems with your attempt:
You plan to use foldl to construct a shorter list and then to take its length. While it is possible, filter function is much better suited for that task as #landei suggests
foldl can be used to accumulate the length without constructing a shorter list. See the answer of #WuXingbo - his answer is incorrect, but once you realize that length is not needed at all with his approach, it should be easy for you to come with correct solution.
Somewhat contradictory to common sense, in a lazy language foldr is faster and uses less memory than foldl. You should ask your teacher why.
I would rewrite foo as
foo :: Char -> [(String, Int)] -> Int
foo c = length . filter ((==c).head.fst)
fst fetches the first element of a two-element tuple.
(==c) is a one-argument function that compares its input with c (see http://www.haskell.org/tutorial/functions.html paragraph 3.2.1 for better explanation).

Haskell, Monads, Stack Space, Laziness -- how to structure code to be lazy?

A contrived example, but the below code demonstrates a class of problems I keep running into while learning Haskell.
import Control.Monad.Error
import Data.Char (isDigit)
countDigitsForList [] = return []
countDigitsForList (x:xs) = do
q <- countDigits x
qs <- countDigitsForList xs
return (q:qs)
countDigits x = do
if all isDigit x
then return $ length x
else throwError $ "Bad number: " ++ x
t1 = countDigitsForList ["1", "23", "456", "7890"] :: Either String [Int]
t2 = countDigitsForList ["1", "23", "4S6", "7890"] :: Either String [Int]
t1 gives me the right answer and t2 correctly identifies the error.
Seems to me that, for a sufficiently long list, this code is going to run out of stack space because it runs inside of a monad and at each step it tries to process the rest of the list before returning the result.
An accumulator and tail recursion seems like it may solve the problem but I repeatedly read that neither are necessary in Haskell because of lazy evaluation.
How do I structure this kind of code into one which won't have a stack space problem and/or be lazy?
How do I structure this kind of code into one which won't have a stack space problem and/or be lazy?
You can't make this function process the list lazily, monads or no. Here's a direct translation of countDigitsForList to use pattern matching instead of do notation:
countDigitsForList [] = return []
countDigitsForList (x:xs) = case countDigits x of
Left e -> Left e
Right q -> case countDigitsForList xs of
Left e -> Left e
Right qs -> Right (q:qs)
It should be easier to see here that, because a Left at any point in the list makes the whole thing return that value, in order to determine the outermost constructor of the result, the entire list must be traversed and processed; likewise for processing each element. Because the final result potentially depends on the last character in the last string, this function as written is inherently strict, much like summing a list of numbers.
Given that, the thing to do is ensure that the function is strict enough to avoid building up a huge unevaluated expression. A good place to start for information on that is discussions on the difference between foldr, foldl and foldl'.
An accumulator and tail recursion seems like it may solve the problem but I repeatedly read that neither are necessary in Haskell because of lazy evaluation.
Both are unnecessary when you can instead generate, process, and consume a list lazily; the simplest example here being map. For a function where that's not possible, strictly-evaluated tail recursion is precisely what you want.
camccann is right that the function is inherently strict. But that doesn't mean that it can't run in constant stack!
countDigitsForList xss = go xss []
where go (x:xs) acc = case countDigits x of
Left e -> Left e
Right q -> go xs (q:acc)
go [] acc = reverse acc
This accumulating parameter version is a partial cps transform of camccann's code, and I bet that you could get the same result by working over a cps-transformed either monad as well.
Edited to take into account jwodder's correction regarding reverse. oops. As John L notes an implicit or explicit difference list would work as well...

Resources