Project Euler 8 - I don't understand it - haskell

I looked up for a solution in Haskell for the 8th Euler problem, but I don't quite understand it.
import Data.List
import Data.Char
euler_8 = do
str <- readFile "number.txt"
print . maximum . map product
. foldr (zipWith (:)) (repeat [])
. take 13 . tails . map (fromIntegral . digitToInt)
. concat . lines $ str
Here is the link for the solution and here you can find the task.
Could anyone explain me the solution one by one?

Reading the data
readFile reads the file "number.txt". If we put a small 16 digit number in a file called number.txt
7316
9698
8586
1254
Runing
euler_8 = do
str <- readFile "number.txt"
print $ str
Results in
"7316\n9698\n8586\n1254"
This string has extra newline characters in it. To remove them, the author splits the string into lines.
euler_8 = do
str <- readFile "number.txt"
print . lines $ str
The result no longer has any '\n' characters, but is a list of strings.
["7316","9698","8586","1254"]
To turn this into a single string, the strings are concatenated together.
euler_8 = do
str <- readFile "number.txt"
print . concat . lines $ str
The concatenated string is a list of characters instead of a list of numbers
"7316969885861254"
Each character is converted into an Int by digitToInt then converted into an Integer by fromInteger. On 32 bit hardware using a full-sized Integer is important since the product of 13 digits could be larger than 2^31-1. This conversion is mapped onto each item in the list.
euler_8 = do
str <- readFile "number.txt"
print . map (fromIntegral . digitToInt)
. concat . lines $ str
The resulting list is full of Integers.
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4]
Subsequences
The author's next goal is to find all of the 13 digit runs in this list of integers. tails returns all of the sublists of a list, starting at any position and running till the end of the list.
euler_8 = do
str <- readFile "number.txt"
print . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
This results in 17 lists for our 16 digit example. (I've added formatting)
[
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,6,9,8,8,5,8,6,1,2,5,4],
[9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,8,8,5,8,6,1,2,5,4],
[9,8,8,5,8,6,1,2,5,4],
[8,8,5,8,6,1,2,5,4],
[8,5,8,6,1,2,5,4],
[5,8,6,1,2,5,4],
[8,6,1,2,5,4],
[6,1,2,5,4],
[1,2,5,4],
[2,5,4],
[5,4],
[4],
[]
]
The author is going to pull a trick where we rearrange these lists to read off 13 digit long sub lists. If we look at these lists left-aligned instead of right-aligned we can see the sub sequences running down each column.
[
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,6,9,8,8,5,8,6,1,2,5,4],
[9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,8,8,5,8,6,1,2,5,4],
[9,8,8,5,8,6,1,2,5,4],
[8,8,5,8,6,1,2,5,4],
[8,5,8,6,1,2,5,4],
[5,8,6,1,2,5,4],
[8,6,1,2,5,4],
[6,1,2,5,4],
[1,2,5,4],
[2,5,4],
[5,4],
[4],
[]
]
We only want these columns to be 13 digits long, so we only want to take the first 13 rows.
[
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,6,9,8,8,5,8,6,1,2,5,4],
[9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,8,8,5,8,6,1,2,5,4],
[9,8,8,5,8,6,1,2,5,4],
[8,8,5,8,6,1,2,5,4],
[8,5,8,6,1,2,5,4],
[5,8,6,1,2,5,4],
[8,6,1,2,5,4],
[6,1,2,5,4],
[1,2,5,4]
]
foldr (zipWith (:)) (repeat []) transposes a list of lists (explaining it belongs to perhaps another question). It discards the parts of the rows longer than the shortest row.
euler_8 = do
str <- readFile "number.txt"
print . foldr (zipWith (:)) (repeat [])
. take 13 . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
We are now reading the sub-sequences across the lists as usual
[
[7,3,1,6,9,6,9,8,8,5,8,6,1],
[3,1,6,9,6,9,8,8,5,8,6,1,2],
[1,6,9,6,9,8,8,5,8,6,1,2,5],
[6,9,6,9,8,8,5,8,6,1,2,5,4]
]
The problem
We find the product of each of the sub-sequences by mapping product on to them.
euler_8 = do
str <- readFile "number.txt"
print . map product
. foldr (zipWith (:)) (repeat [])
. take 13 . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
This reduces the lists to a single number each
[940584960,268738560,447897600,1791590400]
From which we must find the maximum.
euler_8 = do
str <- readFile "number.txt"
print . maximum . map product
. foldr (zipWith (:)) (repeat [])
. take 13 . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
The answer is
1791590400

If you're not familiar with the functions used, the first thing you should do is examine the types of each function. Since this is function composition, you apply from inside out (i.e. operations occur right to left, bottom to top when reading). We can walk through this line by line.
Starting from the last line, we'll first examine the types.
:t str
str :: String -- This is your input
:t lines
lines :: String -> [String] -- Turn a string into an array of strings splitting on new line
:t concat
concat :: [[a]] -> [a] -- Merge a list of lists into a single list (hint: type String = [Char])
Since type String = [Char] (so [String] is equivalent to [[Char]]), this line is converting the multi-line number into a single array of number characters. More precisely, it first creates an array of strings based on the full string. That is, one string per new line. It then merges all of these lines (now containing only number characters) into a single array of characters (or a single String).
The next line takes this new String as input. Again, let's observe the types:
:t digitToInt
digitToInt :: Char -> Int -- Convert a digit char to an int
:t fromIntegral
fromIntegral :: (Num b, Integral a) => a -> b -- Convert integral to num type
:t map
map :: (a -> b) -> [a] -> [b] -- Perform a function on each element of the array
:t tails
tails :: [a] -> [[a]] -- Returns all final segments of input (see: http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-List.html#v:tails)
:t take
take :: Int -> [a] -> [a] -- Return the first n values of the list
If we apply these operations to our string current input, the first thing that happens is we map the composed function of (fromIntegral . digitToInt) over each character in our string. What this does is turn our string of digits into a list of number types. EDIT As pointed out below in the comments, the fromIntegral in this example is to prevent overflow on 32-bit integer types. Now that we have converted our string into actual numeric types, we start by running tails on this result. Since (by the problem statement) all values must be adjacent and we know that all of the integers are non-negative (by virtue of being places of a larger number), we take only the first 13 elements since we want to ensure our multiplication is groupings of 13 consecutive elements. How this works is difficult to understand without considering the next line.
So, let's do a quick experiment. After converting our string into numeric types, we now have a big list of lists. This is actually kind of hard to think about what we actually have here. For sake of understanding, the contents of the list are not very important. What is important is its size. So let's take a look at an artificial example:
(map length . take 13 . tails) [1..1000]
[1000,999,998,997,996,995,994,993,992,991,990,989,988]
You can see what we have here is a big list of 13 elements. Each element is a list of size 1000 (i.e. the full dataset) down to 988 in descending order. So this is what we currently have for input into the next line which is, arguably, the most difficult-- yet most important-- line to understand. Why understanding this is important should become clear as we walk through the next line.
:t foldr
foldr :: (a -> b -> b) -> b -> [a] -> b -- Combine values into a single value
:t zipWith
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c] -- Generalization of zip
:t (:)
(:) :: a -> [a] -> [a] -- Cons operator. Add element to list
:t repeat
repeat :: a -> [a] -- Infinite list containing specified value
Remember how I mentioned we had a list of 13 elements before (of varying-sized lists)? This is important now. The line is going to iterate over that list and apply (zipWith (:)) to it. The (repeat []) is such that each time zipWith is called on a subsequence, it starts with an empty list as its base. This allows us to construct a list of lists containing our adjacent subsequences of length 13.
Finally, we get to the last line which is pretty easy. That said, we should still be mindful of our types
:t product
product :: Num a => [a] -> a -- Multiply all elements of a list together and return result
:t maximum
maximum :: Ord a => [a] -> a -- Return maximum element in the list
The first thing we do is map the product function over each subsequence. When this has completed we end up with a list of numeric types (hey, we finally don't have a list of lists anymore!). These values are the products of each subsequence. Finally, we apply the maximum function which returns only the largest element in the list.

EDIT: I found out later what the foldr expression was for. (See comments bellow my answer).
I think that this could be expressed in different way - You can simply add a guard at the end of the list.
My verbose version of that solution would be:
import Data.List
import Data.Char
euler_8 = do
let len = 13
let str1 = "123456789\n123456789"
-- Join lines
let str2 = concat (lines str1)
-- Transform the list of characters into a list of numbers
let lst1 = map (fromIntegral . digitToInt) str2
-- EDIT: Add a guard at the end of list
let lst2 = lst1 ++ [-1]
-- Get all tails of the list of digits
let lst3 = tails lst2
-- Get first 13 digits from each tail
let lst4 = map (take len) lst3
-- Get a list of products
let prod = map product lst4
-- Find max product
let m = maximum prod
print m

Related

Haskell - Exclude lists based on a test in a nested list comprehension

I want to create a series of possible equations based on a general specification:
test = ["12", "34=", "56=", "78"]
Each string (e.g. "12") represents a possible character at that location, in this case '1' or '2'.)
So possible equations from test would be "13=7" or "1=68".
I know the examples I give are not balanced but that's because I'm deliberately giving a simplified short string.
(I also know that I could use 'sequence' to search all possibilities but I want to be more intelligent so I need a different approach explained below.)
What I want is to try fixing each of the equals in turn and then removing all other equals in the equation. So I want:
[["12","=","56","78"],["12","34","=","78”]]
I've written this nested list comprehension:
(it needs: {-# LANGUAGE ParallelListComp #-} )
fixEquals :: [String] -> [[String]]
fixEquals re
= [
[
if index == outerIndex then equals else remain
| equals <- map (filter (== '=')) re
| remain <- map (filter (/= '=')) re
| index <- [1..]
]
| outerIndex <- [1..length re]
]
This produces:
[["","34","56","78"],["12","=","56","78"],["12","34","=","78"],["12","34","56","”]]
but I want to filter out any with empty lists within them. i.e. in this case, the first and last.
I can do:
countOfEmpty :: (Eq a) => [[a]] -> Int
countOfEmpty = length . filter (== [])
fixEqualsFiltered :: [String] -> [[String]]
fixEqualsFiltered re = filter (\x -> countOfEmpty x == 0) (fixEquals re)
so that "fixEqualsFiltered test" gives:
[["12","=","56","78"],["12","34","=","78”]]
which is what I want but it doesn’t seem elegant.
I can’t help thinking there’s another way to filter these out.
After all, it’s whenever "equals" is used in the if statement and is empty that we want to drop the equals so it seems a waste to build the list (e.g. ["","34","56","78”] and then ditch it.)
Any thoughts appreciated.
I don't know if this is any cleaner than your code, but it might be a bit more clear and maybe more efficient using a recursion:
fixEquals = init . f
f :: [String] -> [[String]]
f [] = [[]]
f (x:xs) | '=' `elem` x = ("=":removeEq xs) : map (removeEq [x] ++) (f xs)
| otherwise = map (x:) (f xs)
removeEq :: [String] -> [String]
removeEq = map (filter (/= '='))
The way it works is that, if there's an '=' in the current string, then it splits the return into two, if not just calls recursively. The init is needed as in the last element returned there's no equal in any string.
Finally, I believe you can probably find a better data structure to do what you need to achieve instead of using list of strings
Let
xs = [["","34","56","78"],["12","=","56","78"],["12","34","=","78"],["12","34","56",""]]
in
filter (not . any null) xs
will give
[["12","=","56","78"],["12","34","=","78"]]
If you want list comprehension then do
[x | x <- xs, and [not $ null y | y <- x]]
I think I'd probably do it this way. First, a preliminary that I've written so many times it's practically burned into my fingers by now:
zippers :: [a] -> [([a], a, [a])]
zippers = go [] where
go _ [] = []
go b (h:e) = (b,h,e):go (h:b) e
Probably running it once or twice in ghci will be a more clear explanation of what this does than any English writing I could do:
> zippers "abcd"
[("",'a',"bcd"),("a",'b',"cd"),("ba",'c',"d"),("cba",'d',"")]
In other words, it gives a way of selecting each element of a list in turn, giving the "leftovers" of what was before and after the selection point. Given that tool, here's our plan: we'll nondeterministically choose a String to serve as our equals sign, double-check that we've got an equals sign in the first place, and then clear out the equals from the others. So:
fixEquals ss = do
(prefix, s, suffix) <- zippers ss
guard ('=' `elem` s)
return (reverse (deleteEquals prefix) ++ ["="] ++ deleteEquals suffix)
deleteEquals = map (filter ('='/=))
Let's try it:
> fixEquals ["12", "34=", "56=", "78"]
[["12","=","56","78"],["12","34","=","78"]]
Perfect! But this is just a stepping-stone to actually generating the equations, right? It turns out to be not that hard to go all the way in one step, skipping this intermediate. Let's do that:
equations ss = do
(prefixes, s, suffixes) <- zippers ss
guard ('=' `elem` s)
prefix <- mapM (filter ('='/=)) (reverse prefixes)
suffix <- mapM (filter ('='/=)) suffixes
return (prefix ++ "=" ++ suffix)
And we can try it in ghci:
> equations ["12", "34=", "56=", "78"]
["1=57","1=58","1=67","1=68","2=57","2=58","2=67","2=68","13=7","13=8","14=7","14=8","23=7","23=8","24=7","24=8"]
The easiest waty to achieve what you want is to create all the combinations and to filter the ones that have a meaning:
Prelude> test = ["12", "34=", "56=", "78"]
Prelude> sequence test
["1357","1358","1367","1368","13=7","13=8","1457","1458","1467","1468","14=7","14=8","1=57","1=58","1=67","1=68","1==7","1==8","2357","2358","2367","2368","23=7","23=8","2457","2458","2467","2468","24=7","24=8"
Prelude> filter ((1==).length.filter('='==)) $ sequence test
["13=7","13=8","14=7","14=8","1=57","1=58","1=67","1=68","23=7","23=8","24=7","24=8","2=57","2=58","2=67","2=68"]
You pointed the drawback: imagine we have the followig list of strings: ["=", "=", "0123456789", "0123456789"]. We will generate 100 combinations and drop them all.
You can look at the combinations as a tree. For the ["12", "34"], you have:
/ \
1 2
/ \ / \
3 4 3 4
You can prune the tree: just ignore the subtrees when you have two = on the path.
Let's try to do it. First, a simple combinations function:
Prelude> :set +m
Prelude> let combinations :: [String] -> [String]
Prelude| combinations [] = [""]
Prelude| combinations (cs:ts) = [c:t | c<-cs, t<-combinations ts]
Prelude|
Prelude> combinations test
["1357","1358","1367","1368","13=7","13=8","1457","1458","1467","1468","14=7","14=8","1=57","1=58","1=67","1=68","1==7","1==8","2357","2358","2367","2368","23=7","23=8","2457","2458","2467","2468","24=7","24=8", ...]
Second, we need a variable to store the current number of = signs met:
if we find a second = sign, just drop the subtree
if we reach the end of a combination with no =, drop the combination
That is:
Prelude> let combinations' :: [String] -> Int -> [String]
Prelude| combinations' [] n= if n==1 then [""] else []
Prelude| combinations' (cs:ts) n = [c:t | c<-cs, let p = n+(fromEnum $ c=='='), p <= 1, t<-combinations' ts p]
Prelude|
Prelude> combinations' test 0
["13=7","13=8","14=7","14=8","1=57","1=58","1=67","1=68","23=7","23=8","24=7","24=8","2=57","2=58","2=67","2=68"]
We use p as the new number of = sign on the path: if p>1, drop the subtree.
If n is zero, we don't have any = sign in the path, drop the combination.
You may use the variable n to store more information, eg type of the last char (to avoid +* sequences).

Adding two functions together in Haskell

Hi I am new in Haskell and I came across an interesting problem but I was not really sure on how I would go about solving it. I am about to show you only two parts of the question as an example.
The question is that we are to input a number between 13 to 15 digits.
then from that number we remove the last number. such as 19283828382133 should out put the exact same number just without the final 3, 1928382838213.
Then every odd digit(not number) from these numbers will be doubled. So you will get 2,9,4,8,6 etc
This is my code so far. As you can see from the code I have been able to complete these two parts individually(working) but I am not sure how I would add them together.
lastdigit :: Integer -> Integer -- This Function removes the last number
lastdigit x = x`div`10
doubleOdd (x:xs) = (2*x):(doubleEven xs) -- This function doubles every odd digit not number.
doubleOdd [] = []
doubleEven (x:xs) = x:(doubleOdd xs)
doubleEven [] = []
So to further explain the program I am trying to build will first go through the step of taking in the number between 13 to 15 digits. Then it will first remove the last number then automatically go to the next step of doubling each odd digit(not number). Thanks
First, you need a way to break some large number into digits.
digits :: Integral x => x -> [x]
digits 0 = []
digits x = digits (x `div` 10) ++ [x `mod` 10]
Which gives you...
Prelude> digits 12345
[1,2,3,4,5]
You can then drop the last digit with init
Prelude> (init . digits) 12345
[1,2,3,4]
The a helper function to map over odd elements in a list.
mapOdd _ [] = []
mapOdd f (x:[]) = [f x]
mapOdd f (x:y:rest) = f x : y : mapOdd f rest
Giving you...
Prelude> mapOdd (+10) [1..10]
[11,2,13,4,15,6,17,8,19,10]
And a function to get back to a large number...
undigits = sum . zipWith (*) [10^n | n <- [0..]] . reverse
Resulting in...
Prelude> undigits [1, 2, 3, 4]
1234
And putting it all together
Prelude> undigits . mapOdd (*2) . init . digits $ 12345
2264
In functional languages particularly, always try to solve a problem by composing solutions to smaller problems :)
The missing component is a way to break down an integer into its digits, and build it back up from there. That's easy:
digits:: Int -> [Int]
digits = map (`mod` 10) . takeWhile (/= 0) . iterate (`div` 10)
undigits :: [Int] -> Int
undigits = foldr f 0 where f i r = 10 * r + i
Then it looks like you need to post-process those digits in two different ways, but only if they match a predicate. Let's build a combinator for that:
when :: (a -> Bool) -> (a -> a) -> a -> a
when p f a = if p a then f a else a
The first case appears when you want to double digits in odd position (from left to right). Again trivial, with the minor inconvenience that digits breaks down a number by increasing power of ten. Let's prefix each number by its position:
prefix :: [Int] -> [(Int, Int)]
prefix is = let n = length is in zip [n, n-1..1] is
doubleOdd can now be expressed as
doubleodd :: [Int] -> [Int]
doubleodd = map (snd . when (odd . fst) (id *** double)) . prefix
You mentioned in a comment that when the double number overflows, its digits must be added together. This is the second case I was referring to and is again simplicity itself:
double :: Int -> Int
double = when (>= 10) (sum . digits) . (* 2)
Here is your final program:
program = undigits . doubleodd . tail . digits
... assuming the "between 13 and 15 digits" part is verified separately.
I hope this helps and realize it could be cleaned up a lot. List indices start with 0 which is also an even number and the first element of a list. The list comprehension processes 0,2,4 ... the 1st,2nd and 3rd items.
let f n = [mod n 10] ++ f (div n 10)
let r = [if even i then d*2 else d|(i,d)<-zip [0..] (init.reverse.take 14.f$19283828382133)]
sum [b*(10^a)|(a,b) <- zip [12,11..0] r]
2948684868416
If you want it to handle any length number, the easiest way here is length $ show 19283828382133 but I do have a function somewhere that does that. Use the length as a value in 3 places, once at full value in thetake function in the composition.

Splitting lists in Haskell

In Haskell I need to perform a function, whose declaration of types is as follows:
split ::[Integer] -> Maybe ([Integer],[Integer])
Let it work as follows:
split [1,2,3,4,5,15] = Just ([1,2,3,4,5],[15])
Because, 1 + 2 + 3 + 4 + 5 = 15
split [1,3,3,4,3] = Just ([1,3,3],[4,3])
Because, 1 + 3 + 3 = 7 = 4 + 3
split [1,5,7,8,0] = Nothing
I have tried this, but it doesn't work:
split :: [Integer] -> ([Integer], [Integer])
split xs = (ys, zs)
where
ys <- subsequences xs, ys isInfixOf xs, sum ys == sum zs
zs == xs \\ ys
Determines whether the list of positive integers xs can be divided into two parts (without rearranging its elements) with the same sum. If possible, its value is the pair formed by the two parts. If it's not, its value is Nothing.
How can I do it?
Not a complete answer, since this is a learning exercise and you want hints, but if you want to use subsequences from Data.List, you could then remove each element of the subsequence you are checking from the original list with \\, to get the difference, and compare the sums. You were on the right track, but you need to either find the first subsequence that works and return Just (ys, zs), or else Nothing.
You can make the test for some given subsequence a predicate and search with find.
What you could also do is create a function that gives all possible splittings of a list:
splits :: [a] -> [([a], [a])]
splits xs = zipWith splitAt [1..(length xs)-1] $ repeat xs
Which works as follows:
*Main> splits [1,2,3,4,5,15]
[([1],[2,3,4,5,15]),([1,2],[3,4,5,15]),([1,2,3],[4,5,15]),([1,2,3,4],[5,15]),([1,2,3,4,5],[15])]
Then you could just use find from Data.List to find the first pair of splitted lists that have equal sums:
import Data.List
splitSum :: [Integer] -> Maybe ([Integer], [Integer])
splitSum xs = find (\(x, y) -> sum x == sum y) $ splits xs
Which works as intended:
*Main> splitSum [1,2,3,4,5,15]
Just ([1,2,3,4,5],[15])
Since find returns Maybe a, the types automatically match up.

How to split a [String] in to [[String]] based on length

I'm trying to split a list of Strings in to a List of Lists of Strings
so like in the title [String] -> [[String]]
This has to be done based on length of characters, so that the Lists in the output are no longer than 10. So if input was length 20 this would be broken down in to 2 lists and if length 21 in to 3 lists.
I'm not sure what to use to do this, I don't even know how to brake down a list in to a list of lists never mind based on certain length.
For example if the limit was 5 and the input was:
["abc","cd","abcd","ab"]
The output would be:
[["abc","cd"],["abcd"],["ab"]]
I'd like to be pointed in the right direction and what methods to use, list comprehension? recursion?
Here's an intuitive solution:
import Data.List (foldl')
breakup :: Int -> [[a]] -> [[[a]]]
breakup size = foldl' accumulate [[]]
where accumulate broken l
| length l > size = error "Breakup size too small."
| sum (map length (last broken ++ [l])) <= size
= init broken ++ [last broken ++ [l]]
| otherwise = broken ++ [[l]]
Now, let's go through it line-by-line:
breakup :: Int -> [[a]] -> [[[a]]]
Since you hinted that you may want to generalize the function to accept different size limits, our type signature reflects this. We also generalize beyond [String] (that is, [[Char]]), since our problem is not specific to [[Char]], and could equally apply to any [[a]].
breakup size = foldl' accumulate [[]]
We're using a left fold because we want to transform a list, left-to-right, into our target, which will be a list of sub-lists. Even though we're not concerned with efficiency, we're using Data.List.foldl' instead of Prelude's own foldl because this is standard practice. You can read more about foldl vs. foldl' here.
Our folding function is called accumulate. It will consider a new item and decide whether to place it in the last-created sub-list or to start a new sub-list. To make that judgment, it uses the size we passed in. We start with an initial value of [[]], that is, a list with one empty sub-list.
Now the question is, how should you accumulate your target?
where accumulate broken l
We're using broken to refer to our constructed target so far, and l (for "list") to refer to the next item to process. We'll use guards for the different cases:
| length l > size = error "Breakup size too small."
We need to raise an error if the item surpasses the size limit on its own, since there's no way to place it in a sub-list that satisfies the size limit. (Alternatively, we could build a safe function by wrapping our return value in the Maybe monad, and that's something you should definitely try out on your own.)
| sum (map length (last broken ++ [l])) <= size
= init broken ++ [last broken ++ [l]]
The guard condition is sum (map length (last broken ++ [l])) <= size, and the return value for this guard is init broken ++ [last broken ++ [l]]. Translated into plain English, we might say, "If the item can fit in the last sub-list without going over the size limit, append it there."
| otherwise = broken ++ [[l]]
On the other hand, if there isn't enough "room" in the last sub-list for this item, we start a new sub-list, containing only this item. When the accumulate helper is applied to the next item in the input list, it will decide whether to place that item in this sub-list or start yet another sub-list, following the same logic.
There you have it. Don't forget to import Data.List (foldl') up at the top. As another answer points out, this is not a performant solution if you plan to process 100,000 strings. However, I believe this solution is easier to read and understand. In many cases, readability is the more important optimization.
Thanks for the fun question. Good luck with Haskell, and happy coding!
You can do something like this:
splitByLen :: Int -> [String] -> [[String]]
splitByLen n s = go (zip s $ scanl1 (+) $ map length s) 0
where go [] _ = []
go xs prev = let (lst, rest) = span (\ (x, c) -> c - prev <= n) xs
in (map fst lst) : go rest (snd $ last lst)
And then:
*Main> splitByLen 5 ["abc","cd","abcd","ab"]
[["abc","cd"],["abcd"],["ab"]]
In case there is a string longer than n, this function will fail. Now, what you want to do in those cases depends on your requirements and that was not specified in your question.
[Update]
As requested by #amar47shah, I made a benchmark comparing his solution (breakup) with mine (splitByLen):
import Data.List
import Data.Time.Clock
import Control.DeepSeq
import System.Random
main :: IO ()
main = do
s <- mapM (\ _ -> randomString 10) [1..10000]
test "breakup 10000" $ breakup 10 s
test "splitByLen 10000" $ splitByLen 10 s
putStrLn ""
r <- mapM (\ _ -> randomString 10) [1..100000]
test "breakup 100000" $ breakup 10 r
test "splitByLen 100000" $ splitByLen 10 r
test :: (NFData a) => String -> a -> IO ()
test s a = do time1 <- getCurrentTime
time2 <- a `deepseq` getCurrentTime
putStrLn $ s ++ ": " ++ show (diffUTCTime time2 time1)
randomString :: Int -> IO String
randomString n = do
l <- randomRIO (1,n)
mapM (\ _ -> randomRIO ('a', 'z')) [1..l]
Here are the results:
breakup 10000: 0.904012s
splitByLen 10000: 0.005966s
breakup 100000: 150.945322s
splitByLen 100000: 0.058658s
Here is another approach. It is clear from the problem that the result is a list of lists and we need a running length and an inner list to keep track of how much we have accumulated (We use foldl' with these two as input). We then describe what we want which is basically:
If the length of the current input string itself exceeds the input length, we ignore that string (you may change this if you want a different behavior).
If the new length after we have added the length of the current string is within our input length, we add it to the current result list.
If the new length exceeds the input length, we add the result so far to the output and start a new result list.
chunks len = reverse . map reverse . snd . foldl' f (0, [[]]) where
f (resSoFar#(lenSoFar, (currRes: acc)) curr
| currLength > len = resSoFar -- ignore
| newLen <= len = (newLen, (curr: currRes):acc)
| otherwise = (currLength, [curr]:currRes:acc)
where
newLen = lenSoFar + currLength
currLength = length curr
Every time we add a result to the output list, we add it to the front hence we need reverse . map reverse at the end.
> chunks 5 ["abc","cd","abcd","ab"]
[["abc","cd"],["abcd"],["ab"]]
> chunks 5 ["abc","cd","abcdef","ab"]
[["abc","cd"],["ab"]]
Here is an elementary approach. First, the type String doesn't matter, so we can define our function in terms of a general type a:
breakup :: [a] -> [[a]]
I'll illustrate with a limit of 3 instead of 10. It'll be obvious how to implement it with another limit.
The first pattern will handle lists which are of size >= 3 and the the second pattern handles all of the other cases:
breakup (a1 : a2 : a3 : as) = [a1, a2, a3] : breakup as
breakup as = [ as ]
It is important to have the patterns in this order. That way the second pattern will only be used when the first pattern does not match, i.e. when there are less than 3 elements in the list.
Examples of running this on some inputs:
breakup [1..5] -> [ [1,2,3], [4,5] ]
breakup [1..4] -> [ [1,2,3], [4] ]
breakup [1..2] -> [ [1,2] ]
breakup [1..3] -> [ [1,2,3], [] ]
We see these is an extra [] when we run the function on [1..3]. Fortunately this is easy to fix by inserting another rule before the last one:
breakup [] = []
The complete definition is:
breakup :: [a] -> [[a]]
breakup [] = []
breakup (a1 : a2 : a3 : as) = [a1, a2, a3] : breakup as
breakup as = [ as ]

String letter percentages in Haskell

I'm trying to write a Haskell function that will take a String say "PLATYPUS" and will return the relative percentages of Characters in that word i.e. characterPercentages "PLATYPUS" would return: [(P,25),(A,13),(L,13),(S,13),(T,13),(U,13),(Y,13)]. I know I can use tuples, but after that I'm a bit stumped?
First, you need to understand what are you going to get.
As I understand, you wish to have
type String = [Char] --already in Prelude
String -> [(Char,Int)]
"PLATYPUS" -=> [('P',2),('A',1),('L',1),('S',1),('T',1),('U',1),('Y',1)]
You could combine group grouping lists from Data-List with mapping using length function
String -> [[Char]]
[[Char]] -> [(Char,Int)]
UPDATED
If we talk about first part - count letters, we can do next:
> :m Data.List
> map (\c -> (head c, length c)) $ group $ sort "PLATYPUSAAA"
[('A',4),('L',1),('P',2),('S',1),('T',1),('U',1),('Y',1)]
So, let's found relative numbers, we change length c to 100*(length c) 'div' ls:
> let frqLetters s = let ls = length s in
map (\c -> (head c, 100 * (length c) `div` ls)) $ group $ sort s
> frqLetters "PLATYPUSAAA"
[('A',36),('L',9),('P',18),('S',9),('T',9),('U',9),('Y',9)]

Resources