Haskell: Pattern Matching to combine String - haskell

I'm trying to write a function which adds single characters from a string to a list of strings, for instance
combine ", !" ["Hello", "", "..."] = ["Hello,", " ", "...!"]
I've tried this:
combine :: String -> [String] -> [String]
combine (y:ys) (x:xs) =
[x:y, combine ys xs]

A simple one would be
combine :: [Char] -> [String] -> [String]
combine [] _ = []
combine _ [] = []
combine (c:cs) (x:xs) = x ++ [c] : combine cs xs
Or even more simply using zipWith
combine :: [Char] -> [String] -> [String]
combine = zipWith (\c x -> x ++ [c])
I had to do a bit extra to get this to work. I'll break it down for you.
First, I specified the type of the function as [Char] -> [String] -> [String]. I could have used String for the first argument, but what you're operating on conceptually is a list of characters and a list of strings, not a string and a list of strings.
Next, I had to specify the edge cases for this function. What happens when either argument is the empty list []? The easy answer is to just end the computation then, so we can write
combine [] _ = []
combine _ [] = []
Here the _ is matching anything, but throwing it away because it isn't used in the return value.
Next, for the actual body of the function We want to take the first character and the first string, then append that character to the end of the string:
combine (c:cs) (x:xs) = x ++ [c]
But this doesn't do anything with cs or xs, the rest of our lists (and won't even compile with the type signature above). We need to keep going, and since we're generating a list, this is normally done with the prepend operator :
combine (c:cs) (x:xs) = x ++ [c] : combine cs xs
However, this is such a common pattern that there is a helper function called zipWith that handles the edge cases for us. It's type signature is
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
It walks down both input lists simultaneously, passing the corresponding elements into the provided function. Since the function we want to apply is \c x -> x ++ [c] (turned into a lambda function), we can drop it in to zipWith as
combine cs xs = zipWith (\c x -> x ++ [c]) cs xs
But Haskell will let us drop arguments when possible, so we can eta reduce this to
combine :: [Char] -> [String] -> [String]
combine = zipWith (\c x -> x ++ [c])
And that's it!

When you want to combine lists element by element, it is usually a zip you are looking at. In this case, you know exactly how you want to combine the elements – that makes it a zipWith.
zipWith takes a "combining function" and then creates a function that combines two lists using said combining function. Let's call your "combining" function append, because it adds a characters to the end of a string. You can define it like this:
append char string = string ++ [char]
Do you see how this works? For example,
append 'e' "nic" = "nice"
or
append '!' "Hello" = "Hello!"
Now that we have that, recall that zipWith takes a "combining function" and then creates a function that combines two lists using that function. So your function is then easily implemented as
combine = zipWith append
and it will do append on each of the elements in order in the lists you supply, like so:
combine ", !" ["Hello", "", "..."] = ["Hello,", " ", "...!"]

You are close. There are a couple issues with what you have.
y has type Char, and x has type String which is an alias for [Char]. This means that you can add y to the top of a list with y : x, but you can't add y to the end of a list using the same : operator. Instead, you make y into a list and join the lists.
x ++ [y]
There must also be a base case, or this recursion will continue until it has no elements in either list and crash. In this case, we likely don't have anything we want to add.
combine [] [] = []
Finally, once we create the element y ++ [x] we want to add it to the top of the rest of the items we have computed. So we use : to cons it to our list.
combine :: String -> [String] -> [String]
combine [] [] = []
combine (x : xs) (y : ys) = (y ++ [x]) : (combine xs ys)
One note about this code, if there is ever a point where the number of characters in your string is different from the number of strings in you list, then this will crash. You can handle that case in a number of ways, bheklilr's answer addresses this.
kqr's answer also works perfectly and is probably the best one to use in practice.

Related

Number each list element and format text in haskell

I want to give each one a number from 1 to length(x:xs), like a book's page number. Unfortunately it only works backwards.
numberL :: [String] -> [String]
numberL [] = []
numberL (x:xs) = ([show (length(x:xs)) ++ ": " ++ x] ++ numberL (xs))
Also how do I remove any new line and tab from the text and replace it with the actual new line and tabulator?
There are multiple built-in Haskell functions in Prelude that are good to learn and use them. zip and zipWith are two of them, when you think about something to be done using two different lists into one result list:
[1..] will generate the list of indices for you, it's an infinite list
appendIndex :: String -> Int -> String
appendIndex s i = (show i) ++ " :" ++ s
indexThem :: [String] -> [String]
indexThem l = zipWith appendIndex l [1..]
if you wanted to use zip, which is more basic but a little more verbose:
appendIndex :: (String,Int) -> String
appendIndex (s,i) = (show i) ++ " :" ++ s
indexThem :: [String] -> [String]
indexThem l = fmap appendIndex $ zip l [1..]
-- if you dont know about Functors yet, `fmap` is the generic way of doing `map`
To get it right, it's important to understand why you're thinking wrong. Your recursion looks like this:
numberL (x:xs) = ... ++ numberL xs
So you calculate numberL xs and then put something in front of it. If numberL xs were correct, then then it would be numbered from 1 onwards, like: 1:..., 2:..., 3:.... So you could never build numberL (x:xs) from numberL xs just by adding new elements at the front. The whole numbering would be wrong. Instead you'd have to change the whole numbering of numberL xs.
The problem therefore is that it's not very useful to know numberL xs in order to calculate numberL (x:xs), due to the fact numberL always starts numbering from 1.
So lift that restriction. Build a function that numbers starting at n,
numberLFrom :: Int -> [String] -> [String]
numberLFrom n [] = ...
numberLFrom n (x:xs) = ...
Now the question you have to ask yourself is, in order to number (x:xs) starting at n you need to number xs starting at which number? And then how do you introduced the numbered x to that result?

Haskell filter out circular permutations

You have a list with N elements
You only want to print elements that are not circular permuations of other elements of the same list
To check if two strings are the circular permutations of each other I do this, which works fine :
string1 = "abc"
string2 = "cab"
stringconc = string1 ++ string1
if string2 `isInfixOf` stringconc
then -- it's a circular permuation
else -- it's not
Edit : As one comment pointed that out, this test only work for strings of the same size
Back to the real use case :
checkClean :: [String] -> [String] -> IO String
checkClean [] list = return ""
checkClean (x:xs) list = do
let sequence = cleanInfix x list
if sequence /= "abortmath"
then putStr sequence
else return ()
checkClean xs list
cleanInfix :
cleanInfix :: String -> [String] -> String
cleanInfix seq [] = seq
cleanInfix seq (x:xs) = do
let seqconc = x ++ x
if seq `isInfixOf` seqconc && seq /= x
then "abortmath"
else cleanInfix seq xs
However this just outputs... nothing
With some research I found out that sequence in checkClean is always "abortmath"
Also I'm not quite comfortable with this "flag" abortmath, because if by any chance one element of the list is "abortmath", well..
For example :
if I have a list composed of :
NUUNNFFUF
FFUFNUUNN
I should write
NUUNNFFUF
I guess you call your initial code (question) with something like that:
result = ["NUUNNFFUF", "FFUFNUUNN"]
main = do
checkClean result result
It won't print anything because:
the first call of cleanInfix has the arguments following arguments: "NUUNNFFUF" and ["NUUNNFFUF", "FFUFNUUNN"]
in cleanInfix, since seq == x you have a recursive call with the following arguments: "NUUNNFFUF" and ["FFUFNUUNN"]
now, "NUUNNFFUF" is a real permutation of "FFUFNUUNN": cleanInfix returns "abortmath", and checkClean returns ()
then you have a recursive call of checkClean with following arguments: "FFUFNUUNN" and ["NUUNNFFUF", "FFUFNUUNN"]
again, "FFUFNUUNN" is a real permutation of "NUUNNFFUF": cleanInfix returns "abortmath", and checkClean returns ()
this is the end.
Basically, x is a permutation of y and y is a permutation of x, thus x and y are discarded.
Your answer works, but it is horribly complicated.
I won't try to improve either of your codes, but I will make a general comment: you should (you really should) avoid returning a monad when you don't need to: in the question, checkClean just needs to remove duplicates (or "circular duplicates") from a list. That's totally functional: you have all the information you need. Thus, remove those dos, lets and returns!
Now, let's try to focus on this:
You have a list with N elements You only want to print elements that are not circular permuations of other elements of the same list
Why don't you use your initial knowledge on circular permutations?
isCircPermOf x y = x `isInfixOf` (y ++ y)
Now, you need a function that takes a sequence and a list of sequences, and return only the elements of the second that are not circular permutations of the first :
filterCircDuplicates :: String -> [String] -> [String]
filterCircDuplicates seq [] = []
filterCircDuplicates seq (x:xs) =
if seq `isCircPermOf` x
then removeCircDuplicates seq xs
else x:removeCircDuplicates seq xs
This pattern is well know, and you can use filter to simplify it:
filterCircDuplicates seq l = filter (\x -> !seq `isCircPermOf` x) l
Or better:
filterCircDuplicates seq = filter (not.isCircPermOf seq)
Note the signature: not.isCircPermOf seq :: String -> Boolean. It returns true if the current element is not a circular permutation of seq. (You don't have to add the list argument.)
Final step: you need a function that takes a list and return this list without (circular) duplicates.
removeCircDuplicates :: [String] -> [String]
removeCircDuplicates [] = []
removeCircDuplicates (x:xs) = x:filterCircDuplicates x (removeCircDuplicates xs)
When your list has a head and a tail, you clean the tail, then remove the duplicates of the first element of the tail, and keep this first element.
Again, you have a well known pattern, a fold:
removeCircDuplicates = foldr (\x acc -> x:filterCircDuplicates x acc) []
It removes the duplicates from right to left.
And if you want a one-liner:
Prelude Data.List> foldr (\x -> ((:) x).filter(not.(flip isInfixOf (x++x)))) [] ["abcd", "acbd", "cdab", "abdc", "dcab"]
["abcd","acbd","abdc"]
The wonders you can make with a pen and some paper...
So if anyone is interested here is how I solved it, it's probably badly optimised but at least it works (I'm just trying to learn haskell, so it's good enough for now)
-- cleanInfix function
cleanInfix :: String -> [String] -> [String] -> [String]
cleanInfix sequence [] cleanlist = cleanlist
cleanInfix sequence (x:xs) cleanlist = do
-- this is where I check for the circular permuation
let sequenceconc = x ++ x
if sequence `isInfixOf` sequenceconc
then cleanInfix sequence xs (delete x cleanlist)
else cleanInfix sequence xs cleanlist
-- checkClean Function
checkClean :: [String] -> [String] -> [String] -> [String]
checkClean [] listesend cleanlist = cleanlist
checkClean (x:xs) listesend cleanlist = do
-- The first delete is to avoid checking if an element is the circular permuation of... itself, because it obviously is... in some way
let liste2 = cleanInfix x (delete x listesend) cleanlist
checkClean xs (delete x listesend) liste2
-- Clean function, first second and third are the command line argument don't worry about them
clean first second third = do
-- create of the result list by asking user for input
let printlist = checkClean result result result -- yes, it's the same list, three times
print printlist -- print the list

How to "pack" some strings in a list on Haskell?

I want to write a function pack such that
pack ['a','a','a','b','c','c','a','a','d','e','e','e']
= ["aaa","b","cc","aa","d","eee"]
How can I do this? I'm stuck...
Use Data.List.group:
λ> import Data.List (group)
λ> :t group
group :: Eq a => [a] -> [[a]]
λ> group ['a','a','a','b','c','c','a','a','d','e','e','e']
["aaa","b","cc","aa","d","eee"]
Unless you want to write the function yourself (see Michael Foukarakis answer)
Here's something off the top of my head:
pack :: (Eq a) => [a] -> [[a]]
pack [] = []
-- We split elements of a list recursively into those which are equal to the first one,
-- and those that are not. Then do the same for the latter:
pack (x:xs) = let (first, rest) = span (==x) xs
in (x:first) : pack rest
Data.List already has what you're looking for, though.
I think it's worth adding a more explicit/beginner version:
pack :: [Char] -> [String]
pack [] = []
pack (c:cs) =
let (v, s) = findConsecutive [c] cs
in v : pack s
where
findConsecutive ds [] = (ds, [])
findConsecutive s#(d:ds) t#(e:es)
| d /= e = (s, t)
| otherwise = findConsecutive (e:s) es
If the input is an empty list, the outcome is also an empty list. Otherwise, we find the next consecutive Chars that are equal and group them together into a String, which is returned in the result list. In order to do that we use the findConsecutive auxiliary function. This function's behavior resembles the takeWhile function, with the difference that we know in advance the predicate to use (equality comparison) and that we return both the consumed and the remaining list.
In other words, the signature of findConsecutive could be written as:
findConsecutive :: String -> [Char] -> (String, String)
which means that it takes a string containing only repeated characters to be used as an accumulator and a list whose characters are "extracted" from. It returns a tuple containing the current sequence of elements and the remaining list. Its body should be intuitive to follow: while the characters list is not empty and the current element is equal to the ones in the accumulator, we add the character to the accumulator and recursive into the function. The function returns when we reach the end of the list or a different character is encountered.
The same rationale can be used to understand the body of pack.

Why can't I pattern match on the concatenation function (++) in Haskell?

I'm trying to match **String Newline String** pattern in a function Split.
split::String -> [String]
split[] = []
split (x++'\n':xs) = [x]++split(xs)
I'm getting this error:
Parse error in pattern: x ++ ('\n' : xs)
What am I doing wrong here?
I know there are other ways of achieving the same result but I'd like to understand what wrong with this pattern. I'm fairly new to Haskell BTW.
One problem (as I understand it) is that ++ is not a constructor of the list data type the way : is. You can think of the list data type being defined as
data [a] = [] | a : [a]
Where : is a constructor that appends elements to the front of a list. However, ++ is a function (defined in the documentation here: http://hackage.haskell.org/package/base-4.8.1.0/docs/src/GHC.Base.html#%2B%2B) as
(++) :: [a] -> [a] -> [a]
(++) [] ys = ys
(++) (x:xs) ys = x : xs ++ ys
We could define our own data type list like
data List a = Empty | Cons a (List a)
That would mimic the behavior of our familiar list. In fact, you could use (Cons val) in a pattern. I believe you could also define a type with a concat constructor like so
data CList a = Empty | Cons a (CList a) | Concat (CList a) (CList a)
Which you could use to lazily concatenate two lists and defer joining them into one. With such a data type you could pattern match against the Concat xs ys input, but you that would only work on the boundary of two lists and not in the middle of one.
Anyway I'm still fairly new to Haskell myself but I hope this is on point.
Imagine you could. Then matching "a\nb\nc" could produce x = "a", xs = "b\nc" or x = "a\nb", xs = "c" and you'd need some ad hoc rule to decide which to use. Matching against functions is also impossible to reasonably implement in general: you need to find an x given f x, and there is no way to do this other than trying all possible x.

mapping over entire data set to get results

Suppose I have the arrays:
A = "ABACUS"
B = "YELLOW"
And they are zipped so: Pairing = zip A B
I also have a function Connect :: Char -> [(Char,Char)] -> [(Char,Char,Int)]
What I want to do is given a char such as A, find the indices of where it is present in the first string and return the character in the same positions in the second string, as well as the position e.g. if I did Connect 'A' Pairing I'd want (A,Y,0) and (A,L,2) as results.
I know I can do
pos = x!!map fst pairing
to retrieve the positions. And fnd = findIndices (==pos) map snd pairing to get what's in this position in the second string but in Haskell how would I do this over the whole set of data (as if I were using a for loop) and how would I get my outputs?
To do exactly as you asked (but correct the initial letter of function names to be lowercase), I could define
connect :: Char -> [(Char,Char)] -> [(Char,Char,Int)]
connect c pairs = [(a,b,n)|((a,b),n) <- zip pairs [0..], a == c]
so if
pairing = zip "ABACUS" "YELLOW"
we get
ghci> connect 'A' pairing
[('A','Y',0),('A','L',2)]
However, I think it'd be neater to zip once, not twice, using zip3:
connect3 :: Char -> String -> String -> [(Char,Char,Int)]
connect3 c xs ys = filter (\(a,_,_) -> a==c) (zip3 xs ys [0..])
which is equivalent to
connect3' c xs ys = [(a,b,n)| (a,b,n) <- zip3 xs ys [0..], a==c]
they all work as you wanted:
ghci> connect3 'A' "ABACUS" "YELLOW"
[('A','Y',0),('A','L',2)]
ghci> connect3' 'A' "ABACUS" "AQUAMARINE"
[('A','A',0),('A','U',2)]
In comments, you said you'd like to get pairs for matches the other way round.
This time, it'd be most convenient to use the monadic do notation, since lists are an example of a monad.
connectEither :: (Char,Char) -> String -> String -> [(Char,Char,Int)]
connectEither (c1,c2) xs ys = do
(a,b,n) <- zip3 xs ys [0..]
if a == c1 then return (a,b,n) else
if b == c2 then return (b,a,n) else
fail "Doesn't match - leave it out"
I've used the fail function to leave out ones that don't match. The three lines starting if, if and fail are increasingly indented because they're actually one line from Haskell's point of view.
ghci> connectEither ('a','n') "abacus" "banana"
[('a','b',0),('a','n',2),('n','u',4)]
In this case, it hasn't included ('n','a',2) because it's only checking one way.
We can allow both ways by reusing existing functions:
connectBoth :: (Char,Char) -> String -> String -> [(Char,Char,Int)]
connectBoth (c1,c2) xs ys = lefts ++ rights where
lefts = connect3 c1 xs ys
rights = connect3 c2 ys xs
which gives us everything we want to get:
ghci> connectBoth ('a','n') "abacus" "banana"
[('a','b',0),('a','n',2),('n','a',2),('n','u',4)]
but unfortunately things more than once:
ghci> connectBoth ('A','A') "Austria" "Antwerp"
[('A','A',0),('A','A',0)]
So we can get rid of that using nub from Data.List. (Add import Data.List at the top of your file.)
connectBothOnce (c1,c2) xs ys = nub $ connectBoth (c1,c2) xs ys
giving
ghci> connectBothOnce ('A','A') "ABACUS" "Antwerp"
[('A','A',0),('A','t',2)]
I would recommend not zipping the lists together, since that'd just make it more difficult to use the function elemIndices from Data.List. You then have a list of the indices that you can use directly to get the values out of the second list.
You can add indices with another zip, then filter on the given character and convert tuples to triples. Especially because of this repackaging, a list comprehension seems appropriate:
connect c pairs = [(a, b, idx) | ((a, b), idx) <- zip pairs [0..], a == c]

Resources