Complex pattern matching with strings

Complex pattern matching with strings - string

I have a list of strings that looks like this:
xs = ["xabbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "ibbatb"]
I would like to find only strings in the list which have and vocel followed by two b's followed by any character followed by a vowel. How are simple matches like this done in Haskell. Is there a better solution that regular expressions? Can anyone help me with an example? Thanks.

You could just use the classic filter function in conjunction with any regexp library. Your pattern is simple enough that this would work with any regexp library :
filter (=~ "bb.[aeiuy]") xs
The confusing part of regexps in Haskell is that there is a very powerful generic API (in regex-base) to use them in the same way for all the specific libraries and the multiple result type you could wish for (Bool, String, Int...). For basic usages it should mostly work as you mean (tm). For your specific need, regex-posix should be sufficient (and come with the haskell platform so no need to install it normally). So don't forget to import it :
import Text.Regex.Posix
This tutorial should show you the basics of the regex API if you have other needs, it is a bit out-dated now but the fundamentals remains the same, only details of regex-base have changed.

One approach would be to build a small pattern-matching language and to embed it in Haskell.
In your example, a pattern is basically a list of character specifications. Let's define a type of abstract characters the values of which will serve as such specifications,
data AbsChar = Exactly Char | Vowel | Any
together with an "interpreter" that tells us whether a character matches a specification:
(=?) :: AbsChar -> Char -> Bool
Exactly c' =? c = c == c'
Vowel =? c = c `elem` "aeiou"
Any =? c = True
For example, Vowel =? 'x' will produce False, while Vowel =? 'a' will produce True.
Then, indeed, a pattern is just a list of abstract characters:
type Pattern = [AbsChar]
Next, we write a function that tests whether the prefix of a string matches a given pattern:
matchesPrefix :: Pattern -> String -> Bool
matchesPrefix [] _ = True
matchesPrefix (a : as) (c : cs) = a =? c && matchesPrefix as cs
matchesPrefix _ _ = False
For example:
> matchesPrefix [Vowel, Exactly 'v'] "eva"
True
> matchesPrefix [Vowel, Exactly 'v'] "era"
False
As we do not want to restrict ourselves to matching prefixes, but rather match anywhere within a word, our next function matches the prefixes of every end segment of a string:
containsMatch :: Pattern -> String -> Bool
containsMatch pat = any (matchesPrefix pat) . tails
It uses the function tails which can be found in the module Data.List, but which we can, to make this explanation self-contained, easily define ourselves as well:
tails :: [a] -> [[a]]
tails [] = [[]]
tails l#(_ : xs) = l : tails xs
For example:
> tails "xabbaua"
["xabbaua","abbaua","bbaua","baua","aua","ua","a",""]
Now, finally, the function you were looking for, that selects all strings from a list that contain a matching segment, is written simply as:
select :: Pattern -> [String] -> [String]
select = filter . containsMatch
Let's test it on your example:
> let pat = [Vowel, Exactly 'b', Exactly 'b', Any, Vowel]
> select pat ["xabbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "ibbatb"]
["xabbaua"]

Well, you can try this function, although this may not be a best method:
elem' :: String -> String -> Bool
elem' p xs = any (p==) $ map (take $ length p) $ tails xs
Usage:
filter (elem' "bb") ["xxbbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "bbbaab"]
or
bbFilter = filter (elem' "bb")

Well if you're absolutely opposed to doing it with Regexs you could do it with just pattern matching and recursion, although it is ugly.
xs = ["xabbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "ibbatb"]
vowel = "aeiou"
filter' strs = filter matches strs
matches [] = False
matches str#(x:'b':'b':_:y:xs)
| x `elem` vowel && y `elem` vowel = True
| otherwise = matches $ tail str
matches (x:xs) = matches xs
Calling filter' xs will return ["xabbaua"] which I believe is the required result.

Related

Is implementing the words function possible without a postprocessing step after folding?

Real World Haskell, chapter 4, page 98 of the printed version asks if words can be implemented using folds, and this is my question too:
Is it possible? If not, why? If it is, how?
I came up with the following, which is based on the idea that each non-space should be prepended to the last word in the output list (this happens in the otherwise guard), and that a space should trigger the appending of an emtpy word to the output list if there is not one already (this is handled in the if-then-else).
myWords :: String -> [String]
myWords = foldr step [[]]
where
step x yss#(y:ys)
| x == ' ' = if y == "" then yss else "":yss
| otherwise = (x:y):ys
Clearly this solution is wrong, since leading spaces in the input string result in one leading empty string in the output list of strings.
At the link above, I've looked into several of the proposed solutions for other readers, and many of them work similarly to my solution, but they generally "post-process" the output of the fold, for instance by tailing it if there is an empty leading string.
Other approaches use tuples (actually just pairs), so that the fold deals with the pair and can well handle the leading/trailing spaces.
In all these approaches, foldr (or another fold, fwiw) is not the function that provides the final output out of the box; there's always something else with has to adjust the output somehow.
Therefore I go back to the initial question and ask if it is actually possible to implement words (in a way that it correctly handles trailing/leading/repeated spaces) using folds. By using folds I mean that the folding function has to be the outermost function:
myWords :: String -> [String]
myWords input = foldr step seed input

If I understand correctly, your requirements include
(1) words "a b c" == words " a b c" == ["a", "b", "c"]
(2) words "xa b c" == ["xa", "b", "c"] /= ["x", "a", "b", "c"] == words "x a b c"
This implies that we can not have
words = foldr step base
for any step and base.
Indeed, if we had that, then
words "xa b c"
= def words and foldr
step 'x' (words "a b c")
= (1)
step 'x' (words " a b c")
= def words and foldr
words "x a b c"
and this contradicts (2).
You definitely need some post-processing after the foldr.

#chi has a wonderful argument that you cannot implement words using "a" fold, but you did say using folds.
words = filterNull . words1
where
filterNull = foldr (\xs -> if null xs then id else (xs:)) []
words1 = foldr (\c -> if c == ' ' then ([]:) else consHead c) []
consHead c [] = [[c]]
consHead c (xs:xss) = (c:xs):xss
Both the outermost and innermost function are folds. ;-)

Yes. Eventhough it's a little tricky you may still do this job properly by using a single foldr and nothing else if you dwell into CPS (Continuation Passing Style). I had shown a special kind of chunksOf function previously.
In this kinds of folds our accumulator, hence the result of the fold is a function and we have to apply it to an identity kind of input so that we have the final result. So this may count as a final processing stage or not since we are using a single fold here and the type of it includes the function. Open to debate :)
ws :: String -> [String]
ws str = foldr go sf str $ ""
where
sf :: String -> [String]
sf s = if s == " " then [""] else [s]
go :: Char -> (String -> [String]) -> (String -> [String])
go c f = \pc -> let (s:ss) = f [c]
in case pc of
"" -> dropWhile (== "") (s:ss)
otherwise -> case (pc == " ", s == "") of
(True, False) -> "":s:ss
(True, True) -> s:ss
otherwise -> (pc++s):ss
λ> ws " a b c "
["a","b","c"]
sf : The initial function value to start with.
go : The iterator function
We are actually not fully utilizing the power of the CPS here since we have both the previous character pc and the currect character c at hand in every turn. It was very useful in the chunksOf function mentioned above while chunking a [Int] into [[Int]] every time an ascending sequence of elements were broken.

Capitalizing first letter of words while removing spaces (Haskell)

I'm just starting out in Haskell and this is like the third thing I'm writing, so, naturally, I'm finding myself a little stumped.
I'm trying to write a bit of code that will take a string, delete the spaces, and capitalize each letter of that string.
For example, if I input "this is a test", I would like to get back something like: "thisIsATest"
import qualified Data.Char as Char
toCaps :: String -> String
toCaps [] = []
toCaps xs = filter(/=' ') xs
toCaps (_:xs) = map Char.toUpper xs
I think the method I'm using is wrong. With my code in this order, I am able to remove all the spaces using the filter function, but nothing becomes capitalize.
When I move the filter bit to the very end of the code, I am able to use the map Char.toUpper bit. When I map that function Char.toUpper, it just capitalizes everything "HISISATEST", for example.
I was trying to make use of an if function to say something similar to
if ' ' then map Char.toUpper xs else Char.toLower xs, but that didn't work out for me. I haven't utilized if in Haskell yet, and I don't think I'm doing it correctly. I also know using "xs" is wrong, but I'm not sure how to fix it.
Can anyone offer any pointers on this particular problem?

I think it might be better if you split the problem into smaller subproblems. First we can make a function that, for a given word will capitalize the first character. For camel case, we thus can implement this as:
import Data.Char(toUpper)
capWord :: String -> String
capWord "" = ""
capWord (c:cs) = toUpper c : cs
We can then use words to obtain the list of words:
toCaps :: String -> String
toCaps = go . words
where go [] = ""
go (w:ws) = concat (w : map capWord ws)
For example:
Prelude Data.Char> toCaps "this is a test"
"thisIsATest"
For Pascal case, we can make use of concatMap instead:
toCaps :: String -> String
toCaps = concatMap capWord . words

Inspired by this answer from Will Ness, here's a way to do it that avoids unnecessary Booleans and comparisons:
import qualified Data.Char as Char
toCaps :: String -> String
toCaps = flip (foldr go (const [])) id
where go ' ' acc _ = acc Char.toUpper
go x acc f = f x:acc id
Or more understandably, but perhaps slightly less efficient:
import qualified Data.Char as Char
toCaps :: String -> String
toCaps = go id
where go _ [] = []
go _ (' ':xs) = go Char.toUpper xs
go f (x :xs) = f x:go id xs

There are a number of ways of doing it, but if I were trying to keep it as close to how you've set up your example, I might do something like:
import Data.Char (toUpper)
toCaps :: String -> String
toCaps [] = [] -- base case
toCaps (' ':c:cs) = toUpper c : toCaps cs -- throws out the space and capitalizes next letter
toCaps (c:cs) = c : toCaps cs -- anything else is left as is
This is just using basic recursion, dealing with a character (element of the list) at a time, but if you wanted to use higher-order functions such as map or filter that work on the entire list, then you would probably want to compose them (the way that Willem suggested is one way) and in that case you could probably do without using recursion at all.
It should be noted that this solution is brittle in the sense that it assumes the input string does not contain leading, trailing, or multiple consecutive spaces.

Inspired by Joseph Sible 's answer, a coroutines solution:
import Data.Char
toCamelCase :: String -> String
toCamelCase [] = []
toCamelCase (' ': xs) = toPascalCase xs
toCamelCase (x : xs) = x : toCamelCase xs
toPascalCase :: String -> String
toPascalCase [] = []
toPascalCase (' ': xs) = toPascalCase xs
toPascalCase (x : xs) = toUpper x : toCamelCase xs
Be careful to not start the input string with a space, or you'll get the first word capitalized as well.

Haskell - Exclude lists based on a test in a nested list comprehension

I want to create a series of possible equations based on a general specification:
test = ["12", "34=", "56=", "78"]
Each string (e.g. "12") represents a possible character at that location, in this case '1' or '2'.)
So possible equations from test would be "13=7" or "1=68".
I know the examples I give are not balanced but that's because I'm deliberately giving a simplified short string.
(I also know that I could use 'sequence' to search all possibilities but I want to be more intelligent so I need a different approach explained below.)
What I want is to try fixing each of the equals in turn and then removing all other equals in the equation. So I want:
[["12","=","56","78"],["12","34","=","78”]]
I've written this nested list comprehension:
(it needs: {-# LANGUAGE ParallelListComp #-} )
fixEquals :: [String] -> [[String]]
fixEquals re
= [
[
if index == outerIndex then equals else remain
| equals <- map (filter (== '=')) re
| remain <- map (filter (/= '=')) re
| index <- [1..]
]
| outerIndex <- [1..length re]
]
This produces:
[["","34","56","78"],["12","=","56","78"],["12","34","=","78"],["12","34","56","”]]
but I want to filter out any with empty lists within them. i.e. in this case, the first and last.
I can do:
countOfEmpty :: (Eq a) => [[a]] -> Int
countOfEmpty = length . filter (== [])
fixEqualsFiltered :: [String] -> [[String]]
fixEqualsFiltered re = filter (\x -> countOfEmpty x == 0) (fixEquals re)
so that "fixEqualsFiltered test" gives:
[["12","=","56","78"],["12","34","=","78”]]
which is what I want but it doesn’t seem elegant.
I can’t help thinking there’s another way to filter these out.
After all, it’s whenever "equals" is used in the if statement and is empty that we want to drop the equals so it seems a waste to build the list (e.g. ["","34","56","78”] and then ditch it.)
Any thoughts appreciated.

I don't know if this is any cleaner than your code, but it might be a bit more clear and maybe more efficient using a recursion:
fixEquals = init . f
f :: [String] -> [[String]]
f [] = [[]]
f (x:xs) | '=' `elem` x = ("=":removeEq xs) : map (removeEq [x] ++) (f xs)
| otherwise = map (x:) (f xs)
removeEq :: [String] -> [String]
removeEq = map (filter (/= '='))
The way it works is that, if there's an '=' in the current string, then it splits the return into two, if not just calls recursively. The init is needed as in the last element returned there's no equal in any string.
Finally, I believe you can probably find a better data structure to do what you need to achieve instead of using list of strings

Let
xs = [["","34","56","78"],["12","=","56","78"],["12","34","=","78"],["12","34","56",""]]
in
filter (not . any null) xs
will give
[["12","=","56","78"],["12","34","=","78"]]
If you want list comprehension then do
[x | x <- xs, and [not $ null y | y <- x]]

I think I'd probably do it this way. First, a preliminary that I've written so many times it's practically burned into my fingers by now:
zippers :: [a] -> [([a], a, [a])]
zippers = go [] where
go _ [] = []
go b (h:e) = (b,h,e):go (h:b) e
Probably running it once or twice in ghci will be a more clear explanation of what this does than any English writing I could do:
> zippers "abcd"
[("",'a',"bcd"),("a",'b',"cd"),("ba",'c',"d"),("cba",'d',"")]
In other words, it gives a way of selecting each element of a list in turn, giving the "leftovers" of what was before and after the selection point. Given that tool, here's our plan: we'll nondeterministically choose a String to serve as our equals sign, double-check that we've got an equals sign in the first place, and then clear out the equals from the others. So:
fixEquals ss = do
(prefix, s, suffix) <- zippers ss
guard ('=' `elem` s)
return (reverse (deleteEquals prefix) ++ ["="] ++ deleteEquals suffix)
deleteEquals = map (filter ('='/=))
Let's try it:
> fixEquals ["12", "34=", "56=", "78"]
[["12","=","56","78"],["12","34","=","78"]]
Perfect! But this is just a stepping-stone to actually generating the equations, right? It turns out to be not that hard to go all the way in one step, skipping this intermediate. Let's do that:
equations ss = do
(prefixes, s, suffixes) <- zippers ss
guard ('=' `elem` s)
prefix <- mapM (filter ('='/=)) (reverse prefixes)
suffix <- mapM (filter ('='/=)) suffixes
return (prefix ++ "=" ++ suffix)
And we can try it in ghci:
> equations ["12", "34=", "56=", "78"]
["1=57","1=58","1=67","1=68","2=57","2=58","2=67","2=68","13=7","13=8","14=7","14=8","23=7","23=8","24=7","24=8"]

The easiest waty to achieve what you want is to create all the combinations and to filter the ones that have a meaning:
Prelude> test = ["12", "34=", "56=", "78"]
Prelude> sequence test
["1357","1358","1367","1368","13=7","13=8","1457","1458","1467","1468","14=7","14=8","1=57","1=58","1=67","1=68","1==7","1==8","2357","2358","2367","2368","23=7","23=8","2457","2458","2467","2468","24=7","24=8"
Prelude> filter ((1==).length.filter('='==)) $ sequence test
["13=7","13=8","14=7","14=8","1=57","1=58","1=67","1=68","23=7","23=8","24=7","24=8","2=57","2=58","2=67","2=68"]
You pointed the drawback: imagine we have the followig list of strings: ["=", "=", "0123456789", "0123456789"]. We will generate 100 combinations and drop them all.
You can look at the combinations as a tree. For the ["12", "34"], you have:
/ \
1 2
/ \ / \
3 4 3 4
You can prune the tree: just ignore the subtrees when you have two = on the path.
Let's try to do it. First, a simple combinations function:
Prelude> :set +m
Prelude> let combinations :: [String] -> [String]
Prelude| combinations [] = [""]
Prelude| combinations (cs:ts) = [c:t | c<-cs, t<-combinations ts]
Prelude|
Prelude> combinations test
["1357","1358","1367","1368","13=7","13=8","1457","1458","1467","1468","14=7","14=8","1=57","1=58","1=67","1=68","1==7","1==8","2357","2358","2367","2368","23=7","23=8","2457","2458","2467","2468","24=7","24=8", ...]
Second, we need a variable to store the current number of = signs met:
if we find a second = sign, just drop the subtree
if we reach the end of a combination with no =, drop the combination
That is:
Prelude> let combinations' :: [String] -> Int -> [String]
Prelude| combinations' [] n= if n==1 then [""] else []
Prelude| combinations' (cs:ts) n = [c:t | c<-cs, let p = n+(fromEnum $ c=='='), p <= 1, t<-combinations' ts p]
Prelude|
Prelude> combinations' test 0
["13=7","13=8","14=7","14=8","1=57","1=58","1=67","1=68","23=7","23=8","24=7","24=8","2=57","2=58","2=67","2=68"]
We use p as the new number of = sign on the path: if p>1, drop the subtree.
If n is zero, we don't have any = sign in the path, drop the combination.
You may use the variable n to store more information, eg type of the last char (to avoid +* sequences).

Haskell : Filtering a list of strings

I have a list of Strings I want to filter through. My predicate is that the string should begin with an uppercase letter.
eg. when I run onlyLowercase ["boy", "girl", "Hi"]
it should give me a list of ["boy", "girl"]
I can do it using pattern matching and guards, but I'm using the learnyouahaskell (http://learnyouahaskell.com) book and I came across the topic on higher-order functions. I read about the filter function and thought it could achieve what I want to do in far fewer lines of code.
Using pattern Matching/Guards (This works well and solves my problem)
onlyLowercase :: [[Char]] -> [[Char]]
onlyLowercase [] = []
onlyLowercase (x:xs)
| isLower (head x) = x : onlyLowercase xs
| otherwise = onlyLowercase xs
Using the filter function
onlyLowercase2 :: [String] -> [String]
onlyLowercase2 [] = []
onlyLowercase2 (x:xs) = filter isLower x : onlyLowercase2 xs
Unfortunately, when I run onlyLowercase2 ["boy", "girl", "Hi"],
I get a list of ["boy", "girl", "i"].
I want to know if there's a way I can filter my list of strings using the first character in my string (without creating any auxiliary function that could check the String and return true if the first letter is lowercase).
I also tried using
onlyLowercase2 (x:xs) = filter (isLower head x) : onlyLowercase2 xs
but that didn't even compile. Basically, I'm just trying to figure out how the filter function can be used on a list of lists. Thank you, in advance, for any assistance rendered.

Thanks to Willem Van Onsem's suggestion to use a lambda expression as a filter function, I read further and came up with this 2 line solution.
onlyLowercase2 :: [String] -> [String]
onlyLowercase2 = filter (\st-> ("" /= st) && (isLower $ head st))
Not sure if it's perfect, but at least it's working.

Using Data.List and Data.Char:
import Data.List
import Data.Char
onlyLowerCase :: [String] -> [String]
onlyLowerCase = filter (all isLower)
I use the all function which checks that all elements of a list satisfy a predicate. In this case all isLower will return true if all letters in a String are lowercase. Then just filter the Strings that are all lowercase. The Haskell Report has a good reference for List and Char functions among other useful libraries.

How to "pack" some strings in a list on Haskell?

I want to write a function pack such that
pack ['a','a','a','b','c','c','a','a','d','e','e','e']
= ["aaa","b","cc","aa","d","eee"]
How can I do this? I'm stuck...

Use Data.List.group:
λ> import Data.List (group)
λ> :t group
group :: Eq a => [a] -> [[a]]
λ> group ['a','a','a','b','c','c','a','a','d','e','e','e']
["aaa","b","cc","aa","d","eee"]
Unless you want to write the function yourself (see Michael Foukarakis answer)

Here's something off the top of my head:
pack :: (Eq a) => [a] -> [[a]]
pack [] = []
-- We split elements of a list recursively into those which are equal to the first one,
-- and those that are not. Then do the same for the latter:
pack (x:xs) = let (first, rest) = span (==x) xs
in (x:first) : pack rest
Data.List already has what you're looking for, though.

I think it's worth adding a more explicit/beginner version:
pack :: [Char] -> [String]
pack [] = []
pack (c:cs) =
let (v, s) = findConsecutive [c] cs
in v : pack s
where
findConsecutive ds [] = (ds, [])
findConsecutive s#(d:ds) t#(e:es)
| d /= e = (s, t)
| otherwise = findConsecutive (e:s) es
If the input is an empty list, the outcome is also an empty list. Otherwise, we find the next consecutive Chars that are equal and group them together into a String, which is returned in the result list. In order to do that we use the findConsecutive auxiliary function. This function's behavior resembles the takeWhile function, with the difference that we know in advance the predicate to use (equality comparison) and that we return both the consumed and the remaining list.
In other words, the signature of findConsecutive could be written as:
findConsecutive :: String -> [Char] -> (String, String)
which means that it takes a string containing only repeated characters to be used as an accumulator and a list whose characters are "extracted" from. It returns a tuple containing the current sequence of elements and the remaining list. Its body should be intuitive to follow: while the characters list is not empty and the current element is equal to the ones in the accumulator, we add the character to the accumulator and recursive into the function. The function returns when we reach the end of the list or a different character is encountered.
The same rationale can be used to understand the body of pack.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Complex pattern matching with strings - string

Well, you can try this function, although this may not be a best method: elem' :: String -> String -> Bool elem' p xs = any (p==) $ map (take $ length p) $ tails xs Usage: filter (elem' "bb") ["xxbbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "bbbaab"] or bbFilter = filter (elem' "bb")

Related

Is implementing the words function possible without a postprocessing step after folding?

Capitalizing first letter of words while removing spaces (Haskell)

Haskell - Exclude lists based on a test in a nested list comprehension

Haskell : Filtering a list of strings

How to "pack" some strings in a list on Haskell?

Categories

Resources