I want to generate a list of all possible combinations of a string that contains optional parts. This is probably best explained with some examples:
A[B] → A and AB
A[B][C] → A, AB, AC and ABC
A[B[C]] → A, AB and ABC
I hope this sufficiently explains what I'm trying to do.
I could hack together my own little parser or "algorithm" for this, but I have a strong feeling that there's an existing (and easier) solution for this. Because I don't have any kind of CS education (yet), I have no idea what kind of algorithm I'm looking for or even just what search terms to use.
Is my hunch correct and is there actually an existing (well-documented) approach for this?
I haven't read the article but it seems that this question have been studied. There's an article page 117: "Enumeration of Formal Languages" http://www.eatcs.org/images/bulletin/beatcs89.pdf
You may find more on this subject by searching with the good keywords like "enumerate language for DFA"
The grammar you defined contains only 2 different entities: Terminals (a character) or an Optional part of the grammar.
In the Haskell code below, this is reflected by the definition of the discriminated union Grammar.
The first task is to convert a given concrete syntax (e.g. "A[B]") into a list of Grammar parts (each terminal or optional). In the code below, the function fromString does just that.
The interesting part, though is how to generate all possible strings for a given syntax.
In the code below, the function generate does this, recursively.
For an empty grammar list (which is the end of the recursion), the output is a single empty string.
If a terminal symbol is found in the grammar list at a given position, the respective character is pasted in front of all variations, generated from the remainder of the grammar list.
If an optional part is found in the grammar list, the list of the remainder of the grammar list is yielded twice; once with the optional part prepended and once without.
For the non-functional people reading this, it should be pointed out that fmap is a function which maps a list to another list (element wise).
The other function I used in the code below, non-haskellers might stumble over is concat, which turns a list of lists into a list.
data Grammar = Terminal Char | Optional [Grammar] deriving (Show,Eq)
fromString :: String -> [Grammar] -> ([Grammar],String)
fromString [] acc = (acc,"")
fromString ('[':cs) acc =
let (o,rest) = fromString cs [] in
fromString rest (acc ++ [Optional o])
fromString (']':cs) acc = (acc,cs)
fromString (c:cs) acc = fromString cs (acc ++ [Terminal c])
generate :: [Grammar] -> [String]
generate [] = [""]
generate ((Terminal c) : parts) = fmap (\s -> c : s) $ generate parts
generate ((Optional gs) : parts) = tails ++ (concat . fmap prependOpts $ tails)
where
tails = generate parts
opts = generate gs
prependOpts :: String -> [String]
prependOpts tail = fmap (\o -> o ++ tail) $ opts
Putting it all together in the REPL (interactive shell), running
fromString "A[B][C]" [], for example yields:
([Terminal 'A',Optional [Terminal 'B'],Optional [Terminal 'C']],"")
And if we run generate on the grammar list above (the first part of the tuple), we get all our strings:
generate (fst $ fromString "A[B][C]" [])
["A","AC","AB","ABC"]
Here's something in JavaScript, only tested for the three examples provided. Hopefully the (attempted) recurrence is clear from the code:
function f(string, index, combinations){
if (index == string.length)
return [combinations, index]
if (string[index] == "["){
let prefixes = []
let [suffixes, nextIndex] = f(string, index + 1, [""])
for (let combination of combinations)
for (let suffix of suffixes)
prefixes.push(combination + suffix)
if (nextIndex == string.length)
return [combinations.concat(prefixes), nextIndex]
else
return f(string, nextIndex, combinations.concat(prefixes))
} else if (string[index] == "]"){
return [combinations, index + 1]
} else {
for (let i=0; i<combinations.length; i++)
combinations[i] += string[index]
return f(string, index + 1, combinations)
}
}
strings = [
"A[B]",
"A[B][C]",
"A[B[C]]"
]
for (let string of strings)
console.log(JSON.stringify(f(string, 0, [""])[0]))
The algorithm could be like the following:
- Parse the string and make a rooted binary tree that on each node breaks on
the new bracket exists or not.
- You can go through the root to the leaf of the tree.
All paths from the root to leaves generate all combinations.
Also you can use a push-down automaton to parse the string for knowing where a bracket is opened and where it is closed. There are many implementations for the case that you can find.
Related
Doing the third of the 99-Haskell problems (I am currently trying to learn the language) I tried to incorporate pattern matching as well as recursion into my function which now looks like this:
myElementAt :: [a] -> Int -> a
myElementAt (x ++ xs) i =
if length (x ++ xs) == i && length xs == 1 then xs!!0
else myElementAt x i
Which gives me Parse error in pattern: x ++ xs. The questions:
Why does this give me a parse error? Is it because Haskell is no idea where to cut my list (Which is my best guess)?
How could I reframe my function so that it works? The algorithmic idea is to check wether the list has the length as the specified inde; if yes return the last elemen; if not cut away one element at the end of the list and then do the recursion.
Note: I know that this is a really bad algorithm, but it I've set myself the challenge to write that function including recursion and pattern matching. I also tried not to use the !! operator, but that is fine for me since the only thing it really does (or should do if it compiled) is to convert a one-element list into that element.
Haskell has two different kinds of value-level entities: variables (this also includes functions, infix operators like ++ etc.) and constructors. Both can be used in expressions, but only constructors can also be used in patterns.
In either case, it's easy to tell whether you're dealing with a variable or constructor: a constructor always starts with an uppercase letter (e.g. Nothing, True or StateT) or, if it's an infix, with a colon (:, :+). Everything else is a variable. Fundamentally, the difference is that a constructor is always a unique, immediately matcheable value from a predefined collection (namely, the alternatives of a data definition), whereas a variable can just have any value, and often it's in principle not possible to uniquely distinguish different variables, in particular if they have a function type.
Yours is actually a good example for this: for the pattern match x ++ xs to make sense, there would have to be one unique way in which the input list could be written in the form x ++ xs. Well, but for, say [0,1,2,3], there are multiple different ways in which this can be done:
[] ++[0,1,2,3]
[0] ++ [1,2,3]
[0,1] ++ [2,3]
[0,1,2] ++ [3]
[0,1,2,3]++ []
Which one should the runtime choose?
Presumably, you're trying to match the head and tail part of a list. Let's step through it:
myElementAt (x:_) 0 = x
This means that if the head is x, the tail is something, and the index is 0, return the head. Note that your x ++ x is a concatenation of two lists, not the head and tail parts.
Then you can have
myElementAt(_:tl) i = myElementAt tl (i - 1)
which means that if the previous pattern was not matched, ignore the head, and take the i - 1 element of the tail.
In patterns, you can only use constructors like : and []. The append operator (++) is a non-constructor function.
So, try something like:
myElementAt :: [a] -> Int -> a
myElementAt (x:xs) i = ...
There are more issues in your code, but at least this fixes your first problem.
in standard Haskell pattern matches like this :
f :: Int -> Int
f (g n 1) = n
g :: Int -> Int -> Int
g a b = a+b
Are illegal because function calls aren't allowed in patterns, your case is just a special case as the operator ++ is just a function.
To pattern match on lists you can do it like this:
myElementAt :: [a] -> Int -> a
myElementAt (x:xs) i = // result
But in this case x is of type a not [a] , it is the head of the list and xs is its tail, you'll need to change your function implementation to accommodate this fact, also this function will fail with the empty list []. However that's the idiomatic haskell way to pattern match aginst lists.
I should mention that when I said "illegal" I meant in standard Haskell, there are GHC extensions that give something similar to that , it's called ViewPatterns But I don't think you need it especially that you're still learning.
I am trying to do divide an array of strings into sub arrays. I am trying like the following;
content <- readFile "/tmp/foo.txt"
let all_paragraphs = lines content
let number = elemIndex "I THE LAY OF THE LAND" all_paragraphs
let number2 = elemIndex "IV THE MOST INTELLIGENT ANIMALS" all_paragraphs
Is it possible to parse the content to an array like;
let new_array = all_paragraphs[number,number2] or let new_Array = all_paragraphs(number:number2) a code similar to that?
You're probably talking about lists, not arrays, since there are no arrays in Haskell's Prelude and lines returns [String], i.e. a list of Strings.
So you want to get the sublist from index n to index m of a list? You can do that with a combination of drop and take.
However, this is not idiomatic functional programming where explicitly dealing with indices is discouraged, since it's error prone (e.g. off-by-one errors) and there are better ways. So it seems you want to get all the lines between the line I THE LAY OF THE LAND and the line IV THE MOST INTELLIGENT ANIMALS. You'd do that in idiomatic Haskell with:
main :: IO ()
main = do
content <- readFile "/tmp/foo.txt"
let ls = excerpt $ lines content
-- the dollar just rearranges precedence, so this is the same as:
-- ... = excerpt (lines content)
print ls
-- do as little as possible in monads (the thing with `do ... let <- ...` etc)
-- rather define pure functions like this one and use them above...
excerpt :: [String] -> [String]
excerpt xs = takeWhile (/= "IV THE MOST INTELLIGENT ANIMALS")
$ dropWhile (/= "I THE LAY OF THE LAND") xs
-- the excerpt function could alternatively also be written as
-- the composition of `takeWhile x` and `dropWhile y`
excerpt :: [String] -> [String]
excerpt = takeWhile (/= "IV THE MOST INTELLIGENT ANIMALS")
. dropWhile (/= "I THE LAY OF THE LAND")
But you really should read up on how Haskell (and functional programming in general) take a different approach to solving problems than imperative languages. Maybe Try Haskell is more to your liking, and if you wonder what a function does (or are looking for one), Hoogle is indispensable.
Like #mb21 indicates but does not state outright, what you are looking for is a simple sequence of drop and take.
For example, defining in your program
slice list lo hi = drop lo (take hi list)
will let you do slices of the form that you're used to in JavaScript and Python, and you could also define it with an infix operator and a pair, if you wanted. Let's say we want to use .! for a slice operator, and handle negative arguments like in JS and Python; we then define at the top level:
list .! (lo, hi) = drop l $ take h list
where
len = length list
handle_negative x | x < 0 = x + len | otherwise = x
h = handle_negative hi
l = handle_negative lo
This should work on any finite list; negative indices will in general screw up potentially infinite lists: though this problem is removable for the second index it is essential for the first one. So for example to do the slice [0..] .! (1, -2) which should be equivalent to [1..] you would in general need to keep list and drop (-last_index) list which you travel down together, emitting elements of the first while the second is not [].
I'm pretty brand new to Haskell (only written a fizzbuzz program before the current one) and am trying to write a program that takes the unix wordlist ('/usr/share/dict/words') and prints out the list of anagrams for that word, with any direct palindromes starred. I have the meat of this summed up into one function:
findAnagrams :: [String] -> [(String, [String])]
findAnagrams d =
[x | x <- map (\s -> (s, [if reverse s == t then t ++ "*" else t | t <- d, s /= t && null (t \\ s)])) d, not (null (snd x))]
However, when I run the program I get this output:
abase: babes, bases
abased: debase
abasement: basements
abasements: abatements
abases: basses
And so on, so clearly it isn't working properly. My intention is for the list comprehension to read as follows: for all t in d such that t is not equal to s and there is no difference between t and s other than order, if t is the reverse of s include as t*, otherwise include as t. The problem seems to be with the "no difference between t and s other than order" part, which I'm trying to accomplish by using "null (t \ s)". It seems like it should work. Testing in GHCI gives:
Prelude Data.List> null ("abatements" \\ "abasements")
False
And yet it passes the predicate test. My assumption is that I'm missing something simple here, but I've looked at it a while and can't quite come up with it.
In addition, any notes regarding best practice would be greatly appreciated.
If you break it out into multiple functions (remember, source code size is not really that important), you could do something like:
import Data.List
isPalindrome :: String -> Bool
isPalindrome s = s == reverse s
flagPalins :: [String] -> [String]
flagPalins [] = []
flagPalins (x:xs)
| isPalindrome x = x ++ "*"
| otherwise = x
isAnagram :: String -> String -> Bool
isAnagram s t = (isPalindrome s || s /= t) && ??? -- test for anagram
findAnagrams :: String -> [String] -> [String]
findAnagrams s ws = flagPalins $ filter (isAnagram s) ws
findAllAnagrams :: [String] -> [(String, [String])]
findAllAnagrams ws = filter (not . null . snd) ??? -- words paired with their anagrams
I've intentionally left some holes for you to fill in, I'm not going to give you all the answers ;)
There are only two spots for you to do yourself. The one in findAllAnagrams should be pretty easy to figure out, you're already doing something pretty similar with your map (\s -> ...) part. I intentionally structured isAnagram so it'll return True if it's a palindrome or if it's just an anagram, and you only need one more check to determine if t is an anagram of s. Look at the comment I made on your question for a hint about what to do there. If you get stuck, comment and ask for an additional hint, I'll give you the name of the function I think you should use to solve this problem.
If you really want to make a list comprehension, I would recommend solving it this way, then converting back to a comprehension. In general you should write more verbose code, then compress it once you understand it fully.
Think of a \\ b as "items in a that are not in b."
Consider the implications.
I am trying to learn some Haskell and I find it difficult. I am having some issues with my
current project. The idea is that I have to go through a String and substitute certain chars
with new substrings. For instance if I have a String "FLXF" and I want to replace every F
with a substring called "FLF" the result should be "FLFLXFLF". Now I have been working on this
specific problem for hours. I have been reading up on types, different functions that might come in handy (map, fold, etc) and yet I have not been able to solve this problem.
The code below is some of the different tries I have had:
apply :: String -> String
apply [] = []
apply (x:xs) = if (x == 'F')
then do show "Hello"
apply xs
else (apply (xs))
This example here I was just trying to show hello every time I encountered a 'F', but all it shows is "", so this clearly does not work. I am really not sure an if else statement is the way to go here. I was also thinking the function map might do the trick. Here the code I was thinking about could look something like this:
map (\x y -> if y == 'F' then "FLD" else y) "FLF"
but that gives me a type error. So as you can see I am lost. Excuse me my poor knowledge to Haskell, but I am still new to it. I really hope some of you can help me out here or give me a push in the right direction. Feel free to ask questions if I have been unclear about something.
Thank you in advance!
John
map (\x y -> if y == 'F' then "FLD" else y) "FLF"
This is nearly right.
First... why does the function take two arguments?
map (\y -> if y == 'F' then "FLD" else y) "FLF"
The remaining type error is because the then branch gives a String, but the else branch gives a Char (the two branches must each give a value of the same type). So we'll make the else branch give a String instead (recall that String is a synonym for [Char]):
map (\y -> if y == 'F' then "FLD" else [y]) "FLF"
Now the problem is that this gives you a [String] value instead of a String. So we'll concatenate all those strings together:
concat (map (\y -> if y == 'F' then "FLD" else [y]) "FLF")
This combination of concat and map is common enough that there's a standard function that combines them.
concatMap (\y -> if y == 'F' then "FLD" else [y]) "FLF"
concatMap is the most intuitive thing here. This kind of combination between mapping over a data structure a function that does itself return the type of the data structure (in this case, a list) and combining the results back into a single "tight" list is indeed very common in Haskell, and indeed not only for lists.
I'd like to explain why your first attempt compiles at all, and what it actually does – because it's completely different from what you probably think!
apply (x:xs) = if (x == 'F')
that line is still perfectly clear: you just take the first char off the string and compare it to 'F'. At bit "pedestrian" to manually take the string apart, but fine. Well, the name you gave the function is not particularly great, but I'll stick with it here.
then do show "Hello"
now this is interesting. You probably think do starts a list of points, "first do this, then do that"... like in simple Hello, World-ish example programs. But always remember: in Haskell, there's normally no such thing as an order in which stuff is calculated. That only happens in the IO context. But there's no IO in your code!?!
Not sure if you've heard about what IO actually is, anyway here you go: it's a Monad. Those "mythical Haskell constructs you've only read about in story books"...
Indeed, though this might lead a bit far here, this question covers all there is to know about Monads! How is that?
Here's another (correct!) way do define your function.
apply' str = do
x <- str
if (x == 'F')
then "FLF"
else return x
So I'm using this weird do syntax, and it's not in IO, and it looks completely different from what you'd write in IO, but it works. How?
x <- str
In do notation, variable <- action always means something like "take one value out of this monadic thingy, and call it x". What you've probably seen is something like
response <- getLine
which means "take a user input out of the real world (out of the IO monad!) and call it response". In x <- str, it's a string that we have, not an IO action. So we take a character out of a string – nice and easy!
Actually, it's not quite right, though. "take a character" is what you do with apply (x:xs) = ..., which simply takes the first one. In contrast, x <- str actually takes all possible characters out of the string, one by one. If you're used to procedural languages, this may seem very inconsistent with response <- getLine, but in fact it's not: getLine also consists of every possible input that the user might give, and the program has to act according to this.
if (x == 'F')
nothing unexpected here, but
then "FLF"
whoah! Just like that? Let's first look at the next line
else return x
ok, this looks familiar, but actually it's not. In other languages, this would mean "we're done with our function, x is the result". But that's obviously not what happens here, because x is Char, and the "return type" of apply' is String. In Haskell, return actually has little to do with returning values from a function, instead it means "put that value into the monadic context that we're working in". If the monad were IO, that would be quite the same: give this value back to the real-world context (this does not mean to print the value or something, just to hand it on). But here, our context is a string, or rather a list (of chars, so it is a String).
Right, so if x is not 'F' we put it back into the string. That sounds reasonable enough, but what about then "FLF"? Note that I can also write it this way:
if (x == 'F')
then do
x' <- "FLF"
return x'
else return x
which means, I take all characters out of "FLW" and return them back into the overall result. But there's no need to only think about the final result, we can as well isolate only this part do { x' <- "FLF"; return x' } – and, quite obviously, its value is nothing but the string "FLF" itself!
So I hope you have now grasped why apply' works. Back to your version, though it actually doesn't make much sense...
then do
show "Hello"
apply xs
here we have a line that's not at the end of a do block, but doesn't have a <- in it. You normally see this in IO in something like
main = do
putStrLn "How ya doin'?"
response <- getLine
...
Remember that "output-only" actions have type IO() in Haskell, which means, they don't directly return any meaningful value, just the trivial value (). So you don't really care about this, but you could still evaluate it:
main = do
trivial <- putStrLn "Hello, let's see what this IO action returns:"
print trivial
compiles and outputs
Hello, let's see what this IO action returns:()
It would be stupid if we had to do this evaluating () all the time, so Haskell allows to just leave the () <- out. It's really just that!
So a line like show "Hello" in the middle of a do block basically means "take one character out of show "Hello" (which is simply a string with the value "\"Hello\""), but don't do anything else with this character / just throw it away".
The rest of your definition is just other recursive calls to apply, but because none of them does anything more interesting than throwing away characters, you eventually end up at apply [] = [], so that's the final result: an empty string.
if-then-else... I know that Haskell supports these, however, I'm very surprised that no one here removed them...
So below are my solutions for different cases of making replacements.
Replacing a character
Replacing words
Replacing through a function on each word
$ cat replace.hs
import Data.List (isPrefixOf)
replaceC :: Char -> Char -> String -> String
replaceC _ _ [] = []
replaceC a b (x:xs)
| x == a = b:replaceC a b xs
| otherwise = x:replaceC a b xs
replaceW :: String -> String -> String -> String
replaceW a b s = unwords . map replaceW' $ words s
where replaceW' x | x == a = b
| otherwise = x
replaceF :: (String -> String) -> String -> String
replaceF f = unwords . map f . words
string = "Hello world ^fg(blue)"
main = do
print string
print $ replaceC 'o' 'z' string
print $ replaceW "world" "kitty" string
print . replaceF f . replaceW "world" "kitty" $ replaceC 'H' 'Y' string
where f s | "^" `isPrefixOf` s = '^':'^':drop 1 s
| otherwise = s
$ runhaskell replace.hs
"Hello world ^fg(blue)"
"Hellz wzrld ^fg(blue)"
"Hello kitty ^fg(blue)"
"Yello kitty ^^fg(blue)"
Your basic error was that you wanted to replace a Char in a String with a String.
This is impossible because String is a list of Char and a Char is a Char and not a short String. Neither is a String ever a Char, even if its length is 1.
Hence, what you really wanted is to replace some Char with some other Chars. Your approach was promising and could have been completed like so:
replace [] = [] -- nothing to replace in an empty string
replace (c:cs) = if c == 'F' then 'F':'L':'F':replace cs
else c:replace cs
I have a list of strings that looks like this:
xs = ["xabbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "ibbatb"]
I would like to find only strings in the list which have and vocel followed by two b's followed by any character followed by a vowel. How are simple matches like this done in Haskell. Is there a better solution that regular expressions? Can anyone help me with an example? Thanks.
You could just use the classic filter function in conjunction with any regexp library. Your pattern is simple enough that this would work with any regexp library :
filter (=~ "bb.[aeiuy]") xs
The confusing part of regexps in Haskell is that there is a very powerful generic API (in regex-base) to use them in the same way for all the specific libraries and the multiple result type you could wish for (Bool, String, Int...). For basic usages it should mostly work as you mean (tm). For your specific need, regex-posix should be sufficient (and come with the haskell platform so no need to install it normally). So don't forget to import it :
import Text.Regex.Posix
This tutorial should show you the basics of the regex API if you have other needs, it is a bit out-dated now but the fundamentals remains the same, only details of regex-base have changed.
One approach would be to build a small pattern-matching language and to embed it in Haskell.
In your example, a pattern is basically a list of character specifications. Let's define a type of abstract characters the values of which will serve as such specifications,
data AbsChar = Exactly Char | Vowel | Any
together with an "interpreter" that tells us whether a character matches a specification:
(=?) :: AbsChar -> Char -> Bool
Exactly c' =? c = c == c'
Vowel =? c = c `elem` "aeiou"
Any =? c = True
For example, Vowel =? 'x' will produce False, while Vowel =? 'a' will produce True.
Then, indeed, a pattern is just a list of abstract characters:
type Pattern = [AbsChar]
Next, we write a function that tests whether the prefix of a string matches a given pattern:
matchesPrefix :: Pattern -> String -> Bool
matchesPrefix [] _ = True
matchesPrefix (a : as) (c : cs) = a =? c && matchesPrefix as cs
matchesPrefix _ _ = False
For example:
> matchesPrefix [Vowel, Exactly 'v'] "eva"
True
> matchesPrefix [Vowel, Exactly 'v'] "era"
False
As we do not want to restrict ourselves to matching prefixes, but rather match anywhere within a word, our next function matches the prefixes of every end segment of a string:
containsMatch :: Pattern -> String -> Bool
containsMatch pat = any (matchesPrefix pat) . tails
It uses the function tails which can be found in the module Data.List, but which we can, to make this explanation self-contained, easily define ourselves as well:
tails :: [a] -> [[a]]
tails [] = [[]]
tails l#(_ : xs) = l : tails xs
For example:
> tails "xabbaua"
["xabbaua","abbaua","bbaua","baua","aua","ua","a",""]
Now, finally, the function you were looking for, that selects all strings from a list that contain a matching segment, is written simply as:
select :: Pattern -> [String] -> [String]
select = filter . containsMatch
Let's test it on your example:
> let pat = [Vowel, Exactly 'b', Exactly 'b', Any, Vowel]
> select pat ["xabbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "ibbatb"]
["xabbaua"]
Well, you can try this function, although this may not be a best method:
elem' :: String -> String -> Bool
elem' p xs = any (p==) $ map (take $ length p) $ tails xs
Usage:
filter (elem' "bb") ["xxbbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "bbbaab"]
or
bbFilter = filter (elem' "bb")
Well if you're absolutely opposed to doing it with Regexs you could do it with just pattern matching and recursion, although it is ugly.
xs = ["xabbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "ibbatb"]
vowel = "aeiou"
filter' strs = filter matches strs
matches [] = False
matches str#(x:'b':'b':_:y:xs)
| x `elem` vowel && y `elem` vowel = True
| otherwise = matches $ tail str
matches (x:xs) = matches xs
Calling filter' xs will return ["xabbaua"] which I believe is the required result.