Searching through a String

Searching through a String - haskell

I found a good example in a book that I'm trying to tackle. I'm trying to write a function called, "pointer" with the signature as, pointer :: String -> Int. It is going to take text with "pointers" that look like, [Int], and then return the total number of pointers found.
The text that the pointer function will examine will look like:
txt :: String
txt = "[1] and [2] are friends who grew up together who " ++
"went to the same school and got the same degrees." ++
"They eventually opened up a store named [2] which was pretty successful."
In the command line, we will run the code as follows:
> pointer txt
3
The 3 signifies the number of pointers that were found.
WHAT I UNDERSTAND:
I get that "words" will break down a string into a list with words.
Example:
words "where are all of these apples?"
["where","are","all","of","these","apples?"]
I get that "filter" will choose a specific element(s) in a list.
Example:
filter (>3) [1,5,6,4,3]
[5,6,4]
I get that "length" will return the length of a list
WHAT I THINK I NEED TO DO:
Step 1) look at txt and then break it down into single words until you have a long list of words.
Step 2) use filter to examine the list for [1] or [2]. Once found, filter will place these pointers into an list.
Step 3) call the length function on the resulting list.
PROBLEM BEING FACED:
I'm having a tough time trying to take everything I know and implementing it.

Here is a hypothetical ghci session:
ghci> words txt
[ "[1]", "and", "[2]", "are", "friends", "who", ...]
ghci> filter (\w -> w == "[1]" || w == "[2]") (words txt)
[ "[1]", "[2]", "[2]" ]
ghci> length ( filter (\w -> w == "[1]" || w == "[2]") (words txt) )
3
You can make the last expression more readable using the $ operator:
length $ filter (\w -> w == "[1]" || w == "[2]") $ words txt

If you want to be able to find all patterns of type [Int] in a string – such as [3], [465], etc. and not only [1] and [2] the easiest would be to use regular expression:
{-# LANGUAGE NoOverloadedStrings #-}
import Text.Regex.Posix
txt :: String
txt = "[1] and [2] are friends who grew up together who " ++
"went to the same school and got the same degrees." ++
"They eventually opened up a store named [2] which was pretty successful."
pointer :: String -> Int
pointer source = source =~ "\\[[0-9]{1,}\\]"
We can now run:
pointer txt
> 3

This works for single digit "pointers":
pointer :: String -> Int
pointer ('[':_:']':xs) = 1 + pointer xs
pointer (_: xs) = pointer xs
pointer _ = 0
This is better handled with parser combinators like those provided by ie. Parsec, but this might be overkill.

Related

Is implementing the words function possible without a postprocessing step after folding?

Real World Haskell, chapter 4, page 98 of the printed version asks if words can be implemented using folds, and this is my question too:
Is it possible? If not, why? If it is, how?
I came up with the following, which is based on the idea that each non-space should be prepended to the last word in the output list (this happens in the otherwise guard), and that a space should trigger the appending of an emtpy word to the output list if there is not one already (this is handled in the if-then-else).
myWords :: String -> [String]
myWords = foldr step [[]]
where
step x yss#(y:ys)
| x == ' ' = if y == "" then yss else "":yss
| otherwise = (x:y):ys
Clearly this solution is wrong, since leading spaces in the input string result in one leading empty string in the output list of strings.
At the link above, I've looked into several of the proposed solutions for other readers, and many of them work similarly to my solution, but they generally "post-process" the output of the fold, for instance by tailing it if there is an empty leading string.
Other approaches use tuples (actually just pairs), so that the fold deals with the pair and can well handle the leading/trailing spaces.
In all these approaches, foldr (or another fold, fwiw) is not the function that provides the final output out of the box; there's always something else with has to adjust the output somehow.
Therefore I go back to the initial question and ask if it is actually possible to implement words (in a way that it correctly handles trailing/leading/repeated spaces) using folds. By using folds I mean that the folding function has to be the outermost function:
myWords :: String -> [String]
myWords input = foldr step seed input

If I understand correctly, your requirements include
(1) words "a b c" == words " a b c" == ["a", "b", "c"]
(2) words "xa b c" == ["xa", "b", "c"] /= ["x", "a", "b", "c"] == words "x a b c"
This implies that we can not have
words = foldr step base
for any step and base.
Indeed, if we had that, then
words "xa b c"
= def words and foldr
step 'x' (words "a b c")
= (1)
step 'x' (words " a b c")
= def words and foldr
words "x a b c"
and this contradicts (2).
You definitely need some post-processing after the foldr.

#chi has a wonderful argument that you cannot implement words using "a" fold, but you did say using folds.
words = filterNull . words1
where
filterNull = foldr (\xs -> if null xs then id else (xs:)) []
words1 = foldr (\c -> if c == ' ' then ([]:) else consHead c) []
consHead c [] = [[c]]
consHead c (xs:xss) = (c:xs):xss
Both the outermost and innermost function are folds. ;-)

Yes. Eventhough it's a little tricky you may still do this job properly by using a single foldr and nothing else if you dwell into CPS (Continuation Passing Style). I had shown a special kind of chunksOf function previously.
In this kinds of folds our accumulator, hence the result of the fold is a function and we have to apply it to an identity kind of input so that we have the final result. So this may count as a final processing stage or not since we are using a single fold here and the type of it includes the function. Open to debate :)
ws :: String -> [String]
ws str = foldr go sf str $ ""
where
sf :: String -> [String]
sf s = if s == " " then [""] else [s]
go :: Char -> (String -> [String]) -> (String -> [String])
go c f = \pc -> let (s:ss) = f [c]
in case pc of
"" -> dropWhile (== "") (s:ss)
otherwise -> case (pc == " ", s == "") of
(True, False) -> "":s:ss
(True, True) -> s:ss
otherwise -> (pc++s):ss
λ> ws " a b c "
["a","b","c"]
sf : The initial function value to start with.
go : The iterator function
We are actually not fully utilizing the power of the CPS here since we have both the previous character pc and the currect character c at hand in every turn. It was very useful in the chunksOf function mentioned above while chunking a [Int] into [[Int]] every time an ascending sequence of elements were broken.

Calculating all possibilities for a string with optional parts

I want to generate a list of all possible combinations of a string that contains optional parts. This is probably best explained with some examples:
A[B] → A and AB
A[B][C] → A, AB, AC and ABC
A[B[C]] → A, AB and ABC
I hope this sufficiently explains what I'm trying to do.
I could hack together my own little parser or "algorithm" for this, but I have a strong feeling that there's an existing (and easier) solution for this. Because I don't have any kind of CS education (yet), I have no idea what kind of algorithm I'm looking for or even just what search terms to use.
Is my hunch correct and is there actually an existing (well-documented) approach for this?

I haven't read the article but it seems that this question have been studied. There's an article page 117: "Enumeration of Formal Languages" http://www.eatcs.org/images/bulletin/beatcs89.pdf
You may find more on this subject by searching with the good keywords like "enumerate language for DFA"

The grammar you defined contains only 2 different entities: Terminals (a character) or an Optional part of the grammar.
In the Haskell code below, this is reflected by the definition of the discriminated union Grammar.
The first task is to convert a given concrete syntax (e.g. "A[B]") into a list of Grammar parts (each terminal or optional). In the code below, the function fromString does just that.
The interesting part, though is how to generate all possible strings for a given syntax.
In the code below, the function generate does this, recursively.
For an empty grammar list (which is the end of the recursion), the output is a single empty string.
If a terminal symbol is found in the grammar list at a given position, the respective character is pasted in front of all variations, generated from the remainder of the grammar list.
If an optional part is found in the grammar list, the list of the remainder of the grammar list is yielded twice; once with the optional part prepended and once without.
For the non-functional people reading this, it should be pointed out that fmap is a function which maps a list to another list (element wise).
The other function I used in the code below, non-haskellers might stumble over is concat, which turns a list of lists into a list.
data Grammar = Terminal Char | Optional [Grammar] deriving (Show,Eq)
fromString :: String -> [Grammar] -> ([Grammar],String)
fromString [] acc = (acc,"")
fromString ('[':cs) acc =
let (o,rest) = fromString cs [] in
fromString rest (acc ++ [Optional o])
fromString (']':cs) acc = (acc,cs)
fromString (c:cs) acc = fromString cs (acc ++ [Terminal c])
generate :: [Grammar] -> [String]
generate [] = [""]
generate ((Terminal c) : parts) = fmap (\s -> c : s) $ generate parts
generate ((Optional gs) : parts) = tails ++ (concat . fmap prependOpts $ tails)
where
tails = generate parts
opts = generate gs
prependOpts :: String -> [String]
prependOpts tail = fmap (\o -> o ++ tail) $ opts
Putting it all together in the REPL (interactive shell), running
fromString "A[B][C]" [], for example yields:
([Terminal 'A',Optional [Terminal 'B'],Optional [Terminal 'C']],"")
And if we run generate on the grammar list above (the first part of the tuple), we get all our strings:
generate (fst $ fromString "A[B][C]" [])
["A","AC","AB","ABC"]

Here's something in JavaScript, only tested for the three examples provided. Hopefully the (attempted) recurrence is clear from the code:
function f(string, index, combinations){
if (index == string.length)
return [combinations, index]
if (string[index] == "["){
let prefixes = []
let [suffixes, nextIndex] = f(string, index + 1, [""])
for (let combination of combinations)
for (let suffix of suffixes)
prefixes.push(combination + suffix)
if (nextIndex == string.length)
return [combinations.concat(prefixes), nextIndex]
else
return f(string, nextIndex, combinations.concat(prefixes))
} else if (string[index] == "]"){
return [combinations, index + 1]
} else {
for (let i=0; i<combinations.length; i++)
combinations[i] += string[index]
return f(string, index + 1, combinations)
}
}
strings = [
"A[B]",
"A[B][C]",
"A[B[C]]"
]
for (let string of strings)
console.log(JSON.stringify(f(string, 0, [""])[0]))

The algorithm could be like the following:
- Parse the string and make a rooted binary tree that on each node breaks on
the new bracket exists or not.
- You can go through the root to the leaf of the tree.
All paths from the root to leaves generate all combinations.
Also you can use a push-down automaton to parse the string for knowing where a bracket is opened and where it is closed. There are many implementations for the case that you can find.

No instance for Foldable arising from length inside lambda

first question here and completely a noob on haskell, so please be kind with me :)
I was playing with the question number 6 of this haskell exercises
and in the end came to the solution (or something similar I hope) with this code
combinations gr lis = filter clean $ sequence $ replicate gr lis
where
clean string
| total > gr = False
| otherwise = True
where total = sum [ rpt c string | c <- string]
rpt chr list = length $ filter (== chr) list
the part that i like to be highlighted is the function 'rpt' which counts the number of times a character is repeated in a string, for example:
"aaba" -> [3313] (the 3 comes from the letter a, which repeates 3 times)
"aaccva" -> [332213]
later on I tried to make the function with a lambda and a map resulting in this:
rpt chr list = map (\chr -> length $ filter (== chr)) list
and at first ghci told me to use FlexibleContext to allow this, but if I do then it yields:
<interactive>:7:1:
No instance for (Foldable ((->) [Char]))
arising from a use of ‘rpt’
In the expression: rpt 'a' string
In an equation for ‘it’: it = rpt 'a' string
and here I'am stuck, I have not been able to understand what's happening... what is needed to fix this function?

You likely are intending to filter over list, so to make your code work, you need to also add list as an argument of filter:
rpt chr list = map (\chr -> length $ filter (== chr) list) list
For beginners, I recommend ignoring GHCi's suggestion of FlexibleContexts. It often ends up producing error messages like the one you had (or other confusing ones like No instance for (Num (Int -> Bool))).

Iterating through a String and replacing single chars with substrings in haskell

I am trying to learn some Haskell and I find it difficult. I am having some issues with my
current project. The idea is that I have to go through a String and substitute certain chars
with new substrings. For instance if I have a String "FLXF" and I want to replace every F
with a substring called "FLF" the result should be "FLFLXFLF". Now I have been working on this
specific problem for hours. I have been reading up on types, different functions that might come in handy (map, fold, etc) and yet I have not been able to solve this problem.
The code below is some of the different tries I have had:
apply :: String -> String
apply [] = []
apply (x:xs) = if (x == 'F')
then do show "Hello"
apply xs
else (apply (xs))
This example here I was just trying to show hello every time I encountered a 'F', but all it shows is "", so this clearly does not work. I am really not sure an if else statement is the way to go here. I was also thinking the function map might do the trick. Here the code I was thinking about could look something like this:
map (\x y -> if y == 'F' then "FLD" else y) "FLF"
but that gives me a type error. So as you can see I am lost. Excuse me my poor knowledge to Haskell, but I am still new to it. I really hope some of you can help me out here or give me a push in the right direction. Feel free to ask questions if I have been unclear about something.
Thank you in advance!
John

map (\x y -> if y == 'F' then "FLD" else y) "FLF"
This is nearly right.
First... why does the function take two arguments?
map (\y -> if y == 'F' then "FLD" else y) "FLF"
The remaining type error is because the then branch gives a String, but the else branch gives a Char (the two branches must each give a value of the same type). So we'll make the else branch give a String instead (recall that String is a synonym for [Char]):
map (\y -> if y == 'F' then "FLD" else [y]) "FLF"
Now the problem is that this gives you a [String] value instead of a String. So we'll concatenate all those strings together:
concat (map (\y -> if y == 'F' then "FLD" else [y]) "FLF")
This combination of concat and map is common enough that there's a standard function that combines them.
concatMap (\y -> if y == 'F' then "FLD" else [y]) "FLF"

concatMap is the most intuitive thing here. This kind of combination between mapping over a data structure a function that does itself return the type of the data structure (in this case, a list) and combining the results back into a single "tight" list is indeed very common in Haskell, and indeed not only for lists.
I'd like to explain why your first attempt compiles at all, and what it actually does – because it's completely different from what you probably think!
apply (x:xs) = if (x == 'F')
that line is still perfectly clear: you just take the first char off the string and compare it to 'F'. At bit "pedestrian" to manually take the string apart, but fine. Well, the name you gave the function is not particularly great, but I'll stick with it here.
then do show "Hello"
now this is interesting. You probably think do starts a list of points, "first do this, then do that"... like in simple Hello, World-ish example programs. But always remember: in Haskell, there's normally no such thing as an order in which stuff is calculated. That only happens in the IO context. But there's no IO in your code!?!
Not sure if you've heard about what IO actually is, anyway here you go: it's a Monad. Those "mythical Haskell constructs you've only read about in story books"...
Indeed, though this might lead a bit far here, this question covers all there is to know about Monads! How is that?
Here's another (correct!) way do define your function.
apply' str = do
x <- str
if (x == 'F')
then "FLF"
else return x
So I'm using this weird do syntax, and it's not in IO, and it looks completely different from what you'd write in IO, but it works. How?
x <- str
In do notation, variable <- action always means something like "take one value out of this monadic thingy, and call it x". What you've probably seen is something like
response <- getLine
which means "take a user input out of the real world (out of the IO monad!) and call it response". In x <- str, it's a string that we have, not an IO action. So we take a character out of a string – nice and easy!
Actually, it's not quite right, though. "take a character" is what you do with apply (x:xs) = ..., which simply takes the first one. In contrast, x <- str actually takes all possible characters out of the string, one by one. If you're used to procedural languages, this may seem very inconsistent with response <- getLine, but in fact it's not: getLine also consists of every possible input that the user might give, and the program has to act according to this.
if (x == 'F')
nothing unexpected here, but
then "FLF"
whoah! Just like that? Let's first look at the next line
else return x
ok, this looks familiar, but actually it's not. In other languages, this would mean "we're done with our function, x is the result". But that's obviously not what happens here, because x is Char, and the "return type" of apply' is String. In Haskell, return actually has little to do with returning values from a function, instead it means "put that value into the monadic context that we're working in". If the monad were IO, that would be quite the same: give this value back to the real-world context (this does not mean to print the value or something, just to hand it on). But here, our context is a string, or rather a list (of chars, so it is a String).
Right, so if x is not 'F' we put it back into the string. That sounds reasonable enough, but what about then "FLF"? Note that I can also write it this way:
if (x == 'F')
then do
x' <- "FLF"
return x'
else return x
which means, I take all characters out of "FLW" and return them back into the overall result. But there's no need to only think about the final result, we can as well isolate only this part do { x' <- "FLF"; return x' } – and, quite obviously, its value is nothing but the string "FLF" itself!
So I hope you have now grasped why apply' works. Back to your version, though it actually doesn't make much sense...
then do
show "Hello"
apply xs
here we have a line that's not at the end of a do block, but doesn't have a <- in it. You normally see this in IO in something like
main = do
putStrLn "How ya doin'?"
response <- getLine
...
Remember that "output-only" actions have type IO() in Haskell, which means, they don't directly return any meaningful value, just the trivial value (). So you don't really care about this, but you could still evaluate it:
main = do
trivial <- putStrLn "Hello, let's see what this IO action returns:"
print trivial
compiles and outputs
Hello, let's see what this IO action returns:()
It would be stupid if we had to do this evaluating () all the time, so Haskell allows to just leave the () <- out. It's really just that!
So a line like show "Hello" in the middle of a do block basically means "take one character out of show "Hello" (which is simply a string with the value "\"Hello\""), but don't do anything else with this character / just throw it away".
The rest of your definition is just other recursive calls to apply, but because none of them does anything more interesting than throwing away characters, you eventually end up at apply [] = [], so that's the final result: an empty string.

if-then-else... I know that Haskell supports these, however, I'm very surprised that no one here removed them...
So below are my solutions for different cases of making replacements.
Replacing a character
Replacing words
Replacing through a function on each word
$ cat replace.hs
import Data.List (isPrefixOf)
replaceC :: Char -> Char -> String -> String
replaceC _ _ [] = []
replaceC a b (x:xs)
| x == a = b:replaceC a b xs
| otherwise = x:replaceC a b xs
replaceW :: String -> String -> String -> String
replaceW a b s = unwords . map replaceW' $ words s
where replaceW' x | x == a = b
| otherwise = x
replaceF :: (String -> String) -> String -> String
replaceF f = unwords . map f . words
string = "Hello world ^fg(blue)"
main = do
print string
print $ replaceC 'o' 'z' string
print $ replaceW "world" "kitty" string
print . replaceF f . replaceW "world" "kitty" $ replaceC 'H' 'Y' string
where f s | "^" `isPrefixOf` s = '^':'^':drop 1 s
| otherwise = s
$ runhaskell replace.hs
"Hello world ^fg(blue)"
"Hellz wzrld ^fg(blue)"
"Hello kitty ^fg(blue)"
"Yello kitty ^^fg(blue)"

Your basic error was that you wanted to replace a Char in a String with a String.
This is impossible because String is a list of Char and a Char is a Char and not a short String. Neither is a String ever a Char, even if its length is 1.
Hence, what you really wanted is to replace some Char with some other Chars. Your approach was promising and could have been completed like so:
replace [] = [] -- nothing to replace in an empty string
replace (c:cs) = if c == 'F' then 'F':'L':'F':replace cs
else c:replace cs

Shortening a String Haskell

How do you shorten a string in Haskell with a given number.
Say:
comp :: String -> String
short :: String -> String
chomp (x:xs) = (x : takeWhile (==x) xs)
using comp I want to select a run of repeated characters from the start of a string, with
the run comprising at most nine characters.
For example:
short "aaaavvvdd"
would output "aaaa"
and short "dddddddddd"
outputs "ddddddddd".
I know I need take but am not sure how to put that into the code.
i've got this far but it doesn't work
short x:xs | length(short x:xs) >9 = take(9)
| otherwise = comp

The Quick Answer
import Data.List
short [] = []
short x = (take 9 . head . group) x
This will give you output that matches your desired output.
That is,
*> short "aaaavvvdd"
"aaaa"
*> short "dddddddddd"
"ddddddddd"
Step by Step Development
Use "group" to separate the items
This solution depends on the "group" function in the Data.List library. We begin with the definition:
short x = group x
This gives us:
*> short "aaaavvvddd"
["aaaa","vvv","ddd"]
Use "head" to return only the first element
Once we have the the elements in a list, we want only the first item of the list. We achieve this using "head":
short x = (head . group) x
"." is the Haskell function for function composition. It's the same as:
short x = head (group x)
or
short x = head $ group x
This will give us:
*> short "aaaavvvdd"
"aaaa"
*> short "dddddddddddddd"
"dddddddddddddd"
Use "take" to get the first nine characters
We finish the program by taking only the first nine characters of this result, and end up with our final function. To do this, we use the "take" function from the prelude:
short x = (take 9 . head . group) x
We now have the result that we wanted, but with one minor problem.
Add another case to eliminate the error
Note that using our current definition on the empty list causes an error,
*> short "aaaavvvddd"
"aaaa"
*> short ""
"*** Exception: Prelude.head: empty list
Because "head" is undefined on the empty list, we need to handle another case: the empty list. Now we have:
short [] = []
short x = (take 9 . head . group) x
This is our "final answer".

Here is another version:
short xs = take 9 $ takeWhile (== head xs) xs
So we take from the list as long as the content equals the head of list (which is the first char of the string). Then we use take to shorten the result when necessary.
Note that we don't need an additional case for empty strings, which is a consequence from Haskell's lazyness: If takeWhile sees that the list argument is empty, it doesn't bother to evaluate the condition argument, so head xs doesn't throw an error.

Here's a definition:
import Data.List (group)
short = take 9 . head . group
Interestingly enough, since our returned string is a prefix of the original string, and is constrained to be at most 9 characters long, it doesn't matter whether we trim down to that limit first or last. So we could also use this definition:
short = head . group . take 9
Both of these are written in the "pointfree" style which doesn't reference a lack of punctuation, but a lack of unnecessary variables. We could have also written the definition as
short s = take 9 (head (group s))
Or, using $ to get rid of parentheses:
short s = take 9 $ head $ group s
The only other step is to extract only the first block of matching characters, which is what head . group does (equivalent to your chomp function).
From the docs:
group :: Eq a => [a] -> [[a]]
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Searching through a String - haskell

This works for single digit "pointers": pointer :: String -> Int pointer ('[':_:']':xs) = 1 + pointer xs pointer (_: xs) = pointer xs pointer _ = 0 This is better handled with parser combinators like those provided by ie. Parsec, but this might be overkill.

Related

Is implementing the words function possible without a postprocessing step after folding?

Calculating all possibilities for a string with optional parts

No instance for Foldable arising from length inside lambda

Iterating through a String and replacing single chars with substrings in haskell

Shortening a String Haskell

Categories

Resources