Show a list of words repeated in haskell - string

I need to be able to write a function that shows repeated words from a string and return a list of strings in order of its occurrence and ignore non-letters
e.g at hugs prompt
repetitions :: String -> [String]
repetitions > "My bag is is action packed packed."
output> ["is","packed"]
repetitions > "My name name name is Sean ."
output> ["name","name"]
repetitions > "Ade is into into technical drawing drawing ."
output> ["into","drawing"]

To split a string into words, use the words function (in the Prelude).
To eliminate non-word characters, filter with Data.Char.isAlphaNum.
Zip the list together with its tail to get adjacent pairs (x, y).
Fold the list, consing a new list that contains all x where x == y.
Someting like:
repetitions s = map fst . filter (uncurry (==)) . zip l $ tail l
where l = map (filter isAlphaNum) (words s)
I'm not sure that works, but it should give you a rough idea.

I am new to this language so my solution could be a kind of ugly in the eyes of an Haskell veteran, but anyway:
let repetitions x = concat (map tail (filter (\x -> (length x) > 1) (List.group (words (filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') || c==' ') x)))))
This part will remove all non letters and non spaces from a string s:
filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') || c==' ') s
This one will split a string s to words and group the same words to lists returning list of lists:
List.group (words s)
When this part will remove all lists with less than two elements:
filter (\x -> (length x) > 1) s
After what we will concatenate all lists to one removing one element from them though
concat (map tail s)

This might be inelegent, however it is conceptually very simple. I'm assuming that its looking for consecutive duplicate words like the examples.
-- a wrapper that allows you to give the input as a String
repititions :: String -> [String]
repititions s = repititionsLogic (words s)
-- dose the real work
repititionsLogic :: [String] -> [String]
repititionsLogic [] = []
repititionsLogic [a] = []
repititionsLogic (a:as)
| ((==) a (head as)) = a : repititionsLogic as
| otherwise = repititionsLogic as

Building on what Alexander Prokofyev answered:
repetitions x = concat (map tail (filter (\x -> (length x) > 1) (List.group (word (filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') || c==' ') x)))))
Remove unnecessary parenthesis:
repetitions x = concat (map tail (filter (\x -> length x > 1) (List.group (word (filter (\c -> c >= 'a' && c <= 'z' || c>='A' && c <= 'Z' || c==' ') x)))))
Use $ to remove more parenthesis (each $ can replace an opening parenthesis if the ending parenthesis is at the end of the expression):
repetitions x = concat $ map tail $ filter (\x -> length x > 1) $ List.group $ word $ filter (\c -> c >= 'a' && c <= 'z' || c>='A' && c <= 'Z' || c==' ') x
Replace character ranges with functions from Data.Char, merge concat and map:
repetitions x = concatMap tail $ filter (\x -> length x > 1) $ List.group $ word $ filter (\c -> isAlpha c || isSeparator c) x
Use a section and currying in points-free style to simplify (\x -> length x > 1) to ((>1) . length). This combines length with (>1) (a partially applied operator, or section) in a right-to-left pipeline.
repetitions x = concatMap tail $ filter ((>1) . length) $ List.group $ word $ filter (\c -> isAlpha c || isSeparator c) x
Eliminate explicit "x" variable to make overall expression points-free:
repetitions = concatMap tail . filter ((>1) . length) . List.group . word . filter (\c -> isAlpha c || isSeparator c)
Now the entire function, reading from right to left, is a pipeline that filters only alpha or separator characters, splits it into words, breaks it into groups, filters those groups with more than 1 element, and then reduces the remaining groups to the first element of each.

Related

Word count in haskell

I'm working on this exercise:
Given a phrase, count the occurrences of each word in that phrase.
For the purposes of this exercise you can expect that a word will always be one of:
A number composed of one or more ASCII digits (ie "0" or "1234") OR
A simple word composed of one or more ASCII letters (ie "a" or "they") OR
A contraction of two simple words joined by a single apostrophe (ie "it's" or "they're")
When counting words you can assume the following rules:
The count is case insensitive (ie "You", "you", and "YOU" are 3 uses of the same word)
The count is unordered; the tests will ignore how words and counts are ordered
Other than the apostrophe in a contraction all forms of punctuation are ignored
The words can be separated by any form of whitespace (ie "\t", "\n", " ")
For example, for the phrase "That's the password: 'PASSWORD 123'!", cried the Special > Agent.\nSo I fled. the count would be:
that's: 1
the: 2
password: 2
123: 1
cried: 1
special: 1
agent: 1
so: 1
i: 1
fled: 1
My code:
module WordCount (wordCount) where
import qualified Data.Char as C
import qualified Data.List as L
import Text.Regex.TDFA as R
wordCount :: String -> [(String, Int)]
wordCount xs =
do
ys <- words xs
let zs = R.getAllTextMatches (ys =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map (C.toLower) w | w <- zs]
return (head g, length g)
But it fails on the input "one fish two fish red fish blue fish". It outputs one count for each word, even the repeated ones, as if the sort and group aren't doing anything. Why?
I've read this answer, which basically does the same thing in a more advanced way using Control.Arrow.
You don't need to use words to split the line, the regex should achieve the desired splitting:
wordCount :: String -> [(String, Int)]
wordCount xs =
do
let zs = R.getAllTextMatches (xs =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map C.toLower w | w <- zs]
return (head g, length g)
wordCount xs =
do
ys <- words xs
let zs = R.getAllTextMatches (ys =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map (C.toLower) w | w <- zs]
return (head g, length g)
You’re splitting the input xs into words by whitespace using words. You iterate over these in the list monad with the binding statement ys <- …. Then you split each of those words into subwords using the regular expression, of which there happens to be only one match in your example. You sort and group each of the subwords in a list by itself.
I believe you can essentially just delete the initial call to words:
wordCount xs =
do
let ys = R.getAllTextMatches (xs =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map C.toLower w | w <- ys]
return (head g, length g)

Haskel parse error on input ')' in a fuction

Basically i have this code
ola xs = foldl (\acc x -> if (chr x >= 65 && chr x <= 71 || chr x >= 97 && chr x <= 103) then acc (++) x) [] xs
And when i try to load it on ghci it says i have a parse error in the last parenthesys so can you help me? What is wrong here?
You are finding all occurrecne of letters A to G in a string in case in sensitive manner. As said in comments you are missing the else case. The else case should simply be returning the acc variable.
But still there are mistakes in your program. The function chr takes a number and returns ascii character. What you need is ord, which takes char and returns ascii value.
So the version of your program will be
ola xs = foldl (\acc x -> if (ord x >= 65 && ord x <= 71 || ord x >= 97 && ord x <= 103) then acc ++ [x] else acc) [] xs
Char data type is already of the Ord class, so you can compare them. No need for using ord. With this you can write above as
ola xs = foldl (\acc x -> if (x >= 'A' && x <= 'G' || x >= 'a' && x <= 'g') then acc ++ [x] else acc) [] xs
There is special function in list known as filter which takes from a list only the needed elements. Using filter the above program will be.
ola xs = filter (\x -> x >= 'A' && x <= 'G' || x >= 'a' && x <= 'g') xs
now the argument can simply be removed and it can be simply written as
ola = filter (\x -> x >= 'A' && x <= 'G' || x >= 'a' && x <= 'g')
Bonus :-
But now what if you have some random character you want to check. You can first create a string for the characters you want to filter. For the above example this will suffice.
strList=['A'..'G','a'..'g'];
Now filter using elem. elem checks if a given element is present in string.
ola xs = filter (\x -> elem x strList) xs
Bonus 2:-
What if you want to remove the arguments xs. For that first you need to first flip the arguments of the elem, so that the list will be first.
ola = filter ((flip elem) strList)

haskell expected type trouble

new to Haskell and functional programing...in the learning process. What is wrong with this code:
import System.IO
import Data.Char
import System.Environment
main = do
args <- getArgs
progName <- getProgName
content <- readFile $ head args
putStrLn $ show $ getWordsInfo content
getWordsInfo = let
wordList = filter (\x -> length x > 2 && all isAlpha x) . words
in foldl foldingFunction 0 wordList
where foldingFunction acc tWord = acc + length tWord
When I try to compile it, I get the following
Couldn't match expected type `[[a0]]'
with actual type `String -> [[Char]]'
In the third argument of `foldl', namely `wordList'
In the expression: foldl foldingFunction 0 wordList
In the expression:
let
wordList = filter (\ x -> length x > 2 && all isAlpha x) . words
in foldl foldingFunction 0 wordList
You appear to be using point free notation incorrectly.
The only line of this that is wrong is:
let wordList = filter (\x -> length x > 2 && all isAlpha x) . words
The error message is saying that when you call wordList it hasn't been applied to enough arguments, it is expecting a list of lists, but instead has been given a function which takes a string and produces a list of lists. So, we simply need to give the wordList function the input string.
You can rewrite it two ways:
The first is by explicitly specifying the argument:
getWordsInfo xs = let wordList = filter (\x -> length x > 2 && all isAlpha x) (words xs)
in foldl foldingFunction 0 wordList
where foldingFunction acc tWord = acc + length tWord
The second is by keeping the point free bit not in a let binding:
getWordsInfo = foldl foldingFunction 0 . filter (\x -> length x > 2 && all isAlpha x) . words
where foldingFunction acc tWord = acc + length tWord
Your folding function is taking the length of each word and summing them up, which can be simplified by mapping over the list, and taking the length, then summing the list.
getWordsInfo = sum . map length . filter (\x -> length x > 2 && all isAlpha x) . words
And that line is getting a bit long, so we should probably factor some of it out into another definition giving us finally:
import Data.Char (isAlpha)
getWordsInfo = sum . map length . filter isLongWord . words
where isLongWord x = length x > 2 && all isAlpha x
Usage:
λ> getWordsInfo "apple banana orange a a b b punctuation!!"
17
λ> getWordsInfo "aa bb cc"
0
λ> getWordsInfo "!!!"
0
λ> getWordsInfo "apple"
5
λ>

rotate a string in haskell with some exceptions

I want to rotate a string in haskell, so if I give "Now I want to scream" to rotate [[want to scream now I],[scream now I want to]] , if the string start with "I" or "to" then must eliminate it. Till now I still have problems with the rotation.
reverseWords :: String -> String
reverseWords = unwords . reverse . words
shiftt :: [a] -> Int -> [a]
shiftt l n = drop n l ++ take n l
rot::String->[String]
rot l = [ reverseWords l i | i <- [0 .. (length l) -1]]
create a list of all rotations, then filter out based on your predicate. For example,
rotations x = take (length x) $ iterate rot1 x
where rot1 = drop 1 x ++ take 1 x
filteredRots = map unwords . filter (\x -> length (head x) > 2) . rotations . words
and use as
> filteredRots "Now I want to scream"
["Now I want to scream","want to scream Now I","scream Now I want to"]
Prelude>

Implementing Backtracking on Haskell

I have a problem making Backtracking on Haskell, I know how to do recursive functions but I get troubles when I try to get multiple solutions or the best one (backtracking).
There's a list with some strings, then I need to get the solutions to get from a string to another one changing one letter from the string, I will get the list, the first string and the last one. If there is solution return the count of steps that it did, if there is not solution it returns -1. here's an example:
wordF ["spice","stick","smice","stock","slice","slick","stock"] "spice" "stock"
Then I have my list and I need to start with "spice" and get to "stock"
and the best solution is ["spice","slice","slick","stick","stock"] with four steps to get from "spice" to "stock". then it return 4.
Another solution is ["spice","smice","slice","slick","stick","stock"] with five steps to get to "stock" then it return `5. But this is a wrong solution because there's another one that's better with lesser steps than this one.
I'm having troubles making a backtracking to get the best solution, because I don't know how to make that my code search another solutions and just not one..
Here's a code that i tried to make but i get some errors, btw i dont know if my way to "make" backtracking is good or if there are some mistakes that im not seeing..
wordF :: [String] -> String -> String -> (String, String, Int)
wordF [] a b = (a, b, -1)
wordF list a b | (notElem a list || notElem b list) = (a, b, -1)
| otherwise = (a, b, (wordF2 list a b [a] 0 (length list)))
wordF2 :: [String] -> String -> String -> [String] -> Int -> Int -> Int
wordF2 list a b list_aux cont maxi | (cont==maxi) = 1000
| (a==b) = length list_aux
| (a/=b) && (cont<maxi) && notElemFound && (checkin /= "ThisWRONG") && (wording1<=wording2) = wording1
| (a/=b) && (cont<maxi) && notElemFound && (checkin /= "ThisWRONG") && (wording1>wording2) = wording2
| (a/=b) && (checkin == "ThisWRONG") = wordF2 list a b list_aux (cont+1) maxi
where
checkin = (check_word2 a (list!!cont) (list!!cont) 0)
wording1 = (wordF2 list checkin b (list_aux++[checkin]) 0 maxi)
wording2 = (wordF2 list checkin b (list_aux++[checkin]) 1 maxi)
notElemFound = ((any (==(list!!cont)) list_aux) == False)
check_word2 :: String -> String -> String -> Int -> String
check_word2 word1 word2 word3 dif | (dif > 1) = "ThisWRONG"
| ((length word1 == 1) && (length word2 == 1) && (head word1 == head word2)) = word3
| ((length word1 == 1) && (length word2 == 1) && (head word1 /= head word2) && (dif<1)) = word3
| ((head word1) == (head word2)) = check_word2 (tail word1) (tail word2) word3 dif
| otherwise = check_word2 (tail word1) (tail word2) word3 (dif+1)
My first function wordF2 get the list, the start, the end, an auxiliary list to get the current solution with the first element that always will be there ([a]), a counter with 0, and the max size of the counter (length list)..
and the second function check_word2 it checks if a word can pass to another word, like "spice" to "slice" if it cant like "spice" to "spoca" it returns "ThisWRONG".
This solution gets an error of pattern match failure
Program error: pattern match failure: wordF2 ["slice","slick"] "slice" "slick" ["slice"] 0 1
I was trying with little cases and nothing, and I'm restricting that i get a wrong position of the list with the count and the max.
Or may be I dont know how to implement backtracking on haskell to get multiple solutions, the best solution, etc..
UPDATE: I did a solution but its not backtracking
wordF :: [String] -> String -> String -> (String, String, Int)
wordF [] a b = (a, b, -1)
wordF list a b | (notElem a list || notElem b list) = (a, b, -1)
| otherwise = (a, b, (wordF1 list a b))
wordF1 :: [String] -> String -> String -> Int
wordF1 list a b | ((map length (wordF2 (subconjuntos2 (subconjuntos list) a b))) == []) = -1
| (calculo > 0) = calculo
| otherwise = -1
where
calculo = (minimum (map length (wordF2 (subconjuntos2 (subconjuntos list) a b))))-1
wordF2 :: [[String]] -> [[String]]
wordF2 [[]] = []
wordF2 (x:xs) | ((length xs == 1) && ((check_word x) == True) && ((check_word (head xs)) == True)) = x:xs
| ((length xs == 1) && ((check_word x) == False) && ((check_word (head xs)) == True)) = xs
| ((length xs == 1) && ((check_word x) == True) && ((check_word (head xs)) == False)) = [x]
| ((length xs == 1) && ((check_word x) == False) && ((check_word (head xs)) == False)) = []
| ((check_word x) == True) = x:wordF2 xs
| ((check_word x) == False ) = wordF2 xs
check_word :: [String] -> Bool
check_word [] = False
check_word (x:xs) | ((length xs == 1) && ((check_word2 x (head xs) 0) == True)) = True
| ((length xs >1) && ((check_word2 x (head xs) 0) == True)) = True && (check_word xs)
| otherwise = False
check_word2 :: String -> String -> Int -> Bool
check_word2 word1 word2 dif | (dif > 1) = False
| ((length word1 == 1) && (length word2 == 1) && (head word1 == head word2)) = True
| ((length word1 == 1) && (length word2 == 1) && (head word1 /= head word2) && (dif<1)) = True
| ((head word1) == (head word2)) = check_word2 (tail word1) (tail word2) dif
| otherwise = check_word2 (tail word1) (tail word2) (dif+1)
subconjuntos2 :: [[String]] -> String -> String -> [[String]]
subconjuntos2 [] a b = []
subconjuntos2 (x:xs) a b | (length x <= 1) = subconjuntos2 xs a b
| ((head x == a) && (last x == b)) = (x:subconjuntos2 xs a b)
| ((head x /= a) || (last x /= b)) = (subconjuntos2 xs a b)
subconjuntos :: [a] -> [[a]]
subconjuntos [] = [[]]
subconjuntos (x:xs) = [x:ys | ys <- sub] ++ sub
where sub = subconjuntos xs
Mmm may be its inefficient but at least it does the solution..
i search all posible solutions, i compare head == "slice" and last == "stock", then i filter the ones that are solution and print the shorter one,
thanks and if you guys have any suggest say it :)
Not thoroughly tested, but this hopefully will help:
import Data.Function (on)
import Data.List (minimumBy, delete)
import Control.Monad (guard)
type Word = String
type Path = [String]
wordF :: [Word] -> Word -> Word -> Path
wordF words start end =
start : minimumBy (compare `on` length) (generatePaths words start end)
-- Use the list monad to do the nondeterminism and backtracking.
-- Returns a list of all paths that lead from `start` to `end`
-- in steps that `differByOne`.
generatePaths :: [Word] -> Word -> Word -> [Path]
generatePaths words start end = do
-- Choose one of the words, nondeterministically
word <- words
-- If the word doesn't `differByOne` from `start`, reject the choice
-- and backtrack.
guard $ differsByOne word start
if word == end
then return [word]
else do
next <- generatePaths (delete word words) word end
return $ word : next
differsByOne :: Word -> Word -> Bool
differsByOne "" "" = False
differsByOne (a:as) (b:bs)
| a == b = differsByOne as bs
| otherwise = as == bs
Example run:
>>> wordF ["spice","stick","smice","stock","slice","slick","stock"] "spice" "stock"
["spice","slice","slick","stick","stock"]
The list monad in Haskell is commonly described as a form of nondeterministic, backtracking computation. What the code above is doing is allowing the list monad to take on the responsibility of generating alternatives, testing whether they satisfy criteria, and backtracking on failure to the most recent choice point. The bind of the list monad, e.g. word <- words, means "nondeterministically pick one of the words. guard means "if the choices so far don't satisfy this condition, backtrack and make a different choice. The result of a list monad computation is the list of all the results that stem from choices that did not violate any guards.
If this looks like list comprehensions, well, list comprehensions are the same thing as the list monad—I chose to express it with the monad instead of comprehensions.
There have been several articles published recently on classic brute-force search problems.
Mark Dominus published a simple example of using lists for a simple exhaustive search.
Justin Le followed up with a small modification to the previous article that simplified tracking the current state of the search.
I followed up with a further modification that allowed measuring the gains from early rejection of part of the search tree.
Note that the code in my article is quite slow because it's measuring the amount of work done as well as doing it. My article has good examples for how to quickly reject parts of the search tree, but it should be considered only an illustration - not production code.
A brute force approach using recursion:
import Data.List (filter, (\\), reverse, delete, sortBy)
import Data.Ord (comparing)
neighbour :: String -> String -> Bool
neighbour word = (1 ==) . length . (\\ word)
process :: String -> String -> [String] -> [(Int, [String])]
process start end dict =
let
loop :: String -> String -> [String] -> [String] -> [(Int,[String])] -> [(Int,[String])]
loop start end dict path results =
case next of
[] -> results
xs ->
if elem end xs
then (length solution, solution) : results
else results ++ branches xs
where
next = filter (neighbour start) dict'
dict' = delete start dict
path' = start : path
branches xs = [a | x <- xs, a <- loop x end dict' path' results]
solution = reverse (end : path')
in
loop start end dict [] []
shortestSolution :: Maybe Int
shortestSolution = shortest solutions
where
solutions = process start end dict
shortest s =
case s of
[] -> Nothing
xs -> Just $ fst $ head $ sortBy (comparing fst) xs
start = "spice"
end = "stock"
dict = ["spice","stick","smice","slice","slick","stock"]
Notes:
This code computes all possibles solutions (process) and select the shortest one (shortestSolution), as Carl said, you might want to prune parts of the search tree for better performance.
Using a Maybe instead of returning -1 when a function can fail to return results is preferred.
Another way using a tree with breadth-first search:
import Data.Tree
import Data.List( filter, (\\), delete )
import Data.Maybe
node :: String -> [String] -> Tree String
node label dict = Node{ rootLabel = label, subForest = branches label (delete label dict) }
branches :: String -> [String] -> [Tree String]
branches start dict = map (flip node dict) (filter (neighbour start) dict)
neighbour :: String -> String -> Bool
neighbour word = (1 ==) . length . (\\ word)
-- breadth first traversal
shortestBF tree end = find [tree] end 0
where
find ts end depth
| null ts = Nothing
| elem end (map rootLabel ts) = Just depth
| otherwise = find (concat (map subForest ts)) end (depth+1)
result = shortestBF tree end
tree :: Tree String
tree = node start dict
start = "spice"
end = "stock"
dict = ["spice","stick","smice","slice","slick","stock"]

Resources