Haskell - count most repeated element in a list with interact - haskell
I'm trying to make a Haskell program using interact, that returns the most used word and the # of times it appears. I've seen examples with sort- but I don't need to know the counts for all words, I only need the most repeated word. So far I have:
import Data.List -- (sort)
import Data.Char -- (isAlpha, toLower)
import Data.Ord -- (maximumBy)
main =
interact
$ unwords
-- comment: here show the size of the list and the word (probably head)
. maximumBy(comparing length)
. group
. sort
. words
. map (\char -> if isAlpha char then toLower char else ' ')
The above compiles. maximumBy gives the most used word like this:
[the, the, the, the, the, the, the, the...]
for the number of times the word "the" appears in the text; and I have verified that "the" is the most used word for the text I've supplied.
What I want to output is something like this: "the, 318"
I tried the following which only gives the first letter "t" and 3:
import Data.List -- sort
import Data.Char -- isAlpha, toLower
import Data.Ord -- maximumBy
main =
interact
$ unwords
. map (\(n, w) -> show n ++ ", " ++ show w)
. map (\s -> (length s, head s))
. maximumBy(comparing length)
. group
. sort
. words
. map (\char -> if isAlpha char then toLower char else ' ')
Which gives the output:
"3, 't' 3, 't' 3, 't' 3, 't' ..."
Anyone know what I'm doing wrong?
The map in map (\s -> (length s, head s)) means that the function \s -> (length s, head s) is applied to each "the" instead of to the list of "the"'s, repeatedly giving the length and the first character of "the". So removing the map should work better. You will also need to fix up the final two steps (remove the unwords and the map):
$ (\(n, w) -> show n ++ ", " ++ show w)
. (\s -> (length s, head s))
. maximumBy(comparing length)
More efficiently, you can apply map (\s -> (length s, head s)) earlier in the pipeline than the maximum, which allows you to
Avoid recomputing the length in each comparison the maximum function does
Use just plain maximum instead of maximumBy. (This may be slightly different in which word is chosen if there are two equally frequent ones, since it then compares the actual strings.)
In other words, you can use
$ (\(n, w) -> show n ++ ", " ++ show w)
. maximum
. map (\s -> (length s, head s))
Or to put it all together:
import Data.List (group, sort)
import Data.Char (isAlpha, toLower)
main =
interact
$ (\(n, w) -> show n ++ ", " ++ show w)
. maximum
. map (\s -> (length s, head s))
. group
. sort
. words
. map (\char -> if isAlpha char then toLower char else ' ')
Note also how I changed the import statements to use the official syntax for explicitly naming what you are importing. I'd strongly recommend that over using comments, as this actually gave me error messages pointing out one function (group) you had missed and one (maximumBy) you had listed with the wrong module.
Related
Filtering String from List Haskell
I'm trying to write a program that reads a text file, then displays the frequencies and count of words in the file. What I need to do next is filter certain words from the text file. I have been looking at online resources for a couple of hours and still can't find what I'm looking for! I have provided my code for the program so far below: lowercase = map toLower top doc = wordPairs where listOfWords = words (lowercase doc) wordGroups = group (sort listOfWords) wordPairs = reverse $ sort $ map (\x -> (length x, head x)) $ filterWords wordGroups filterWords :: String -> String filterWords = filter (all (`elem` ["poultry outwits ants"])) . words
It might be easier if you split the program in a different way. For example import Data.List(group,sort) import Control.Arrow((&&&)) freq :: Ord a => [a] -> [(Int,a)] freq = reverse . sort . map (length &&& head) . group . sort second part will be defining the input to this function. You want to filter only certain elements. select :: Eq a => [a] -> [a] -> [a] select list = filter (`elem` list) these will make testing easier since you don't need the specific typed input. Finally, you can tie it all together freq $ select ["a","b","c"] $ words "a b a d e a b b b c d e c" will give you [(4,"b"),(3,"a"),(2,"c")]
There is my code which solve your problem top :: String -> [(Int,String)] --Signature is always important top = sorter . wordFrequency . groups . filtered --just compose `where` functions where -- This will filter your words filtered = filter (`notElem` ["poultry","outwits","ants"]) . words . map toLower -- Group your words groups = group . sort -- Create the pairs of (count, word) wordFrequency = map (length &&& head) -- Sort your list by first. for reverse just switch a and b sorter = sortBy (\ a b -> fst b `compare` fst a)
Haskell - Trying to apply a function to lines of multiple numbers
I am new to Haskell and I am trying to apply a function (gcd) to input on standard in, which is line separated and each line contains no less or more than two numbers. Here is an example of my input: 3 10 4 1 100 288 240 I am currently breaking up each line into a tuple of both numbers, but I am having trouble figuring out how to separate these tuples and apply a function to them. Here is what I have so far: import Data.List main :: IO () main = do n <- readLn :: IO Int content <- getContents let points = map (\[x, y] -> (x, y)). map (map (read::String->Int)). map words. lines $ content ans = gcd (fst points :: Int) (snd points :: Int) print ans Any information as two a good place to start looking for this answer would be much appreciated. I have read through the Learning Haskell tutorial and have not found any information of this particular problem.
You are pretty close. There is no reason to convert to a tuple or list of tuples before calling gcd. main = do contents <- getContents print $ map ((\[x,y] -> gcd (read x) (read y)) . words) . lines $ contents All the interesting stuff is between print and contents. lines will split the contents into lines. map (...) applies the function to each line. words splits the line into words. \[x,y] -> gcd (read x) (read y) will match on a list of two strings (and throw an error otherwise - not good practice in general but fine for a simple program like this), read those strings as Integers and compute their GCD. If you want to make use of lazy IO, in order to print each result after you enter each line, you can change it as follows. main = do contents <- getContents mapM_ (print . (\[x,y] -> gcd (read x) (read y)) . words) . lines $ contents
Or, you can do it in a more imperative style: import Control.Monad main = do n <- readLn replicateM_ n $ do [x, y] <- (map read . words) `liftM` getLine print $ gcd x y
Project Euler 8 - I don't understand it
I looked up for a solution in Haskell for the 8th Euler problem, but I don't quite understand it. import Data.List import Data.Char euler_8 = do str <- readFile "number.txt" print . maximum . map product . foldr (zipWith (:)) (repeat []) . take 13 . tails . map (fromIntegral . digitToInt) . concat . lines $ str Here is the link for the solution and here you can find the task. Could anyone explain me the solution one by one?
Reading the data readFile reads the file "number.txt". If we put a small 16 digit number in a file called number.txt 7316 9698 8586 1254 Runing euler_8 = do str <- readFile "number.txt" print $ str Results in "7316\n9698\n8586\n1254" This string has extra newline characters in it. To remove them, the author splits the string into lines. euler_8 = do str <- readFile "number.txt" print . lines $ str The result no longer has any '\n' characters, but is a list of strings. ["7316","9698","8586","1254"] To turn this into a single string, the strings are concatenated together. euler_8 = do str <- readFile "number.txt" print . concat . lines $ str The concatenated string is a list of characters instead of a list of numbers "7316969885861254" Each character is converted into an Int by digitToInt then converted into an Integer by fromInteger. On 32 bit hardware using a full-sized Integer is important since the product of 13 digits could be larger than 2^31-1. This conversion is mapped onto each item in the list. euler_8 = do str <- readFile "number.txt" print . map (fromIntegral . digitToInt) . concat . lines $ str The resulting list is full of Integers. [7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4] Subsequences The author's next goal is to find all of the 13 digit runs in this list of integers. tails returns all of the sublists of a list, starting at any position and running till the end of the list. euler_8 = do str <- readFile "number.txt" print . tails . map (fromIntegral . digitToInt) . concat . lines $ str This results in 17 lists for our 16 digit example. (I've added formatting) [ [7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4], [3,1,6,9,6,9,8,8,5,8,6,1,2,5,4], [1,6,9,6,9,8,8,5,8,6,1,2,5,4], [6,9,6,9,8,8,5,8,6,1,2,5,4], [9,6,9,8,8,5,8,6,1,2,5,4], [6,9,8,8,5,8,6,1,2,5,4], [9,8,8,5,8,6,1,2,5,4], [8,8,5,8,6,1,2,5,4], [8,5,8,6,1,2,5,4], [5,8,6,1,2,5,4], [8,6,1,2,5,4], [6,1,2,5,4], [1,2,5,4], [2,5,4], [5,4], [4], [] ] The author is going to pull a trick where we rearrange these lists to read off 13 digit long sub lists. If we look at these lists left-aligned instead of right-aligned we can see the sub sequences running down each column. [ [7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4], [3,1,6,9,6,9,8,8,5,8,6,1,2,5,4], [1,6,9,6,9,8,8,5,8,6,1,2,5,4], [6,9,6,9,8,8,5,8,6,1,2,5,4], [9,6,9,8,8,5,8,6,1,2,5,4], [6,9,8,8,5,8,6,1,2,5,4], [9,8,8,5,8,6,1,2,5,4], [8,8,5,8,6,1,2,5,4], [8,5,8,6,1,2,5,4], [5,8,6,1,2,5,4], [8,6,1,2,5,4], [6,1,2,5,4], [1,2,5,4], [2,5,4], [5,4], [4], [] ] We only want these columns to be 13 digits long, so we only want to take the first 13 rows. [ [7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4], [3,1,6,9,6,9,8,8,5,8,6,1,2,5,4], [1,6,9,6,9,8,8,5,8,6,1,2,5,4], [6,9,6,9,8,8,5,8,6,1,2,5,4], [9,6,9,8,8,5,8,6,1,2,5,4], [6,9,8,8,5,8,6,1,2,5,4], [9,8,8,5,8,6,1,2,5,4], [8,8,5,8,6,1,2,5,4], [8,5,8,6,1,2,5,4], [5,8,6,1,2,5,4], [8,6,1,2,5,4], [6,1,2,5,4], [1,2,5,4] ] foldr (zipWith (:)) (repeat []) transposes a list of lists (explaining it belongs to perhaps another question). It discards the parts of the rows longer than the shortest row. euler_8 = do str <- readFile "number.txt" print . foldr (zipWith (:)) (repeat []) . take 13 . tails . map (fromIntegral . digitToInt) . concat . lines $ str We are now reading the sub-sequences across the lists as usual [ [7,3,1,6,9,6,9,8,8,5,8,6,1], [3,1,6,9,6,9,8,8,5,8,6,1,2], [1,6,9,6,9,8,8,5,8,6,1,2,5], [6,9,6,9,8,8,5,8,6,1,2,5,4] ] The problem We find the product of each of the sub-sequences by mapping product on to them. euler_8 = do str <- readFile "number.txt" print . map product . foldr (zipWith (:)) (repeat []) . take 13 . tails . map (fromIntegral . digitToInt) . concat . lines $ str This reduces the lists to a single number each [940584960,268738560,447897600,1791590400] From which we must find the maximum. euler_8 = do str <- readFile "number.txt" print . maximum . map product . foldr (zipWith (:)) (repeat []) . take 13 . tails . map (fromIntegral . digitToInt) . concat . lines $ str The answer is 1791590400
If you're not familiar with the functions used, the first thing you should do is examine the types of each function. Since this is function composition, you apply from inside out (i.e. operations occur right to left, bottom to top when reading). We can walk through this line by line. Starting from the last line, we'll first examine the types. :t str str :: String -- This is your input :t lines lines :: String -> [String] -- Turn a string into an array of strings splitting on new line :t concat concat :: [[a]] -> [a] -- Merge a list of lists into a single list (hint: type String = [Char]) Since type String = [Char] (so [String] is equivalent to [[Char]]), this line is converting the multi-line number into a single array of number characters. More precisely, it first creates an array of strings based on the full string. That is, one string per new line. It then merges all of these lines (now containing only number characters) into a single array of characters (or a single String). The next line takes this new String as input. Again, let's observe the types: :t digitToInt digitToInt :: Char -> Int -- Convert a digit char to an int :t fromIntegral fromIntegral :: (Num b, Integral a) => a -> b -- Convert integral to num type :t map map :: (a -> b) -> [a] -> [b] -- Perform a function on each element of the array :t tails tails :: [a] -> [[a]] -- Returns all final segments of input (see: http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-List.html#v:tails) :t take take :: Int -> [a] -> [a] -- Return the first n values of the list If we apply these operations to our string current input, the first thing that happens is we map the composed function of (fromIntegral . digitToInt) over each character in our string. What this does is turn our string of digits into a list of number types. EDIT As pointed out below in the comments, the fromIntegral in this example is to prevent overflow on 32-bit integer types. Now that we have converted our string into actual numeric types, we start by running tails on this result. Since (by the problem statement) all values must be adjacent and we know that all of the integers are non-negative (by virtue of being places of a larger number), we take only the first 13 elements since we want to ensure our multiplication is groupings of 13 consecutive elements. How this works is difficult to understand without considering the next line. So, let's do a quick experiment. After converting our string into numeric types, we now have a big list of lists. This is actually kind of hard to think about what we actually have here. For sake of understanding, the contents of the list are not very important. What is important is its size. So let's take a look at an artificial example: (map length . take 13 . tails) [1..1000] [1000,999,998,997,996,995,994,993,992,991,990,989,988] You can see what we have here is a big list of 13 elements. Each element is a list of size 1000 (i.e. the full dataset) down to 988 in descending order. So this is what we currently have for input into the next line which is, arguably, the most difficult-- yet most important-- line to understand. Why understanding this is important should become clear as we walk through the next line. :t foldr foldr :: (a -> b -> b) -> b -> [a] -> b -- Combine values into a single value :t zipWith zipWith :: (a -> b -> c) -> [a] -> [b] -> [c] -- Generalization of zip :t (:) (:) :: a -> [a] -> [a] -- Cons operator. Add element to list :t repeat repeat :: a -> [a] -- Infinite list containing specified value Remember how I mentioned we had a list of 13 elements before (of varying-sized lists)? This is important now. The line is going to iterate over that list and apply (zipWith (:)) to it. The (repeat []) is such that each time zipWith is called on a subsequence, it starts with an empty list as its base. This allows us to construct a list of lists containing our adjacent subsequences of length 13. Finally, we get to the last line which is pretty easy. That said, we should still be mindful of our types :t product product :: Num a => [a] -> a -- Multiply all elements of a list together and return result :t maximum maximum :: Ord a => [a] -> a -- Return maximum element in the list The first thing we do is map the product function over each subsequence. When this has completed we end up with a list of numeric types (hey, we finally don't have a list of lists anymore!). These values are the products of each subsequence. Finally, we apply the maximum function which returns only the largest element in the list.
EDIT: I found out later what the foldr expression was for. (See comments bellow my answer). I think that this could be expressed in different way - You can simply add a guard at the end of the list. My verbose version of that solution would be: import Data.List import Data.Char euler_8 = do let len = 13 let str1 = "123456789\n123456789" -- Join lines let str2 = concat (lines str1) -- Transform the list of characters into a list of numbers let lst1 = map (fromIntegral . digitToInt) str2 -- EDIT: Add a guard at the end of list let lst2 = lst1 ++ [-1] -- Get all tails of the list of digits let lst3 = tails lst2 -- Get first 13 digits from each tail let lst4 = map (take len) lst3 -- Get a list of products let prod = map product lst4 -- Find max product let m = maximum prod print m
How to capitalize a string using control lens?
I'm playing with the lens package and I'm trying to capitalize a string using only lens. Basically I want to call toUpper on each first element of every words. That seems to be easy to with it, but I can't figure out at all how to do it. Do I need a traversable ? How do I split by spaces etc ...
It's not really an isomorphism to call words then unwords because it'll convert repeated spaces to single ones, but let's pretend: words :: Iso' String [String] words = iso Prelude.words Prelude.unwords Now we can capitalize words by building a lens which focuses on the first letter of each word and applying over and toUpper capitalize :: String -> String capitalize = over (words . traverse . _head) toUpper
capitalize xs = xs & words <&> _head %~ toUpper & unwords Okay, that's the solution, but how to get there? Lets remove some lens parts. Exchange (<&>) with fmap and (&) with ($): capitalize xs = unwords $ fmap (_head %~ toUpper) $ words $ xs This looks familar. _head %~ f will apply f on the first element of the list. At the end, this is (almost*) equivalent to capitalize xs = unwords $ fmap (\(x:xs) -> toUpper x : xs) $ words $ xs which you are probably familiar with. * _head also takes care of the empty list case
A solution that doesn't collapse repeated spaces: import Control.Lens import Data.List.Split import Data.List.Split.Lens import Data.Char capitalize :: String -> String capitalize = view $ splitting (whenElt isSpace) traversed.to (over _head toUpper)
Haskell: how to operate the string type in a do block
I want to make a function that firstly divides a list l to two list m and n. Then create two thread to find out the longest palindrome in the two list. My code is : import Control.Concurrent (forkIO) import System.Environment (getArgs) import Data.List import Data.Ord main = do l <- getArgs forkIO $ putStrLn $ show $ longestPalindr $ mList l forkIO $ putStrLn $ show $ longestPalindr $ nList l longestPalindr x = snd $ last $ sort $ map (\l -> (length l, l)) $ map head $ group $ sort $ filter (\y -> y == reverse y) $ concatMap inits $ tails x mList l = take (length l `div` 2) l nList l = drop (length l `div` 2) l Now I can compile it, but the result is a [ ]. When I just run the longestPalindr and mList , I get the right result. I thought the logic here is right. So what is the problem?
The question title may need to be changed, as this is no longer about type errors. The functionality of the program can be fixed by simply mapping longestPalindr across the two halves of the list. In your code, you are finding the longest palindrome across [[Char]], so the result length is usually just 1. I've given a simple example of par and pseq. This just suggests to the compiler that it may be smart to evaluate left and right independently. It doesn't guarantee parallel evaluation, but rather leaves it up to the compiler to decide. Consult Parallel Haskell on the wiki to understand sparks, compile with the -threaded flag, then run it with +RTS -N2. Add -stderr for profiling, and see if there is any benefit to sparking here. I would expect negative returns until you start to feed it longer lists. For further reading on functional parallelism, take a look at Control.Parallel.Strategies. Manually wrangling threads in Haskell is only really needed in nondeterministic scenarios. import Control.Parallel (par, pseq) import System.Environment (getArgs) import Data.List import Data.Ord import Control.Function (on) main = do l <- getArgs let left = map longestPalindr (mList l) right = map longestPalindr (nList l) left `par` right `pseq` print $ longest (left ++ right) longestPalindr x = longest pals where pals = nub $ filter (\y -> y == reverse y) substrings substrings = concatMap inits $ tails x longest = maximumBy (compare `on` length) mList l = take (length l `div` 2) l nList l = drop (length l `div` 2) l
For reference, please read the Parallelchapter from Simon Marlow's book. http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-whnf As others have stated, using par from the Eval monad seems to be the correct approach here. Here is a simplified view of your problem. You can test it out by compiling with +RTS -threaded -RTSand then you can use Thread Scope to profile your performance. import Control.Parallel.Strategies import Data.List (maximumBy, subsequences) import Data.Ord isPalindrome :: Eq a => [a] -> Bool isPalindrome xs = xs == reverse xs -- * note while subsequences is correct, it is asymptotically -- inefficient due to nested foldr calls getLongestPalindrome :: Ord a => [a] -> Int getLongestPalindrome = length . maximum' . filter isPalindrome . subsequences where maximum' :: Ord a => [[a]] -> [a] maximum' = maximumBy $ comparing length --- Do it in parallel, in a monad -- rpar rpar seems to fit your case, according to Simon Marlow's book -- http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-whnf main :: IO () main = do let shorter = [2,3,4,5,4,3,2] longer = [1,2,3,4,5,4,3,2,1] result = runEval $ do a <- rpar $ getLongestPalindrome shorter b <- rpar $ getLongestPalindrome longer if a > b -- 'a > b' will always be false in this case then return (a,"shorter") else return (b,"longer") print result -- This will print the length of the longest palindrome along w/ the list name -- Don't forget to compile w/ -threaded and use ThreadScope to check -- performance and evaluation