Haskell - count most repeated element in a list with interact - haskell

I'm trying to make a Haskell program using interact, that returns the most used word and the # of times it appears. I've seen examples with sort- but I don't need to know the counts for all words, I only need the most repeated word. So far I have:
import Data.List -- (sort)
import Data.Char -- (isAlpha, toLower)
import Data.Ord -- (maximumBy)
main =
interact
$ unwords
-- comment: here show the size of the list and the word (probably head)
. maximumBy(comparing length)
. group
. sort
. words
. map (\char -> if isAlpha char then toLower char else ' ')
The above compiles. maximumBy gives the most used word like this:
[the, the, the, the, the, the, the, the...]
for the number of times the word "the" appears in the text; and I have verified that "the" is the most used word for the text I've supplied.
What I want to output is something like this: "the, 318"
I tried the following which only gives the first letter "t" and 3:
import Data.List -- sort
import Data.Char -- isAlpha, toLower
import Data.Ord -- maximumBy
main =
interact
$ unwords
. map (\(n, w) -> show n ++ ", " ++ show w)
. map (\s -> (length s, head s))
. maximumBy(comparing length)
. group
. sort
. words
. map (\char -> if isAlpha char then toLower char else ' ')
Which gives the output:
"3, 't' 3, 't' 3, 't' 3, 't' ..."
Anyone know what I'm doing wrong?

The map in map (\s -> (length s, head s)) means that the function \s -> (length s, head s) is applied to each "the" instead of to the list of "the"'s, repeatedly giving the length and the first character of "the". So removing the map should work better. You will also need to fix up the final two steps (remove the unwords and the map):
$ (\(n, w) -> show n ++ ", " ++ show w)
. (\s -> (length s, head s))
. maximumBy(comparing length)
More efficiently, you can apply map (\s -> (length s, head s)) earlier in the pipeline than the maximum, which allows you to
Avoid recomputing the length in each comparison the maximum function does
Use just plain maximum instead of maximumBy. (This may be slightly different in which word is chosen if there are two equally frequent ones, since it then compares the actual strings.)
In other words, you can use
$ (\(n, w) -> show n ++ ", " ++ show w)
. maximum
. map (\s -> (length s, head s))
Or to put it all together:
import Data.List (group, sort)
import Data.Char (isAlpha, toLower)
main =
interact
$ (\(n, w) -> show n ++ ", " ++ show w)
. maximum
. map (\s -> (length s, head s))
. group
. sort
. words
. map (\char -> if isAlpha char then toLower char else ' ')
Note also how I changed the import statements to use the official syntax for explicitly naming what you are importing. I'd strongly recommend that over using comments, as this actually gave me error messages pointing out one function (group) you had missed and one (maximumBy) you had listed with the wrong module.

Related

Filtering String from List Haskell

I'm trying to write a program that reads a text file, then displays the frequencies and count of words in the file. What I need to do next is filter certain words from the text file. I have been looking at online resources for a couple of hours and still can't find what I'm looking for!
I have provided my code for the program so far below:
lowercase = map toLower
top doc = wordPairs
where
listOfWords = words (lowercase doc)
wordGroups = group (sort listOfWords)
wordPairs = reverse
$ sort
$ map (\x -> (length x, head x))
$ filterWords
wordGroups
filterWords :: String -> String
filterWords = filter (all (`elem` ["poultry outwits ants"])) . words
It might be easier if you split the program in a different way. For example
import Data.List(group,sort)
import Control.Arrow((&&&))
freq :: Ord a => [a] -> [(Int,a)]
freq = reverse . sort . map (length &&& head) . group . sort
second part will be defining the input to this function. You want to filter only certain elements.
select :: Eq a => [a] -> [a] -> [a]
select list = filter (`elem` list)
these will make testing easier since you don't need the specific typed input.
Finally, you can tie it all together
freq $ select ["a","b","c"] $ words "a b a d e a b b b c d e c"
will give you
[(4,"b"),(3,"a"),(2,"c")]
There is my code which solve your problem
top :: String -> [(Int,String)] --Signature is always important
top = sorter . wordFrequency . groups . filtered --just compose `where` functions
where
-- This will filter your words
filtered = filter (`notElem` ["poultry","outwits","ants"]) . words . map toLower
-- Group your words
groups = group . sort
-- Create the pairs of (count, word)
wordFrequency = map (length &&& head)
-- Sort your list by first. for reverse just switch a and b
sorter = sortBy (\ a b -> fst b `compare` fst a)

Haskell - Trying to apply a function to lines of multiple numbers

I am new to Haskell and I am trying to apply a function (gcd) to input on standard in, which is line separated and each line contains no less or more than two numbers. Here is an example of my input:
3
10 4
1 100
288 240
I am currently breaking up each line into a tuple of both numbers, but I am having trouble figuring out how to separate these tuples and apply a function to them. Here is what I have so far:
import Data.List
main :: IO ()
main = do
n <- readLn :: IO Int
content <- getContents
let
points = map (\[x, y] -> (x, y)). map (map (read::String->Int)). map words. lines $ content
ans = gcd (fst points :: Int) (snd points :: Int)
print ans
Any information as two a good place to start looking for this answer would be much appreciated. I have read through the Learning Haskell tutorial and have not found any information of this particular problem.
You are pretty close. There is no reason to convert to a tuple or list of tuples before calling gcd.
main = do
contents <- getContents
print $ map ((\[x,y] -> gcd (read x) (read y)) . words) . lines $ contents
All the interesting stuff is between print and contents. lines will split the contents into lines. map (...) applies the function to each line. words splits the line into words. \[x,y] -> gcd (read x) (read y) will match on a list of two strings (and throw an error otherwise - not good practice in general but fine for a simple program like this), read those strings as Integers and compute their GCD.
If you want to make use of lazy IO, in order to print each result after you enter each line, you can change it as follows.
main = do
contents <- getContents
mapM_ (print . (\[x,y] -> gcd (read x) (read y)) . words) . lines $ contents
Or, you can do it in a more imperative style:
import Control.Monad
main = do
n <- readLn
replicateM_ n $ do
[x, y] <- (map read . words) `liftM` getLine
print $ gcd x y

Project Euler 8 - I don't understand it

I looked up for a solution in Haskell for the 8th Euler problem, but I don't quite understand it.
import Data.List
import Data.Char
euler_8 = do
str <- readFile "number.txt"
print . maximum . map product
. foldr (zipWith (:)) (repeat [])
. take 13 . tails . map (fromIntegral . digitToInt)
. concat . lines $ str
Here is the link for the solution and here you can find the task.
Could anyone explain me the solution one by one?
Reading the data
readFile reads the file "number.txt". If we put a small 16 digit number in a file called number.txt
7316
9698
8586
1254
Runing
euler_8 = do
str <- readFile "number.txt"
print $ str
Results in
"7316\n9698\n8586\n1254"
This string has extra newline characters in it. To remove them, the author splits the string into lines.
euler_8 = do
str <- readFile "number.txt"
print . lines $ str
The result no longer has any '\n' characters, but is a list of strings.
["7316","9698","8586","1254"]
To turn this into a single string, the strings are concatenated together.
euler_8 = do
str <- readFile "number.txt"
print . concat . lines $ str
The concatenated string is a list of characters instead of a list of numbers
"7316969885861254"
Each character is converted into an Int by digitToInt then converted into an Integer by fromInteger. On 32 bit hardware using a full-sized Integer is important since the product of 13 digits could be larger than 2^31-1. This conversion is mapped onto each item in the list.
euler_8 = do
str <- readFile "number.txt"
print . map (fromIntegral . digitToInt)
. concat . lines $ str
The resulting list is full of Integers.
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4]
Subsequences
The author's next goal is to find all of the 13 digit runs in this list of integers. tails returns all of the sublists of a list, starting at any position and running till the end of the list.
euler_8 = do
str <- readFile "number.txt"
print . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
This results in 17 lists for our 16 digit example. (I've added formatting)
[
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,6,9,8,8,5,8,6,1,2,5,4],
[9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,8,8,5,8,6,1,2,5,4],
[9,8,8,5,8,6,1,2,5,4],
[8,8,5,8,6,1,2,5,4],
[8,5,8,6,1,2,5,4],
[5,8,6,1,2,5,4],
[8,6,1,2,5,4],
[6,1,2,5,4],
[1,2,5,4],
[2,5,4],
[5,4],
[4],
[]
]
The author is going to pull a trick where we rearrange these lists to read off 13 digit long sub lists. If we look at these lists left-aligned instead of right-aligned we can see the sub sequences running down each column.
[
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,6,9,8,8,5,8,6,1,2,5,4],
[9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,8,8,5,8,6,1,2,5,4],
[9,8,8,5,8,6,1,2,5,4],
[8,8,5,8,6,1,2,5,4],
[8,5,8,6,1,2,5,4],
[5,8,6,1,2,5,4],
[8,6,1,2,5,4],
[6,1,2,5,4],
[1,2,5,4],
[2,5,4],
[5,4],
[4],
[]
]
We only want these columns to be 13 digits long, so we only want to take the first 13 rows.
[
[7,3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[3,1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[1,6,9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,6,9,8,8,5,8,6,1,2,5,4],
[9,6,9,8,8,5,8,6,1,2,5,4],
[6,9,8,8,5,8,6,1,2,5,4],
[9,8,8,5,8,6,1,2,5,4],
[8,8,5,8,6,1,2,5,4],
[8,5,8,6,1,2,5,4],
[5,8,6,1,2,5,4],
[8,6,1,2,5,4],
[6,1,2,5,4],
[1,2,5,4]
]
foldr (zipWith (:)) (repeat []) transposes a list of lists (explaining it belongs to perhaps another question). It discards the parts of the rows longer than the shortest row.
euler_8 = do
str <- readFile "number.txt"
print . foldr (zipWith (:)) (repeat [])
. take 13 . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
We are now reading the sub-sequences across the lists as usual
[
[7,3,1,6,9,6,9,8,8,5,8,6,1],
[3,1,6,9,6,9,8,8,5,8,6,1,2],
[1,6,9,6,9,8,8,5,8,6,1,2,5],
[6,9,6,9,8,8,5,8,6,1,2,5,4]
]
The problem
We find the product of each of the sub-sequences by mapping product on to them.
euler_8 = do
str <- readFile "number.txt"
print . map product
. foldr (zipWith (:)) (repeat [])
. take 13 . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
This reduces the lists to a single number each
[940584960,268738560,447897600,1791590400]
From which we must find the maximum.
euler_8 = do
str <- readFile "number.txt"
print . maximum . map product
. foldr (zipWith (:)) (repeat [])
. take 13 . tails
. map (fromIntegral . digitToInt)
. concat . lines $ str
The answer is
1791590400
If you're not familiar with the functions used, the first thing you should do is examine the types of each function. Since this is function composition, you apply from inside out (i.e. operations occur right to left, bottom to top when reading). We can walk through this line by line.
Starting from the last line, we'll first examine the types.
:t str
str :: String -- This is your input
:t lines
lines :: String -> [String] -- Turn a string into an array of strings splitting on new line
:t concat
concat :: [[a]] -> [a] -- Merge a list of lists into a single list (hint: type String = [Char])
Since type String = [Char] (so [String] is equivalent to [[Char]]), this line is converting the multi-line number into a single array of number characters. More precisely, it first creates an array of strings based on the full string. That is, one string per new line. It then merges all of these lines (now containing only number characters) into a single array of characters (or a single String).
The next line takes this new String as input. Again, let's observe the types:
:t digitToInt
digitToInt :: Char -> Int -- Convert a digit char to an int
:t fromIntegral
fromIntegral :: (Num b, Integral a) => a -> b -- Convert integral to num type
:t map
map :: (a -> b) -> [a] -> [b] -- Perform a function on each element of the array
:t tails
tails :: [a] -> [[a]] -- Returns all final segments of input (see: http://hackage.haskell.org/package/base-4.8.0.0/docs/Data-List.html#v:tails)
:t take
take :: Int -> [a] -> [a] -- Return the first n values of the list
If we apply these operations to our string current input, the first thing that happens is we map the composed function of (fromIntegral . digitToInt) over each character in our string. What this does is turn our string of digits into a list of number types. EDIT As pointed out below in the comments, the fromIntegral in this example is to prevent overflow on 32-bit integer types. Now that we have converted our string into actual numeric types, we start by running tails on this result. Since (by the problem statement) all values must be adjacent and we know that all of the integers are non-negative (by virtue of being places of a larger number), we take only the first 13 elements since we want to ensure our multiplication is groupings of 13 consecutive elements. How this works is difficult to understand without considering the next line.
So, let's do a quick experiment. After converting our string into numeric types, we now have a big list of lists. This is actually kind of hard to think about what we actually have here. For sake of understanding, the contents of the list are not very important. What is important is its size. So let's take a look at an artificial example:
(map length . take 13 . tails) [1..1000]
[1000,999,998,997,996,995,994,993,992,991,990,989,988]
You can see what we have here is a big list of 13 elements. Each element is a list of size 1000 (i.e. the full dataset) down to 988 in descending order. So this is what we currently have for input into the next line which is, arguably, the most difficult-- yet most important-- line to understand. Why understanding this is important should become clear as we walk through the next line.
:t foldr
foldr :: (a -> b -> b) -> b -> [a] -> b -- Combine values into a single value
:t zipWith
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c] -- Generalization of zip
:t (:)
(:) :: a -> [a] -> [a] -- Cons operator. Add element to list
:t repeat
repeat :: a -> [a] -- Infinite list containing specified value
Remember how I mentioned we had a list of 13 elements before (of varying-sized lists)? This is important now. The line is going to iterate over that list and apply (zipWith (:)) to it. The (repeat []) is such that each time zipWith is called on a subsequence, it starts with an empty list as its base. This allows us to construct a list of lists containing our adjacent subsequences of length 13.
Finally, we get to the last line which is pretty easy. That said, we should still be mindful of our types
:t product
product :: Num a => [a] -> a -- Multiply all elements of a list together and return result
:t maximum
maximum :: Ord a => [a] -> a -- Return maximum element in the list
The first thing we do is map the product function over each subsequence. When this has completed we end up with a list of numeric types (hey, we finally don't have a list of lists anymore!). These values are the products of each subsequence. Finally, we apply the maximum function which returns only the largest element in the list.
EDIT: I found out later what the foldr expression was for. (See comments bellow my answer).
I think that this could be expressed in different way - You can simply add a guard at the end of the list.
My verbose version of that solution would be:
import Data.List
import Data.Char
euler_8 = do
let len = 13
let str1 = "123456789\n123456789"
-- Join lines
let str2 = concat (lines str1)
-- Transform the list of characters into a list of numbers
let lst1 = map (fromIntegral . digitToInt) str2
-- EDIT: Add a guard at the end of list
let lst2 = lst1 ++ [-1]
-- Get all tails of the list of digits
let lst3 = tails lst2
-- Get first 13 digits from each tail
let lst4 = map (take len) lst3
-- Get a list of products
let prod = map product lst4
-- Find max product
let m = maximum prod
print m

How to capitalize a string using control lens?

I'm playing with the lens package and I'm trying to capitalize a string using only lens.
Basically I want to call toUpper on each first element of every words. That seems to be easy to with it, but I can't figure out at all how to do it. Do I need a traversable ? How do I split by spaces etc ...
It's not really an isomorphism to call words then unwords because it'll convert repeated spaces to single ones, but let's pretend:
words :: Iso' String [String]
words = iso Prelude.words Prelude.unwords
Now we can capitalize words by building a lens which focuses on the first letter of each word and applying over and toUpper
capitalize :: String -> String
capitalize = over (words . traverse . _head) toUpper
capitalize xs = xs & words <&> _head %~ toUpper & unwords
Okay, that's the solution, but how to get there? Lets remove some lens parts. Exchange (<&>) with fmap and (&) with ($):
capitalize xs = unwords $ fmap (_head %~ toUpper) $ words $ xs
This looks familar. _head %~ f will apply f on the first element of the list. At the end, this is (almost*) equivalent to
capitalize xs = unwords $ fmap (\(x:xs) -> toUpper x : xs) $ words $ xs
which you are probably familiar with.
* _head also takes care of the empty list case
A solution that doesn't collapse repeated spaces:
import Control.Lens
import Data.List.Split
import Data.List.Split.Lens
import Data.Char
capitalize :: String -> String
capitalize = view $ splitting (whenElt isSpace) traversed.to (over _head toUpper)

Haskell: how to operate the string type in a do block

I want to make a function that firstly divides a list l to two list m and n. Then create two thread to find out the longest palindrome in the two list. My code is :
import Control.Concurrent (forkIO)
import System.Environment (getArgs)
import Data.List
import Data.Ord
main = do
l <- getArgs
forkIO $ putStrLn $ show $ longestPalindr $ mList l
forkIO $ putStrLn $ show $ longestPalindr $ nList l
longestPalindr x =
snd $ last $ sort $
map (\l -> (length l, l)) $
map head $ group $ sort $
filter (\y -> y == reverse y) $
concatMap inits $ tails x
mList l = take (length l `div` 2) l
nList l = drop (length l `div` 2) l
Now I can compile it, but the result is a [ ]. When I just run the longestPalindr and mList , I get the right result. I thought the logic here is right. So what is the problem?
The question title may need to be changed, as this is no longer about type errors.
The functionality of the program can be fixed by simply mapping longestPalindr across the two halves of the list. In your code, you are finding the longest palindrome across [[Char]], so the result length is usually just 1.
I've given a simple example of par and pseq. This just suggests to the compiler that it may be smart to evaluate left and right independently. It doesn't guarantee parallel evaluation, but rather leaves it up to the compiler to decide.
Consult Parallel Haskell on the wiki to understand sparks, compile with the -threaded flag, then run it with +RTS -N2. Add -stderr for profiling, and see if there is any benefit to sparking here. I would expect negative returns until you start to feed it longer lists.
For further reading on functional parallelism, take a look at Control.Parallel.Strategies. Manually wrangling threads in Haskell is only really needed in nondeterministic scenarios.
import Control.Parallel (par, pseq)
import System.Environment (getArgs)
import Data.List
import Data.Ord
import Control.Function (on)
main = do
l <- getArgs
let left = map longestPalindr (mList l)
right = map longestPalindr (nList l)
left `par` right `pseq` print $ longest (left ++ right)
longestPalindr x = longest pals
where pals = nub $ filter (\y -> y == reverse y) substrings
substrings = concatMap inits $ tails x
longest = maximumBy (compare `on` length)
mList l = take (length l `div` 2) l
nList l = drop (length l `div` 2) l
For reference, please read the Parallelchapter from Simon Marlow's book.
http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-whnf
As others have stated, using par from the Eval monad seems to be the correct approach here.
Here is a simplified view of your problem. You can test it out by compiling with +RTS -threaded -RTSand then you can use Thread Scope to profile your performance.
import Control.Parallel.Strategies
import Data.List (maximumBy, subsequences)
import Data.Ord
isPalindrome :: Eq a => [a] -> Bool
isPalindrome xs = xs == reverse xs
-- * note while subsequences is correct, it is asymptotically
-- inefficient due to nested foldr calls
getLongestPalindrome :: Ord a => [a] -> Int
getLongestPalindrome = length . maximum' . filter isPalindrome . subsequences
where maximum' :: Ord a => [[a]] -> [a]
maximum' = maximumBy $ comparing length
--- Do it in parallel, in a monad
-- rpar rpar seems to fit your case, according to Simon Marlow's book
-- http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-whnf
main :: IO ()
main = do
let shorter = [2,3,4,5,4,3,2]
longer = [1,2,3,4,5,4,3,2,1]
result = runEval $ do
a <- rpar $ getLongestPalindrome shorter
b <- rpar $ getLongestPalindrome longer
if a > b -- 'a > b' will always be false in this case
then return (a,"shorter")
else return (b,"longer")
print result
-- This will print the length of the longest palindrome along w/ the list name
-- Don't forget to compile w/ -threaded and use ThreadScope to check
-- performance and evaluation

Resources