Histogram counting apostrophes as a word - haskell

I am to create a histogram which counts the top 20 most common words, excluding the top 20 in the world. This is the result i get below:
import Data.List(sort, group, sortBy)
toWordList = words
countCommonWords wordList = length (filter isCommon wordList)
where isCommon word = elem word commonWords
dropCommonWords wordList = filter isUncommon wordList
where isUncommon w = notElem w commonWords
commonWords = ["the","and","have","not","as","be","a","I","on", "you","to","in","it","with","do","of","that","for","he","at"]
countWords wordList = map (\w -> (head w, length w)) $group $ sort wordList
compareTuples (w1, n1) (w2, n2) = if n1 < n2 then LT else if n1> n2 then GT else EQ
sortWords wordList = reverse $ sortBy compareTuples wordList
toAsteriskBar x = (replicate (snd x) '*') ++ " -> " ++ (fst x) ++ "\n"
makeHistogram wordList = concat $ map toAsteriskBar (take 20 wordList)
--Do word list
text = "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way--in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only. there were a king with a large jaw and a queen with a plain face, on the throne of England; there were a king with a large jaw and a queen with a fair face, on the throne of France. In both countries it was clearer than crystal to the lords of the State preserves of loaves and fishes, that things in general were settled for ever of."
main = do
let wordlist = toWordList text
putStrLn "Report:"
putStrLn ("\t" ++ (show $ length wordlist) ++ " words")
putStrLn ("\t" ++ (show $ countCommonWords wordlist) ++ " common words")
putStrLn "\nHistogram of the most frequent words (excluding common words):\n"
putStr $ makeHistogram $ sortWords $ countWords $ dropCommonWords $ wordlist
Result:
Report:
186 words
71 common words
Histogram of the most frequent words (excluding common words):
************ -> was
***** -> were
**** -> we
** -> us,
** -> times,
** -> throne
** -> there
** -> season
** -> queen
** -> large
** -> king
** -> jaw
** -> its
** -> had
** -> going
** -> face,
** -> epoch
** -> direct
** -> before
** -> all
Does anybody know why the counter is counting any word with an apostrophe attached eg. us, as a whole word?

In Brief
toWordList = words
This is the function I'd modify to sanitize your words. For example, toWordList = map (filter isAlpha) . words so you get only those characters in words that are alphabetical instead of all blocks of characters that are divided by spaces (which is what words does). EDIT: isAlpha is from the Data.Char module which you'd need to import. Edited the above snippet to add map too.
Epilog
Moving forward, I'm just going to make some code comments because why not.
import Data.List(sort, group, sortBy)
Yay, using pre-existing code. You will probably also want comparing from Data.Ord.
countCommonWords wordList = length (filter isCommon wordList)
where isCommon word = elem word commonWords
dropCommonWords wordList = filter isUncommon wordList
where isUncommon w = notElem w commonWords
These operations are O(n * m) wherenis the length of wordList andmis the length ofcommonWords`. You could make this faster by using a Set if you desire.
commonWords = ["the","and","have","not","as","be","a","I"
,"on","you","to","in","it","with","do","of","that"
,"for","he","at"]
countWords wordList = map (\w -> (head w, length w)) $ group $ sort wordList
A similar performance comment here. A common method is to use Data.Map.insertWith to keep a counter for each word.
compareTuples (w1, n1) (w2, n2) = if n1 < n2 then LT else if n1> n2 then GT else EQ
This is more easily spelled compareTuples = comparing fst

Related

Word count in haskell

I'm working on this exercise:
Given a phrase, count the occurrences of each word in that phrase.
For the purposes of this exercise you can expect that a word will always be one of:
A number composed of one or more ASCII digits (ie "0" or "1234") OR
A simple word composed of one or more ASCII letters (ie "a" or "they") OR
A contraction of two simple words joined by a single apostrophe (ie "it's" or "they're")
When counting words you can assume the following rules:
The count is case insensitive (ie "You", "you", and "YOU" are 3 uses of the same word)
The count is unordered; the tests will ignore how words and counts are ordered
Other than the apostrophe in a contraction all forms of punctuation are ignored
The words can be separated by any form of whitespace (ie "\t", "\n", " ")
For example, for the phrase "That's the password: 'PASSWORD 123'!", cried the Special > Agent.\nSo I fled. the count would be:
that's: 1
the: 2
password: 2
123: 1
cried: 1
special: 1
agent: 1
so: 1
i: 1
fled: 1
My code:
module WordCount (wordCount) where
import qualified Data.Char as C
import qualified Data.List as L
import Text.Regex.TDFA as R
wordCount :: String -> [(String, Int)]
wordCount xs =
do
ys <- words xs
let zs = R.getAllTextMatches (ys =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map (C.toLower) w | w <- zs]
return (head g, length g)
But it fails on the input "one fish two fish red fish blue fish". It outputs one count for each word, even the repeated ones, as if the sort and group aren't doing anything. Why?
I've read this answer, which basically does the same thing in a more advanced way using Control.Arrow.
You don't need to use words to split the line, the regex should achieve the desired splitting:
wordCount :: String -> [(String, Int)]
wordCount xs =
do
let zs = R.getAllTextMatches (xs =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map C.toLower w | w <- zs]
return (head g, length g)
wordCount xs =
do
ys <- words xs
let zs = R.getAllTextMatches (ys =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map (C.toLower) w | w <- zs]
return (head g, length g)
You’re splitting the input xs into words by whitespace using words. You iterate over these in the list monad with the binding statement ys <- …. Then you split each of those words into subwords using the regular expression, of which there happens to be only one match in your example. You sort and group each of the subwords in a list by itself.
I believe you can essentially just delete the initial call to words:
wordCount xs =
do
let ys = R.getAllTextMatches (xs =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map C.toLower w | w <- ys]
return (head g, length g)

Outputting Pascal's triangle

import Data.List (intercalate)
import Control.Concurrent (threadDelay)
import System.IO
-- I love how amazingly concise Haskell code can be. This same program in C, C++ or Java
-- would be at least twice as long.
pascal :: Int -> Int -> Int
pascal row col | col >= 0 && col <= row =
if row == 0 || col == 0 || row == col
then 1
else pascal (row - 1) (col - 1) + pascal (row - 1) col
pascal _ _ = 0
pascalsTriangle :: Int -> [[Int]]
pascalsTriangle rows =
[[pascal row col | col <- [0..row]] | row <- [0..rows]]
main :: IO ()
main = do
putStrLn ""
putStr "Starting at row #0, how many rows of Pascal's Triangle do you want to print out? "
hFlush stdout
numRows <- (\s -> read s :: Int) <$> getLine
let triangle = pascalsTriangle numRows
triangleOfStrings = map (intercalate ", ") $ map (map show) triangle
lengthOfLastDiv2 = div ((length . last) triangleOfStrings) 2
putStrLn ""
mapM_ (\s -> let spaces = [' ' | x <- [1 .. lengthOfLastDiv2 - div (length s) 2]]
in (putStrLn $ spaces ++ s) >> threadDelay 200000) triangleOfStrings
putStrLn ""
My little program above finds the values of Pascal's Triangle. But if you compile it and use it you'll see that the "triangle" looks more like a Christmas tree than a triangle! Ouch!
I'm just taking half the length of the last line and subtracting from that half the length of each preceding line, and creating that many blank spaces to add to the beginning of each string. It ALMOST works, but I'm looking for an equilateral triangle type of effect, but to me it resembles a sloping Christmas tree! Is there a better way to do this. What am I missing besides some programming talent?! Thanks for the help. THIS IS NOT A HOMEWORK ASSIGNMENT. I'm just doing this for fun and recreation. I appreciate the help.
Best.
Douglas Lewit.
Here's a straightforward implementation:
space n = replicate n ' '
pad n s | n < length s = take n s
pad n s = space (n - length s) ++ s
triangle = iterate (\ xs -> zipWith (+) (xs ++ [0]) (0:xs)) [1]
rowPrint n hw xs = space (n * hw) ++ concatMap (pad (2*hw) . show) xs
triRows n hw = [rowPrint (n-i) hw row | (i,row) <- zip [1..n] triangle]
main = do
s <- getLine
mapM_ putStrLn (triRows (read s) 2)
Note that triangle is an infinite Pascal's triangle, generated by the recurrence relation. Also, hw stands for "half-width": half the width allocated for printing a number, and pad is a strict left-pad that truncates the output rather than disrupt the formatting.

Haskell: Given a list of numbers and a number k, return whether any two numbers from the list add up to k

Given a list of numbers and a number k, return whether any two numbers from the list add up to k.
For example, given [10, 15, 3, 7] and k of 17, return true since 10 + 7 is 17.
The program must prompt the user for input.
The program must accept the list as a collection of comma separated values.
The values should all be integers.
The input list can be between 1 and 42 number long.
What I have done
I have been able to input the list of integer as a list and seperated by commas but am not able to return true when 2 numbers add to k
toList :: String -> [Integer]
toList input = read ("[" ++ input ++ "]")
main = do
putStrLn "Enter a list of numbers (separated by comma):"
input <- getLine
print $ k (toList input)
There are following approaches.
1) Create a list pf pairs which are all combinations [(10,10),(10,15),..,(15,10),(15,3)..].
Now you can use simple any function on this list to check if any pair add up to given number.
getCoupleList :: [a]->[(a,a)]
getCoupleList [] = []
getCoupleList [x] = []
getCoupleList (x:xs) = map (\y->(x,y)) xs ++ getCoupleList xs
getSumOfCoupleList :: Num a => [(a,a)]->[a]
getSumOfCoupleList xs = map (\x -> fst x + snd x) xs
isSum :: [Int]->Int->Bool
isSum xs k = any (==k) $ (getSumOfCoupleList.getCoupleList) xs
or directly check wuthout getSumOfCoupleList
isSum xs k = any (\(a,b)-> a + b == k) $ (getSumOfCoupleList.getCoupleList) xs
If you check creating the couple list and finding the sum in not needed. We can directly get the list of sum with simple changes.
getSumList :: Num a=>[a]->[a]
getSumList [] = []
getSumList [x] = []
getSumList (x:xs) = map (+x) xs ++ getSumList xs
isSum1 :: [Int]->Int->Bool
isSum1 xs k = any (==k) $ getSumList xs
2) Create another list from given list by subtracting every element from 17. Now just check if any number from first list is present in second.
isSum2 :: [Int]->Int->Bool
isSum2 xs k = let newList = map (k-) xs
intersectList = xs `intersect` newList
in not (null intersectList)
It's a naive method, not optimized and just show an example.
toList :: String -> [Integer]
toList input = read ("[" ++ input ++ "]")
check :: Integer -> [Integer] -> Bool
check k (x:xs) = if ((k-x) `elem` xs)
then True
else (check k xs)
check k x = False
main = do
let k = 12
putStrLn "Enter a list of numbers (separated by comma):"
input <- getLine
print $ (check k (toList input))
I was recently asked the same exact question in an interview, here's one of my answer
import util
arr = [10, 15, 3, 8]
k = 17
for i in range(0, len(arr)):
arr_new.append(abs(arr[i] -17))
res= list(set(arr).intersection(arr_new))
if (len(res)>0):
print(str(res[0]) + " + " + str(res[1]) +"= "+ str(k ))
else:
print("No numbers add up to k")

rotate a string in haskell with some exceptions

I want to rotate a string in haskell, so if I give "Now I want to scream" to rotate [[want to scream now I],[scream now I want to]] , if the string start with "I" or "to" then must eliminate it. Till now I still have problems with the rotation.
reverseWords :: String -> String
reverseWords = unwords . reverse . words
shiftt :: [a] -> Int -> [a]
shiftt l n = drop n l ++ take n l
rot::String->[String]
rot l = [ reverseWords l i | i <- [0 .. (length l) -1]]
create a list of all rotations, then filter out based on your predicate. For example,
rotations x = take (length x) $ iterate rot1 x
where rot1 = drop 1 x ++ take 1 x
filteredRots = map unwords . filter (\x -> length (head x) > 2) . rotations . words
and use as
> filteredRots "Now I want to scream"
["Now I want to scream","want to scream Now I","scream Now I want to"]
Prelude>

Accessing a list entry and showing its values

I have a function
(.#.) :: [a] -> Integer -> a -- 1-indexing with 'Integer'
xs .#. j = xs !! (fromIntegral $ j-1)
showIntegers :: [Integer] -> String
showIntegers r = let
str = concat $ "List: " : [r (.#.) j | j <- [1..length r]]
How can I show r (.#.) j as a Char/String rather than an integer? I tried using show, but it gave me an error.
Here is an example of how I used show:
str = concat $ "List: " : [show $ r (.#.) j | j <- [1..length r]]
Example input and output:
> showIntegers [1,2,3]
List: 1 2 3
You should just use Data.List.intercalate or even better use unwords.
import Data.List
showIntegers :: [Integer] -> String
showIntegers r = "List: " ++ intercalate " " $ map show r
--showIntegers r = "List: " ++ unwords $ map show r
EDIT: In either case you should avoid using !! especially to enumerate the original list.
First I would get rid of .#. it is just going to confuse you to use a different numbering system, best to rip that bandaid off.
Next realize that [show $ r !! j <- 0 .. length r - 1] is the same as map show r (and the latter is standard).
Now going with that you have: concat $ "List: " : (map show r) which creates List: 123 because we lost the spaces.
We could reproduce the spaces but what is the difference between using intercalate and concat? Honestly the best solution without using intercalate would be to reproduce intercalate (whose source code is available on Hackage).
Just remove the parenthesis around (.#.) and it works.
If you have an infix operator !#$ , with something before and after it, e.g. x !#$ y, you must not use parentheses. In the other cases, add parentheses, like in the type declaration.
(this technically answers the question, but Guvante's advice is better.)

Resources