identifying number of words in a paragraph using haskell

identifying number of words in a paragraph using haskell - haskell

I am new to Haskell and functional programing. I have a .txt file which contains some paragraphs. I want to count the number of words in each paragraph, using Haskell.
I have written the input/output code
paragraph-words:: String -> int
no_of_words::IO()
no_of_words=
do
putStrLn "enter the .txt file name:"
fileName1<- getLine
text<- readFile fileName1
let wordscount= paragraph-words text
Can anyone help me to write the function paragraph-words. which will calculate the number of words in each paragraph.

First: you don't want to be bothered with dirty IO() any more than necessary, so the signature should be
wordsPerParagraph :: String -> [Int]
As for doing this: you should first split up the text in paragraphs. Counting the words in each of them is pretty trivial then.
What you basically need is match on empty lines (two adjacent newline characters). So I'd first use the lines function, giving you a list of lines. Then you separate these, at each empty line:
paragraphs :: String -> [String]
paragraphs = split . lines
where split [] = []
split (ln : "" : lns) = ln : split lns
split (ln : lns) = let (hd, tl) = splitAt 1 $ split lns
in (ln ++ hd) : tl

A list of lines can be split into paragraphs if one takes all lines until at least one empty line ("") is reached or the list is exhausted (1). We ignore all consecutive empty lines (2) and apply the same method for the rest of our lines:
type Line = String
type Paragraph = [String]
parify :: [Line] -> [Paragraph]
parify [] = []
parify ls
| null first = parify rest
| otherwise = first : parify rest
where first = takeWhile (/= "") ls -- (1) take until newline or end
rest = dropWhile (== "") . drop (length first) $ ls
-- ^ (2) drop all empty lines
In order to split a string into its lines, you can simply use lines. To get the number of words in a Paragraph, you simply sum over the number of words in each line
singleParagraphCount :: Paragraph -> Int
singleParagraphCount = sum . map lineWordCount
The words in each line are simply length . words:
lineWordCount :: Line -> Int
lineWordCount = length . words
So all in all we get the following function:
wordsPerParagraph :: String -> [Int]
wordsPerParagraph = map (singleParagraphCount) . parify . lines

First, you can't use - in a function name, you would have to use _ instead (or better, use camelCase as leftroundabout suggests below).
Here is a function which satisfies your type signature:
paragraph_words = length . words
This first splits the text into a list of words, then counts them by returning the length of that list of words.
However this does not completely solve the problem because you haven't written code to split your text into paragraphs.

Related

How can split a string with two conditions?

So basically I want to split my string with two conditions , when have a empty space or a diferent letter from the next one.
An example:
if I have this string ,"AAA ADDD DD", I want to split to this, ["AAA","A","DDD","DD"]
So I made this code:
sliceIt :: String -> [String]
sliceIt xs = words xs
But it only splits the inicial string when an empty space exists.
How can I also split when a caracter is next to a diferent one?
Can this problem be solve easier with recursion?

So you want to split by words and then group equal elements in each split. You have the functions for doing so,
import Data.List
sliceIt :: String -> [String]
sliceIt s = concatMap group $ words s
sliceItPointFree = concatMap group . words -- Point free notation. Same but cooler

split :: String -> [String]
split [] = []
split (' ':xs) = split xs
split (x:xs) = (takeWhile (== x) (x:xs)) : (split $ dropWhile (== x) (x:xs))
So this is a recursive definition where there are 2 cases:
If head is a space then ignore it.
Otherwise, take as many of the same characters as you can, then call the function on the remaining part of the string.

Getting text between two empty lines in haskell

Hey i am back with another haskell question. I asked a question in here and now i can get the empty lines perfectly, now i want to try to get the text between two specific empty lines in haskell.(For example i will get the text between the beginning and first empty line.) I can't think of any ways to do that in haskell because i can't understand the syntax and use it efficiently so i really need your help. My practice about doing some io stuff is like following;`
main=do{
readFile "/tmp/foo.txt" >>= print . length . filter (== '?');
readFile "/tmp/foo.txt" >>= print . length . words;
readFile "/tmp/foo.txt" >>= print . length . filter (== '.');
readFile "/tmp/foo.txt" >>= print . length . filter null . lines;
}`
With this, i can count the number of sentences, number of question marks, number of empty lines and so on. Now i want to get the text between two empty lines. I would be very pleased if you help me with this last exercise of mine that i couldn't solve. Thanks from now!

The easiest way is to use the functions lines, groupBy and filter
lines is used to split a String in a list of Strings (one line for each element)
groupBy groups then all lines that are non-empty - this should be the most difficult part you have to write a predicate that is true for two succeeding elements if they are non-empty: groupBy (\x y -> ???)
then filter out the elements of shape [""]
here some example usage in ghci
λ > import Data.List
λ > let groupify = ???
λ > l <- readFile "~/tmux.conf"
λ > map length $ groupify l
[4,7,3,1,4,2,2,5,4,3,3,2,7,4,4,4,3,3,3,2]
you can check with the contents of my tmux config file at my github-repo
UPDATE
the solution for this problem would be
groupify = filter (/= [""]) . groupBy (\x y -> x /= "" && y /= "") . lines

You can try pattern-matching, it literally says what it does:
betweenEmptyLines :: [String] -> [String]
betweenEmptyLines [] = []
betweenEmptyLines ("":line:"":rest) = line:(betweenEmptyLines $ "":rest)
betweenEmptyLines (line:rest) = betweenEmptyLines rest
How it works:
> betweenEmptyLines ["foo", "", "the bar", "", "and", "also", "", "the baz", "", "but", "not", "rest"]
["the bar","the baz"]

Haskell: trim String and eliminate multiple spaces

I have just started programming with Haskell and would like to do a String transformation.
I have an arbitrary String e.g.
" abcd \n dad "
I would like to remove the whitespace characters on the left and on the right. And I would like to eliminate multiple whitespaces as well as escape sequcences " \n " -> " "
So the String above would look like this
"abcd dad"
I have already written a function that trims the String and removes the whitespace characters (I'm removing the character if isSpace is true):
trim :: [Char] -> [Char]
trim x = dropWhileEnd isSpace (dropWhile isSpace x)
Now my idea is to do a pattern matching on the input String. But how do I apply the trim function directly to the input? So at first I would like to trim the String at both ends and then apply a pattern matching. So the only thing I would have to do is comparing two characters and removing one if both are whitespace characters
--How do I apply trim directly to the input
s :: [Char] -> [Char]
s [x] = [x]
s(x:xx) = ...
Note: Efficiency is not important. I would like to learn the concepts of pattern matching and understand how Haskell works.
Cheers

trim = unwords . words
Examine the source of words in the Prelude.

If you want to pattern-match on the output of trim, you have to call trim, of course! For example, if you want cases for lists of length 0, 1, and longer, you could use
s xs = case trim xs of
[] -> ...
[x] -> ...
x:x':xs -> ...

Your first pattern matches a single character and returns it. Surely this is not what you want - it could be whitespace. Your first match should be the empty list.
If you were only removing space chars, you could do something like this:
trim :: [Char] -> [Char]
trim [] = []
trim (' ':xs) = trim xs
...
You should be able to see that this removes all leading spaces. At this point, either the string is empty (and matches the first pattern) or it falls through to... leaving that up to you.
If you want to remove all whitespace, you need a list or set of those characters. That might look like this:
trim :: [Char] -> [Char]
trim = let whitespace = [' ', '\t\, `\v'] -- There are more than this, of course
in t
where
t [] = []
t (x:xs) | elem x whitespace = t xs
| otherwise = ...
Again, this has shown how to match the beginning part of the string. Leave it up to you to think about getting to the end.

You can also do pattern matching in a nested function:
s str = removeInnerSpaces (trim str)
where
removeInnerSpaces [] = []
removeInnerSpaces (x:xs) = ...
Here removeInnerSpaces is a nested function, local to s.

haskell front letters

So im educating myself for the future
firstLetter :: IO String
firstLetter = do
x <- getChar
if (x == ' ')
then return (show x)
else firstLetter
So it would get lines until the first line, that starts with empty char
how can I do it, so if empty line comes, it returns all head(x)
for example:
Liquid
Osone
Liquid
(empty line)
returns
"LOL"

Try this. The library function lines will split the input into lines for you, so all that is left is extracting the first character from each string in a list until one string is empty. An empty string is just a null list, so you can check for that to end the recursion over the list of strings.
firstLetters :: [String] -> String
firstLetters (x:xs)
| null x = []
| otherwise = head x : firstLetters xs
main = do
contents <- getContents
putStrLn . firstLetters . lines $ contents

Have you seen interact? This'll help you eliminate the IO and that always seems to make thing simpler for me and hopefully you too.
That reduces it to a problem that reads a string and returns a string.
Here's a rough go at it. getLines takes a string, breaks it into lines and consumes them (takeWhile) until it meets a line containing a single space (I wasn't sure on your ending condition, as the other poster says using null will stop at the first empty list). Then it goes over those lines and gets the first character of each (with map head).
getLines :: String -> String
getLines = map head . takeWhile (/= " ") . lines
main :: IO ()
main = interact getLines

Functional paragraphs

Sorry I don't quite get FP yet, I want to split a sequence of lines into a sequence of sequences of lines, assuming an empty line as paragraph division, I could do it in python like this:
def get_paraghraps(lines):
paragraphs = []
paragraph = []
for line in lines:
if line == "": # I know it could also be "if line:"
paragraphs.append(paragraph)
paragraph = []
else:
paragraph.append(line)
return paragraphs
How would you go about doing it in Erlang or Haskell?

I'm only a beginning Haskell programmer (and the little Haskell I learnt was 5 years ago), but for a start, I'd write the natural translation of your function, with the accumulator ("the current paragraph") being passed around (I've added types, just for clarity):
type Line = String
type Para = [Line]
-- Takes a list of lines, and returns a list of paragraphs
paragraphs :: [Line] -> [Para]
paragraphs ls = paragraphs2 ls []
-- Helper function: takes a list of lines, and the "current paragraph"
paragraphs2 :: [Line] -> Para -> [Para]
paragraphs2 [] para = [para]
paragraphs2 ("":ls) para = para : (paragraphs2 ls [])
paragraphs2 (l:ls) para = paragraphs2 ls (para++[l])
This works:
*Main> paragraphs ["Line 1", "Line 2", "", "Line 3", "Line 4"]
[["Line 1","Line 2"],["Line 3","Line 4"]]
So that's a solution. But then, Haskell experience suggests that there are almost always library functions for doing things like this :) One related function is called groupBy, and it almost works:
paragraphs3 :: [Line] -> [Para]
paragraphs3 ls = groupBy (\x y -> y /= "") ls
*Main> paragraphs3 ["Line 1", "Line 2", "", "Line 3", "Line 4"]
[["Line 1","Line 2"],["","Line 3","Line 4"]]
Oops. What we really need is a "splitBy", and it's not in the libraries, but we can filter out the bad ones ourselves:
paragraphs4 :: [Line] -> [Para]
paragraphs4 ls = map (filter (/= "")) (groupBy (\x y -> y /= "") ls)
or, if you want to be cool, you can get rid of the argument and do it the pointless way:
paragraphs5 = map (filter (/= "")) . groupBy (\x y -> y /= "")
I'm sure there is an even shorter way. :-)
Edit: ephemient points out that (not . null) is cleaner than (/= ""). So we can write
paragraphs = map (filter $ not . null) . groupBy (const $ not . null)
The repeated (not . null) is a strong indication that we really should abstract this out into a function, and this is what the Data.List.Split module does, as pointed out in the answer below.

I'm also trying to learn Haskell. A solution for this question could be:
paragraphs :: [String] -> [[String]]
paragraphs [] = []
paragraphs lines = p : (paragraphs rest)
where (p, rest) = span (/= "") (dropWhile (== "") lines)
where I'm using the functions from Data.List. The ones I'm using are already available from the Prelude, but you can find their documentation in the link.
The idea is to find the first paragraph using span (/= ""). This will return the paragraph, and the lines following. We then recurse on the smaller list of lines which I call rest.
Before splitting out the first paragraph, we drop any empty lines using dropWhile (== ""). This is important to eat the empty line(s) separating the paragraphs. My first attempt was this:
paragraphs :: [String] -> [[String]]
paragraphs [] = []
paragraphs lines = p : (paragraphs $ tail rest)
where (p, rest) = span (/= "") lines
but this fails when we reach the final paragraph since rest is then the empty string:
*Main> paragraphs ["foo", "bar", "", "hehe", "", "bla", "bla"]
[["foo","bar"],["hehe"],["bla","bla"]*** Exception: Prelude.tail: empty list
Dropping empty lines solves this, and it also makes the code treat any number of empty lines as a paragraph separator, which is what I would expect as a user.

The cleanest solution would be to use something appropriate from the split package.
You'll need to install that first, but then Data.List.Split.splitWhen null should do the job perfectly.

Think recursively.
get_paragraphs [] paras para = paras ++ [para]
get_paragraphs ("":ls) paras para = get_paragraphs ls (paras ++ [para]) []
get_paragraphs (l:ls) paras para = get_paragraphs ls paras (para ++ [l])

You want to group the lines, so groupBy from Data.List seems like a good candidate. It uses a custom function to determine which lines are "equal" so one can supply something that makes lines in the same paragraph "equal". For example:
import Data.List( groupBy )
inpara :: String -> String -> Bool
inpara _ "" = False
inpara _ _ = True
paragraphs :: [String] -> [[String]]
paragraphs = groupBy inpara
This has some limitations, since inpara can only compare two adjacent lines and more complex logic doesn't fit into the framework given by groupBy. A more elemental solution if is more flexible. Using basic recursion one can write:
paragraphs [] = []
paragraphs as = para : paragraphs (dropWhile null reminder)
where (para, reminder) = span (not . null) as
-- splits list at the first empty line
span splits a list at the point the supplied function becomes false (the first empty line), dropWhile removes leading elements for which the supplied function is true (any leading empty lines).

Better late than never.
import Data.List.Split (splitOn)
paragraphs :: String -> [[String]]
paragraphs s = filter (not . null) $ map words $ splitOn "\n\n" s
paragraphs "a\nb\n\nc\nd" == [["a", "b"], ["c", "d"]]
paragraphs "\n\na\nb\n\n\nc\nd\n\n\n" == [["a", "b"], ["c", "d"]]
paragraphs "\n\na\nb\n\n \n c\nd\n\n\n" == [["a", "b"], ["c", "d"]]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

identifying number of words in a paragraph using haskell - haskell

Related

How can split a string with two conditions?

Getting text between two empty lines in haskell

Haskell: trim String and eliminate multiple spaces

haskell front letters

Functional paragraphs

Categories

Resources