Getting text between two empty lines in haskell - haskell

Hey i am back with another haskell question. I asked a question in here and now i can get the empty lines perfectly, now i want to try to get the text between two specific empty lines in haskell.(For example i will get the text between the beginning and first empty line.) I can't think of any ways to do that in haskell because i can't understand the syntax and use it efficiently so i really need your help. My practice about doing some io stuff is like following;`
main=do{
readFile "/tmp/foo.txt" >>= print . length . filter (== '?');
readFile "/tmp/foo.txt" >>= print . length . words;
readFile "/tmp/foo.txt" >>= print . length . filter (== '.');
readFile "/tmp/foo.txt" >>= print . length . filter null . lines;
}`
With this, i can count the number of sentences, number of question marks, number of empty lines and so on. Now i want to get the text between two empty lines. I would be very pleased if you help me with this last exercise of mine that i couldn't solve. Thanks from now!

The easiest way is to use the functions lines, groupBy and filter
lines is used to split a String in a list of Strings (one line for each element)
groupBy groups then all lines that are non-empty - this should be the most difficult part you have to write a predicate that is true for two succeeding elements if they are non-empty: groupBy (\x y -> ???)
then filter out the elements of shape [""]
here some example usage in ghci
λ > import Data.List
λ > let groupify = ???
λ > l <- readFile "~/tmux.conf"
λ > map length $ groupify l
[4,7,3,1,4,2,2,5,4,3,3,2,7,4,4,4,3,3,3,2]
you can check with the contents of my tmux config file at my github-repo
UPDATE
the solution for this problem would be
groupify = filter (/= [""]) . groupBy (\x y -> x /= "" && y /= "") . lines

You can try pattern-matching, it literally says what it does:
betweenEmptyLines :: [String] -> [String]
betweenEmptyLines [] = []
betweenEmptyLines ("":line:"":rest) = line:(betweenEmptyLines $ "":rest)
betweenEmptyLines (line:rest) = betweenEmptyLines rest
How it works:
> betweenEmptyLines ["foo", "", "the bar", "", "and", "also", "", "the baz", "", "but", "not", "rest"]
["the bar","the baz"]

Related

Haskell - Removing non-letter characters but ignoring white spaces?

I am very new to Haskell. I am trying to return a list of strings from a given string (which could contain non-letter characters) but I get a single string in the list.
The below code shows What I have tried so far:
toLowerStr xs = map toLower xs
--drop non-letters characters
dropNonLetters xs = words $ (filter (\x -> x `elem` ['a'..'z'])) $ toLowerStr xs
lowercase all the characters by using toLower function
remove non-letter characters by using filter function
return a list of strings by using words function
I think the filter function is removing the white spaces and therefore it becomes a single string. I tried using isSpace function but I don't know exactly how to implement it in this case.
What is it that I am doing wrong? I get this output:
λ> dropNonLetters "ORANGE, apple! APPLE!!"
["orangeappleapple"]
But I want to achieve the below output:
λ> dropNonLetters "ORANGE, apple! APPLE!!"
["orange","apple","apple"]
I think the filter function is removing the white spaces and therefore it becomes a single string.
That is correct. As filter predicate you write \x -> x `elem` ['a'..'z']. ['a'..'z'] is a list that contains lowercase letters, so for whitespace, the predicate will fail, and thus you should allow spaces as well.
We can for instance add the space character to the list:
dropNonLetters xs = words $ (filter (\x -> x `elem` (' ':['a'..'z'])))) $ toLowerStr xs
But this is inelegant and does not really explain itself. The Data.Char module however ships with two functions that are interesting here: isLower :: Char -> Bool, and isSpace :: Char -> Bool. We can use this like:
dropNonLetters xs = words $ (filter (\x -> isLower x || isSpace x)) $ toLowerStr xs
isLower and isSpace are not only more "descriptive" and elegant. Usually these functions will be faster than a membership check (which will usually be done in O(n)), and furthermore it will also take into account tabs, new lines, etc.
We can also perform an eta-reduction on the function:
dropNonLetters = words . (filter (\x -> isLower x || isSpace x)) . toLowerStr
This then produces:
Prelude Data.Char> dropNonLetters "ORANGE, apple! APPLE!!"
["orange","apple","apple"]
I advise you to rename the function dropNonLetters, since now it does not fully explain that it will generate a list of words. Based on the name, I would think that it only drops non-letters, not that it converts the string to lowercase nor that it constructs words.
here's an example of separating characters into separate string lists:
sortNumbers :: [Char] -> [String]
sortNumbers args = filter (\strings ->strings/= "") $ zipWith (\x numbers -> filter (\char -> char == numbers) x) (repeat args)
['1'..'9']

Haskell - Trying to apply a function to lines of multiple numbers

I am new to Haskell and I am trying to apply a function (gcd) to input on standard in, which is line separated and each line contains no less or more than two numbers. Here is an example of my input:
3
10 4
1 100
288 240
I am currently breaking up each line into a tuple of both numbers, but I am having trouble figuring out how to separate these tuples and apply a function to them. Here is what I have so far:
import Data.List
main :: IO ()
main = do
n <- readLn :: IO Int
content <- getContents
let
points = map (\[x, y] -> (x, y)). map (map (read::String->Int)). map words. lines $ content
ans = gcd (fst points :: Int) (snd points :: Int)
print ans
Any information as two a good place to start looking for this answer would be much appreciated. I have read through the Learning Haskell tutorial and have not found any information of this particular problem.
You are pretty close. There is no reason to convert to a tuple or list of tuples before calling gcd.
main = do
contents <- getContents
print $ map ((\[x,y] -> gcd (read x) (read y)) . words) . lines $ contents
All the interesting stuff is between print and contents. lines will split the contents into lines. map (...) applies the function to each line. words splits the line into words. \[x,y] -> gcd (read x) (read y) will match on a list of two strings (and throw an error otherwise - not good practice in general but fine for a simple program like this), read those strings as Integers and compute their GCD.
If you want to make use of lazy IO, in order to print each result after you enter each line, you can change it as follows.
main = do
contents <- getContents
mapM_ (print . (\[x,y] -> gcd (read x) (read y)) . words) . lines $ contents
Or, you can do it in a more imperative style:
import Control.Monad
main = do
n <- readLn
replicateM_ n $ do
[x, y] <- (map read . words) `liftM` getLine
print $ gcd x y

Haskell filter more than one char

I have a question. I have this line of code here.
map length [filter (/= ' ') someString]
I know it removes the space from someString. But is it possible to remove more than just the space from the string using filter? Let just say remove the spaces and some other char.
Thanks!
You can just do:
filter (not . flip elem "<your chars here>")
Example:
ghci> filter (not . flip elem " .,") "This is an example sentence, which uses punctuation."
"Thisisanexamplesentencewhichusespunctuation"
Just to put the comment in here: To filter out letters of both cases (a-z and A-Z), you should probably use Data.Char's isLetter function.
A slightly shorter way is to use notElem:
λ> filter (flip notElem " f") "foo bar"
"oobar"
An alternative is to use a fold to get the length, if you don't actually need to return the filtered string but only want its length.
f x charlist = foldl (\acc x -> if not $ x `elem` charlist then acc + 1 else acc) 0 x
charlist is your list of banned characters. If you have a fixed list always, you can directly define it as a top-level value and use it in the function body instead of passing it as a parameter.
A more concise version using list comprehensions:
f x = length [z | z <- x, notElem z charlist]

haskell front letters

So im educating myself for the future
firstLetter :: IO String
firstLetter = do
x <- getChar
if (x == ' ')
then return (show x)
else firstLetter
So it would get lines until the first line, that starts with empty char
how can I do it, so if empty line comes, it returns all head(x)
for example:
Liquid
Osone
Liquid
(empty line)
returns
"LOL"
Try this. The library function lines will split the input into lines for you, so all that is left is extracting the first character from each string in a list until one string is empty. An empty string is just a null list, so you can check for that to end the recursion over the list of strings.
firstLetters :: [String] -> String
firstLetters (x:xs)
| null x = []
| otherwise = head x : firstLetters xs
main = do
contents <- getContents
putStrLn . firstLetters . lines $ contents
Have you seen interact? This'll help you eliminate the IO and that always seems to make thing simpler for me and hopefully you too.
That reduces it to a problem that reads a string and returns a string.
Here's a rough go at it. getLines takes a string, breaks it into lines and consumes them (takeWhile) until it meets a line containing a single space (I wasn't sure on your ending condition, as the other poster says using null will stop at the first empty list). Then it goes over those lines and gets the first character of each (with map head).
getLines :: String -> String
getLines = map head . takeWhile (/= " ") . lines
main :: IO ()
main = interact getLines

Functional paragraphs

Sorry I don't quite get FP yet, I want to split a sequence of lines into a sequence of sequences of lines, assuming an empty line as paragraph division, I could do it in python like this:
def get_paraghraps(lines):
paragraphs = []
paragraph = []
for line in lines:
if line == "": # I know it could also be "if line:"
paragraphs.append(paragraph)
paragraph = []
else:
paragraph.append(line)
return paragraphs
How would you go about doing it in Erlang or Haskell?
I'm only a beginning Haskell programmer (and the little Haskell I learnt was 5 years ago), but for a start, I'd write the natural translation of your function, with the accumulator ("the current paragraph") being passed around (I've added types, just for clarity):
type Line = String
type Para = [Line]
-- Takes a list of lines, and returns a list of paragraphs
paragraphs :: [Line] -> [Para]
paragraphs ls = paragraphs2 ls []
-- Helper function: takes a list of lines, and the "current paragraph"
paragraphs2 :: [Line] -> Para -> [Para]
paragraphs2 [] para = [para]
paragraphs2 ("":ls) para = para : (paragraphs2 ls [])
paragraphs2 (l:ls) para = paragraphs2 ls (para++[l])
This works:
*Main> paragraphs ["Line 1", "Line 2", "", "Line 3", "Line 4"]
[["Line 1","Line 2"],["Line 3","Line 4"]]
So that's a solution. But then, Haskell experience suggests that there are almost always library functions for doing things like this :) One related function is called groupBy, and it almost works:
paragraphs3 :: [Line] -> [Para]
paragraphs3 ls = groupBy (\x y -> y /= "") ls
*Main> paragraphs3 ["Line 1", "Line 2", "", "Line 3", "Line 4"]
[["Line 1","Line 2"],["","Line 3","Line 4"]]
Oops. What we really need is a "splitBy", and it's not in the libraries, but we can filter out the bad ones ourselves:
paragraphs4 :: [Line] -> [Para]
paragraphs4 ls = map (filter (/= "")) (groupBy (\x y -> y /= "") ls)
or, if you want to be cool, you can get rid of the argument and do it the pointless way:
paragraphs5 = map (filter (/= "")) . groupBy (\x y -> y /= "")
I'm sure there is an even shorter way. :-)
Edit: ephemient points out that (not . null) is cleaner than (/= ""). So we can write
paragraphs = map (filter $ not . null) . groupBy (const $ not . null)
The repeated (not . null) is a strong indication that we really should abstract this out into a function, and this is what the Data.List.Split module does, as pointed out in the answer below.
I'm also trying to learn Haskell. A solution for this question could be:
paragraphs :: [String] -> [[String]]
paragraphs [] = []
paragraphs lines = p : (paragraphs rest)
where (p, rest) = span (/= "") (dropWhile (== "") lines)
where I'm using the functions from Data.List. The ones I'm using are already available from the Prelude, but you can find their documentation in the link.
The idea is to find the first paragraph using span (/= ""). This will return the paragraph, and the lines following. We then recurse on the smaller list of lines which I call rest.
Before splitting out the first paragraph, we drop any empty lines using dropWhile (== ""). This is important to eat the empty line(s) separating the paragraphs. My first attempt was this:
paragraphs :: [String] -> [[String]]
paragraphs [] = []
paragraphs lines = p : (paragraphs $ tail rest)
where (p, rest) = span (/= "") lines
but this fails when we reach the final paragraph since rest is then the empty string:
*Main> paragraphs ["foo", "bar", "", "hehe", "", "bla", "bla"]
[["foo","bar"],["hehe"],["bla","bla"]*** Exception: Prelude.tail: empty list
Dropping empty lines solves this, and it also makes the code treat any number of empty lines as a paragraph separator, which is what I would expect as a user.
The cleanest solution would be to use something appropriate from the split package.
You'll need to install that first, but then Data.List.Split.splitWhen null should do the job perfectly.
Think recursively.
get_paragraphs [] paras para = paras ++ [para]
get_paragraphs ("":ls) paras para = get_paragraphs ls (paras ++ [para]) []
get_paragraphs (l:ls) paras para = get_paragraphs ls paras (para ++ [l])
You want to group the lines, so groupBy from Data.List seems like a good candidate. It uses a custom function to determine which lines are "equal" so one can supply something that makes lines in the same paragraph "equal". For example:
import Data.List( groupBy )
inpara :: String -> String -> Bool
inpara _ "" = False
inpara _ _ = True
paragraphs :: [String] -> [[String]]
paragraphs = groupBy inpara
This has some limitations, since inpara can only compare two adjacent lines and more complex logic doesn't fit into the framework given by groupBy. A more elemental solution if is more flexible. Using basic recursion one can write:
paragraphs [] = []
paragraphs as = para : paragraphs (dropWhile null reminder)
where (para, reminder) = span (not . null) as
-- splits list at the first empty line
span splits a list at the point the supplied function becomes false (the first empty line), dropWhile removes leading elements for which the supplied function is true (any leading empty lines).
Better late than never.
import Data.List.Split (splitOn)
paragraphs :: String -> [[String]]
paragraphs s = filter (not . null) $ map words $ splitOn "\n\n" s
paragraphs "a\nb\n\nc\nd" == [["a", "b"], ["c", "d"]]
paragraphs "\n\na\nb\n\n\nc\nd\n\n\n" == [["a", "b"], ["c", "d"]]
paragraphs "\n\na\nb\n\n \n c\nd\n\n\n" == [["a", "b"], ["c", "d"]]

Resources