Expanding the abbreviated words from a file in Haskell - haskell

I am new at working with files in haskell.I wrote a code to check for occurence of words in a .c file. words are listed in a .txt file .
for example:
abbreviations.txt
ix=index
ctr=counter
tbl=table
Another file is:
main.c
main ()
{
ix = 1
for (ctr =1; ctr < 10; ctr++)
{
tbl[ctr] = ix
}
}
on encountering ix it should be expanded to index and same for ctr and tbl.
This is the code I wrote to check for occurrences(not yet to replace the encountered words)
import System.Environment
import System.IO
import Data.Char
import Control.Monad
import Data.Set
main = do
s <- getLine
f <- readFile "abbreviations.txt"
g <- readFile s
let dict = fromList (lines f)
mapM_ (spell dict) (words g)
spell d w = when (w `member` d) (putStrLn w)
On executing the code it is giving no output.
Instead of the upper code,I tried reading a file using hgetLine then converting it into list of words using words
getLines' h = do
isEOF <- hIsEOF h
if isEOF then
return ()
else
do
line <- hGetLine h
list<-remove (return (words line))
getLines' h
-- print list
main = do
inH <- openFile "abbreviations.txt" ReadMode
getLines' inH
hClose inH
remove [] = []
remove (x:xs)| x == "=" = remove xs
| otherwise = x:remove (xs)
But its giving me errors relating to IO() ,is there any other way in which I could do the following.
Where am I going wrong?
Thank you for any help.

First, there is a problem with your spell function. It should also have an else clause with it:
spell :: (Show a, Ord a) => Set a -> a -> IO ()
spell d w = if (w `member` d)
then print d
else return ()
Also, note that I have changed your putStrLn to print and added a type signature to your code.
On executing the code it is giving no output.
That's because, it's always going to the else clause in your spell function. If you try to trace up the execution of your program, then you will note that, your dict variable will actually contain this Set: ["ctr=counter","ix=index","tbl=table"] and it doesn't contains the words of the file main.c. I hope this will be sufficient to get you started.

Related

Remove digits from file

I'm trying to create new file without digits in strings
main :: IO ()
main = do
contents <- readFile "input1.txt"
putStr (process contents)
check = if isDigit x
x = "a"
process :: String -> String
process = map check
but getting that error: "Syntax error in expression (unexpected symbol "process")". What am I doing wrong?
In Haskell, if “statements” are actually expressions and must return a value. So you need to have an else block.
import Data.Char (isDigit)
check x = if isDigit x then 'a' else x
process :: String -> String
process = map check
main :: IO ()
main = do
contents <- readFile "input1.txt"
putStr (process contents)
Also, if you want to remove the digits, then filter is a better option than using map check. So you can refactor process to
process :: String -> String
process = filter (not . isDigit)

Read a list of integers lazily as a bytestring

I'm trying to find the sum of integers in a file. The code using the normal string is:
main = do
contents <- getContents
L.putStrLn (sumFile contents)
where sumFile = sum . map read. words
I tried to change it to use the Data.ByteString.Lazy module like this:
import Data.ByteString.Lazy as L
main = do
contents <- L.getContents
L.putStrLn (sumFile contents)
where sumFile = sum . L.map read. words
But this refused as words was returning a string. Then I tried using Data.ByteString.Char8 but it used a strict ByteString.
How can I make this function completely lazy?
I found a slightly length workaround to reading the file as a ByteString and then as a list of integers. Thanks to #melpomene
import Data.ByteString.Lazy.Char8 as L
main = do
contents <- L.getContents
print (sumFile contents)
where sumFile x = sum $ Prelude.map tups $ Prelude.map L.readInt (L.words x)
where read' = tups.(L.readInt)
tups :: (Num a) => (Maybe (a, b)) -> a
tups (Just (a,b)) = a
tups Nothing = 0

Why does this function not fail immediately?

I have the following piece of code. main gets the stdin text and sequences it through g, after which f prints it's output and returns an appropriate ExitCode which is commited using exitWith.
My question is why does this program, when run with the sample input, not terminate immediately after the first line (test) is entered, but only fails after it reads the second line (test2)? What I want to happen is for the g function to return immediately after parse1 returns Left "left: test" and not wait until the second line is entered.
Code:
import System.Exit
import Control.Monad
import Data.Either
type ErrType = String
parse1 :: String -> Either ErrType Int
parse1 "test" = Left "left: test"
parse1 _ = Left "left"
parse2 :: String -> Either ErrType Char
parse2 s = Right (head s)
g :: String -> Either String String
g str =
let l1:l2:ls = lines str
in either (Left . show) (Right . show) $ do
a <- parse1 l1
b <- parse2 l2
return "placeholder"
main = getContents >>= f.g >>= exitWith
where f (Right s) = putStrLn s >> return ExitSuccess
f (Left s) = putStrLn s >> return (ExitFailure 1)
Standard input stream:
test
test2
The line
let l1:l2:ls = lines str
means that to evaluate even just l1, the whole pattern l1:l2:ls needs to match, which means that a check needs to be done that str actually contains at least two lines. With lazy input, that causes the behavior you see.
You can fix it with an explicitly lazy pattern that defers the check for the second line:
let l1 : ~(l2:ls) = lines str
or, since a top pattern in a let is implicitly lazy, you could split it up like:
let l1:ls' = lines str
l2:ls = ls'

Why doesn't this code operate in constant memory?

I'm using Data.Text.Lazy to process some text files. I read in 2 files and distribute their text to 3 files according to some criteria. The loop which does the processing is go'. I've designed it in a way in which it should process the files incrementally and keep nothing huge in memory. However, as soon as the execution reaches the go' part the memory keeps on increasing till it reaches around 90MB at the end, starting from 2MB.
Can someone explain why this memory increase happens and how to avoid it?
import qualified Data.Text.Lazy as T
import qualified Data.Text.Lazy.IO as TI
import System.IO
import System.Environment
import Control.Monad
main = do
[in_en, in_ar] <- getArgs
[h_en, h_ar] <- mapM (`openFile` ReadMode) [in_en, in_ar]
hSetEncoding h_en utf8
en_txt <- TI.hGetContents h_en
let len = length $ T.lines en_txt
len `seq` hClose h_en
h_en <- openFile in_en ReadMode
hs#[hO_lm, hO_en, hO_ar] <- mapM (`openFile` WriteMode) ["lm.txt", "tun_"++in_en, "tun_"++in_ar]
mapM_ (`hSetEncoding` utf8) [h_en, h_ar, hO_lm, hO_en, hO_ar]
[en_txt, ar_txt] <- mapM TI.hGetContents [h_en, h_ar]
let txts#[_, _, _] = map T.unlines $ go len en_txt ar_txt
zipWithM_ TI.hPutStr hs txts
mapM_ (liftM2 (>>) hFlush hClose) hs
print "success"
where
go len en_txt ar_txt = go' (T.lines en_txt) (T.lines ar_txt)
where (q,r) = len `quotRem` 3000
go' [] [] = [[],[],[]]
go' en ar = let (h:bef, aft) = splitAt q en
(hA:befA, aftA) = splitAt q ar
~[lm,en',ar'] = go' aft aftA
in [bef ++ lm, h:en', hA:ar']
EDIT
As per #kosmikus's suggestion I've tried replacing zipWithM_ TI.hPutStr hs txts with a loop which prints line by line as shown below. The memory consumption is now 2GB+!
fix (\loop lm en ar -> do
case (en,ar,lm) of
([],_,lm) -> TI.hPutStr hO_lm $ T.unlines lm
(h:t,~(h':t'),~(lh:lt)) -> do
TI.hPutStrLn hO_en h
TI.hPutStrLn hO_ar h'
TI.hPutStrLn hO_lm lh
loop lt t t')
lm en ar
What's going on here?
The function go' builds a [T.Text] with three elements. The list is built lazily: in each step of go each of the three lists becomes known to a certain extent. However, you consume this structure by printing each element to a file in order, using the line:
zipWithM_ TI.hPutStr hs txts
So the way you consume the data does not match the way you produce the data. While printing the first of the three list elements to a file, the other two are built and kept in memory. Hence the space leak.
Update
I think that for the current example, the easiest fix would be to write to the target files during the loop, i.e., in the go' loop. I'd modify go' as follows:
go' :: [T.Text] -> [T.Text] -> IO ()
go' [] [] = return ()
go' en ar = let (h:bef, aft) = splitAt q en
(hA:befA, aftA) = splitAt q ar
in do
TI.hPutStrLn hO_en h
TI.hPutStrLn hO_ar hA
mapM_ (TI.hPutStrLn hO_lm) bef
go' aft aftA
And then replace the call to go and the subsequent zipWithM_ call with a plain call to:
go hs len en_txt ar_txt

How to do something with data from stdin, line by line, a maximum number of times and printing the number of line in Haskell

This code reads the number of lines to process from the first line of stdin, then it loops number_of_lines_to_process times doing some calculations and prints the result.
I want it to print the line number in "Line #" after "#" but I don't know how to obtain it
import IO
import Control.Monad (replicateM)
main :: IO ()
main = do
hSetBuffering stdin LineBuffering
s <- getLine
let number_of_lines_to_process = read s :: Integer
lines <- replicateM (fromIntegral(number_of_lines_to_process)) $ do
line <- getLine
let number = read line :: Integer
result = number*2 --example
putStrLn ("Line #"++": "++(show result)) --I want to print the number of the iteration and the result
return ()
I guess that the solution to this problem is really easy, but I'm not familiar with Haskell (coding in it for the first time) and I didn't find any way of doing this. Can anyone help?
You could use forM_ instead of replicateM:
import IO
import Control.Monad
main :: IO ()
main = do
hSetBuffering stdin LineBuffering
s <- getLine
let number_of_lines_to_process = read s :: Integer
forM_ [1..number_of_lines_to_process] (\i -> do
line <- getLine
let number = read line :: Integer
result = number * 2
putStrLn $ "Line #" ++ show i ++ ": " ++ show result)
Note that because you use forM_ (which discards the results of each iteration) you don't need the additional return () at the end - the do block returns the value of the last statement, which in this case is the () which is returned by forM_.
The trick is to first create a list of all the line numbers you want to print, and to then loop through that list, printing each number in turn. So, like this:
import Control.Monad
import System.IO
main :: IO ()
main = do
hSetBuffering stdin LineBuffering
s <- getLine
let lineCount = read s :: Int
-- Create a list of the line numbers
lineNumbers = [1..lineCount]
-- `forM_` is like a "for-loop"; it takes each element in a list and performs
-- an action function that takes the element as a parameter
forM_ lineNumbers $ \ lineNumber -> do
line <- getLine
let number = read line :: Integer
result = number*2 --example
putStrLn $ "Line #" ++ show lineNumber ++ ": " ++ show result
return ()
Read the definition of forM_.
By the way, I wouldn't recommend using the old Haskell98 IO library. Use System.IO instead.
You could calculate the results, enumerate them, and then print them:
import IO
import Control.Monad (replicateM)
-- I'm assuming you start counting from zero
enumerate xs = zip [0..] xs
main :: IO ()
main = do
hSetBuffering stdin LineBuffering
s <- getLine
let number_of_lines_to_process = read s :: Integer
lines <- replicateM (fromIntegral(number_of_lines_to_process)) $ do
line <- getLine
let number = read line :: Integer
result = number*2 --example
return result
mapM_ putStrLn [ "Line "++show i++": "++show l | (i,l) <- enumerate lines ]
I'm still new at Haskell, so there could be problems with the program below (it does work). This program is a tail recursive implementation. The doLine helper function carries around the line number. The processing step is factored into process, which you can change according to the problem you are presented.
import System.IO
import Text.Printf
main = do
hSetBuffering stdin LineBuffering
s <- getLine
let number_of_lines_to_process = read s :: Integer
processLines number_of_lines_to_process
return ()
-- This reads "max" lines from stdin, processing each line and
-- printing the result.
processLines :: Integer -> IO ()
processLines max = doLine 0
where doLine i
| i == max = return ()
| otherwise =
do
line <- getLine
let result = process line
Text.Printf.printf "Line #%d: %d\n" (i + 1) result
doLine (i + 1)
-- Just an example. (This doubles the input.)
process :: [Char] -> Integer
process line = let number = read line :: Integer
in
number * 2
I'm a haskell rookie, so any critiques of the above are welcome.
Just as an alternative, I thought that you might enjoy an answer with minimal monad mucking and no do notation. We zip a lazy list of the user's data with an infinite list of the line number using the enumerate function to give us our desired output.
import System.IO
import Control.Monad (liftM)
--Here's the function that does what you really want with the data
example = (* 2)
--Enumerate takes a function, a line number, and a line of input and returns
--an ennumerated line number of the function performed on the data
enumerate :: (Show a, Show b, Read a) => (a->b) -> Integer -> String -> String
enumerate f i x = "Line #" ++
show i ++
": " ++
(show . f . read $ x) -- show . f . read handles our string conversion
-- Runover takes a list of lines and runs
-- an enumerated version of the sample over those lines.
-- The first line is the number of lines to process.
runOver :: [String] -> [String]
runOver (line:lines) = take (read line) $ --We only want to process the number of lines given in the first line
zipWith (enumerate example) [1..] lines -- run the enumerated example
-- over the list of numbers and the list of lines
-- In our main, we'll use liftM to lift our functions into the IO Monad
main = liftM (runOver . lines) getContents

Resources