Haskell readFile with text encoding as argument - haskell

I needed a version of readFile taking text encoding as its argument. I ended up with following:
readFile' e name = openFile name ReadMode >>= (flip hSetEncoding $ e) >&&> hGetContents
f >&&> g = \x -> f x >> g x
Is there a better way to do this?
It seems like the thing I defined as >&&> should be something standard but I couldn't find it.
Thanks,
Adam

It's liftM2 (>>), with Control.Monad.Instances imported. There's no more succinct version of it in the standard libraries.

I think the simple "do block" approach reads nicely for this, not that it is any more succinct.
readFile' e name = do {h <- openFile name ReadMode; hSetEncoding h e; hGetContents h}

Related

How to check that two files are equal in Haskell

I'm learning Haskell and I need to compare two files.
I did not find a function that does this, so I coded it myself. Below is the function I came up with.
cmpFiles :: FilePath -> FilePath -> IO Bool
cmpFiles a b = withBinaryFile a ReadMode $ \ha ->
withBinaryFile b ReadMode $ \hb ->
fix (\loop -> do
isEofA <- hIsEOF ha
isEofB <- hIsEOF hb
if | isEofA && isEofB -> return True -- both files reached EOF
| isEofA || isEofB -> return False -- only one reached EOF
| otherwise -> do -- read content
x <- hGet ha 4028 -- TODO: How to use a constant?
y <- hGet hb 4028 -- TODO: How to use a constant?
if x /= y
then return False -- different content
else loop -- same content, contunue...
)
My questions are:
Is this code idiomatic? It looks very imperative rather than functional.
Is this code efficient (Layz IO issues with big files, performance...)?
Is there a more compact way to write it?
How about
cmpFiles a b = do
aContents <- readFile a
bContents <- readFile b
return (aContents == bContents)
You can even make a one-liner out of it:
cmpFiles a b = liftM2 (==) (readFile a) (readFile b)
This one is actually equivalent to Reid Barton's solution. Equivalent is not a weasel word here, if you take the definition of liftM2 from hackage
liftM2 f m1 m2 = do { x1 <- m1; x2 <- m2; return (f x1 x2) }
and insert (==) and the readFiles you are there immediately.
Laziness is your friend in haskell. The documentation of readFile states that the input is read lazily, i.e. only on demand. == is lazy, too. Thus the entire liftM22 ... reads the files only until it finds a difference.
cmpFiles a b = (==) <$> readFile a <*> readFile b

Why I can't use the (cnt <- hGetContents h) expression instead of cnt?

I learn Haskell. It works fine:
import System.IO
main = do
h <- openFile "text.txt" ReadMode
cnt <- hGetContents h
mapM_ putStrLn $ lines cnt
hClose h
But this isn't working:
import System.IO
main = do
h <- openFile "text.txt" ReadMode
mapM_ putStrLn $ lines (cnt <- hGetContents h)
hClose h
Why my second variant isn't working? I expected both variants are equal, because the (cnt <- hGetContents h) is an expression and returns the value too.
The problem is that cnt <- hGetContents h is not an expression, it's some special syntax sugar inside do notation. This means it is a different way of writing the following normal Haskell code:
hGetContents h >>= \ cnt -> {- rest of do block -}
The part before the {- rest of the do block -} is not a whole expression here, because the rest of the do block is needed to complete the lambda's body.
You could desugar it manually to get something like:
hGetContents h >>= \ cnt -> mapM_ putStrLn (lines cnt)
or the point-free version
hGetContents h >>= mapM_ putStrLn . lines
You can tell that it's a special expression because it introduces a new identifier (cnt) that you can use in the rest of your code, outside of the expression itself. This is not something that normal Haskell expressions get to do (at least without compile-time magic).
cnt <- hGetContents h is essentially syntactical sugar for hGetContents h >>= \cnt ->.
It is not an expression, it is sugar intended for its own line in a do-block.
If you still want to keep it on one line, you can do this, though you will not be able to refer to the file's contents later on:
import System.IO
main = do
h <- openFile "text.txt" ReadMode
hGetContents h >>= mapM_ putStrLn . lines
hClose h

haskell evaluation $ sign

I am going through 'learn you some haskell' and I have written following application:
import System.IO
main = do
filename <- getLine
handle <- openFile filename ReadMode
content <- hGetContents handle
putStr . unlines . (map isLong) . lines $ content
hClose handle
isLong :: String -> String
isLong x = if length x > 10 then x ++ " long enough" else x ++ " could be better!"
And it works but when I remove "$" between lines and content a compilation fails.
Could you help me understand why this is wrong?
I was thinking that I compose statements with dots and I get a function (String -> IO ()) and I apply it to the "content" but why is "$" needed here?.
Thanks!
The operator (.) has type (b -> c) -> (a -> b) -> a -> c.... Its first two inputs must be functions.
lines content, however, is of type [String], not a function, hence f . lines content will fail. The compiler treats it as
f . (lines content)
By adding the ($), you change the precedence, and it becomes
f . lines $ content = (f . lines) $ content
which works, because f and lines are both functions.
The dollar sign in haskell is used for application of a function on a value.
Its backround is, that you do not need complicated parentheses in terms.

Haskell: Reading from /proc. Issues with strictness and laziness. Process statistics

I have really strange behaviour while reading files from /proc
If I read /proc/pid/stat lazily with prelude's readFile - it works but not the way I want.
Switching to strict reading with Data.ByteString.readFile gives me an empty string.
I need strict reading here to be able to compare the results of two reads within short interval.
So using System.IO.readFile to read /proc/pid/stat simply does not work. It gives me the same result within 0.5 sec interval. I figure this is due to laziness and half closed handle or something ...
Opening and closing the file handle explicitly works.
h <- openFile "/proc/pid/stat" ReadMode
st1 <- hGetLine h; hClose h
But why do the above if we have the bytestring strict reading. Right?
This is where I got stuck.
import qualified Data.ByteString as B
B.readFile "/proc/pid/stat" >>= print
This always returns an empty string. Also tested in GHCI.
Any suggestions. Thanks.
--- UPDATE ---
Thank you Daniel for suggestions.
This is what I actually need to do. This might help to show my dilemma in full and bring more general suggestions.
I need to calculate process statistics. Here is part of the code (just the CPU usage) as an example.
cpuUsage pid = do
st1 <- readProc $ "/proc" </> pid </> "stat"
threadDelay 500000 -- 0.5 sec
st2 <- readProc $ "/proc" </> pid </> "stat"
let sum1 = (read $ words st1 !! 13) +
(read $ words st1 !! 14)
sum2 = (read $ words st2 !! 13) +
(read $ words st2 !! 14)
return $ round $ fromIntegral (sum2 - sum1) * jiffy / delay * 100
where
jiffy = 0.01
delay = 0.5
readProc f = do
h <- openFile f ReadMode
c <- hGetLine h
hClose h
return c
Prelude.readFile does not work due to the laziness
Strict functions from ByteString don't work. Thank you Daniel for the explanation.
withFile would work (it closes the handle properly) if I stuffed the whole computation in it but then the interval will not be strictly 0.5 as computations take time.
Opening and closing handles explicitly and using hGetContents does not work! For the same reason readFile doesn't.
The only thing that work in this situation is explicitly opening and closing handles with hGetLine in above code snippet. But this is not good enough as some proc files are more then one line like /proc/meminfo.
So I need a function that would read the whole file strictly. Something like hGetContents but strict.
I was trying to do this:
readProc f = do
h <- openFile f ReadMode
c <- hGetContents h
let c' = lines c
hClose h
return c'
Hoping that lines would trigger it to read the file in full. No luck. Still get an empty list.
Any help, suggestion is very appreciated.
The ByteString code is
readFile :: FilePath -> IO ByteString
readFile f = bracket (openBinaryFile f ReadMode) hClose
(\h -> hFileSize h >>= hGet h . fromIntegral)
But /proc/whatever isn't a real file, it's generated on demand, when you stat them to get the file size, you get 0. So ByteString's readFile successfully reads 0 bytes.
Before coding this type of thing, it's usually a good idea to check if something already exists on Hackage. In this case, I found the procstat package, which seems to work nicely:
import System.Linux.ProcStat
cpuUsage pid = do
Just before <- fmap procTotalTime <$> procStat pid
threadDelay 500000 -- 0.5 sec
Just after <- fmap procTotalTime <$> procStat pid
return . round $ fromIntegral (after - before) * jiffy / delay * 100
where
procTotalTime info = procUTime info + procSTime info
jiffy = 0.01
delay = 0.5

Haskell, can i call function without IO output working with monads?

Why i can't do this?
Its forbidden the use of 'do' in this question :/
How i can call words in my list and at same time result an IO?
Thanks.. this is my actual code :/
main :: IO()
main =
putStr "Name of File: " >>
getLine >>=
\st ->
openFile st ReadMode >>=
\handle ->
hGetContents handle >>=
\y ->
words y >>=
\strings ->
strings !! 1 >>=
\string->
putStr string
[Edit] Solution :
main :: IO()
main =
putStr "Name of File: " >>
getLine >>=
\st ->
openFile st ReadMode >>=
\handle ->
hGetContents handle >>=
\y ->
return (words y) >>=
\strings ->
return (strings !! 1) >>=
\string->
putStr string
Use return (words y) instead of just words y. return wraps a pure value (such as the [String] that words returns) into a monad.
From your wording, it sounds like this question is homework. If so, it should be tagged as such.
(This doesn't directly answer the question, but it will make your code more idiomatic and thus easier to read.)
You are using the pattern \x -> f x >>= ... a lot, this can (and should) be eliminated: it is (mostly) unnecessary noise which obscures the meaning of the code. I won't use your code, since it is homework but consider this example (note that I'm using return as suggested by the other answer):
main = getLine >>=
\fname -> openFile fname ReadMode >>=
\handle -> hGetContents handle >>=
\str -> return (lines str) >>=
\lns -> return (length lns) >>=
\num -> print num
(It reads a file name from the user, and then prints the number of lines in that file.)
The easiest optimisation is the section where we count the number of lines (this corresponds to the part where you are separating the words and getting the second one): the number of lines in a string str is just length (lines str) (which is the same as length . lines $ str), so there is no reason for us to have the call to length and the call to lines separate. Our code is now:
main = getLine >>=
\fname -> openFile fname ReadMode >>=
\handle -> hGetContents handle >>=
\str -> return (length . lines $ str) >>=
\num -> print num
Now, the next optimisation is on \num -> print num. This can be written as just print. (This is called eta conversion). (You can think about this as "a function that takes an argument and calls print on it, is the same as print itself"). Now we have:
main = getLine >>=
\fname -> openFile fname ReadMode >>=
\handle -> hGetContents handle >>=
\str -> return (length . lines $ str) >>= print
The next optimisation we can do is based on the monad laws. Using the first one, we can turn return (length . lines $ str) >>= print into print (length . lines $ str) (i.e. "creating a container that contains a value (this is done by return) and then passing that value to print is the same as just passing the value to print"). Again, we can remove the parenthesis, so we have:
main = getLine >>=
\fname -> openFile fname ReadMode >>=
\handle -> hGetContents handle >>=
\str -> print . length . lines $ str
And look! We have an eta-conversion we can do: \str -> print . length . lines $ str becomes just print . length . lines. This leaves:
main = getLine >>=
\fname -> openFile fname ReadMode >>=
\handle -> hGetContents handle >>= print . length . lines
At this point, we can probably stop, since that expression is much simpler than our original one (we could keep going, by using >=> if we wanted to). Since it is so much simpler, it is also easier to debug (imagine if we had forgotten to use lines: in the original main it wouldn't be very clear, in the last one it's obvious.)
In your code, you can and should do the same: you can use things like sections (which mean \x -> x !! 1 is the same as (!! 1)), and the eta conversion and monad laws I used above.

Resources