Haskell: Reading from /proc. Issues with strictness and laziness. Process statistics

Haskell: Reading from /proc. Issues with strictness and laziness. Process statistics - linux

I have really strange behaviour while reading files from /proc
If I read /proc/pid/stat lazily with prelude's readFile - it works but not the way I want.
Switching to strict reading with Data.ByteString.readFile gives me an empty string.
I need strict reading here to be able to compare the results of two reads within short interval.
So using System.IO.readFile to read /proc/pid/stat simply does not work. It gives me the same result within 0.5 sec interval. I figure this is due to laziness and half closed handle or something ...
Opening and closing the file handle explicitly works.
h <- openFile "/proc/pid/stat" ReadMode
st1 <- hGetLine h; hClose h
But why do the above if we have the bytestring strict reading. Right?
This is where I got stuck.
import qualified Data.ByteString as B
B.readFile "/proc/pid/stat" >>= print
This always returns an empty string. Also tested in GHCI.
Any suggestions. Thanks.
--- UPDATE ---
Thank you Daniel for suggestions.
This is what I actually need to do. This might help to show my dilemma in full and bring more general suggestions.
I need to calculate process statistics. Here is part of the code (just the CPU usage) as an example.
cpuUsage pid = do
st1 <- readProc $ "/proc" </> pid </> "stat"
threadDelay 500000 -- 0.5 sec
st2 <- readProc $ "/proc" </> pid </> "stat"
let sum1 = (read $ words st1 !! 13) +
(read $ words st1 !! 14)
sum2 = (read $ words st2 !! 13) +
(read $ words st2 !! 14)
return $ round $ fromIntegral (sum2 - sum1) * jiffy / delay * 100
where
jiffy = 0.01
delay = 0.5
readProc f = do
h <- openFile f ReadMode
c <- hGetLine h
hClose h
return c
Prelude.readFile does not work due to the laziness
Strict functions from ByteString don't work. Thank you Daniel for the explanation.
withFile would work (it closes the handle properly) if I stuffed the whole computation in it but then the interval will not be strictly 0.5 as computations take time.
Opening and closing handles explicitly and using hGetContents does not work! For the same reason readFile doesn't.
The only thing that work in this situation is explicitly opening and closing handles with hGetLine in above code snippet. But this is not good enough as some proc files are more then one line like /proc/meminfo.
So I need a function that would read the whole file strictly. Something like hGetContents but strict.
I was trying to do this:
readProc f = do
h <- openFile f ReadMode
c <- hGetContents h
let c' = lines c
hClose h
return c'
Hoping that lines would trigger it to read the file in full. No luck. Still get an empty list.
Any help, suggestion is very appreciated.

The ByteString code is
readFile :: FilePath -> IO ByteString
readFile f = bracket (openBinaryFile f ReadMode) hClose
(\h -> hFileSize h >>= hGet h . fromIntegral)
But /proc/whatever isn't a real file, it's generated on demand, when you stat them to get the file size, you get 0. So ByteString's readFile successfully reads 0 bytes.

Before coding this type of thing, it's usually a good idea to check if something already exists on Hackage. In this case, I found the procstat package, which seems to work nicely:
import System.Linux.ProcStat
cpuUsage pid = do
Just before <- fmap procTotalTime <$> procStat pid
threadDelay 500000 -- 0.5 sec
Just after <- fmap procTotalTime <$> procStat pid
return . round $ fromIntegral (after - before) * jiffy / delay * 100
where
procTotalTime info = procUTime info + procSTime info
jiffy = 0.01
delay = 0.5

Related

Why I can't use the (cnt <- hGetContents h) expression instead of cnt?

I learn Haskell. It works fine:
import System.IO
main = do
h <- openFile "text.txt" ReadMode
cnt <- hGetContents h
mapM_ putStrLn $ lines cnt
hClose h
But this isn't working:
import System.IO
main = do
h <- openFile "text.txt" ReadMode
mapM_ putStrLn $ lines (cnt <- hGetContents h)
hClose h
Why my second variant isn't working? I expected both variants are equal, because the (cnt <- hGetContents h) is an expression and returns the value too.

The problem is that cnt <- hGetContents h is not an expression, it's some special syntax sugar inside do notation. This means it is a different way of writing the following normal Haskell code:
hGetContents h >>= \ cnt -> {- rest of do block -}
The part before the {- rest of the do block -} is not a whole expression here, because the rest of the do block is needed to complete the lambda's body.
You could desugar it manually to get something like:
hGetContents h >>= \ cnt -> mapM_ putStrLn (lines cnt)
or the point-free version
hGetContents h >>= mapM_ putStrLn . lines
You can tell that it's a special expression because it introduces a new identifier (cnt) that you can use in the rest of your code, outside of the expression itself. This is not something that normal Haskell expressions get to do (at least without compile-time magic).

cnt <- hGetContents h is essentially syntactical sugar for hGetContents h >>= \cnt ->.
It is not an expression, it is sugar intended for its own line in a do-block.
If you still want to keep it on one line, you can do this, though you will not be able to refer to the file's contents later on:
import System.IO
main = do
h <- openFile "text.txt" ReadMode
hGetContents h >>= mapM_ putStrLn . lines
hClose h

I try for lazy I/O, but entire file is consumed

I am a Haskell newbie. I want to read only N characters of a text file into memory. So I wrote this code:
main :: IO()
main = do
inh <- openFile "input.txt" ReadMode
transformedList <- Control.Monad.liftM (take 4) $ transformFileToList inh
putStrLn "transformedList became available"
putStrLn transformedList
hClose inh
transformFileToList :: Handle -> IO [Char]
transformFileToList h = transformFileToListAcc h []
transformFileToListAcc :: Handle -> [Char] -> IO [Char]
transformFileToListAcc h acc = do
readResult <- tryIOError (hGetChar h)
case readResult of
Left e -> if isEOFError e then return acc else ioError e
Right c -> do let acc' = acc ++ [transformChar c]
putStrLn "got char"
unsafeInterleaveIO $ transformFileToListAcc h acc'
My input file several lines, with the first one being "hello world", and when I run this program, I get this output:
got char
transformedList became available
got char
["got char" a bunch of times]
hell
My expectation is that "got char" happens only 4 times. Instead, the entire file is read, one character at a time, and only THEN the first 4 characters are taken.
What am I doing wrong?

I acknowledge I don't understand how unsafeInterLeaveIO works but I suspect the problem here is somehow related to it. Maybe with this example you are trying to understand unsafeInterLeaveIO, but if I were you I'd try to avoid its direct use. Here is how I'd do it in your particular case.
main :: IO ()
main = do
inh <- openFile "input.txt" ReadMode
charList <- replicateM 4 $ hGetChar inh
let transformedList = map transformChar charList
putStrLn "transformedList became available"
putStrLn transformedList
hClose inh
This should just read the first 4 characters of the file.
If you are looking for a truly effectful streaming solution, I'd look into pipes or conduit instead of unsafeInterLeaveIO.

Portably opening a handle to stdin many times in a single session

Code:
main = do
putStrLn "4917 Microprocessor\nEnter the Machine Code to be run: "
inp <- getContents
putStrLn "The output of the Program is:"
fState <- ((runStateT _4917) . construct . parse) inp
args <- getArgs
if elem "-v" args then putStrLn ("\nFinal state was: " ++ (show . snd) fState) else return ()
putStrLn "\n================================ RESTART ================================"
main
where parse xs = array (0,15) $
zip [0..15] $
take 16 $
map ((makeArbInt 4) . read) (words (filter ((/=)'.') xs)) ++ repeat (makeArbInt 4 0)
construct xs = Program xs z z z z 0 False
z = (makeArbInt 4 0)
There's more but this is the relevant part. Basically line 3 needs to be evaluated multiple times but getContents is closing the stdin handle:
4917: <stdin>: hGetContents: illegal operation (handle is closed)
Is there a way to reopen the handle? Or some way of preventing getContents from doing that? (Maybe I'm sending the wrong signal. I'm sending over a Ctrl-D EOF on Linux. Maybe I should use EOT or something instead?)
edit: I've managed to get the desired behaviour but it won't port to windows.
mystdin <- openFile "/dev/tty" ReadMode
inp <- hGetContents mystdin
New question: is there a generic way to portably open a handle to stdin?

You cannot prevent getContents from closing the file, and a closed file stays closed.
It seems to me that you need some other function. Usually, when you read parts of a file, you know when to stop (end of line, certain bytes on the stream). So yes: If you cannot parse the data that you are reading and detect when it is done, you should use some kind of delimiter (possibly EOT, or an empty line, or special text unlikely to occur in the data like __END__).

Have them enter a blank line to end input:
getContents2 = helper "" where
helper str = do
a <- getLine
if "" == a then return str
else helper (str++"\n"++a)
You might also haven them signal the end of the session by entering a single blank line.

To open a handle to stdin portably, use hDuplicate function on the existing stdio handle to get a new one:
mystdin <- hDuplicate stdin
inp <- hGetContents mystdin
Make sure never to close the original stdin, so that you can make more duplicates as appropriate. (I'm not sure if this is good Haskell style)

Haskell readFile with text encoding as argument

I needed a version of readFile taking text encoding as its argument. I ended up with following:
readFile' e name = openFile name ReadMode >>= (flip hSetEncoding $ e) >&&> hGetContents
f >&&> g = \x -> f x >> g x
Is there a better way to do this?
It seems like the thing I defined as >&&> should be something standard but I couldn't find it.
Thanks,
Adam

It's liftM2 (>>), with Control.Monad.Instances imported. There's no more succinct version of it in the standard libraries.

I think the simple "do block" approach reads nicely for this, not that it is any more succinct.
readFile' e name = do {h <- openFile name ReadMode; hSetEncoding h e; hGetContents h}

Limiting memory usage when reading files

I'm a Haskell beginner and thought this would be good exercise. I have an
assignment where I need to read file in a thread A, handle the file lines
in threads B_i, and then output the results in thread C.
I have implemented this far already, but one of the requirements is that we
cannot trust that the entire file fits into memory. I was hoping that lazy
IO and garbage collector would do this for me, but alas the memory usage
keeps rising and rising.
The reader thread (A) reads the file with readFile which is then zipped
with line numbers and wrapped in Just. These zipped lines are then written
to Control.Concurrent.Chan. Each consumer thread B has its own channel.
Each consumer reads their own channel when it has data and if the regex
matches, it's outputted to their own respective output channel wrapped
within Maybe (made of lists).
The printer checks the output channel of each of the B threads. If none of
the results (line) is Nothing, the line is printed. Since at this point
there should be no reference to the older lines, I thought that the garbage
collector would be able to release these lines, but alas I seem to be in
the wrong here.
The .lhs file is in here:
http://gitorious.org/hajautettujen-sovellusten-muodostamistekniikat/hajautettujen-sovellusten-muodostamistekniikat/blobs/master/mgrep.lhs
So the question is, how do I limit the memory usage, or allow the garbage
collector to remove the lines.
Snippets as per requested. Hopefully indenting isn't too badly destroyed :)
data Global = Global {done :: MVar Bool, consumers :: Consumers}
type Done = Bool
type Linenum = Int
type Line = (Linenum, Maybe String)
type Output = MVar [Line]
type Input = Chan Line
type Consumers = MVar (M.Map ThreadId (Done, (Input, Output)))
type State a = ReaderT Global IO a
producer :: [Input] -> FilePath -> State ()
producer c p = do
liftIO $ Main.log "Starting producer"
d <- asks done
f <- liftIO $ readFile p
mapM_ (\l -> mapM_
(liftIO . flip writeChan l) c)
$ zip [1..] $ map Just $ lines f
liftIO $ modifyMVar_ d (return . not)
printer :: State ()
printer = do
liftIO $ Main.log "Starting printer"
c <- (fmap (map (snd . snd) . M.elems)
(asks consumers >>= liftIO . readMVar))
uniq' c
where head' :: Output -> IO Line
head' ch = fmap head (readMVar ch)
tail' = mapM_ (liftIO . flip modifyMVar_
(return . tail))
cont ch = tail' ch >> uniq' ch
printMsg ch = readMVar (head ch) >>=
liftIO . putStrLn . fromJust . snd . head
cempty :: [Output] -> IO Bool
cempty ch = fmap (any id)
(mapM (fmap ((==) 0 . length) . readMVar ) ch)
{- Return false unless none are Nothing -}
uniq :: [Output] -> IO Bool
uniq ch = fmap (any id . map (isNothing . snd))
(mapM (liftIO . head') ch)
uniq' :: [Output] -> State ()
uniq' ch = do
d <- consumersDone
e <- liftIO $ cempty ch
if not e
then do
u <- liftIO $ uniq ch
if u then cont ch else do
liftIO $ printMsg ch
cont ch
else unless d $ uniq' ch

Concurrent programming offers no defined execution order unless you enforce one yourself with mvars and the like. So its likely that the producer thread sticks all/most of the lines in the chan before any consumer reads them off and passes them on. Another architecture that should fit the requirements is just have thread A call the lazy readfile and stick the result in an mvar. Then each consumer thread takes the mvar, reads a line, then replaces the mvar before proceeding to handle the line. Even then, if the output thread can't keep up, then the number of matching lines stored on the chan there can build up arbitrarily.
What you have is a push architecture. To really make it work in constant space, think in terms of demand driven. Find a mechanism such that the output thread signals to the processing threads that they should do something, and such that the processing threads signal to the reader thread that they should do something.
Another way to do this is to have chans of limited size instead -- so the reader thread blocks when the processor threads haven't caught up, and so the processor threads block when the output thread hasn't caught up.
As a whole, the problem in fact reminds me of Tim Bray's widefinder benchmark, although the requirements are somewhat different. In any case, it led to a widespread discussion on the best way to implement multicore grep. The big punchline was that the problem is IO bound, and you want multiple reader threads over mmapped files.
See here for more than you'll ever want to know: http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string