Why does openFile with WriteMode truncate the file? How not to? - haskell

Haskell's openFile has this tidbit in the source:
-- we want to truncate() if this is an open in WriteMode, but only
-- if the target is a RegularFile. ftruncate() fails on special files
-- like /dev/null.
when (iomode == WriteMode && fd_type == RegularFile) $
setSize fD 0
This was very surprising, I assumed that the IOModes were merely wrappers around O_WRONLY, O_RDONLY and O_RDWR. Why is this the case? How does one open a file for write only without truncating?

I think IOMode was designed like the mode argument in the fopen function.
From man fopen:
w - Truncate file to zero length or create text file for writing. The stream is positioned at the beginning of the file.
If you wanna to open a file for writing only (without truncate it) then you wanna append it usually. So just use AppendMode. If you don't wanna to have position on the end of a file (also you don't wanna to have position on the begin of a file usually) you can use the hSeek to change it.

IO modes AppendMode or ReadWriteMode should open the file for writing without truncating it.

Related

Haskell hGetContents not reading data written with external program

I am opening a file handle with openFile, writing data to it using ByteString.hPutStrLn, and then opening the file in an external editor. I seek the handle back to the beginning of the file with hSeek and expect to read back in the changes made with the external editor. However I am not seeing the changes made in the external editor. Example:
module Main where
import System.Process
import System.IO
import qualified Data.ByteString.Char8 as BS
main :: IO ()
main = do
handle <- openFile "myfile" ReadWriteMode
BS.hPutStrLn handle . BS.pack $ "hello"
hFlush handle
callProcess "vi" ["myfile"]
hSeek handle AbsoluteSeek 0
str' <- BS.hGetContents handle
BS.putStr str'
All that ever gets printed to my terminal is "hello" no matter what changes I make while in vi. However, if I then examine the file, the changes are there. Why isn't hGetContents seeing the changes?
The editor is creating a new file when it saves, and the handle in Haskell is reading from the old file. My editor was going to Vim which has the writebackup feature set by default.
Turning off writebackup results in the behavior I expected, as did using ed.

Opening the file in r+ mode but not able to truncate it?

I am failing to truncate a previously created file by opening it in r+ mode given that r+ Open for reading and writing. The stream is positioned at the beginning of the file.
Reference -> here
from sys import argv
script, filename = argv
print(f"We're going to erase {filename}")
print("If you don't want that, hit Ctrl+C (^C).")
print("If you do want that, hit return")
input("?")
print("Opening the file...")
target = open(filename, 'r+')
print(target.read())
print("Truncating the file, Goodbye!")
target.truncate()
The truncate method will resize the file so that it ends at the current file position (if you don't pass it a value for the size argument). When you read the file, you move the current position in the file from the start to the end. That means the truncation does nothing, since you're making the new end of the file the same as the old end position.
If you want to call truncate with no arguments, you need to seek back to the beginning of the file first, or alternatively, call truncate(0) to tell it to make the new file size zero bytes.
The "truncate" method takes an optional size parameter. If you do not specify that parameter, it uses the default location into the file. I assume since you just read the file, the default location is at the end of the file -- so nothing follows the current location and nothing gets truncated. Try passing a zero (0) to the truncate method and see what happens. You might also try opening the file with 'rw+'.

Why does Haskell NoBuffering option still seem to buffer?

I loaded up a file in ghci with the following:
h <- openFile "somefile.txt" ReadMode
hSetBuffering h NoBuffering
I then modified and saved somefile.txt in a text editor. When I call hGetChar several times in ghci, I receive the old characters of the file (as if the entire file was buffered when I opened it). I expected to calls of hGetChar to return the modified contents. Why is this not the case?
Edit:
The reason why it isn't showing the modified contents in the case decribed above is indeed because of the text editor. When the cat command is used instead (cat > somefile.txt), then the modified file contents is returned.
However, it does still seem to doing buffering. Say the file contents is as follows:
ABCDEFGHI
123456789
If I run hGetChar I get the 'A' as expected.
Now if I use cat (cat > somefile.txt) to change the contents to the following, and run hGetChar again, I would expect 'Z' but it's returning 'B':
AZZZZZZZZ
BufferMode is only relevant when writing to a handle, not when reading from it.
From [note Buffered Reading] in GHC.IO.Handle.Types:
Note that the buffering mode (haBufferMode) makes no difference when
reading data into a Handle. When reading, we can always just read all
the data there is available without blocking, decode it into the Char
buffer, and then provide it immediately to the caller.
The documentation for input BufferMode seems to be outdated.

How can I get input right when it is typed in Haskell?

I have written a Brainfuck interpreter in Haskell, but it only operates on the input once I hit Ctrl-D to signal EOF. How can I make the program act on each character when it is typed?
Here is the source. To use the program, give a file to interpret as an argument or type your program in the first line of stdin.
It sounds like your input is being buffered. You can modify the buffering mode of a file handle with System.IO.hSetBuffering. If you are reading from standard input, for instance, then you could disable buffering with:
import System.IO
hSetBuffering stdin NoBuffering
getLine waits for a newline character to be typed (\n), because what if the user typed a bunch of characters, but never pressed enter? Then it would be an error if some of the "line" had already be processed, if that "line" wasn't a line after all.
You should use getContents instead which will return everything that is typed at the terminal.
Also, you are using the following line:
then hGetContents =<< openFile (head args) ReadMode
This will open a file and never close it. This is fine for your short program, but it might be a better idea for the future to get used to doing this:
then readFile $ head args

Empty a file while in use in linux

I'm trying to empty a file in linux while in use, it's a log file so it is continuosly written.
Right now I've used:
echo -n > filename
or
cat /dev/null > filename
but all of this produce an empty file with a newline character (or strange character that I can see as ^#^#^#^#^#^#^#^#^#^#^#^.. on vi) and I have to remove manually with vi and dd the first line and then save.
If I don't use vi adn dd I'm not able to manipulate file with grep but I need an automatic procedure that i can write in a shell script.
Ideas?
This should be enough to empty a file:
> file
However, the other methods you said you tried should also work. If you're seeing weird characters, then they are being written to the file by something else - most probably whatever process is logging there.
What's going on is fairly simple: you are emptying out the file.
Why is it full of ^#s, then, you ask? Well, in a very real sense, it is not. It does not contain those weird characters. It has a "hole".
The program that is writing to the file is writing a file that was opened with O_WRONLY (or perhaps O_RDWR) but not O_APPEND. This program has written, say, 65536 bytes into the file at the point when you empty out the file with cp /dev/null filename or : > filename or some similar command.
Now the program goes to write another chunk of data (say, 4096 or 8192 bytes). Where will that data be written? The answer is: "at the current seek offset on the underlying file descriptor". If the program used O_APPEND the write would be, in effect, preceded by an lseek call that did a "seek to current end-of-file, i.e., current length of file". When you truncate the file that "current end of file" would become zero (the file becoming empty) so the seek would move the write offset to position 0 and the write would go there. But the program did not use O_APPEND, so there is no pre-write "reposition" operation, and the data bytes are written at the current offset (which, again, we've claimed to be 65536 above).
You now have a file that has no data in byte offsets 0 through 65535 inclusive, followed by some data in byte offsets 65536 through 73727 (assuming the write writes 8192 bytes). That "missing" data is the "hole" in the file. When some other program goes to read the file, the OS pretends there is data there: all-zero-byte data.
If the program doing the write operations does not do them on block boundaries, the OS will in fact allocate some extra data (to fit the write into whole blocks) and zero it out. Those zero bytes are not part of the "hole" (they're real zero bytes in the file) but to ordinary programs that do not peek behind the curtain at the Wizard of Oz, the "hole" zero-bytes and the "non-hole" zero bytes are indistinguishable.
What you need to do is to modify the program to use O_APPEND, or to use library routines like syslog that know how to cooperate with log-rotation operations, or perhaps both.
[Edit to add: not sure why this suddenly showed up on the front page and I answered a question from 2011...]
Another way is the following:
cp /dev/null the_file
The advantage of this technique is that it is a single command, so in case it needs sudo access only one sudo call is required.
Why not just :>filename?
(: is a bash builtin having the same effect as /bin/true, and both commands don't echo anything)
Proof that it works:
fg#erwin ~ $ du t.txt
4 t.txt
fg#erwin ~ $ :>t.txt
fg#erwin ~ $ du t.txt
0 t.txt
If it's a log file then the proper way to do this is to use logrotate. As you mentioned doing it manually does not work.
I have not a linux shell here to try ir, but have you try this?
echo "" > file

Resources