Empty a file while in use in linux

Empty a file while in use in linux - linux

I'm trying to empty a file in linux while in use, it's a log file so it is continuosly written.
Right now I've used:
echo -n > filename
or
cat /dev/null > filename
but all of this produce an empty file with a newline character (or strange character that I can see as ^#^#^#^#^#^#^#^#^#^#^#^.. on vi) and I have to remove manually with vi and dd the first line and then save.
If I don't use vi adn dd I'm not able to manipulate file with grep but I need an automatic procedure that i can write in a shell script.
Ideas?

This should be enough to empty a file:
> file
However, the other methods you said you tried should also work. If you're seeing weird characters, then they are being written to the file by something else - most probably whatever process is logging there.

What's going on is fairly simple: you are emptying out the file.
Why is it full of ^#s, then, you ask? Well, in a very real sense, it is not. It does not contain those weird characters. It has a "hole".
The program that is writing to the file is writing a file that was opened with O_WRONLY (or perhaps O_RDWR) but not O_APPEND. This program has written, say, 65536 bytes into the file at the point when you empty out the file with cp /dev/null filename or : > filename or some similar command.
Now the program goes to write another chunk of data (say, 4096 or 8192 bytes). Where will that data be written? The answer is: "at the current seek offset on the underlying file descriptor". If the program used O_APPEND the write would be, in effect, preceded by an lseek call that did a "seek to current end-of-file, i.e., current length of file". When you truncate the file that "current end of file" would become zero (the file becoming empty) so the seek would move the write offset to position 0 and the write would go there. But the program did not use O_APPEND, so there is no pre-write "reposition" operation, and the data bytes are written at the current offset (which, again, we've claimed to be 65536 above).
You now have a file that has no data in byte offsets 0 through 65535 inclusive, followed by some data in byte offsets 65536 through 73727 (assuming the write writes 8192 bytes). That "missing" data is the "hole" in the file. When some other program goes to read the file, the OS pretends there is data there: all-zero-byte data.
If the program doing the write operations does not do them on block boundaries, the OS will in fact allocate some extra data (to fit the write into whole blocks) and zero it out. Those zero bytes are not part of the "hole" (they're real zero bytes in the file) but to ordinary programs that do not peek behind the curtain at the Wizard of Oz, the "hole" zero-bytes and the "non-hole" zero bytes are indistinguishable.
What you need to do is to modify the program to use O_APPEND, or to use library routines like syslog that know how to cooperate with log-rotation operations, or perhaps both.
[Edit to add: not sure why this suddenly showed up on the front page and I answered a question from 2011...]

Another way is the following:
cp /dev/null the_file
The advantage of this technique is that it is a single command, so in case it needs sudo access only one sudo call is required.

Why not just :>filename?
(: is a bash builtin having the same effect as /bin/true, and both commands don't echo anything)
Proof that it works:
fg#erwin ~ $ du t.txt
4 t.txt
fg#erwin ~ $ :>t.txt
fg#erwin ~ $ du t.txt
0 t.txt

If it's a log file then the proper way to do this is to use logrotate. As you mentioned doing it manually does not work.

I have not a linux shell here to try ir, but have you try this?
echo "" > file

Related

Need to get arrow key codes from stdin after reading file using stdin

I am creating a NASM assembly code to read 2d array of numbers present in file from stdin
i am running the executable like this -> ./abc < input.txt .
and after that i will display the read 2d array on terminal then i want to get keys codes of arrow keys (which normal appear in terminal as special characters) i wrote code for it but its not working. ( I did echo off in termios setting for that)
Although it was working when i am taking file name as an argument & reading and not from stdin but using fopen with proper fd.
./abc abc.txt
in this case after displaying the read 2d array i am able to get arrow keys codes in program but not in earlier case.
Please help me in this matter.

By using input redirection you disconnect stdin from your terminal and instead connect it to a pipe that your shell is reading the file into.
You could use cat input.txt - | ./abc, but you would have to pres Enter to flush the line buffer and make cat pipe the current line into your program.
I would suggest not messing with stdin and just taking the input file as an argument, like you already did before.

Adding content to middle of file..without reading it till the end

I have read various questions/answers here on unix.stackexchange on how to add or remove lines to/from a file without needing to create a temporary file.
https://unix.stackexchange.com/questions/11067/is-there-a-way-to-modify-a-file-in-place?lq=1
It appears all these answers need one to atleast read till end of the file, which can be time consuming if input is a large file. Is there a way around this? I would expect a file system to be implemented like a linked list...so there should be a way to reach the required "lines" and then just add stuff (node in linked lists). How do I go about doing this?
Am I correct in thinking so? Or Am I missing anything?
Ps: I need this to be done in 'C' and cannot use any shell commands.

As of Linux 4.1, fallocate(2) supports the FALLOC_FL_INSERT_RANGE flag, which allows one to insert a hole of a given length in the middle of a file without rewriting the following data. However, it is rather limited: the hole must be inserted at a filesystem block boundary, and the size of the inserted hole must be a multiple of the filesystem block size. Additionally, in 4.1, this feature was only supported by the XFS filesystem, with Ext4 support added in 4.2.
For all other cases, it is still necessary to rewrite the rest of the file, as indicated in other answers.

The short answer is that yes, it is possible to modify the contents of a file in place, but no, it is not possible to remove or add content in the middle of the file.
UNIX filesystems are implemented using an inode pointer structure which points to whole blocks of data. Each line of a text file does not "know" about its relationship to the previous or next line, they are simply adjacent to each other within the block. To add content between those two lines would require all of the following content to be shifted further "down" within the block, pushing some data into the next block, which in turn would have to shift into the following block, etc.
In C you can fopen a file for update and read its contents, and overwrite some of the contents, but I do not believe there is (even theoretically) any way to insert new data in the middle, or delete data (except to overwrite it with nulls.

You can modify a file in place, for example using dd.
$ echo Hello world, how are you today? > helloworld.txt
$ cat helloworld.txt
Hello world, how are you today?
$ echo -n Earth | dd of=helloworld.txt conv=notrunc bs=1 seek=6
$ cat helloworld.txt
Hello Earth, how are you today?
The problem is that if your change also changes the length, it will not quite work correctly:
$ echo -n Joe | dd of=helloworld.txt conv=notrunc bs=1 seek=6
Hello Joeth, how are you today?
$ echo -n Santa Claus | dd of=helloworld.txt conv=notrunc bs=1 seek=6
Hello Santa Clausare you today?
When you change the length, you have to re-write the file, if not completely then starting at the point of change you make.
In C this is the same as with dd. You open the file, you seek, and you write.

You can open a file in read/write mode. You read the file (or use "seek" to jump to the position you want, if you know it), and then write into the file, but you overwrite the data which are here (this is not an insertion).
Then you can choose to trunc the file from the last point you wrote or keep all the remaining data after the point you wrote without reading them.

Shell script to nullify a file everytime it reaches a specific size

I am in the middle of writing a shell script to nullify/truncate a file if it reaches certain size. Also the file is being opened/written by a process all the time. Now every time when I nullify the file, will the file pointer be repositioned to the start of the file or will it remain in its previous position? Let me know if we could reset the file pointer once the file has been truncated?

The position of the file pointer depends on how the file was opened by the process that has it open. If it was opened in append mode, then truncating the file will mean that new data will be written at the end of the file, which is actually the beginning too the first time it writes after the file is truncated. If it was not opened in append mode, then truncating the file will simply mean that there is a series of virtual zero bytes at the start of the file, but the real data will continue to be written at the same point as the last write finished. If the file is being reopened by the other process, rather than being held open, then roughly the same rules apply, but there is a better chance that the file will be written at the beginning. It all depends on how the first process to write to the file after the truncation is managing its file pointer.
You can't reset the file pointer of another process, AFAIK.

A cron job or something like this will do the task; it will find every files bigger than 4096bytes then nullified the files
$ find -type f -size 4096c -print0 | while IFS= read -r -d $'\0' line; do cat /dev/null > $line; done
enter link description here

vim -- How to read range of lines from a file into current buffer

I want to read line n1->n2 from file foo.c into the current buffer.
I tried: 147,227r /path/to/foo/foo.c
But I get: "E16: Invalid range", though I am certain that foo.c contains more than 1000 lines.

:r! sed -n 147,227p /path/to/foo/foo.c

You can do it in pure Vimscript, without having to use an external tool like sed:
:put =readfile('/path/to/foo/foo.c')[146:226]
Note that we must decrement one from the line numbers because arrays start from 0 while line numbers start from 1.
Disadvantages: This solution is 7 characters longer than the accepted answer. It will temporarily read the entire file into memory, which could be a concern if that file is huge.

The {range} refers to the destination in the current file, not the range of lines in the source file.
After some experimentation, it seems
:147,227r /path/to/foo/foo.c
means insert the contents of /path/to/foo/foo.c after line 227 in this file. i.e.: it ignores the 147.

Other solutions posted are great for specific line numbers. It's often the case that you want to read from top or bottom of another file. In that case, reading the output of head or tail is very fast. For example -
:r !head -20 xyz.xml
Will read first 20 lines from xyz.xml into current buffer where the cursor is
:r !tail -10 xyz.xml
Will read last 10 lines from xyz.xml into current buffer where the cursor is
The head and tail commands are extremely fast, therefore even combining them can be much faster than other approaches for very large files.
:r !head -700030 xyz.xml| tail -30
Will read line numbers from 700000 to 700030 from file xyz.xml into current buffer
This operation should complete instantly even for fairly large files.

I just had to do this in a code project of mine and did it this way:
In buffer with /path/to/foo/foo.c open:
:147,227w export.txt
In buffer I'm working with:
:r export.txt
Much easier in my book... It requires having both files open, but if I'm importing a set of lines, I usually have them both open anyway. This method is more general and easier to remember for me, especially if I'm trying to export/import a trickier set of lines using g/<search_criteria/:.w >> export.txt or some other more complicated way of selecting lines.

You will need to:
:r /path/to/foo/foo.c
:d 228,$
:d 1,146
Three steps, but it will get it done...

A range permits a command to be applied to a group of lines in the current buffer.
So, the range of read instruction means where to insert the content in the current file, but not the range of file that you want to read.

Edit very large sql dump/text file (on linux)

I have to import a large mysql dump (up to 10G). However the sql dump already predefined with a database structure with index definition. I want to speed up the db insert by removing the index and table definition.
That means I have to remove/edit the first few lines of a 10G text file. What is the most efficient way to do this on linux?
Programs that require loading the entire file into RAM will be an overkill to me.

Rather than removing the first few lines, try editing them to be whitespace.
The hexedit program can do this-- it reads files in chunks, so opening a 10GB file is no different from opening a 100KB file to it.
$ hexedit largefile.sql.dump
tab (switch to ASCII side)
space (repeat as needed until your header is gone)
F2 (save)/Ctrl-X (save and exit)/Ctrl-C (exit without saving)

joe is an editor that works well with large files. I just used it to edit a ~5G SQL dump file. It took about a minute to open the file and a few minutes to save it, with very little use of swap (on a system with 4G RAM).

sed 's/OLD_TEXT/NEW_TEXT/g' < oldfile > newfile
or
cat file | sed 's/OLD_TEXT/NEW_TEXT/g' > newfile

Perl can read the file line by line:
perl -pi.bak -e 's/^create index/--create index/'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string