Shell script to nullify a file everytime it reaches a specific size - linux

I am in the middle of writing a shell script to nullify/truncate a file if it reaches certain size. Also the file is being opened/written by a process all the time. Now every time when I nullify the file, will the file pointer be repositioned to the start of the file or will it remain in its previous position? Let me know if we could reset the file pointer once the file has been truncated?

The position of the file pointer depends on how the file was opened by the process that has it open. If it was opened in append mode, then truncating the file will mean that new data will be written at the end of the file, which is actually the beginning too the first time it writes after the file is truncated. If it was not opened in append mode, then truncating the file will simply mean that there is a series of virtual zero bytes at the start of the file, but the real data will continue to be written at the same point as the last write finished. If the file is being reopened by the other process, rather than being held open, then roughly the same rules apply, but there is a better chance that the file will be written at the beginning. It all depends on how the first process to write to the file after the truncation is managing its file pointer.
You can't reset the file pointer of another process, AFAIK.

A cron job or something like this will do the task; it will find every files bigger than 4096bytes then nullified the files
$ find -type f -size 4096c -print0 | while IFS= read -r -d $'\0' line; do cat /dev/null > $line; done
enter link description here

Related

Set line maximum of log file

Currently I write a simple logger to log messages from my bash script. The logger works fine and I simply write the date plus the message in the log file. Since the log file will increase, I would like to set the limit of the logger to for example 1000 lines. After reaching 1000 lines, it doesn't delete or totally clear the log file. It should truncate the first line and replace it with the new log line. So the file keeps 1000 lines and doesn't increase further. The latest line should always be at the top of the file. Is there any built in method? Or how could I solve this?
Why would you want to replace the first line with the new message thereby causing a jump in the order of messages in your log file instead of just deleting the first line and appending the new message, e.g. simplistically:
log() {
tail -999 logfile > tmp &&
{ cat tmp && printf '%s\n' "$*"; } > logfile
}
log "new message"
You don't even need a tmp file if your log file is always small lines, just save the output of the tail in a variable and printf that too.
Note that unlike a sed -i solution, the above will not change the inode, hardlinks, permissions or anything else for logfile - it's the same file as you started with just with updated content, it's not getting replaced with a new file.
Your chosen example may not be the best. As the comments have already pointed out, logrotate is the best tool to keep log file sizes at bay; furthermore, a line is not the best unit to measure size. Those commenters are both right.
However, I take your question at face value and answer it.
You can achieve what you want by shell builtins, but it is much faster and simpler to use an external tool like sed. (awk is another option, but it lacks the -i switch which simplifies your life in this case.)
So, suppose your file exists already and is named script.log then
maxlines=1000
log_msg='Whatever the log message is'
sed -i -e1i"\\$log_msg" -e$((maxlines))',$d' script.log
does what you want.
-i means modify the given file in place.
-e1i"\\$log_msg" means insert $log_msg before the first (1) line.
-e$((maxlines))',$d' means delete each line from line number $((maxlines)) to the last one ($).

Is sed line deletion atomic?

Lets say I have a file 'queue'. From what I understand, appending to it using '>>' is atomic as defined by POSIX. But if I have multiple processes appending, is it safe to do the following without data loss?
sed -e '1d' -i queue
I have several different script services on Linux that may want to pass information between them. Each has a 'queue' where new data is pushed on the bottom with the next item to be popped from the top.
Is my use of sed guaranteed to not miss an append operation from another process between the time it reads the file to a buffer and writes it to back to disk?
Thanks.
I'll defer to anyone else who makes a stronger claim, but I think it is not safe to use that way. -i works by making a temporary copy of the file and then moving that back in place, which would replace any additional changes in the middle.
Source: http://www.gnu.org/software/sed/manual/sed.html
-i[SUFFIX]
--in-place[=SUFFIX]
This option specifies that files are to be edited in-place. GNU sed does this by creating a temporary file and sending output to this file rather than to the standard output.1.
This option implies -s.
When the end of the file is reached, the temporary file is renamed to the output file's original name. The extension, if supplied, is used to modify the name of the old file before renaming the temporary file, thereby making a backup copy2).
This rule is followed: if the extension doesn't contain a *, then it is appended to the end of the current filename as a suffix; if the extension does contain one or more * characters, then each asterisk is replaced with the current filename. This allows you to add a prefix to the backup file, instead of (or in addition to) a suffix, or even to place backup copies of the original files into another directory (provided the directory already exists).
If no extension is supplied, the original file is overwritten without making a backup.

How does linux redirect IO work internally

When we use the redirect IO operator for a shell script does the operator keep all the data to be written in memory and write it all at once or does write it to file line by line.
Here is what i am working on.
I have about 200 small files ~1000 lines each in a specific format. I want to process (do a regex and change the format a little) each line in all the files and have the new transformed lines in a single combined file.
I have a transformscript.sh that takes a single file and applies the transformation. I run it in the following manner
sh transformscript.sh somefile.txt > newfile.txt
This works fine and fast for a single file.
How do i extend to do it for all the files. will it be efficient to change transformscript.sh to take a directory as argument instead of filename and add a for loop to transform all the lines of all the files together. Or should I run the above trnsformscript.sh for each file and create a new file for each one and combine then separately.
Thanks.
The redirect operator simply opens the file for writing and passes that file descriptor to the shell as its standard output. The shell then writes to the file directly.
You probably do NOT want to run the script separately for each file since you will incur the overhead of bash process creation for each pass. For example:
# don't do it this way
for somefile in $(ls somefiles*.txt); do
newfile=${somefile//some/new}
sh transformscript.sh $somefile > $newfile
done
The above starts one shell for every file found which is pretty inefficient. It would be better to rewrite transformscript.sh to handle multiple files if possible. Depending on how complicated your transform is and whether you need to keep the original filenames, you might be able to use a single sed process. For example, assume you have 200 files named test1.txt through test200.txt all with a "Hello world" line you want to change to "Hello joe". You could do something as simple a this:
sed -i.save 's/Hello world/Hello joe/' test*.txt
The -i tells sed to do an "in place" edit (edit the original file) and the optional ".save" argument to -i makes a backup copy of the original file with a .save extension before editing the original file. Note, this will leave the original contents in the .save files and the new content in the files with the original name which may not be what you want.

shell script to trigger a command with every new line in to a file?

Is it possible to trigger a command with every new line in to a file?
For example: I have a log file say maillog. I want to get every new entry in to the log file as a mail.
If a new entry like " Mail Sent " added in to maillog file then my script should grep the new entry and send me a mail with the entry(data).
I know its crazy but i want to automate my Linux box with these kind of things.
Regards,
Not so crazy. Check periodically (once per hour, per day, what you like) the file for new parts by storing the original length of the file, compare the length, in case it has grown, handle the part which was appended:
length=0
while sleep 3600 # use wanted delay here
do
new_length=$(find "$file" -printf "%s")
if [ $length -lt $new_length ]
then
tail --bytes=$[new_length-length] "$file" | handle_part
fi
length=$new_length
done
Now you only have to write that handle_part function which could for instance mail its input somewhere.
Using this way (instead of the obvious tail -f) has the advantage that you can store the current length into a file and later on restarting your script read that length again. So you won't get the whole file after a restart of your script (e. g. due to a machine reboot).
If you want a faster response you could have a look at inotify which is a facility on Linux to monitor file actions; so that polling could be replaced.
Use tail -f, that watches a file and sents whatever is appended to it to stdout. If you have a script that performs the desired action, say mail_per_line, then you can set it up as
tail -f maillog | mail_per_line
In this case, mail_per_line runs once and gets all the lines. If you want to spawn a separate process each time a line comes in, use the shell built-in read:
tail -f maillog | while IFS='' read line; do
send_a_message "$line"
done
To counter the effect described by Alfe, that a restart of this program will cause all the previous logs to be processed again, consider using logrotate.

Empty a file while in use in linux

I'm trying to empty a file in linux while in use, it's a log file so it is continuosly written.
Right now I've used:
echo -n > filename
or
cat /dev/null > filename
but all of this produce an empty file with a newline character (or strange character that I can see as ^#^#^#^#^#^#^#^#^#^#^#^.. on vi) and I have to remove manually with vi and dd the first line and then save.
If I don't use vi adn dd I'm not able to manipulate file with grep but I need an automatic procedure that i can write in a shell script.
Ideas?
This should be enough to empty a file:
> file
However, the other methods you said you tried should also work. If you're seeing weird characters, then they are being written to the file by something else - most probably whatever process is logging there.
What's going on is fairly simple: you are emptying out the file.
Why is it full of ^#s, then, you ask? Well, in a very real sense, it is not. It does not contain those weird characters. It has a "hole".
The program that is writing to the file is writing a file that was opened with O_WRONLY (or perhaps O_RDWR) but not O_APPEND. This program has written, say, 65536 bytes into the file at the point when you empty out the file with cp /dev/null filename or : > filename or some similar command.
Now the program goes to write another chunk of data (say, 4096 or 8192 bytes). Where will that data be written? The answer is: "at the current seek offset on the underlying file descriptor". If the program used O_APPEND the write would be, in effect, preceded by an lseek call that did a "seek to current end-of-file, i.e., current length of file". When you truncate the file that "current end of file" would become zero (the file becoming empty) so the seek would move the write offset to position 0 and the write would go there. But the program did not use O_APPEND, so there is no pre-write "reposition" operation, and the data bytes are written at the current offset (which, again, we've claimed to be 65536 above).
You now have a file that has no data in byte offsets 0 through 65535 inclusive, followed by some data in byte offsets 65536 through 73727 (assuming the write writes 8192 bytes). That "missing" data is the "hole" in the file. When some other program goes to read the file, the OS pretends there is data there: all-zero-byte data.
If the program doing the write operations does not do them on block boundaries, the OS will in fact allocate some extra data (to fit the write into whole blocks) and zero it out. Those zero bytes are not part of the "hole" (they're real zero bytes in the file) but to ordinary programs that do not peek behind the curtain at the Wizard of Oz, the "hole" zero-bytes and the "non-hole" zero bytes are indistinguishable.
What you need to do is to modify the program to use O_APPEND, or to use library routines like syslog that know how to cooperate with log-rotation operations, or perhaps both.
[Edit to add: not sure why this suddenly showed up on the front page and I answered a question from 2011...]
Another way is the following:
cp /dev/null the_file
The advantage of this technique is that it is a single command, so in case it needs sudo access only one sudo call is required.
Why not just :>filename?
(: is a bash builtin having the same effect as /bin/true, and both commands don't echo anything)
Proof that it works:
fg#erwin ~ $ du t.txt
4 t.txt
fg#erwin ~ $ :>t.txt
fg#erwin ~ $ du t.txt
0 t.txt
If it's a log file then the proper way to do this is to use logrotate. As you mentioned doing it manually does not work.
I have not a linux shell here to try ir, but have you try this?
echo "" > file

Resources