How does tail -f works internally - linux

If I run this command:
tail -f file.txt
I will see changes in real time.
My question is: How does it work ? Is there a way for a process to be notified each time a file is changed ?
Or is it a loop, like watch command do ?
Thanks

I too had this question, as I wanted to implement a similar feature in Java.
It looks like tail is periodically checking the size and modified date of the file against values it recorded when it last outputted the file's contents. When it detects a change, it outputs the data at the end of the file (starting where it last left off). It then updates the values, and repeats the process.
https://github.com/coreutils/coreutils/blob/master/src/tail.c#L1235
There seems to be code later on to handle if the file gets truncated.

With --follow (-f), tail defaults to following the file descriptor,
which means that even if a tail'ed file is renamed, tail will continue
to track its end. This default behavior is not desirable when you re‐
ally want to track the actual name of the file, not the file descriptor
(e.g., log rotation). Use --follow=name in that case. That causes
tail to track the named file in a way that accommodates renaming, re‐
moval and creation.

Related

Unix create multiple files with same name in a directory

I am looking for some kind of logic in linux where I can place files with same name in a directory or file system.
For e.g. i create a file abc.txt, so the next time if any process creates abc.txt it should automatically check and make the file named as abc.txt.1 should be created, then next time abc.txt.2 and so on...
Is there a way to achieve this.
Any logic or third party tools are also welcomed.
You ask,
For e.g. i create a file abc.txt, so the next time if any process
creates abc.txt it should automatically check and make the file named
as abc.txt.1 should be created
(emphasis added). To obtain such an effect automatically, for every process, without explicit provision by processes, it would have to be implemented as a feature of the filesystem containing the files. Such filesystems are called versioning filesystems, though typically the details are slightly different from what you describe. Most importantly, however, although such filesystems exist for Linux, none of them are mainstream. To the best of my knowledge, none of the major Linux distributions even offers one as a distribution-supported option.
Although it's a bit dated, see also Linux file versioning?
You might be able to approximate that for many programs via a customized version of the C standard library, but that's not foolproof, and you should not expect it to have universal effect.
It would be an altogether different matter for an individual process to be coded for such behavior. It would need to check for existing files and choose an appropriate name when opening each new file. In doing so, some care needs to be taken to avoid related race conditions, but it can be done. Details would depend on the language in which you are writing.
You can use BASH expression to achieve this. For example if I wanted to make 10 files all with the same name, but having a unique number value I would do the following:
# touch my_file{01..10}.txt
This would create 10 files starting at 01 all the way to 10. This method is also hand for looping over files in a sequence or if your also creating directories.
Now if i am reading you question right your asking that if you move a file or create a file in a directory. you would want the a script to automatically create a new file for you? If that is the case then just use a test and if there is a file move that file and mark it. Me personally I use time stamps to do so.
Logic:
# The [ -f ] tests if the file is present
if [ -f $MY_FILE_NAME ]; then
# If the file is present move the file and give it the PID
# That way the name will always be unique
mv $MY_FILE_NAME $MY_FILE_NAME_$$
mv $MY_NEW_FILE .
else
# Move or make the file here
mv $MY_NEW_FILE .
fi
As you can see the logic is very simple. Hope this helps.
Cheers
I don't know about Your particular use case, but You may try to look at logrotate:
https://wiki.archlinux.org/index.php/Logrotate

How to stream log files content that is constantly changing file names in perl?

I a series of applications on Linux systems that I need to basically constantly 'stream' out or even just 'tail' out but the challenge is the filenames are constantly rolling and changing.
The are all date encoded (dates being in different formats) and each then have different incremented formats.
Most of them start with one and increase, but one doesn't have an extension and then adds an extension past the first file and the other increments a number but once hitting 99 rolls to increment a alpha and returns the numeric to 01 and then up again as it rolls so quickly.
I just have the OS level shell scripting, OS command line utilities, and perl available to me to handle this situation for another application to pickup and read these logs.
The new files are always created right when it starts writing to the new file and groups of different logs (some I am reading some I am not) are being written to the same directory so I cannot just pickup anything hitting the directory.
If I simply 'tail -n 1000000 -f |' them today this works fine for the reader application I am using until the file changes and I cannot setup file lists ranges within the reader application, but can pre-process them so they basically appear as a continuous stream to the reader vs. the reader directly invoking commands to read them. A simple Perl log reader like this also work fine for a static filename but not for dynamic ones. It is critical I don't re-process any logs lines and just capture new lines being written to the logs.
I admit I am not any form a Perl guru, and the best answers / clue I've been able to find so far is the use of Perl's Glob function to possibly do this but the examples I've found basically reprocess all of the files on each run then seem to stop.
Example File Names I am dealing with across multiple apps I am trying to handle..
appA_YYMMDD.log
appA_YYMMDD_0001.log
appA_YYMMDD_0002.log
WS01APPB_YYMMDD.log
WS02APPB_YYMMDD.log
WS03AppB_YYMMDD.log
APPCMMDD_A01.log
APPCMMDD_B01.log
YYYYMMDD_001_APPD.log
As denoted above the files do not have the same inode and simply monitoring the directory for change is not possible as a lot of things are written there. On the dev system it has more than 50 logs being written to the directory and thousands of files and I am only trying to retrieve 5. I am seeing if multitail can be made available to try that suggestion but it is not currently available and installing any additional RPMs in the environment is generally a multi-month battle.
ls -i
24792 APPA_180901.log
24805 APPA__180902.log
17011 APPA__180903.log
17072 APPA__180904.log
24644 APPA__180905.log
17081 APPA__180906.log
17115 APPA__180907.log
So really the root of what I am trying to do is simply a continuous stream regardless if the file name changes and not have to run the extract command repeatedly nor have big breaks in the data feed while some script figures out that the file being logged to has changed. I don't need to parse the contents (my other app does that).. Is there an easy way of handling this changing file name?
How about monitoring the log directory for changes with Linux inotify, e.g. Linux::inotify2? Then you could detect when new log files are created, stop reading from the old log file and start reading from the new log file.
Try tailswitch. I created this script to tail log files that are rotated daily and have YYYY-MM-DD on their names. To use this script, you just say:
% tailswitch '*.log'
The quoting prevents the shell from interpreting the glob pattern. The script will perform glob pattern from time to time to switch to a newer file based on its name.

When does a process acquire a file to read

I have a script that takes a list of servers from an input file one by one and executes some commands on each server. I want to be able to update the input file while this script is running, without affecting the input of the first process, and re-run the script with the second list of servers. Can this be done safely?
When you run a command like file > my_script the contents located at file are piped into my_script (as a file descriptor). This decouples the contents from the name, meaning you can immediately modify/replace file in another process.
If you instead run a command like my_script file you're passing the name "file" to my_script, which may read from that file at any point (or write to it, delete it, etc.), thus you can't safely change file while the script is running. Notably this doesn't happen immediately; a long running process might not read from file until much later, after you've already edited the file.
Therefore if you design your program to read from stdin you can safely modify the input file and re-run the command while the first process is still running.
Let say that your process is running and if you want to change the file, just mv the file aside and copy your new input file. That way if the process hasn't completely read the input file into memory, it will still have a file-descriptor open to the previous file and will run unaffected. Ofcourse this all depends on how the process is implemented, if it tries to re-open the file during the course of execution, it will see new files contents.
process inputfile
mv inputfile inputfile.running
mv newinput inputfile

Head on svn log doesn't always stop

Consider this:
svn log -r HEAD:1 --search $pattern | head -4
Sometimes this command finds the necessary amount of lines (e.g. 4) and stops. But sometimes it just keeps searching (i.e. hangs) even after having found the necessary amount of lines.
I don't know on what it depends (whether it keeps searching or stops). I would like to know the reason and I would like to know how to modify my command so it always stops right after having found the necessary amount of lines (I don't want the svn log to search the entire svn history as this might take forever).
Plain svn log will always continue showing the revision history from HEAD revision to 0 relevant to your query unless you kill the process (assuming that you don't use the --limit switch or specified some subtree like /branches/myfeature). Adjust your script to kill the process once it shows the required number of log messages.

Add comments next to files in Linux

I'm interested in simply adding a comment next to my files in Linux (Ubuntu). An example would be:
info user ... my_data.csv Raw data which was sent to me.
info user ... my_data_cleaned.csv Raw data with duplicates filtered.
info user ... my_data_top10.csv Cleaned data with only top 10 values selected for each ID.
So sort of the way you can comment commits in Git. I don't particularly care about searching on these tags, filtering them etc. Just seeings them when I list files in a directory. Bonus if the comments/tags follow the document around as I copy or move it.
Most filesystem types support extended attributes where you could store comments.
So for example to create a comment on "foo.file":
xattr -w user.comment "This is a comment" foo.file
The attributes can be copied/moved with the file just be aware that many utilities require special options to copy the extended attributes.
Then to list files with comments use a script or program that grabs the extended attribute. Here is a simple example to use as a starting point, it just lists the files in the current directory:
#!/bin/sh
ls -1 | while read -r FILE; do
comment=`xattr -p user.comment "$FILE" 2>/dev/null`
if [ -n "$comment" ]; then
echo "$FILE Comment: $comment"
else
echo "$FILE"
fi
done
The xattr command is really slow and poorly written (it doesn't even return error status) so I suggest something else if possible. Use setfattr and getfattr in a more complex script than what I have provided. Or maybe a custom ls command that is aware of the user.comment attribute.
This is a moderately serious challenge. Basically, you want to add attributes to files, keep the attributes when the file is copied or moved, and then modify ls to display the values of these attributes.
So, here's how I would attack the problem.
1) Store the information in a sqlLite database. You can probably get away with one table. The table should contain the complete path to the file, and your comment. I'd name the database something like ~/.dirinfo/dirinfo.db. I'd store it in a subfolder, because you may find later on that you need other information in this folder. It'd be nice to use inodes rather than pathnames, but they change too frequently. Still, you might be able to do something where you store both the inode and the pathname, and retrieve by pathname only if the retrieval by inode fails, in which case you'd then update the inode information.
2) write a bash script to create/read/update/delete the comment for a given file.
3) Write another bash function or script that works with ls. I wouldn't call it "ls" though, because you don't want to mess with all the command line options that are available to ls. You're going to be calling ls always as ls -1 in your script, possibly with some sort options, such as -t and/or -r. Anyway, your script will call ls -1 and loop through the output, displaying the file name, and the comment, which you'll look up using the script from 2). You may also want to add file size, but that's up to you.
4) write functions to replace mv and cp (and ln??). These would be wrapper functions that would update the information in your table, and then call the regular Unix versions of these commands, passing along any arguments received by the functions (i.e. "$#"). If you're really paranoid, you'd also do it for things like scp, which can be used (inefficiently) to copy files locally. Still, it's unlikely you'll catch all the possibilities. What if someone else does a mv on your file, who doesn't have the function you have? What if some script moves the file by calling /bin/mv? You can't easily get around these kinds of issues.
Or if you really wanted to get adventurous, you'd write some C/C++ code to do this. It'd be faster, and honestly not all that much more challenging, provided you understand fork() and exec(). I can't recall whether sqlite has a C API. I assume it does. You'd have to tangle with that, too, but since you only have one database, and one table, that shouldn't be too challenging.
You could do it in perl, too, but I'm not sure that it would be that much easier in perl, than in bash. Your actual code isn't that complex, and you're not likely to be doing any crazy regex stuff or string manipulations. There are just lots of small pieces to fit together.
Doing all of this is much more work than should be expected for a person answering a question here, but I've given you the overall design. Implementing it should be relatively easy if you follow the design above and can live with the constraints.

Resources