Scan directory and get latest matching create time

Scan directory and get latest matching create time - linux

Under Linux I can open a directory using opendir and then use readdir to get the filenames.
I have been experimenting with scandir and thought "great I can search for the files in this directory that I want by passing in a custom filter", and sort using a custom sort where I want to sort by creation date. But then I realised how limited the dirent structure is. It contains only minimal information.
Is this the only API possible? i.e. do I have to stat every single file to get it's size for sorting? Is this how ls -t works?

That is, indeed, how ls -t works, as 'strace ls -t' will confirm. Historically, a UNIX directory was just a special file containing a list of file names, and applications were expected to read and parse that "file" themselves. Naturally, that led to problems when newer file systems were developed that expanded the fixed length of file names, so the opendir/readdir/closedir interface was developed to abstract away the filesystem directory implementation. But the limitation on what is directly available in a directory listing remains.

POSIX does not have any facility for storing creation time, much less retrieving it.

Related

Unix create multiple files with same name in a directory

I am looking for some kind of logic in linux where I can place files with same name in a directory or file system.
For e.g. i create a file abc.txt, so the next time if any process creates abc.txt it should automatically check and make the file named as abc.txt.1 should be created, then next time abc.txt.2 and so on...
Is there a way to achieve this.
Any logic or third party tools are also welcomed.

You ask,
For e.g. i create a file abc.txt, so the next time if any process
creates abc.txt it should automatically check and make the file named
as abc.txt.1 should be created
(emphasis added). To obtain such an effect automatically, for every process, without explicit provision by processes, it would have to be implemented as a feature of the filesystem containing the files. Such filesystems are called versioning filesystems, though typically the details are slightly different from what you describe. Most importantly, however, although such filesystems exist for Linux, none of them are mainstream. To the best of my knowledge, none of the major Linux distributions even offers one as a distribution-supported option.
Although it's a bit dated, see also Linux file versioning?
You might be able to approximate that for many programs via a customized version of the C standard library, but that's not foolproof, and you should not expect it to have universal effect.
It would be an altogether different matter for an individual process to be coded for such behavior. It would need to check for existing files and choose an appropriate name when opening each new file. In doing so, some care needs to be taken to avoid related race conditions, but it can be done. Details would depend on the language in which you are writing.

You can use BASH expression to achieve this. For example if I wanted to make 10 files all with the same name, but having a unique number value I would do the following:
# touch my_file{01..10}.txt
This would create 10 files starting at 01 all the way to 10. This method is also hand for looping over files in a sequence or if your also creating directories.
Now if i am reading you question right your asking that if you move a file or create a file in a directory. you would want the a script to automatically create a new file for you? If that is the case then just use a test and if there is a file move that file and mark it. Me personally I use time stamps to do so.
Logic:
# The [ -f ] tests if the file is present
if [ -f $MY_FILE_NAME ]; then
# If the file is present move the file and give it the PID
# That way the name will always be unique
mv $MY_FILE_NAME $MY_FILE_NAME_$$
mv $MY_NEW_FILE .
else
# Move or make the file here
mv $MY_NEW_FILE .
fi
As you can see the logic is very simple. Hope this helps.
Cheers

I don't know about Your particular use case, but You may try to look at logrotate:
https://wiki.archlinux.org/index.php/Logrotate

Add comments next to files in Linux

I'm interested in simply adding a comment next to my files in Linux (Ubuntu). An example would be:
info user ... my_data.csv Raw data which was sent to me.
info user ... my_data_cleaned.csv Raw data with duplicates filtered.
info user ... my_data_top10.csv Cleaned data with only top 10 values selected for each ID.
So sort of the way you can comment commits in Git. I don't particularly care about searching on these tags, filtering them etc. Just seeings them when I list files in a directory. Bonus if the comments/tags follow the document around as I copy or move it.

Most filesystem types support extended attributes where you could store comments.
So for example to create a comment on "foo.file":
xattr -w user.comment "This is a comment" foo.file
The attributes can be copied/moved with the file just be aware that many utilities require special options to copy the extended attributes.
Then to list files with comments use a script or program that grabs the extended attribute. Here is a simple example to use as a starting point, it just lists the files in the current directory:
#!/bin/sh
ls -1 | while read -r FILE; do
comment=`xattr -p user.comment "$FILE" 2>/dev/null`
if [ -n "$comment" ]; then
echo "$FILE Comment: $comment"
else
echo "$FILE"
fi
done
The xattr command is really slow and poorly written (it doesn't even return error status) so I suggest something else if possible. Use setfattr and getfattr in a more complex script than what I have provided. Or maybe a custom ls command that is aware of the user.comment attribute.

This is a moderately serious challenge. Basically, you want to add attributes to files, keep the attributes when the file is copied or moved, and then modify ls to display the values of these attributes.
So, here's how I would attack the problem.
1) Store the information in a sqlLite database. You can probably get away with one table. The table should contain the complete path to the file, and your comment. I'd name the database something like ~/.dirinfo/dirinfo.db. I'd store it in a subfolder, because you may find later on that you need other information in this folder. It'd be nice to use inodes rather than pathnames, but they change too frequently. Still, you might be able to do something where you store both the inode and the pathname, and retrieve by pathname only if the retrieval by inode fails, in which case you'd then update the inode information.
2) write a bash script to create/read/update/delete the comment for a given file.
3) Write another bash function or script that works with ls. I wouldn't call it "ls" though, because you don't want to mess with all the command line options that are available to ls. You're going to be calling ls always as ls -1 in your script, possibly with some sort options, such as -t and/or -r. Anyway, your script will call ls -1 and loop through the output, displaying the file name, and the comment, which you'll look up using the script from 2). You may also want to add file size, but that's up to you.
4) write functions to replace mv and cp (and ln??). These would be wrapper functions that would update the information in your table, and then call the regular Unix versions of these commands, passing along any arguments received by the functions (i.e. "$#"). If you're really paranoid, you'd also do it for things like scp, which can be used (inefficiently) to copy files locally. Still, it's unlikely you'll catch all the possibilities. What if someone else does a mv on your file, who doesn't have the function you have? What if some script moves the file by calling /bin/mv? You can't easily get around these kinds of issues.
Or if you really wanted to get adventurous, you'd write some C/C++ code to do this. It'd be faster, and honestly not all that much more challenging, provided you understand fork() and exec(). I can't recall whether sqlite has a C API. I assume it does. You'd have to tangle with that, too, but since you only have one database, and one table, that shouldn't be too challenging.
You could do it in perl, too, but I'm not sure that it would be that much easier in perl, than in bash. Your actual code isn't that complex, and you're not likely to be doing any crazy regex stuff or string manipulations. There are just lots of small pieces to fit together.
Doing all of this is much more work than should be expected for a person answering a question here, but I've given you the overall design. Implementing it should be relatively easy if you follow the design above and can live with the constraints.

Program to list files of a process in Linux

I need a program to list all the file that are accessed/opened by a process in Linux.
It should work like this,
o/p: The full path of the files that the process is accessing.
Don't want to use 'lsof' utility or any other utility.
Is there anyway to achieve this programmatically?

If you want just the files which are accessible thru opened file descriptors by process of pid 1234, list the /proc/1234/fd/ directory (most of the entries are symlinks). You'll also get additional details thru /proc/1234/fdinfo/
Try
ls -l /proc/self/fd/
to get an idea of what these files contain.
Programatically you could use readdir(3) after opendir(3) on these directories (and also readlink(2), at least for entries in /proc/1234/fd/ ....). See also proc(5)
Notice that /proc/ is Linux specific. Some other Unixes have it (e.g. Solaris), with very different contents, properties, semantics.
If you care also about files which have been opened and closed in the past by some process, it is much more difficult. See also inotify(7) and ptrace(2)...
To convert a file path to a "canonical" absolute fiile path, use realpath(3).

multiple file view like DB-view

Is it possible, using bash, to create a view/virtual file that when opened combines 2 files into 1?
example:
FILE_META_1.txt
FILE_META_2.txt
combines into
FILE_META.txt

In general, this is not possible. I assume you mean you want to logically link 2 files without creating a 3rd file that is the sum of the 2 files. I've often wanted this feature also. It would have to be done at the kernel level or via a special file system, maybe use FUSE. UnionFS provides this for directories, but not for files. FuseFile looks like it does what you want. Also take a look at the Logic File System.

You can open them stream-like wise with process substitution:
cat <(cat FILE_META_1.txt; cat FILE_META_2.txt;)
<(*) here expands to a named pipe path which you could open and access like a file for input.

How can you tell what files are currently open by any user?

I am trying to write a script or a piece of code to archive files, but I do not want to archive anything that is currently open. I need to find a way to determine what files in a directory are open. I want to use either Perl or a shell script, but can try use other languages if needed. It will be in a Linux environment and I do not have the option to use lsof. I have also had inconsistant results with fuser. Thanks for any help.
I am trying to take log files in a directory and move them to another directory. If the files are open however, I do not want to do anything with them.

You are approaching the problem incorrectly. You wish to keep files from being modified underneath you while you are reading, and cannot do that without operating system support. The best that you can hope for in a multi-user system is to keep your archive metadata consistent.
For example, if you are creating the archive directory, make sure that the number of bytes stored in the archive matches the directory. You can checksum the file contents before and after reading the filesystem and compare that with what you wrote to the archive and perhaps flag it as "inconsistent".
What are you trying to accomplish?
Added in response to comment:
Look at logrotate to steal ideas about how to handle this consistently just have it do the work for you. If you are concerned that rename of files will make processes that are currently writing them will break things, take a look at man 2 rename:
rename() renames a file, moving it
between directories if required. Any
other hard links to the file (as
created using link(2)) are unaffected.
Open file descriptors for oldpath are
also unaffected.
If newpath already exists it will be atomically replaced (subject
to a few conditions; see ERRORS
below), so that there is no point at
which another process attempting to
access newpath will find it missing.

Try ls -l /proc/*/fd/* as root.

msw has answered the question correctly but if you want to file the list of open processes, the lsof command will give it to you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string