Finding if a folder is in copying process on linux

Finding if a folder is in copying process on linux - linux

Is there any way to find out if a folder is in a copying process ?
To be more specific:
I have a folder in a share drive which is copied there by someone else, and I need to use it but, at the moment that I access it (let's admit that I check
the existence before and it's okay) the copying process may still be on going.
I want to check this from a bash/python script.

Try lsof - list open files
lsof +d /path/to/some/directory
Here is an example with a huge copy:
mkdir /tmp/big
cd /tmp/big
# Create 1 Gb file
perl -e 'for(1..10000000) { print "x"x100 . "\n" }' > huge
# Start cp process in background, it will take a few seconds
cp -r /tmp/big /tmp/huge &
$ lsof +d /tmp/big
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
cp 4291 felix 3r REG 8,1 1010000000 2752741 /tmp/big/huge
man lsof

Related

Remove ext2 file with rootfs while it's already mounted

What happens after mounting filesystem from file?
Example:
I have rootfs.ext2 file which is located in data directory and mounted under /mnt directory
mount rootfs.ext2 /mnt
After removing rootfs.ext2 I still can use files under /mnt directory, cat file, run binaries, etc.
rm -f rootfs.ext2
I was thinking that rootfs.ext2 file still exists in data directory however it was deleted. I filled whole data directory for test purposes with new data by filling file from /dev/urandom (for rewritting actual data that was before in data directory)
cat /dev/urandom > /data/Filling
Even after filling whole space in data directory I still can access /mnt and run binaries.
The question is what happens with file after mounting it and why I still can moderate throw it? Can I delete rootfs.ext2 (if it's mounted under /) file without undefined behavior of system(binaries are running, full access to filesystem, etc)
Links to documentation are appreciated.

Linux (and Unix) filesystems have several features that allow that.
Inodes
Data (the thing you get when you run cat) and metadata (what you get from stat and ls) is stored in inodes ("indexed nodes") which are like a key-value type storage. Inodes are indexed in the sense that an inode is referred to by its ID, the inode number.
That means that the data in rootfs.ext2 is stored in an inode.
Hard Links
Files inside directories are represented as directory entries. A directory entry is a pair of name and inode number.
You can think of directories as hashtables, where the key is the name, and the value is the inode number.
The full path that a directory entry represents is called a hard link to that inode.
That means that multiple directory entries, in different directories or even in the same directory, can point to the same inode number.
You can create that by running:
$ echo hello > x1
$ cat x1
hello
$ ls -li x1
1956 -rw-r----- 1 root root 6 2022-09-03 21:26 x1
$ ln -v x1 x2
'x2' => 'x1'
$ cat x2
hello
$ ll -li x1 x2
1956 -rw-r----- 2 root root 6 2022-09-03 21:26 x1
1956 -rw-r----- 2 root root 6 2022-09-03 21:26 x2
ln, by default, creates a hard link.
ls -i prints the inode number, and you can see that in the above example, x1 and x2 have the same inode number, and are therefore both hard links to that inode.
You can also see that the first ls prints 1 before root - that's the number of hard links that inode 1956 has. You see it increasing to 2 after x2 is created.
What this means is that rootfs.ext2 is a hard link that points to the inode that actually holds the filesystem.
Reference Count
Every inode has a reference count.
When nothing is loaded, the inode's reference count is equal to its hard link count.
But if the file is opened, the open file is another reference.
For example:
$ exec 8<>x2 # opens x2 for read & write as file descriptor 8
$ cat /proc/self/fd/8
hello
Because this is reference counting, a file can has 0 hard links, but still have references. Continuing the above example, with the file still open:
$ rm -v x1 x2
removed 'x1'
removed 'x2'
$ ls -li
total 0
$ cat /proc/self/fd/8
hello
The hard links that point to the inode are gone, but the open file still points to the inode, so the inode is not deleted.
(BTW if you check, you'll see that /proc/self/fd/8 is actually not another hard link to that inode, but rather a symbolic link. However, the fact that you can still read the inode's data indicates that the inode wasn't deleted)
Internally Open Files
Opening a file from userspace, like we did above with exec 8<>x2, is just one way to open files.
Many things in the Linux kernel internally open files. For example:
The swap file is internally open
When a program is executed, its executable file internally open while the program is running, as well as the dynamically linked libraries it uses.
As long as a block device is mounted, the inode that represents it is internally open.
When a socket is created, it is internally represented as an open file.
When a block device is set to be a loop device, it keeps the backing file open.
Loop Mounts
When you run mount rootfs.ext2 /mnt, what actually happens is that mount creates a block device, e.g. /dev/loop9, then opens rootfs.ext2, and configures /dev/loop9, as a loop device backed by the open file descriptor for rootfs.ext2.
As noted above, that means that as long as the block device is configured as a loop device for that file descriptor, that rootfs.ext2 inode remains open, and therefore with a reference count > 0, and therefore not deleted.
In fact, even if you deleted the loop device itself, the data would still be available, because that block device is also internally open, meaning both the backing regular file (rootfs.ext2) and the block device (/dev/loop9) are kept open:
$ sudo mount rootfs.ext2 /mnt/test/
$ echo hello > /mnt/test/x
$ losetup --list
NAME SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE DIO LOG-SEC
/dev/loop9 0 0 1 0 /tmp/rootfs.ext2 0 512
$ rm -v rootfs.ext2
removed 'rootfs.ext2'
$ sudo rm -v /dev/loop9
removed '/dev/loop9'
$ cat /mnt/test/x
hello
$ sudo umount /mnt/test
$ ls /mnt/test/
$
Extra Credit: Open Directories
Inodes contain whatever data and/or metadata is needed. Regular files, like rootfs.ext2, are represented as inodes. But directories are also inodes, as well as block devices, pipes, sockets, etc.
This means that directories have reference counts too, and that they too are opened. Famously via opendir(), but also internally:
When you call something like open(/etc/passwd), the inode of the root directory (/) is briefly opened to look up etc, and the inode for /etc is briefly opened to loop up passwd.
The working directory of every process is always internally open - if you delete it from another process, the first process could still run ls in it. However, it will not be able to create new files in it.
When a directory is a mount point, it is internally open.
You can unmount a mount point that is still in use, because every such "use" is counted as a reference:
$ sudo mount rootfs.ext2 /mnt/test/
$ cd /mnt/test/
$ echo hello > x
$ sudo umount --lazy /mnt/test
$ cat x
hello
$ cd / # reference count of what was mounted on /mnt/test drops to 0
$ cd /mnt/test
$ cat x
cat: x: No such file or directory

Bash, display processes in specific folder

I need to display processes, that are running in specific folder.
For example, there are folders "TEST" and "RUN". 3 sql files are running from TEST, and 2 from RUN. So when I use command ps xa, I can see all processes, runned from TEST and RUN together. What I want is to see processes, runned only from TEST folder, so only 3. Any commands, solutions to do this?

You can use lsof for this.
lsof | grep '/path/of/RUN'.
If you want to include both RUN and TEST in same command
lsof | grep -E "/path/of/RUN|/path/of/TEST"
Hope it helps.

You can try fuser to see which processes have particular files open; or, on Linux, examine the /proc/12345/cwd symlink for each of the candidate processes (replace 12345 with the process id of each).
fuser TEST/*.sql
for proc in /proc/[1-9]*; do
readlink "$proc/cwd" | grep -q TEST && echo "$proc"
done
The latter is not portable to other U*xes, though some may offer similar facilities.

How find out which process is using a file in Linux?

I tried to remove a file in Linux using rm -rf file_name, but got the error:
rm: file_name not removed. Text file busy
How can I find out which process is using this file?

You can use the fuser command, which is part of the psmisc package, like:
fuser file_name
You will receive a list of processes using the file.
You can use different flags with it, in order to receive a more detailed output.
You can find more info in the fuser's Wikipedia article, or in the man pages.

#jim's answer is correct -- fuser is what you want.
Additionally (or alternately), you can use lsof to get more information including the username, in case you need permission (without having to run an additional command) to kill the process. (THough of course, if killing the process is what you want, fuser can do that with its -k option. You can have fuser use other signals with the -s option -- check the man page for details.)
For example, with a tail -F /etc/passwd running in one window:
ghoti#pc:~$ lsof | grep passwd
tail 12470 ghoti 3r REG 251,0 2037 51515911 /etc/passwd
Note that you can also use lsof to find out what processes are using particular sockets. An excellent tool to have in your arsenal.

For users without fuser :
Although we can use lsof, there is another way i.e., we can query the /proc filesystem itself which lists all open files by all process.
# ls -l /proc/*/fd/* | grep filename
Sample output below:
l-wx------. 1 root root 64 Aug 15 02:56 /proc/5026/fd/4 -> /var/log/filename.log
From the output, one can use the process id in utility like ps to find program name

$ lsof | tree MyFold
As shown in the image attached:

Cgi-bin script to cat a file owned by a user

I'm using Ubuntu server and I have a cgi-bin script doing the following . . .
#!/bin/bash
echo Content-type: text/plain
echo ""
cat /home/user/.program/logs/file.log | tail -400 | col -b > /tmp/o.txt
cat /tmp/o.txt
Now if I run this script with I am "su" the script fills o.txt and then the host.com/cgi-bin/script runs but only shows up to the point I last ran it from the CLI
My apache error log is showing "permission denied" errors. So I know the user apache is running under somehow cannot cat this file. I tried using chown to no avail. Since this file is in a user directory, what is the best way to either duplicate it or symbolic link it or what?
I even considered running the script as root in a crontab to sort of "update" the file in /tmp/ but that did not work for me. How would somebody experienced with cgi-bin handle access to a file in a users directory?

The Apache user www-data does not have write access to a temporary file owned by another user.
But in this particular case, no temporary file is required.
tail -n 400 logfile | col -b
However, if Apache is running in a restricted chroot, it also has no access to /home.
The log file needs to be chmod o+r and all directories leading down to it should be chmod o+x. Make sure you understand the implications of this! If the user has a reason to want to prevent access to an intermediate directory, having read access to the file itself will not suffice. (Making something have www-data as its group owner is possible in theory, but impractical and pointless, as anybody who finds the CGI script will have access to the file anyway.)
More generally, if you do need a temporary file, the simple fix (not even workaround) is to generate a unique temporary file name, and remove it afterwards.
temp=$(mktemp -t cgi.XXXXXXXX) || exit $?
trap 'rm -f "$temp"' 0
trap 'exit 127' 1 2 15
tail -n 400 logfile | col -b >"$temp"
The first trap makes sure the file is removed when the script terminates. The second makes sure the first trap runs if the script is interrupted or killed.

I would be inclined to change the program that creates the log in the first place and write it to some place visible to Apache - maybe through symbolic links.
For example:
ln -s /var/www/cgi-bin/logs /home/user/.program/logs
So your program continues to write to /home/user/.program/logs but the data actually lands in /var/www/cgi-bin/logs where Apache can read it.

How do I find out what process has a lock on a file in Linux?

Today I had the problem that I couldn't delete a folder because "it was busy".
How can I find out which application to blame for that or can I just delete it with brute force?

Use lsof to find out what has what files are open.
man lsof or have a look here

The fuser Unix command will give you the PIDs of the processes accessing a file.

lslocks lists information about all the currently held file locks in a Linux system. (part of util-linux) this utility has support for json output, which is nice for scripts.
~$ sudo lslocks
COMMAND PID TYPE SIZE MODE M START END PATH
cron 873 FLOCK 4B WRITE 0 0 0 /run/crond.pid
..
..

fuser will show you which processes are accessing a file or directory.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string