How can I tar a file that is being used by another process?

How can I tar a file that is being used by another process? - linux

I'm archiving a directory. This directory has a file that is being written by another process. When I tar this using Linux tar/Perl Tar module, in the archive the entry for the file is there but the contents are null.
Before tarring the files are...
-rw-r--r-- 1 irraju dba 28 Feb 18 02:22 a
-rw-r--r-- 1 irraju dba 25 Feb 18 02:23 b
-rw-r--r-- 1 irraju dba 29 Feb 18 03:38 c
After untarring
-rw-r--r-- irraju/dba 28 2009-02-18 02:22:58 a
-rw-r--r-- irraju/dba 25 2009-02-18 02:23:17 b
-rw-r--r-- irraju/dba 0 2009-02-18 03:33:12 c
How can I fix this problem? I want the file to be in the archive with the contents it has at the instant it is archived. This file can be a log file and assume that we can not close the file handle before tarring.

As you tagged the question with "Linux" there's a chance you're using an LVM partition.
If indeed you're running on an LVM partition, you can use the LVM snapshot feature.
Here's a link to the relevant LVM documentation on how to perform the operation.
Here's a part of the LVM snapshot intro:
A wonderful facility provided by LVM is 'snapshots'. This allows the administrator to create a new block device which presents an exact copy of a logical volume, frozen at some point in time. Typically this would be used when some batch processing, a backup for instance, needs to be performed on the logical volume, but you don't want to halt a live system that is changing the data. When the snapshot device has been finished with the system administrator can just remove the device. This facility does require that the snapshot be made at a time when the data on the logical volume is in a consistent state - the VFS-lock patch for LVM1 makes sure that some filesystems do this automatically when a snapshot is created, and many of the filesystems in the 2.6 kernel do this automatically when a snapshot is created without patching.

Try copying the files first...
cp a a.tmp
cp b b.tmp
cp c c.tmp
...then tarball everything together...
tar *.tmp abc.tar
...and clean up:
rm *.tmp
If that doesn't work then the process holding the file handle doesn't want to share read access...

You may find that this depends on the filesystem used and the application that is accessing the file. The closest to a generic solution is to use a filesystem that supports snapshots and create a snapshot before running tar.

Your second output is made after your first, that can't be right. I'm guessing that tar is right here: when it was doing its job, the file was empty. You may be dealing with a race condition here.

As others have said, it depends on the file system & OS being used. sync first (or whatever the equivalent is on your file system), copy the files to a temp directory and then tar them up. If the file system won't allow you to copy an opened file, then you're SOL; Perl can't get around file system limitations.

Related

what happens if I create a file using vim in /dev directory. How the file will be created as the /dev is not standard file system

What happens if I create a file using vim in the /dev directory. How will the file be created as the /dev is not a standard file system. I can see a file being created but standard Kernel file operation create was not called. Now I am not sure how this file was created by kernel. Will it use some udev bound Kernel API to create this file.
Note : I can see the file in /dev after creation. Look at the ls output below.
crw-rw-rw- 1 root tty 5, 0 Aug 24 17:32 tty
-rw-r--r-- 1 root root 35 Aug 24 17:37 abc
-rw-r--r-- 1 root root 0 Aug 24 17:37 ght
-rw-r--r-- 1 root root 0 Aug 24 17:51 ioiu
I want to find this out to determine what will happen if some illegal SW forcefully writes to /dev directory , how can I find that out.

If you try in MacOS it won't work even as root.
If you try in CentOS 8 it will work if you're root.
Other Linux flavors your mileage may vary.
It is a very interesting directory that highlights one important aspect of the Linux filesystem - everything is a file or a directory.
Example
[root]# date > /dev/date
[root]# cat /dev/date
Tue Aug 24 19:13:04 UTC 2021
All that being said, your concern about nefarious software creating a file in this specific directory seems too specific. If the software has the ability to write to /dev it can write to anywhere and hide in plain site. If you're really concerned about this, install a file integrity monitoring (FIM) package to monitor file CRUD.
References
dev filesystem

What kind of file is the smp_affinity-file?

The IRQ affinity can be set by writing a bit mask to /proc/irq/<irqid>/smp_affinity.
I guess there is a kernel module behind smp_affinity, however, ls tells me it is a normal file:
# ls
-rw-r--r-- 1 root root 0 Feb 9 16:06 smp_affinity
So I wonder, what kind of file /proc/irq/<irqid>/smp_affinity is?

Read about procfs - https://man7.org/linux/man-pages/man5/procfs.5.html https://en.wikipedia.org/wiki/Procfs etc.
smp_affinity is a file inside /proc filesystem. File operation on that file are handled specially by the kernel. Writing or reading - instead of storing or retrieving the data using some non-volatile medium - the kernel executes special function with special semantics instead.
The file would be created somewhere in kernel/irq/proc.c.

Buildroot doesn’t run as root and doesn’t want to run as root

I have 2 questions:
I am not sure to undrestand(from the directories description in Buildroot manual):
target/ which contains almost the complete root filesystem for the target:everything needed is present except the device files in /dev/ (Buildroot doesn’t run as root and doesn’t want to run as root)
Why buildroot need to be root to create the /dev
what i know is that buildroot uses target to generate images/rootfs.tar; is it a simple compression with taror ...? could you please help me find the make target that generate images/rootfs.tar?
In case of using NFS why can't we use directly the targetfolder as rootfs what makes "untaring" images/rootfs.tar different than target
Ref: http://free-electrons.com/~thomas/buildroot/manual/html/ch03.html

I am not sure to undrestand(from the directories description in Buildroot manual):
Buildroot, a tool for generating a kernel and root filesystem, is executed on your host system as a normal user without need of superuser privileges.
Why buildroot need to be root to create the /dev
Buildroot does not use superuser privileges.
what i know is that buildroot uses target to generate images/rootfs.tar; is it a simple compression with taror ...?
The .tar is an ordinary archive without compression.
You can configure/specify compression (and/or select filesystem images) using the make menuconfig procedure.
could you please help me find the make target that generate images/rootfs.tar?
You do not specify this in the make shell command.
You can configure/specify tar and/or cpio archives with optional compression (and/or select filesystem images) using the make menuconfig procedure.
In case of using NFS why can't we use directly the target folder as rootfs
Because it is not suitable as a roofs.
File owners & groups are incorrect (this could be irrelevant for NFS usage).
File permissions may not be correct (e.g. setuid for the busybox binary).
The /dev directory does not have the minimal device nodes that the target kernel requires.
Instead of the required minimal device nodes (e.g. console), the target directory has ordinary files in dev:
buildroot-2015.05/output/target$ ls -l dev
total 4
-rw--w--w- 1 me swdev 0 Sep 15 16:34 console
lrwxrwxrwx 1 me swdev 10 Aug 14 2015 log -> ../tmp/log
drwxrwxr-x 2 me swdev 4096 May 31 2015 pts
$
The target kernel cannot use these files when it expects device nodes. Instead of I/O performed through device nodes, ordinary file transfers will be attempted with these files.
The actual dev directory should be:
crw--w--w- 1 root root 5, 1 Sep 15 16:34 console
lrwxrwxrwx 1 root root 10 Aug 14 2015 log -> ../tmp/log
drwxr-xr-x 2 root root 4096 May 31 2015 pts
what makes "untaring" images/rootfs.tar different than target
Buildroot can cleverly create entries for the device nodes and assign the proper owner and group to each filename as it creates the archive (or filesystem image).
This is simply generating binary data in the appropriate format that is inserted with actual archive entries (or to the fs image) written to a file.
Only when it is un-archived (or the filesystem image is mounted) that the "data" is properly interpreted as for device nodes.

What is there behind a symbolic link?

How are symbolic links managed internally by UNIX/Linux systems. It is known that a symbolic link may exist even without an actual target file (Dangling link). So what is that which represents a symbolic link internally.
In Windows, the answer is a reparse point.
Questions:
Is the answer an inode in UNIX/Linux?
If yes, then will the inode number be same for target and links?
If yes, can the link inode can have permissions different from that of target's inode (if one exists)?

It is not about UNIX/Linux but about filesystem implementation - but yes, Unix/Linux uses inodes at kernel level and filesystem implementations have inodes (at least virtual ones).
In the general, symbolic links are simply files (btw, directories are also files), that have:
the flag file-type in the "inode" that tells to the system this file is a "symbolic link"
file-content: path to the target - in other words: a symbolic link is simply a file which contains a filename with a flag in the inode.
Virtual filesystems can have symbolic links too, so, check FUSE or some other filesystem implementation sources. (ext2/ext3/ufs..etc)
So,
Is the answer an inode in UNIX/Linux?
depends on filesystem implementation, but yes, generally the inode contains a "file-type" (and owners, access rights, timestamps, size, pointers to data blocks). There are filesystems that don't have inodes (in a physical implementation) but have only "virtual inodes" for maintaining compatibility with the kernel.
If yes, then will the inode number be same for target and links?
No. Usually, the symlink is a file with its own inode, (with file-type, own data blocks, etc.)
If yes, can the link inode can have permissions different from that of target's
inode(if one exists)?
This is about how symlink files are handled. Usually, the kernel doesn't allow changes to the symlink permissions - and symlinks always have default permissions. You could write your own filesystem that would allow different permissions for symlinks, but you would get into trouble because common programs like chmod don't change permissions on symlinks themselves, so making such a filesystem would be pointless anyway)
To understand the difference between hard links and symlinks, you should understand directories first.
Directories are files (with differentiated by a flag in the inode) that tell the kernel, "handle this file as a map of file-name to inode_number". Hard-links are simply file names that map to the same inode. So if the directory-file contains:
file_a: 1000
file_b: 1001
file_c: 1000
the above means, in this directory, are 3 files:
file_a described by inode 1000
file_b described by inode 1001 and
file_c again described by inode 1000 (so it is a hard link with file_a, not hardlink to file_a - because it is impossible to tell which filename came first - they are identical).
This is the main difference to symlinks, where the inode of file_b (inode 1001) could have content "file_a" and a flag meaning "this is a symlink". In this case, file_b would be a symlink pointing to file_a.

You can also easily explore this on your own:
$ touch a
$ ln -s a b
$ ln a c
$ ls -li
total 0
95905 -rw-r--r-- 1 regnarg regnarg 0 Jun 19 19:01 a
96990 lrwxrwxrwx 1 regnarg regnarg 1 Jun 19 19:01 b -> a
95905 -rw-r--r-- 2 regnarg regnarg 0 Jun 19 19:01 c
The -i option to ls shows inode numbers in the first column. You can see that the symlink has a different inode number while the hardlink has the same. You can also use the stat(1) command:
$ stat a
File: 'a'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 28h/40d Inode: 95905 Links: 2
[...]
$ stat b
File: 'b' -> 'a'
Size: 1 Blocks: 0 IO Block: 4096 symbolic link
Device: 28h/40d Inode: 96990 Links: 1
[...]
If you want to do this programmatically, you can use the lstat(2) system call to find information about the symlink itself (its inode number etc.), while stat(2) shows information about the target of the symlink, if it exists. Example in Python:
>>> import os
>>> os.stat("b").st_ino
95905
>>> os.lstat("b").st_ino
96990

du -skh * in / returns vastly different size from df on centos 5.5

I have a vps slice running centos 5.5 I am supposed to have 15 gigs of disk space, but according to df it seems to double my disk space usage.
when I run du -skh * in / as root i get:
[root#yardvps1 /]# du -skh *
0 aquota.group
0 aquota.user
5.2M bin
4.0K boot
4.0K dev
4.9M etc
2.5G home
12M lib
14M lib64
4.0K media
4.0K mnt
299M opt
0 proc
692K root
23M sbin
4.0K selinux
4.0K srv
0 sys
48K tmp
2.0G usr
121M var
this is consistent with what I have uploaded to the machine, and adds up to about 5gigs.
BUT when i run df i get:
[root#yardvps1 /]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/simfs 15728640 11659048 4069592 75% /
none 262144 4 262140 1% /dev
it is showing me using almost 12 gigs already.
what is causing this discrepancy and is there anything I can do about it, I planned the server out based on 15 gigs but now it is basically only letting me have about 7 gigs of stuff on it.
thanks.

The most common cause of this effect is open files that have been deleted.
The kernel will only free the disk blocks of a deleted file if it is not in use at the time of its deletion. Otherwise that is deferred until the file is closed, or the system is rebooted.
A common Unix-world trick to ensure that no temporary files are left around is the following:
A process creates and opens a temporary file
While still holding the open file descriptor, the process unlinks (i.e. deletes) the file
The process reads and writes to the file normally using the file descriptor
The process closes the file descriptor when it's done, and the kernel frees the space
If the process (or the system) terminates unexpectedly, the temporary file is already deleted and no clean-up is necessary.
As a bonus, deleting the file reduces the chances of naming collisions when creating temporary files and it also provides an additional layer of obscurity over the running processes - for anyone but the root user, that is.
This behaviour ensures that processes don't have to deal with files that are suddenly pulled from under their feet, and also that processes don't have to consult each other in order to delete a file. It is unexpected behaviour for those coming from Windows systems, though, since there you are not normally allowed to delete a file that is in use.
The lsof command, when run as root, will show all open files and it will specifically indicate deleted files that are deleted:
# lsof 2>/dev/null | grep deleted
bootlogd 2024 root 1w REG 9,3 58 917506 /tmp/init.0W2ARi (deleted)
bootlogd 2024 root 2w REG 9,3 58 917506 /tmp/init.0W2ARi (deleted)
Stopping and restarting the guilty processes, or just rebooting the server should solve this issue.
Deleted files could also be held open by the kernel if, for example, it's a mounted filesystem image. In this case unmounting the filesystem or rebooting the server should do the trick.
In your case, judging by the size of the "missing" space I'd look for any references to the file that you used to set up the VPS e.g. the Centos DVD image that you deleted after installing.

Another case which I've come across although it doesn't appear to be your issue is if you mount a partition "on top" of existing files.
If you do so you effectively hide existing files that exist in the directory on the mounted-to partition (the mount point) from the mounted partition.
To fix: stop any processes with open files on the mounted partition, unmount partition, find and move/remove any files that now appear in mount point directory.

I had the same trouble with FreeBSD server. The reboot helped.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string