Detecting a chroot jail from within - linux

How can one detect being in a chroot jail without root privileges? Assume a standard BSD or Linux system. The best I came up with was to look at the inode value for "/" and to consider whether it is reasonably low, but I would like a more accurate method for detection.
[edit 20080916 142430 EST] Simply looking around the filesystem isn't sufficient, as it's not difficult to duplicate things like /boot and /dev to fool the jailed user.
[edit 20080916 142950 EST] For Linux systems, checking for unexpected values within /proc is reasonable, but what about systems that don't support /proc in the first place?

The inode for / will always be 2 if it's the root directory of an ext2/ext3/ext4 filesystem, but you may be chrooted inside a complete filesystem. If it's just chroot (and not some other virtualization), you could run mount and compare the mounted filesystems against what you see. Verify that every mount point has inode 2.

On Linux with root permissions, test if the root directory of the init process is your root directory. Although /proc/1/root is always a symbolic link to /, following it leads to the “master” root directory (assuming the init process is not chrooted, but that's hardly ever done). If /proc isn't mounted, you can bet you're in a chroot.
[ "$(stat -c %d:%i /)" != "$(stat -c %d:%i /proc/1/root/.)" ]
# With ash/bash/ksh/zsh
! [ -x /proc/1/root/. ] || [ /proc/1/root/. -ef / ]
This is more precise than looking at /proc/1/exe because that could be different outside a chroot if init has been upgraded since the last boot or if the chroot is on the main root filesystem and init is hard linked in it.
If you do not have root permissions, you can look at /proc/1/mountinfo and /proc/$$/mountinfo (briefly documented in filesystems/proc.txt in the Linux kernel documentation). This file is world-readable and contains a lot of information about each mount point in the process's view of the filesystem. The paths in that file are restricted by the chroot affecting the reader process, if any. If the process reading /proc/1/mountinfo is chrooted into a filesystem that's different from the global root (assuming pid 1's root is the global root), then no entry for / appears in /proc/1/mountinfo. If the process reading /proc/1/mountinfo is chrooted to a directory on the global root filesystem, then an entry for / appears in /proc/1/mountinfo, but with a different mount id. Incidentally, the root field ($4) indicates where the chroot is in its master filesystem. Again, this is specific to Linux.
[ "$(awk '$5=="/" {print $1}' </proc/1/mountinfo)" != "$(awk '$5=="/" {print $1}' </proc/$$/mountinfo)" ]

If you are not in a chroot and the root filesystem is ext2/ext3/ext4, the inode for / will always be 2. You may check that using
stat -c %i /
or
ls -id /
Interresting, but let's try to find path of chroot directory. Ask to stat on which device / is located:
stat -c %04D /
First byte is major of device and lest byte is minor. For example, 0802, means major 8, minor 1. If you check in /dev, you will see this device is /dev/sda2. If you are root you can directly create correspondong device in your chroot:
mknode /tmp/root_dev b 8 1
Now, let's find inode associated to our chroot. debugfs allows list contents of files using inode numbers. For exemple, ls -id / returned 923960:
sudo debugfs /tmp/root_dev -R 'ls <923960>'
923960 (12) . 915821 (32) .. 5636100 (12) var
5636319 (12) lib 5636322 (12) usr 5636345 (12) tmp
5636346 (12) sys 5636347 (12) sbin 5636348 (12) run
5636349 (12) root 5636350 (12) proc 5636351 (12) mnt
5636352 (12) home 5636353 (12) dev 5636354 (12) boot
5636355 (12) bin 5636356 (12) etc 5638152 (16) selinux
5769366 (12) srv 5769367 (12) opt 5769375 (3832) media
Interesting information is inode of .. entry: 915821. I can ask its content:
sudo debugfs /tmp/root_dev -R 'ls <915821>'
915821 (12) . 2 (12) .. 923960 (20) debian-jail
923961 (4052) other-jail
Directory called debian-jail has inode 923960. So last component of my chroot dir is debian-jail. Let's see parent directory (inode 2) now:
sudo debugfs /tmp/root_dev -R 'ls <2>'
2 (12) . 2 (12) .. 11 (20) lost+found 1046529 (12) home
130817 (12) etc 784897 (16) media 3603 (20) initrd.img
261633 (12) var 654081 (12) usr 392449 (12) sys 392450 (12) lib
784898 (12) root 915715 (12) sbin 1046530 (12) tmp
1046531 (12) bin 784899 (12) dev 392451 (12) mnt
915716 (12) run 12 (12) proc 1046532 (12) boot 13 (16) lib64
784945 (12) srv 915821 (12) opt 3604 (3796) vmlinuz
Directory called opt has inode 915821 and inode 2 is root of filesystem. So my chroot directory is /opt/debian-jail. Sure, /dev/sda1 may be mounted on another filesystem. You need to check that (use lsof or directly picking information /proc).

Preventing stuff like that is the whole point. If it's your code that's supposed to run in the chroot, have it set a flag on startup. If you're hacking, hack: check for several common things in known locations, count the files in /etc, something in /dev.

On BSD systems (check with uname -a), proc should always be present. Check if the dev/inode pair of /proc/1/exe (use stat on that path, it won't follow the symlink by text but by the underlying hook) matches /sbin/init.
Checking the root for inode #2 is also a good one.
On most other systems, a root user can find out much faster by attempting the fchdir root-breaking trick. If it goes anywhere you are in a chroot jail.

I guess it depends why you might be in a chroot, and whether any effort has gone into disguising it.
I'd check /proc, these files are automatically generated system information files. The kernel will populate these in the root filesystem, but it's possible that they don't exist in the chroot filesystem.
If the root filesystem's /proc has been bound to /proc in the chroot, then it is likely that there are some discrepancies between that information and the chroot environment. Check /proc/mounts for example.
Similrarly, check /sys.

I wanted the same information for a jail running on FreeBSD (as Ansible doesn't seem to detect this scenario).
On the FreeNAS distribution of FreeBSD 11, /proc is not mounted on the host, but it is within the jail. Whether this is also true on regular FreeBSD I don't know for sure, but procfs: Gone But Not Forgotten seems to suggest it is. Either way, you probably wouldn't want to try mounting it just to detect jail status and therefore I'm not certain it can be used as a reliable predictor of being within a jail.
I also ruled out using stat on / as certainly on FreeNAS all jails are given their own file system (i.e. a ZFS dataset) and therefore the / node on the host and in the jail both have inode 4. I expect this is common on FreeBSD 11 in general.
So the approach I settled on was using procstat on pid 0.
[root#host ~]# procstat 0
PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM
0 0 0 0 0 1234 - swapin - kernel
[root#host ~]# echo $?
0
[root#host ~]# jexec guest tcsh
root#guest:/ # procstat 0
procstat: sysctl(kern.proc): No such process
procstat: procstat_getprocs()
root#guest:/ # echo $?
1
I am making an assumption here that pid 0 will always be the kernel on the host, and there won't be a pid 0 inside the jail.

If you entered the chroot with schroot, then you can check the value of $debian_chroot.

Related

Buildroot doesn’t run as root and doesn’t want to run as root

I have 2 questions:
I am not sure to undrestand(from the directories description in Buildroot manual):
target/ which contains almost the complete root filesystem for the target:everything needed is present except the device files in /dev/ (Buildroot doesn’t run as root and doesn’t want to run as root)
Why buildroot need to be root to create the /dev
what i know is that buildroot uses target to generate images/rootfs.tar; is it a simple compression with taror ...? could you please help me find the make target that generate images/rootfs.tar?
In case of using NFS why can't we use directly the targetfolder as rootfs what makes "untaring" images/rootfs.tar different than target
Ref: http://free-electrons.com/~thomas/buildroot/manual/html/ch03.html
I am not sure to undrestand(from the directories description in Buildroot manual):
Buildroot, a tool for generating a kernel and root filesystem, is executed on your host system as a normal user without need of superuser privileges.
Why buildroot need to be root to create the /dev
Buildroot does not use superuser privileges.
what i know is that buildroot uses target to generate images/rootfs.tar; is it a simple compression with taror ...?
The .tar is an ordinary archive without compression.
You can configure/specify compression (and/or select filesystem images) using the make menuconfig procedure.
could you please help me find the make target that generate images/rootfs.tar?
You do not specify this in the make shell command.
You can configure/specify tar and/or cpio archives with optional compression (and/or select filesystem images) using the make menuconfig procedure.
In case of using NFS why can't we use directly the target folder as rootfs
Because it is not suitable as a roofs.
File owners & groups are incorrect (this could be irrelevant for NFS usage).
File permissions may not be correct (e.g. setuid for the busybox binary).
The /dev directory does not have the minimal device nodes that the target kernel requires.
Instead of the required minimal device nodes (e.g. console), the target directory has ordinary files in dev:
buildroot-2015.05/output/target$ ls -l dev
total 4
-rw--w--w- 1 me swdev 0 Sep 15 16:34 console
lrwxrwxrwx 1 me swdev 10 Aug 14 2015 log -> ../tmp/log
drwxrwxr-x 2 me swdev 4096 May 31 2015 pts
$
The target kernel cannot use these files when it expects device nodes. Instead of I/O performed through device nodes, ordinary file transfers will be attempted with these files.
The actual dev directory should be:
crw--w--w- 1 root root 5, 1 Sep 15 16:34 console
lrwxrwxrwx 1 root root 10 Aug 14 2015 log -> ../tmp/log
drwxr-xr-x 2 root root 4096 May 31 2015 pts
what makes "untaring" images/rootfs.tar different than target
Buildroot can cleverly create entries for the device nodes and assign the proper owner and group to each filename as it creates the archive (or filesystem image).
This is simply generating binary data in the appropriate format that is inserted with actual archive entries (or to the fs image) written to a file.
Only when it is un-archived (or the filesystem image is mounted) that the "data" is properly interpreted as for device nodes.

"ls -lh" directories reported size : linux vs cifs mounted drives

From a linux box, I recently mounted a Windows share using cifs.
The intend was to "locally" use rsync to backup my windows machine.
The command line used to mount the Windows drive is something like:
mount \\192.168.1.74\share /cifs1 -t cifs -o noserverino,iocharset=utf8,ro
Please also note that the drive attached to the linux box is formatted ntfs.
When doing a sample backup, rsync was always re-copying the directories (names), but not the files. After looking more closely at the "ls -lh " output at both ends, I noticed that on the linux side the size of the directory level is always 0:
TTT-Admin#1080-Router:/tmp/mnt/RT-1080/tmp# ls -lh
drwxrwxrwx 1 TTT-Admi root 0 Feb 8 12:14 DeltaCopy
but the size of the directory level on the cifs side was always different from 0:
TTT-Admin#1080-Router:/cifs1/temp/Rsync-Packages# ls -lh
drwxr-xr-x 1 TTT-Admi root 8.0K Feb 8 12:14 DeltaCopy
This difference explains why rsync was always recopying directories, but not recopying the folders(which was correct, folder sizes and time stamps being the same on both ends).
EDIT: the rsync command is:
> rsync -av /cifs1/Temp/Rsync-Packages/DeltaCopy /mnt/RT-1080/tmp/rsync
-av /cifs1/Temp/Rsync-Packages/DeltaCopy /mnt/RT-1080/tmp/
Is a directory supposed to "have a size" or not ? What should I do to solve this discrepancy ?

What is there behind a symbolic link?

How are symbolic links managed internally by UNIX/Linux systems. It is known that a symbolic link may exist even without an actual target file (Dangling link). So what is that which represents a symbolic link internally.
In Windows, the answer is a reparse point.
Questions:
Is the answer an inode in UNIX/Linux?
If yes, then will the inode number be same for target and links?
If yes, can the link inode can have permissions different from that of target's inode (if one exists)?
It is not about UNIX/Linux but about filesystem implementation - but yes, Unix/Linux uses inodes at kernel level and filesystem implementations have inodes (at least virtual ones).
In the general, symbolic links are simply files (btw, directories are also files), that have:
the flag file-type in the "inode" that tells to the system this file is a "symbolic link"
file-content: path to the target - in other words: a symbolic link is simply a file which contains a filename with a flag in the inode.
Virtual filesystems can have symbolic links too, so, check FUSE or some other filesystem implementation sources. (ext2/ext3/ufs..etc)
So,
Is the answer an inode in UNIX/Linux?
depends on filesystem implementation, but yes, generally the inode contains a "file-type" (and owners, access rights, timestamps, size, pointers to data blocks). There are filesystems that don't have inodes (in a physical implementation) but have only "virtual inodes" for maintaining compatibility with the kernel.
If yes, then will the inode number be same for target and links?
No. Usually, the symlink is a file with its own inode, (with file-type, own data blocks, etc.)
If yes, can the link inode can have permissions different from that of target's
inode(if one exists)?
This is about how symlink files are handled. Usually, the kernel doesn't allow changes to the symlink permissions - and symlinks always have default permissions. You could write your own filesystem that would allow different permissions for symlinks, but you would get into trouble because common programs like chmod don't change permissions on symlinks themselves, so making such a filesystem would be pointless anyway)
To understand the difference between hard links and symlinks, you should understand directories first.
Directories are files (with differentiated by a flag in the inode) that tell the kernel, "handle this file as a map of file-name to inode_number". Hard-links are simply file names that map to the same inode. So if the directory-file contains:
file_a: 1000
file_b: 1001
file_c: 1000
the above means, in this directory, are 3 files:
file_a described by inode 1000
file_b described by inode 1001 and
file_c again described by inode 1000 (so it is a hard link with file_a, not hardlink to file_a - because it is impossible to tell which filename came first - they are identical).
This is the main difference to symlinks, where the inode of file_b (inode 1001) could have content "file_a" and a flag meaning "this is a symlink". In this case, file_b would be a symlink pointing to file_a.
You can also easily explore this on your own:
$ touch a
$ ln -s a b
$ ln a c
$ ls -li
total 0
95905 -rw-r--r-- 1 regnarg regnarg 0 Jun 19 19:01 a
96990 lrwxrwxrwx 1 regnarg regnarg 1 Jun 19 19:01 b -> a
95905 -rw-r--r-- 2 regnarg regnarg 0 Jun 19 19:01 c
The -i option to ls shows inode numbers in the first column. You can see that the symlink has a different inode number while the hardlink has the same. You can also use the stat(1) command:
$ stat a
File: 'a'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 28h/40d Inode: 95905 Links: 2
[...]
$ stat b
File: 'b' -> 'a'
Size: 1 Blocks: 0 IO Block: 4096 symbolic link
Device: 28h/40d Inode: 96990 Links: 1
[...]
If you want to do this programmatically, you can use the lstat(2) system call to find information about the symlink itself (its inode number etc.), while stat(2) shows information about the target of the symlink, if it exists. Example in Python:
>>> import os
>>> os.stat("b").st_ino
95905
>>> os.lstat("b").st_ino
96990

FSCK shows different results when checked as system and non-system disk

I have a problem with UBUNTU 10.04 filesystem, which reports different results when a drive is mounted and checked with FSCK as the system drive and when it is checked by another system drive:
sudo fsck /dev/sdb1
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
57217 inodes used (24.81%)
42 non-contiguous files (0.1%)
65 non-contiguous directories (0.1%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 50910/20
293868 blocks used (31.88%)
0 bad blocks
1 large file
43327 regular files
7242 directories
59 character device files
26 block device files
0 fifos
509 links
6549 symbolic links (6187 fast symbolic links)
5 sockets
--------
57717 files
sudo fsck -n -t ext4 /dev/sda1
fsck from util-linux-ng 2.17.2
e2fsck 1.41.11 (14-Mar-2010)
Warning! /dev/sda1 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/sda1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (628134, counted=628213).
Fix? no
Free inodes count wrong (173391, counted=173379).
Fix? no
/dev/sda1: ********** WARNING: Filesystem still has errors **********
/dev/sda1: 57217/230608 files (0.1% non-contiguous), 293722/921856 blocks
Whenever the drive is mounted as the system / boot drive, running fsck (check don't fix) shows INODE and BLOCK counts are wrong, yet checking the same drive from another system disk reports that it's fine.
Any ideas ?
This should probably be on SuperUser or ServerFault, not StackOverflow, but anyway:
you can only fsck -n a file system that is mounted read-only. While the file system is mounted read-write, it will be inconsistent until cleanly unmounted, or the journal is recovered. The important message is
Warning: skipping journal recovery because doing a read-only filesystem check.

du -skh * in / returns vastly different size from df on centos 5.5

I have a vps slice running centos 5.5 I am supposed to have 15 gigs of disk space, but according to df it seems to double my disk space usage.
when I run du -skh * in / as root i get:
[root#yardvps1 /]# du -skh *
0 aquota.group
0 aquota.user
5.2M bin
4.0K boot
4.0K dev
4.9M etc
2.5G home
12M lib
14M lib64
4.0K media
4.0K mnt
299M opt
0 proc
692K root
23M sbin
4.0K selinux
4.0K srv
0 sys
48K tmp
2.0G usr
121M var
this is consistent with what I have uploaded to the machine, and adds up to about 5gigs.
BUT when i run df i get:
[root#yardvps1 /]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/simfs 15728640 11659048 4069592 75% /
none 262144 4 262140 1% /dev
it is showing me using almost 12 gigs already.
what is causing this discrepancy and is there anything I can do about it, I planned the server out based on 15 gigs but now it is basically only letting me have about 7 gigs of stuff on it.
thanks.
The most common cause of this effect is open files that have been deleted.
The kernel will only free the disk blocks of a deleted file if it is not in use at the time of its deletion. Otherwise that is deferred until the file is closed, or the system is rebooted.
A common Unix-world trick to ensure that no temporary files are left around is the following:
A process creates and opens a temporary file
While still holding the open file descriptor, the process unlinks (i.e. deletes) the file
The process reads and writes to the file normally using the file descriptor
The process closes the file descriptor when it's done, and the kernel frees the space
If the process (or the system) terminates unexpectedly, the temporary file is already deleted and no clean-up is necessary.
As a bonus, deleting the file reduces the chances of naming collisions when creating temporary files and it also provides an additional layer of obscurity over the running processes - for anyone but the root user, that is.
This behaviour ensures that processes don't have to deal with files that are suddenly pulled from under their feet, and also that processes don't have to consult each other in order to delete a file. It is unexpected behaviour for those coming from Windows systems, though, since there you are not normally allowed to delete a file that is in use.
The lsof command, when run as root, will show all open files and it will specifically indicate deleted files that are deleted:
# lsof 2>/dev/null | grep deleted
bootlogd 2024 root 1w REG 9,3 58 917506 /tmp/init.0W2ARi (deleted)
bootlogd 2024 root 2w REG 9,3 58 917506 /tmp/init.0W2ARi (deleted)
Stopping and restarting the guilty processes, or just rebooting the server should solve this issue.
Deleted files could also be held open by the kernel if, for example, it's a mounted filesystem image. In this case unmounting the filesystem or rebooting the server should do the trick.
In your case, judging by the size of the "missing" space I'd look for any references to the file that you used to set up the VPS e.g. the Centos DVD image that you deleted after installing.
Another case which I've come across although it doesn't appear to be your issue is if you mount a partition "on top" of existing files.
If you do so you effectively hide existing files that exist in the directory on the mounted-to partition (the mount point) from the mounted partition.
To fix: stop any processes with open files on the mounted partition, unmount partition, find and move/remove any files that now appear in mount point directory.
I had the same trouble with FreeBSD server. The reboot helped.

Resources