Nix store is mounted twice - nixos

In the previous generation (a single ext4 partition, no zfs) /nix/store is mounted once.
$ grep nix /proc/self/mountinfo
30 29 259:3 /nix/store /nix/store ro,relatime shared:14 - ext4 /dev/disk/by-uuid/… rw
After moving the store to a separate zfs partition, it's now mounted twice.
$ grep nix /proc/self/mountinfo
30 29 0:27 / /nix/store rw,relatime shared:2 - zfs nix rw,xattr,noacl
31 30 0:27 / /nix/store ro,relatime shared:3 - zfs nix rw,xattr,noacl
Is this how it's supposed to be? If not, how can I stop it from being mounted twice?

Related

mkfs.vfat: unable to open {partition}: No such file or directory (command succeeds, but throws this error and blocks rest of script)

Update: I got this working but am still not 100% sure why. I've appended the fully and consistently working script to the end for reference.
I'm trying to script a series of disk partition commands using sgdisk and mkfs.vfat. I'm working from a Live USB (NixOS 21pre), have a blank 1TB M.2 SSD, and am creating a 1GB EFI boot partition, and a 999GB ZFS partition.
Everything works up until I try to create a FAT32 filesystem on the EFI partition, using mkfs.vfat, where I get the error in the title.
However, the odd thing is, the mkfs.vfat command succeeds, but throws that error anyway and blocks the rest of the script. Any idea why it's doing this and how to fix it?
Starting with an unformatted 1TB M.2 SSD:
$ sudo parted /dev/disk/by-id/wwn-0x5001b448b94488f8 print
Error: /dev/sda: unrecognised disk label
Model: ATA WDC WDS100T2B0B- (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:
Script:
$ ls
total 4
drwxr-xr-x 2 nixos users 60 May 18 20:25 .
drwx------ 17 nixos users 360 May 18 15:24 ..
-rwxr-xr-x 1 nixos users 2225 May 18 19:59 partition.sh
$ cat partition.sh
#!/usr/bin/env bash
#make gpt partition table and boot & rpool partitions for ZFS on 1TB M.2 SSD
#error handling on
set -e
#wipe the disk with -Z, then create two partitions, a 1GB (945GiB) EFI boot partition, and a ZFS root partition consisting of the rest of the drive, then print the results
DISK=/dev/disk/by-id/wwn-0x5001b448b94488f8
sgdisk -Z $DISK
sgdisk -n 1:0:+954M -t 1:EF00 -c 1:efi $DISK
sgdisk -n 2:0:0 -t 2:BF01 -c 2:zroot $DISK
sgdisk -p /dev/sda
#make a FAT32 filesystem on the EFI partition, then mount it
#mkfs.vfat -F 32 ${DISK}-part1 (troubleshooting with hardcoded version below)
mkfs.vfat -F 32 /dev/disk/by-id/wwn-0x5001b448b94488f8-part1
mkdir -p /mnt/boot
mount ${DISK}-part1 /mnt/boot
Result (everything fine until mkfs.vfat, which throws error and blocks the rest of the script):
$ sudo sh partition.sh
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries in memory.
Setting name!
partNum is 0
The operation has completed successfully.
Setting name!
partNum is 1
The operation has completed successfully.
Disk /dev/sda: 1953525168 sectors, 931.5 GiB
Model: WDC WDS100T2B0B-
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 77ED6A41-E722-4FFB-92EC-975A37DBCB97
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)
Number Start (sector) End (sector) Size Code Name
1 2048 1955839 954.0 MiB EF00 efi
2 1955840 1953525134 930.6 GiB BF01 zroot
mkfs.fat 4.1 (2017-01-24)
mkfs.vfat: unable to open /dev/disk/by-id/wwn-0x5001b448b94488f8-part1: No such file or directory
Verifying the partitioning and FAT32 creation commands worked:
$ sudo parted /dev/disk/by-id/wwn-0x5001b448b94488f8 print
Model: ATA WDC WDS100T2B0B- (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 1001MB 1000MB fat32 efi boot, esp
2 1001MB 1000GB 999GB zroot
Fwiw, the same command works on the commandline with no error:
$ sudo mkfs.vfat -F 32 /dev/disk/by-id/wwn-0x5001b448b94488f8-part1
mkfs.fat 4.1 (2017-01-24)
Success. But why no error on the commandline, but an error in the script?
Update: fully and consistently working script:
#!/usr/bin/env bash
#make UEFI (GPT) partition table and two partitions (FAT32 boot and ZFS rpool) on 1TB M.2 SSD
#error handling on
set -e
#vars
DISK=/dev/disk/by-id/wwn-0x5001b448b94488f8
POOL='rpool'
#0. if /mnt/boot is mounted, umount it; if any NixOS filesystems are mounted, unmount them
if mount -l | grep -q '/mnt/boot'; then
umount -f /mnt/boot
fi
if mount -l | grep -q '/mnt/nix'; then
umount -fR /mnt
fi
#1. if a zfs pool exists, delete it
if zpool list | grep -q $POOL; then
zfs unmount -a
zpool export $POOL
zpool destroy -f $POOL
fi
#2. wipe the disk
sgdisk -Z $DISK
wipefs -a $DISK
#3. create two partitions, a 1GB (945GiB) EFI boot partition, and a ZFS root partition consisting of the rest of the drive, then print the results
sgdisk -n 1:0:+954M -t 1:EF00 -c 1:efiboot $DISK
sgdisk -n 2:0:0 -t 2:BF01 -c 2:zfsroot $DISK
sgdisk -p /dev/sda
#4. notify the OS of partition updates, and print partition info
partprobe
parted ${DISK} print
#5. make a FAT32 filesystem on the EFI boot partition
mkfs.vfat -F 32 ${DISK}-part1
#6. notify the OS of partition updates, and print new partition info
partprobe
parted ${DISK} print
#mount the partitions in nixos-zfs-pool-dataset-create.sh script. Make sure to first mount the ZFS root dataset on /mnt before mounting and subdirectories of /mnt.
It may take time for kernel to be notified about partition changes. Try calling partprobe before mkfs, to request kernel to re-read the partition tables.

Fails to `mkdir /mnt/vzsnap0` for Container Backups with Permission Denied

This is all done as the root user.
The script for backups at /usr/share/perl5/PVE/VZDump/LXC.pm sets a default mount point
my $default_mount_point = "/mnt/vzsnap0";
But regardless of whether I use the GUI or the command line I get the following error:
ERROR: Backup of VM 103 failed - mkdir /mnt/vzsnap0:
Permission denied at /usr/share/perl5/PVE/VZDump/LXC.pm line 161.
And lines 160 - 161 in that script is:
my $rootdir = $default_mount_point;
mkpath $rootdir;
After the installation before I created any images or did any backups I setup two things.
(1) SSHFS mount for /mnt/backups
(2) Added all other drives as Linux LVM
What I did for the drive addition is as simple as:
pvcreate /dev/sdb1
pvcreate /dev/sdc1
pvcreate /dev/sdd1
pvcreate /dev/sde1
vgextend pve /dev/sdb1
vgextend pve /dev/sdc1
vgextend pve /dev/sdd1
vgextend pve /dev/sde1
lvextend pve/data /dev/sdb1
lvextend pve/data /dev/sdc1
lvextend pve/data /dev/sdd1
lvextend pve/data /dev/sde1
For the SSHFS instructions see my blog post on it: https://6ftdan.com/allyourdev/2018/02/04/proxmox-a-vm-server-for-your-home/
Here are filesystem directory permission related files and details.
cat /etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/pve/root / ext4 errors=remount-ro 0 1
/dev/pve/swap none swap sw 0 0
proc /proc proc defaults 0 0
df -h
Filesystem Size Used Avail Use% Mounted on
udev 7.8G 0 7.8G 0% /dev
tmpfs 1.6G 9.0M 1.6G 1% /run
/dev/mapper/pve-root 37G 8.0G 27G 24% /
tmpfs 7.9G 43M 7.8G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
/dev/fuse 30M 20K 30M 1% /etc/pve
sshfs#10.0.0.10:/mnt/raid/proxmox_backup 1.4T 725G 672G 52% /mnt/backups
tmpfs 1.6G 0 1.6G 0% /run/user/0
ls -dla /mnt
drwxr-xr-x 3 root root 0 Aug 12 20:10 /mnt
ls /mnt
backups
ls -dla /mnt/backups
drwxr-xr-x 1 1001 1002 80 Aug 12 20:40 /mnt/backups
The command that I desire to succeed is:
vzdump 103 --compress lzo --node ProxMox --storage backup --remove 0 --mode snapshot
For the record the container image is only 8GB in size.
Cloning containers does work and snapshots work.
Q & A
Q) How are you running the perl script?
A) Through the GUI you click on Backup now, then select your storage (I have backups and local and the both produce this error), then select the state of the container (Snapshot, Suspend, Stop each produce the same error), then compression type (none, LZO, and gzip each produce the same error). Once all that is set you click Backup and get the following output.
INFO: starting new backup job: vzdump 103 --node ProxMox --mode snapshot --compress lzo --storage backups --remove 0
INFO: Starting Backup of VM 103 (lxc)
INFO: Backup started at 2019-08-18 16:21:11
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: Passport
ERROR: Backup of VM 103 failed - mkdir /mnt/vzsnap0: Permission denied at /usr/share/perl5/PVE/VZDump/LXC.pm line 161.
INFO: Failed at 2019-08-18 16:21:11
INFO: Backup job finished with errors
TASK ERROR: job errors
From this you can see that the command is vzdump 103 --node ProxMox --mode snapshot --compress lzo --storage backups --remove 0 . I've also tried logging in with a SSH shell and running this command and get the same error.
Q) It could be that the directory's "immutable" attribute is set. Try lsattr / and see if /mnt has the lower-case "i" attribute set to it.
A) root#ProxMox:~# lsattr /
--------------e---- /tmp
--------------e---- /opt
--------------e---- /boot
lsattr: Inappropriate ioctl for device While reading flags on /sys
--------------e---- /lost+found
lsattr: Operation not supported While reading flags on /sbin
--------------e---- /media
--------------e---- /etc
--------------e---- /srv
--------------e---- /usr
lsattr: Operation not supported While reading flags on /libx32
lsattr: Operation not supported While reading flags on /bin
lsattr: Operation not supported While reading flags on /lib
lsattr: Inappropriate ioctl for device While reading flags on /proc
--------------e---- /root
--------------e---- /var
--------------e---- /home
lsattr: Inappropriate ioctl for device While reading flags on /dev
lsattr: Inappropriate ioctl for device While reading flags on /mnt
lsattr: Operation not supported While reading flags on /lib32
lsattr: Operation not supported While reading flags on /lib64
lsattr: Inappropriate ioctl for device While reading flags on /run
Q) Can you manually created /mnt/vzsnap0 without any issues?
A) root#ProxMox:~# mkdir /mnt/vzsnap0
mkdir: cannot create directory ‘/mnt/vzsnap0’: Permission denied
Q) Can you replicate it in a clean VM ?
A) I don't know. I don't have an extra system to try it on and I need the container's I have on it. Trying it within a VM in ProxMox… I'm not sure. I suppose I could try but I'd really rather not have to just yet. Maybe if all else fails.
Q) If you look at drwxr-xr-x 1 1001 1002 80 Aug 12 20:40 /mnt/backups, it looks like there are is a user with id 1001 which has access to the backups, so not even root will be able to write. You need to check why it is 1001 and which group is represented by 1002. Then you can add your root as well as the user under which the GUI runs to the group with id 1002.
A) I have no problem writing to the /mnt/backups directory. Just now did a cd /mnt/backups; mkdir test and that was successful.
From the message
mkdir /mnt/vzsnap0: Permission denied
it is obvious the problem is the permissions for /mnt directory.
It could be that the directory `s "immutable" attribute is set.
Try lsattr / and see if /mnt has the lower-case "i" attribute set to it.
As a reference:
The lower-case i in lsattr output indicates that the file or directory is set as immutable: even root must clear this attribute first before making any changes to it. With root access, you should be able to remove this with chattr -i /mnt, but there is probably a reason why this was done in the first place; you should find out what the reason was and whether or not it's still applicable before removing it. There may be security implications.
So, if this is the case, try:
chattr -i /mnt
to remove it.
References
lsattr output
According to inode flags—attributes manual page:
FS_IMMUTABLE_FL 'i':
The file is immutable: no changes are permitted to the file
contents or metadata (permissions, timestamps, ownership, link
count and so on). (This restriction applies even to the supe‐
ruser.) Only a privileged process (CAP_LINUX_IMMUTABLE) can
set or clear this attribute.
As long as the bounty is still up I'll give it to a legitimate answer that fixes the problem described here.
What I'm writing here for you all is a work around I've thought of which works. Note, it is very slow.
Since I am able to write to the /mnt/backups directory, which exists on another system on the network, I went ahead and changed the Perl script to point to /mnt/backups/vzsnap0 instead of /mnt/vzsnap0.
Bounty remains for anyone who can get the /mnt directory to work for the mount path to successfully mount vzsnap0 for the backup script..
1)
Perhaps your "/mnt/vzsnap0" is mounted as read only?
It may tell from your:
/dev/pve/root / ext4 errors=remount-ro 0 1
'errors=remount-ro' means in case of mistake remounting the partition like readonly. Perhaps this setting applies for your mounted filesystem as well.
Can you try remounting the drive as in the following link? https://askubuntu.com/questions/175739/how-do-i-remount-a-filesystem-as-read-write
And if that succeeds, manually create the directory afterwards?
2) If that didn't help:
https://www.linuxquestions.org/questions/linux-security-4/mkdir-throws-permission-denied-error-in-a-directoy-even-with-root-ownership-and-777-permission-4175424944/
There, someone remarked:
What is the filesystem for the partition that contains the directory.[?]
Double check the permissions of the directory, or whether it's a
symbolic link to another directory. If the directory is an NFS mount,
rootsquash can prevent writing by root.
Check for attributes (lsattr). Check for ACLs (getfacl). Check for
selinux restrictions. (ls -Z)
If the filesystem is corrupt, it might be initially mounted RW but
when you try to write to a bad area, change to RO.
Great, turns out this is a pretty long-standing issue with Ubuntu Make which is faced by many people.
I saw a workaround mentioned by an Ubuntu Developer in the above link.
Just follow the below steps:
sudo -s
unset SUDO_UID
unset SUDO_GID
Then run umake to install your application as normal.
you should now be able to install to any directory you want. Works flawlessly for me.
try ls laZ /mnt to review the security context, in case SE Linux is enabled. relabeling might be required then. errors=remount-ro should also be investigated (however, it is rather unlikely lsattr would fail, unless the /mnt inode itself is corrupted). Creating a new directory inode for these mount-points might be worth a try; if it works, one can swap them.
Just change /mnt/backups to /mnt/sshfs/backups
And the vzdump will work.

Why can a process launched with `unshare -m` affect the mounts in the host?

Here is what I did:
$ sudo unshare -m --propagation unchanged sh # Run a shell with `unshare` in a separate mount namespace
# cd /tmp
# mkdir foo bar
# mount --bind foo bar # This mount is supposed to be only visible in this separate mount namespace, right?
# exit # Back to the original shell
$ cat /proc/self/mountinfo | grep foo # Why can I see it here???
272 26 8:1 /tmp/foo /tmp/bar rw,relatime shared:1 - ext4 /dev/sda1 rw,errors=remount-ro,data=ordered
I know it will work as expected when I run sudo unshare -m sh, but that is because by default unshare will recursively set all mount's propagation as private (see code here and here). When I run it with the --propagation unchanged, unshare will not set mount's propagation at all and it will only call the unshare() syscall with CLONE_NEWNS in which case the the mount made by the launched shell will be visible in host mount namespace as you see in the above example.
So my question is, since it is mount propagation to isolate the mount/umount operations, then why do we need CLONE_NEWNS at all? Or CLONE_NEWNS is only used to isolate setting mount propagation (rather than mount/umount operations) for different mount namespaces?
Mount propagation type determines whether new mounts and unmounts of mount points under the mount point are propagated back to parent namespace. By default all the mounts points are marked as shared mounts. If you create a new namespace and leave the propagation type unchanged, then new mounts under these mounts will be propagated back to the parent. But if there is a private mount point in the parent namespace, then the mount point itself is copied to the new namespace but new mounts are not propagated back to the parent.
Lets make an example:
# mkdir -p /tmp/shared-mount
# mount --bind --make-shared /tmp/shared-mount /tmp/shared-mount
# mkdir -p /tmp/private-mount
# mount --bind --make-private /tmp/private-mount /tmp/private-mount
# grep "/tmp" /proc/self/mountinfo
406 29 8:5 /tmp/shared-mount /tmp/shared-mount rw,relatime shared:1 - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
420 29 8:5 /tmp/private-mount /tmp/private-mount rw,relatime - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
Note the "shared:1", indicating a shared mount point. Now:
# unshare -m --propagation unchanged /bin/bash
# grep "/tmp" /proc/self/mountinfo
551 432 8:5 /tmp/shared-mount /tmp/shared-mount rw,relatime shared:1 - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
552 432 8:5 /tmp/private-mount /tmp/private-mount rw,relatime - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
All good. Now lets create a submounts under these mount points in the new namespace:
# mkdir -p /tmp/shared-mount/submount
# mount --bind /tmp/shared-mount/submount /tmp/shared-mount/submount
# mkdir -p /tmp/private-mount/submount
# mount --bind /tmp/private-mount/submount /tmp/private-mount/submount
# grep "/tmp" /proc/self/mountinfo
551 432 8:5 /tmp/shared-mount /tmp/shared-mount rw,relatime shared:1 - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
552 432 8:5 /tmp/private-mount /tmp/private-mount rw,relatime - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
553 551 8:5 /tmp/shared-mount/submount /tmp/shared-mount/submount rw,relatime shared:1 - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
555 432 8:5 /tmp/shared-mount/submount /tmp/shared-mount/submount rw,relatime shared:1 - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
577 552 8:5 /tmp/private-mount/submount /tmp/private-mount/submount rw,relatime - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
You can now observe the /proc/self/mountinfo from another shell under the parent namespace and see, that the new submount under the private mount point has not propagated. Or exit the new namespace, like you did:
# exit
# grep "/tmp" /proc/self/mountinfo
406 29 8:5 /tmp/shared-mount /tmp/shared-mount rw,relatime shared:1 - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
420 29 8:5 /tmp/private-mount /tmp/private-mount rw,relatime - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
556 406 8:5 /tmp/shared-mount/submount /tmp/shared-mount/submount rw,relatime shared:1 - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
554 29 8:5 /tmp/shared-mount/submount /tmp/shared-mount/submount rw,relatime shared:1 - ext4 /dev/sda5 rw,errors=remount-ro,data=ordered
Under the new namespace, you can also make shared mount point private:
# mount --make-private /tmp/shared-mount
In real life, it is even more complicated than that. Good additional reading:
https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt
http://man7.org/linux/man-pages/man7/mount_namespaces.7.html
http://man7.org/linux/man-pages/man5/proc.5.html (about mountinfo)

Sparse file taking all fallocate()d space at once

I'm trying to create a sparse file (for a QEMU HDD image).
Both qemu-img and fallocate are proving confusing.
$ fallocate -l 100M disk.img
$ ls -lsh disk.img
101M -rw-r--r-- 1 i336 users 100M Jul 22 12:03 disk.img
Note the 101M. strace shows a successful syscall:
$ strace fallocate -l 100M disk.img
open("disk.img", O_RDWR|O_CREAT|O_LARGEFILE, 0666) = 3
fallocate(3, 0, 0, 104857600) = 0
$ ls -lsh disk.img
101M -rw-r--r-- 1 i336 users 100M Jul 22 12:03 disk.img
I'm not sure if stat is the right tool, but just in case..
$ stat disk.img
File: 'disk.img'
Size: 104857600 Blocks: 204808 IO Block: 4096 regular file
Device: 802h/2050d Inode: 549166 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1337/ i336) Gid: ( 100/ users)
A possible (very weird) clue: 104857600/204808 = 511.9800. (File size / block count)
qemu-img has similar output. (I found the preallocation option in the manual.)
$ qemu-img create -f raw -o preallocation=falloc disk.img 100M
Formatting 'disk.img', fmt=raw size=104857600 preallocation=falloc
$ ls -lsh disk.img
101M -rw-r--r-- 1 i336 users 100M Jul 22 12:06 disk.img
Here's the annoying bit: the image appears to be using real space on disk.
$ df -h /; fallocate -l 1G disk.img; df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/root 48G 43G 3.5G 93% /
Filesystem Size Used Avail Use% Mounted on
/dev/root 48G 44G 2.5G 95% /
And yet, just like a sparse file, it takes no time to create!
$ time fallocate -l 3.3G disk.img
0.00user 0.57system 0:00.91elapsed 63%CPU (0avgtext+0avgdata 5424maxresident)k
200inputs+0outputs (0major+68minor)pagefaults 0swaps
0.91 seconds, on a 5400RPM HDD. There is no way I'm not creating a sparse file.
And yet no matter what tool I use, it appears to be using 101MB of space right off the bat.
What could I be doing wrong or have misconfigured?
$ cat /etc/fstab
/dev/sda2 / ext4 rw,user_xattr 0 0

Backup entire disk in Ubuntu

I would like to make a backup of the entire HDD disk.
Step-by-step what I'am trying to do:
1) Check storage capacity (that is going to be backupped):
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 455G 157G 275G 37% /
2) Mount extra, empty hdd to /mnt/backup/
/dev/sdb 294G 63M 279G 1% /mnt/backup
3) Run backup (using lzop as the fastest compressor)
dd if=/dev/sda1 bs=4M conv=noerror iflag=noatime,nofollow | lzop -1 > /mnt/backup/dev-sda1.lzo
But the backup fails with error: lzop: No space left on device: <stdout>
The extra hdd being fulled with dev-sda1.lzo. But the original size of /dev/sda1 "157G" is obviously less than available on /dev/sdb "279G". Even without compression.
In /etc/fstab /dev/sda1 being mounted to "/":
UUID=8a49b90e-6115-43a6-9702-7620182bbbf5 / ext4 errors=remount-ro 0 1
Is it possible that "dd" is doing recursive copy of the "/mnt/backup/" folder and this leads to it fail ?
Please, advice
Thanks to Mark Setchell to show me the correct direction.
Finally the solution to create dump of the whole partition without spaces is:
dump -0a -z1 -f /mnt/hdd1/dev-sda1.dump.gz /dev/sda1
For 157 G partition of Ubuntu 14.04 + development files + database files "dump" takes 45 minutes (on 7200 rpm HDD) and the result file was 80 G (compression level = 1).

Resources