btrfs ioctl: get file checksums from userspace

btrfs ioctl: get file checksums from userspace - linux

I would like to obtain the BTRFS checksums related to the specific file, but unfortunately I have not found appropriate ioctl to perform this action. Is it possible to do? If so, how to do that? I need stored checksums to try to reduce CPU load in cases similar to rsync behaviour.

Just now pushed this messy code into my github repo. https://github.com/Lakshmipathi/btrfs-progs/tree/dump_csum
Its not official code. I tested for files between size 100K to 50GB. They seem to match.
Usage:
./btrfs-debug-tree -f /path/to/file /btrfs/partition
will create csumdump file at destination.
Example:
sudo ./btrfs-debug-tree -f /btrfs/50gbfile1 /dev/sda4
will create an output file-named '/btrfs/50gbfile1.csumdump' with csum of file blocks.
Note: I was trying this out for educational/learning purpose,so it comes with all usual disclaimers. Planning to cleanup this code sometime this week.
If you plan to use, I would recommend you to test with following cases:
1) Create 20GB (or any file with size > 1KB) on /tmp/
2) mount your btrfs partition on /btrfs and copy file /tmp/file /btrfs/f1
3) Now dump the csum it will produce /btrfs/f1.csumdump
4) cp /tmp/file /btrfs/f2 and dump f2's csum.
5) Now compare f1.csumdump with f2.csumdump If they match, it seems to be
working. If they didn't match something went wrong.
Update after almost 5 years!
New location: https://github.com/Lakshmipathi/dduper

Related

Need more clarity on file command usage in linux?

I have built a linux image for ARM on Ubuntu. I was curious to use the file command on the image file created in arch/arm/boot directory. When i execute the following the command
balaji#balaji-virtual-machine:~/meraj/linux-stable/arch/arm/boot$ ls
bootp compressed dts Image install.sh Makefile zImage
balaji#balaji-virtual-machine:~/meraj/linux-stable/arch/arm/boot$ file Image
Image: data
balaji#balaji-virtual-machine:~/meraj/linux-stable/arch/arm/boot$ file zImage
zImage: data
balaji#balaji-virtual-machine:~/meraj/linux-stable/arch/arm/boot$
It gives not much information. I would like to know if this is expected behaviour or not?

From file manpage:
The type printed will usually contain one of the words...
... "data" meaning anything else (data is usually 'binary' or non-printable).
Exceptions are well-known file formats (core files, tar archives) that
are known to contain binary data.
Also...
Any file that cannot be identified as having been written in any of
the character sets listed above is simply said to be 'data'.

Regarding lz4mt compression and linux buffering issue

I am using lz4mt multi-threaded version of lz4 and in my workflow I am sending thousands of large size files (620 MB) from client to server and when file reaches on server my rule will trigger and compress file using lz4mt and then remove uncompressed file. The problem is sometimes when I remove uncompressed file, I am not able to get compressed file of right size its because lz4mt returns immediately before sending output to disk.
So is there any way lz4mt will remove uncompressed file itself after compressing as done by bzip2.
Input: bzip2 uncompress_file
Output: Compressed file only
whereas
Input: lz4mt uncompress_file
Output: (Uncompressed + Compressed) file
Below script sync command also not working properly I think.
The script which execute as my rule triggers is:
script.sh
/bin/lz4mt uncompressed_file output_file
/bin/sync
/bin/rm uncompressed_file
Please tell me how to solve above issue.
Thanks a lot

Author here. You could try the following methods
Concatenate commands with && or ;.
Add lz4mt command line option -q (suppress prompt), and -f (force overwrite).
Try it with original lz4.

Compressing the core files during core generation

Is there way to compress the core files during core dump generation?
If the storage space is limited in the system, is there a way of conserving it in case of need for core dump generation with immediate compression?
Ideally the method would work on older versions of linux such as 2.6.x.

The Linux kernel /proc/sys/kernel/core_pattern file will do what you want: http://www.mjmwired.net/kernel/Documentation/sysctl/kernel.txt#191
Set the filename to something like |/bin/gzip -1 > /var/crash/core-%t-%p-%u.gz and your core files should be saved compressed for you.

For an embedded Linux systems, following script change perfectly works to generate compressed core files in 2 steps
step 1: create a script
touch /bin/gen_compress_core.sh
chmod +x /bin/gen_compress_core.sh
cat > /bin/gen_compress_core.sh #!/bin/sh exec /bin/gzip -f - >"/var/core/core-$1.$2.gz"
ctrl +d
step 2: update the core pattern file
cat > /proc/sys/kernel/core_pattern |/bin/gen_compress_core.sh %e %p ctrl+d

As suggested by other answer, the Linux kernel /proc/sys/kernel/core_pattern file is good place to start: http://www.mjmwired.net/kernel/Documentation/sysctl/kernel.txt#141
As documentation says you can specify the special character "|" which will tell kernel to output the file to script. As suggested you could use |/bin/gzip -1 > /var/crash/core-%t-%p-%u.gz as name, however it doesn't seem to work for me. I expect that the reason is that on my system kernel doesn't treat the > character as a output, rather it probably passes it as a parameter to gzip.
In order to avoid this problem, like other suggested you can create your file in some location I am using /home//crash/core.sh, create it using the following command, replacing with your user. Alternatively you can also obviously change the entire path.
echo -e '#!/bin/bash\nexec /bin/gzip -f - >"/home/<username>/crashes/core-$1-$2-$3-$4-$5.gz"' > ~/crashes/core.sh
Now this script will take 5 input parameters and concatenate them and add to core-path. The full paths must be specified in the ~/crashes/core.sh. Also the location of this script can be specified. Now lets tell kernel to use tour executable with parameters when generating file:
sudo sysctl -w kernel.core_pattern="|/home/<username>/crashes/core.sh %e %p %h %t"
Again should be replaced (or entire path to match location and name of core.sh script). Next step is to crash some program, lets create example crashing cpp file:
int main (){
int * a = nullptr;
int b = *a;
}
After compiling and running there are 2 options, either we will see:
Segmentation fault (core dumped)
Or
Segmentation fault
In case we see the latter, there are few possible reasons.
ulimit is not set, ulimit -c should specify what is limit for cores
apport or your distro core dump collector is not running, this should be investigated further
there is an error in script we wrote, I suggest than checking some basic dump path to check if the other things aren't reason the below should create /tmp/core.dump:
sudo sysctl -w kernel.core_pattern="/tmp/core.dump"
I know there is already an answer for this question however it wasn't obvious for me why it isn't working "out of the box" so I wanted to summarize my findings, hope it helps someone.

Setting creation or change timestamps

Using utimes, futimes, futimens, etc., it is possible to set the access and modification timestamps on a file.
Modification time is the last time the file data changed. Similarly, "ctime" or change time, is the last time attributes on the file, such as permissions, were changed. (Linux/POSIX maintains three timestamps: mtime and ctime, already discussed, and 'atime', or access time.)
Is there a function to set change timestamps? (Where "change" is the attribute modification or 'ctime', not modification time 'mtime'.) (I understand the cyclic nature of wanting to change the change timestamp, but think archiving software - it would be nice to restore a file exactly as it was.)
Are there any functions at all for creation timestamps? (I realize that ext2 does not support this, but I was wondering if Linux did, for those filesystems that do support it.)
If it's not possible, what is the reasoning behind it not being so?

For ext2/3 and possibly for ext4 you can do this with debugfs tool, assuming you want to change the ctime of file /tmp/foo which resides in disk /dev/sda1 we want to set ctime to 201001010101 which means 01 January 2010, time 01:01:
Warning: Disk must be unmounted before this operation
# Update ctime
debugfs -w -R 'set_inode_field /tmp/foo ctime 201001010101' /dev/sda1
# Drop vm cache so ctime update is reflected
echo 2 > /proc/sys/vm/drop_caches
Information taken from Command Line Kung Fu blog.

I had a similar issue, and wrote my answer here.
https://stackoverflow.com/a/17066309/391040
There are essentially two options:
Slight change in kernel (code included in link)
Change the system clock to the desired ctime, touch the file, then restore current time. (shell script for that included in link).

According to http://lists.gnu.org/archive/html/coreutils/2010-08/msg00010.html ctime cannot be faked (at least it's not intended to be fakeable):
POSIX says that atime and mtime are user-settable to arbitrary times
via the utimensat() family of syscalls, but that ctime must
unfakeably track the current time of any action that changes a file's
metadata or contents.
If you just need to change a file's ctime for some testing/debugging, bindfs might be helpful. It's a FUSE filesystem which mounts one directory into another place, and can do some transformation on the file attributes. With option --ctime-from-mtime the ctime of each file is the same as its mtime, which you can set with touch -t.

The easiest way:
1) change System time
2) copy paste a file on another location.
I tried this on windows 7 and I succeed to change all three timestamps.
The stat command on linux shows that all three timestamps are changed.

The script below automates running debugfs ... set_inode_field ... ctime ... in ismail's answer for many files. It will copy ctime values from files in /media/MYUSER/MYFS/FOO/BAR (recursively) to /media/MYUSER/MYFS2/FOO/BAR, and umount /media/MYUSER/MYFS2 as a side effect. It will work only if the filesystem of /media/MYUSER/MYFS2 is ext2, ext3 or ext4 (because debugfs works only for these filesystems).
mydev2="$(df /media/MYUSER/MYFS2 | perl -ne '$x = $1 if !m#^Filesystem # and m#([^ ]+) #; END { print "$x\n" }')"
cd /media/MYUSER/MYFS
find FOO/BAR -type f | perl -ne 'chomp; my #st = lstat($_); if (#st and -f(_)) { s#"#""#g; print "set_inode_field \"/$_\" ctime \#$st[10]\n" }' >/tmp/sif.out
sudo umount /media/MYUSER/MYFS2 # Repeat until success.
sudo debugfs -w -f /tmp/sif.out /dev/"$mydev2"
It handles filenames with whitespace and special characters correctly.
It works independently of time zones. As a limitation of debugfs, its precision is seconds, it ignores anything smaller (e.g. milliseconds, microseconds, nanoseconds). Depending the version of debugfs used, it may use 32-bit timestamps, thus it works correctly with dates before 2038-01-19.
If the current user doesn't have enough read permissions for /media/MYUSER/MYFS, then the commands above should be run as root (sudo bash).

How can I tarball the proc file system?

I would like to take a snapshot of my entire proc file system, and save it in a tarball (or in the worst case concatenate all of the text files together into a single text file).
But when I run:
tar -c /proc
I get a segfault.
What's the best way to do this? Should I set up some kind of recursive walk through each file?
I only have the basic *nix utilities, such as bash, cat, ls, echo, etc. I don't have anything fancy like python or perl or java.

The linux /proc filesystem is actually kernel variables pretending to be a filesystem. There is nothing to save thus nothing to backup. If the system let you, you could rm -rf /proc and it would magically reappear upon the next reboot.
The /dev file system has real i-nodes and they can be backed up. Except they have no contents, just a major and minor number, permissions, and a name. Tools that do backup special device files only record those parameters and never try to open(2) the device. However, since the device major and minor numbers are only meaningful on the precise system they are built for, there is little cause for backing them up.
The reason that trying to tar the /proc pseudo-filesystem causes tar to segfault is because /proc has funny file behavior: things like a write-only pseudo-file may appear to have read permissions, but return an error indication if a program tries to open(2) it for backup. That's sure to drive a naïve tar to get persnickety.
Added in response to comment
It doesn't surprise me that tar had problems reading /proc/kmsg because it has some funny properties:
# strace cat /proc/kmsg
execve("/bin/cat", ["cat", "kmsg"],
open("kmsg", O_RDONLY|O_LARGEFILE) = 3
// ok, no problem opening the file for reading
fstat64(3, { st_mode=S_IFREG|0400, st_size=0,
// looks like a normal file of zero length
// but cat does not pay attention to st_size so it just
// does a blocking read
read(3, "<4>[103128.156051] ata2.00: qc t"..., 32768) = 461
write(1, "<4>[103128.156051] ata2.00: qc t"..., 461) = 461
// ...forever...
read(3, "<6>[103158.228444] ata2.00: conf"..., 32768) = 48
write(1, "<6>[103158.228444] ata2.00: conf"..., 48) = 48
+++ killed by SIGINT +++
Since /proc/kmsg is a running list of kernel messages as they happen, it never returns 0 (EOF) it just keeps going until I get bored and press ^C.
Interestingly, my tar has no trouble with /proc/kmsg:
$ tar --version
tar (GNU tar) 1.22
# tar cf /tmp/junk.tar /proc/kmsg
$ tar tvf /tmp/junk.tar
-r-------- root/root 0 2010-09-01 14:41 proc/kmsg
and if you look at the strace output, GNU tar 1.22 saw that st_length == 0 and didn't even bother opening the file for reading, since there wasn't anything there.
I can imagine that your tar saw the length was 0, allocated that much (none) space using malloc(3) which dutifully handed back a pointer to a zero length buffer. Your tar read from /proc/kmsg, got a non-zero length read, and tried to store it in the zero length buffer and got a segmentation violation.
That is but one rat-hole that awaits tar in /proc. How many more are there? Dunno. Will they behave identically? Probably not. Which of the ~1000 or so files which aren't /proc/<pid> psuedo-files are going to have weird semantics? Dunno.
But perhaps the most telling question: What sense would you make of /proc/sys/vm/lowmem_reserve_ratio, will it be different next week, and will you be able to learn anything from that difference?

While the accepted answer makes a lot of sense if you want to argue the sense of doing something like this, nevertheless there is an answer that works. Here's a script to duplicate the complete /proc file system into /tmp/proc. This can then be tarred and gzipped. I used this to keep a memory of the setup and capabilities (memory, bogomips, default processes, etc.) of my trusty old file server before I replaced it with a new one.
cd /
mkdir /tmp/proc
find /proc -type f | while read F ; do
D=/tmp/$(dirname $F)
test -d $D || mkdir -p $D
test -f /tmp/$F || sudo cat $F > /tmp/$F
done
Notes:
Permissions are not preserved since I have to use cat instead of cp.
cp -a /proc /proccopy doesn't work since it crashes on "kcore" as well. mc (Midnight Commander) succeeds in creating a copy of /proc which you can then tar and gzip, but you have to dismiss thousands of "Cannot read file XYZ" errors and it too crashes on 'kcore' with a Bus Error.

A simple answer:
ls -Rd /proc/* > proc.lst
foreach item (<proc.lst>)
echo "proc_file:$item"
if (-f $item) cat $item
end
Apropos the site advisory protocol:
"But avoid ... Making statements based on opinion..."
(IMHO...based on quite a few years experience) It doesn't require much imagination to think of some good reasons for, at a given point in time, taking a snapshot of selected elements of /proc/* and storing it, or sending it somewhere. Therefore, I would question the usefulness of an 'answer' such as:
The linux /proc filesystem is actually kernel variables pretending to be a filesystem. There is nothing to save thus nothing to backup. If the system let you, you could rm -rf /proc and it would magically reappear upon the next reboot.
...on the grounds that it doesn't answer the question, makes a false assertion, and contains gratuitous information irrelevant to the question.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

btrfs ioctl: get file checksums from userspace - linux

I would like to obtain the BTRFS checksums related to the specific file, but unfortunately I have not found appropriate ioctl to perform this action. Is it possible to do? If so, how to do that? I need stored checksums to try to reduce CPU load in cases similar to rsync behaviour.

Related

Need more clarity on file command usage in linux?

Regarding lz4mt compression and linux buffering issue

Compressing the core files during core generation

Setting creation or change timestamps

How can I tarball the proc file system?

Categories

Resources