Problem, related to Zip4 extension, directing Standard Input Content into a zip archive, using zip, and using zipnotes to change the zipped file name

Problem, related to Zip4 extension, directing Standard Input Content into a zip archive, using zip, and using zipnotes to change the zipped file name - linux

I have an annoying problem with zip and zipnote programs (both in 3.0 version) in my Debian stable platform.
I wish to create a zip archive storing (not compressing) data from standard input, without extra attributes/fields, and giving a name to the resulting file inside the zip file.
My first try was
printf "foodata" | zip -X0 bar.zip -
printf "# -\n#=foofile\n" | zipnote -w bar.zip
where zip create a bar.zip archive, with a stored file "-" containing "foodata", and zipnote rename the file from "-" to "foofile".
First problem (solved): zip, as we can see from zipdetails
001E Filename '-'
001F Extra ID #0001 0001 'ZIP64'
0021 Length 0010
0023 Uncompressed Size 0000000000000007
002B Compressed Size 0000000000000007
receiving data from standard input, doesn't know the size of the resulting file so create a PKZIP 4.5 compatible zip archive (that can exceed 4 GB) using Zip64 extension and adding a Zip64 extra attribute to the file.
And the -X option remove extrafile attributes but doesn't remove the Zip64 extra field.
This problem is easily solvable adding the -fz- option, as stated in zip man page
// .................................VVVV
printf "foodata" | zip -X0 -fz- bar.zip -
Now bar.zip is a PKZIP 2 compatible file and there isn't the Zip64 extra field.
Second problem (not solved): zipnote change the name of the contained file and add the Zip64 field to the file.
I don't know why.
According the zip man page
zip removes the Zip64 extensions if not needed when archive entries are copied (see the -U (--copy) option).
So I understand that
zip bar.zip --out bar-corrected.zip
should create a new bar-corrected.zip archive where the file foofile isZip64free (thefoofileis very short so theZip64` extension isn't needed, I presume).
Unfortunately, this doesn't works: I get the warning
copying: foofile
zip warning: Local Version Needed To Extract does not match CD: foofile
and the resulting file maintain the Zip64 extension.
And seems that doesn't works explicating the filename or adding the -fz- option: I've tried a lot o combinations but (maybe is my fault) without success.
Questions:
(1) can I avoid (and how) that zipnote, changing the name of a file, add the Zip64 fields to it?
(2) otherwise, how can I use zip (with --copy? with -fz-?) to create a new zip archive Zip64 extension free?

[Edit: Updated to use Store rather than Deflate]
Not sure how to achieve what you want with zip and zipnote, but here is an alternative.
echo abc | perl -MIO::Compress::Zip=zip -e ' zip "-" => "out.zip", Method => 0, Name => "member.txt" '
$ unzip -lv out.zip
Archive: out.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
4 Stored 4 0% 2019-10-10 21:54 4788814e member.txt
-------- ------- --- -------
4 4 0% 1 file
No Zip64 or extra attributes are present in the zip file.
$ zipdetails out.zip
0000 LOCAL HEADER #1 04034B50
0004 Extract Zip Spec 14 '2.0'
0005 Extract OS 00 'MS-DOS'
0006 General Purpose Flag 0008
[Bit 3] 1 'Streamed'
0008 Compression Method 0000 'Stored'
000A Last Mod Time 4F4AAECA 'Thu Oct 10 21:54:20 2019'
000E CRC 00000000
0012 Compressed Length 00000000
0016 Uncompressed Length 00000000
001A Filename Length 000A
001C Extra Length 0000
001E Filename 'member.txt'
0028 PAYLOAD abc.
002C STREAMING DATA HEADER 08074B50
0030 CRC 4788814E
0034 Compressed Length 00000004
0038 Uncompressed Length 00000004
003C CENTRAL HEADER #1 02014B50
0040 Created Zip Spec 14 '2.0'
0041 Created OS 03 'Unix'
0042 Extract Zip Spec 14 '2.0'
0043 Extract OS 00 'MS-DOS'
0044 General Purpose Flag 0008
[Bit 3] 1 'Streamed'
0046 Compression Method 0000 'Stored'
0048 Last Mod Time 4F4AAECA 'Thu Oct 10 21:54:20 2019'
004C CRC 4788814E
0050 Compressed Length 00000004
0054 Uncompressed Length 00000004
0058 Filename Length 000A
005A Extra Length 0000
005C Comment Length 0000
005E Disk Start 0000
0060 Int File Attributes 0000
[Bit 0] 0 'Binary Data'
0062 Ext File Attributes 81A40000
0066 Local Header Offset 00000000
006A Filename 'member.txt'
0074 END CENTRAL HEADER 06054B50
0078 Number of this disk 0000
007A Central Dir Disk no 0000
007C Entries in this disk 0001
007E Total Entries 0001
0080 Size of Central Dir 00000038
0084 Offset to Central Dir 0000003C
0088 Comment Length 0000
Done

Created a simple script that wraps the previous answer. See streamzip
Usage is
printf "foodata" | streamzip -method=store -member-name=foofile -zipfile=/tmp/bar.zip
This is what unzip thinks is in the zip file
unzip -lv /tmp/bar.zip
Archive: /tmp/bar.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
7 Stored 7 0% 2019-10-15 20:25 1b7dd7cd foofile
-------- ------- --- -------
7 7 0% 1 file

Related

build-id data offset in the ELF file

I need to modify the build-id of the ELF notes section. I found out that it is possible here. Also found out that I can do it by modifying this code. What I can't figure out is data location. Here is what I'm talking about.
$ eu-readelf -S myelffile
Section Headers:
[Nr] Name Type Addr Off Size ES Flags Lk Inf Al
...
[ 2] .note.ABI-tag NOTE 000000000000028c 0000028c 00000020 0 A 0 0 4
[ 3] .note.gnu.build-id NOTE 00000000000002ac 000002ac 00000024 0 A 0 0 4
...
$ eu-readelf -n myelffile
Note section [ 2] '.note.ABI-tag' of 32 bytes at offset 0x28c:
Owner Data size Type
GNU 16 GNU_ABI_TAG
OS: Linux, ABI: 3.14.0
Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x2ac:
Owner Data size Type
GNU 20 GNU_BUILD_ID
Build ID: d75a086c288c582036b0562908304bc3a8033235
.note.gnu.build-id section is 36 bytes. The build id is 20 bytes. What are the other 16 bytes?
I played with the code a bit and read 36 bytes of myelffile at offset 0x2ac. Got the following 040000001400000003000000474e5500d75a086c288c582036b0562908304bc3a8033235.
Then I decided to use Elf64_Shdr definition, so I read data at address 0x2ac + sizeof(Elf64_Shdr.sh_name) + sizeof(Elf64_Shdr.sh_type) + sizeof(Elf64_Shdr.sh_flags) and I got my build id, d75a086c288c582036b0562908304bc3a8033235. It does makes sense why I got it, sizeof(Elf64_Shdr.sh_name) + sizeof(Elf64_Shdr.sh_type) + sizeof(Elf64_Shdr.sh_flags) = 16 bytes, but according to Elf64_Shdr definition I should be pointing to Elf64_Addr sh_addr, i.e. section virtual address.
So what is not clear to me is what are the other 16 bytes of the section? What do they represent? I can't reconcile the Elf64_Shdr definition and the results I'm getting from my experiments.

.note.gnu.build-id section is 36 bytes. The build id is 20 bytes. What are the other 16 bytes?
Each .note.* section starts with Elf64_Nhdr (12 bytes), followed by (4-byte aligned) note name of variable size (GNU\0 here), followed by (4-byte aligned) actual note data. Documentation.
Looking at /bin/date on my system:
eu-readelf -Wn /bin/date
Note section [ 2] '.note.ABI-tag' of 32 bytes at offset 0x2c4:
Owner Data size Type
GNU 16 GNU_ABI_TAG
OS: Linux, ABI: 3.2.0
Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x2e4:
Owner Data size Type
GNU 20 GNU_BUILD_ID
Build ID: 979ae4616ae71af565b123da2f994f4261748cc9
What are the bytes at offset 0x2e4?
dd bs=1 skip=$((0x2e4)) count=36 < /bin/date | xxd
00000000: 0400 0000 1400 0000 0300 0000 474e 5500 ............GNU.
00000010: 979a e461 6ae7 1af5 65b1 23da 2f99 4f42 ...aj...e.#./.OB
00000020: 6174 8cc9 at..
So we have: .n_namesz == 4, .n_descsz == 20, .n_type == 3 == NT_GNU_BUILD_ID, followed by 4-byte GNU\0 note name, followed by 20 bytes of actual build-id bytes 0x97, 0x9a, etc.

UTF-16 in Python taking 4 bytes [duplicate]

This question already has answers here:
Is the [0xff, 0xfe] prefix required on utf-16 encoded strings?
(5 answers)
Closed 4 years ago.
I am playing with Unicode encoding using Python-3 and noticed a behavior which I am not able to understand.
Following cases work as expected :-
x = "A"
fo = open("test.txt","w",encoding="utf_8")
fo.write(x)
fo.close()
xxd -b test.txt
00000000: 01000001 (1 Byte as expected)
x = "A"
fo = open("test.txt","w",encoding="utf_16_le")
fo.write(x)
fo.close()
xxd -b test.txt
00000000: 01000001 00000000 (2 Bytes as expected)
x = "A"
fo = open("test.txt","w",encoding="utf_16_be")
fo.write(x)
fo.close()
xxd -b test.txt
00000000: 00000000 01000001 (2 Bytes as expected)
Why 4 Bytes with utf_16 encoding? :-
My understanding is that UTF-16 is a variable length character encoding that uses either 16-bit or 32-bit depending on the character. For the charcater A, it should use only 16-bits. Can someone please help me understand this behavior?
x = "A"
fo = open("test.txt","w",encoding="utf_16")
fo.write(x)
fo.close()
xxd -b test.txt
00000000: 11111111 11111110 01000001 00000000

The first two bytes are the UTF 16 byte order marker (BOM). See http://www.unicode.org/faq/utf_bom.html

Place .text section at the very end of ELF

For a specific reason i need to place .text section at the very end of my ELF file.
I've tried to achieve this in this way:
I took default large linker script and moved .text section to the very end of SECTIONS { ... } part.
$ readelf -S beronew
[ #] Name Type Address Offset
Size Size.Ent Flags - - Alignment
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .data PROGBITS 00000000006000b0 000000b0
000000000000003b 0000000000000000 WA 0 0 1
[ 2] .text PROGBITS 0000000000a000f0 000000f0
00000000000003e9 0000000000000000 AX 0 0 1
[ 3] .shstrtab STRTAB 0000000000000000 000004d9
0000000000000027 0000000000000000 0 0 1
[ 4] .symtab SYMTAB 0000000000000000 00000680
0000000000000438 0000000000000018 5 41 8
[ 5] .strtab STRTAB 0000000000000000 00000ab8
0000000000000258 0000000000000000 0 0 1
What i see is that ld added extra sections after my "ending" section. To replace them i used -nostdlib -s linker option (to not use stdlib (just in case) and omit all symbol information).
Run $ readelf -S beronew one more time:
[ #] Name Type Address Offset
Size Size.Ent Flags - - Alignment
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .data PROGBITS 00000000006000b0 000000b0
000000000000003b 0000000000000000 WA 0 0 1
[ 2] .text PROGBITS 0000000000a000f0 000000f0
00000000000003e9 0000000000000000 AX 0 0 1
[ 3] .shstrtab STRTAB 0000000000000000 000004d9
0000000000000017 0000000000000000 0 0 1
Section header string table is still there. I've tried $strip -R .shstrtab beronew. It had no effect, section is still there.
This section is only 0x17 bytes long but i couldn't achieve my goal. Then i looked at hexdump of my file:
$ hexdump beronew
...
00004d0 0060 c748 bfc6 6000 0000 732e 7368 7274
00004e0 6174 0062 642e 7461 0061 742e 7865 0074
00004f0 0000 0000 0000 0000 0000 0000 0000 0000
*
0000530 000b 0000 0001 0000 0003 0000 0000 0000
0000540 00b0 0060 0000 0000 00b0 0000 0000 0000
0000550 003b 0000 0000 0000 0000 0000 0000 0000
...
What i see is that there is another code after sections part. According to ELF structure it is Section header table at the end of file. So even if i remove somehow .shstrtab section, there still will be this header at the end.
So my question is how can i place my .text section at the very end of file? I don't really need to remove all the sections and headers so if You know a (better) way to achieve this, it will be highly appreciated.
.
.
P.S. For those who wonder why do i need this:
This ELF file (beronew) contains rutime library. It will be used as "header" for another file that generates asm instructions with some logic in opcode form. This opcode will be added to the very end of beronew. Then i'm gonna patch sh_size field in .text's section header to be able to run my recently added 'code'.
(One more question: Is this all that i need to patch in case 'text' section is the last one in file?)
P.P.S. I know that this is a bad architecture but it is my course project - porting an app that was built this way in Win32 to Linux64, and now I'm stuck at the point where i merge runtime library "header" file and "logic" part because i can't place .text section at the end of ELF.
Thanks one more time!
.
UPD:
Based on fuz's comment i've tried to add PHDRS to simple linker script as this:
PHDRS
{
headers PT_PHDR PHDRS ;
data PT_LOAD ;
bss PT_LOAD ;
text PT_LOAD ;
}
SECTIONS
{
. = 0x200000;
.data : { *(.data) *(COMMON) } :data
.bss : { *(.bss) } :bss
.text : { *(.text) } :text
}
but it doesn't seem to work now.

For someone who wonder how after all I've managed to get this working there is an answer:
I've made a linker script that placed text section after the data section (here is the script). But there were some debugging sections at the end of file like Shstrtab section and so on (picture below). So I've converted this file byte-by-byte into string.
After this I got an elf on a picture below
in a string-of-bytes form. Then I've just read some headers and found out where the section of code ends (right before the Shstrtab section), so I could split this string into 2 pieces. First one contained data for loader. Second one - for linker.
Then I've converted my 'extra' code into opcode form which is also an array (string) of bytes and concatenated it to the original .text section. Next I've concatenated the ending part to it. So I've got a single file with my extra code in it.
To get this working I've edited values from picture below:
First column is a name of field that needs to be edited (it corresponds to picture of elf structure).
Second column is an offset from the beginning of file to the beginning of the field. Let function s(xxx) be the the size_of(some_header_structure) and injSize be the size of my injected extra code. And values like 0x18, 0x20, 0x28 are the offsets of fields inside their strucures (section_headers, program_headers, elf_headers).
Third one represents the value that should replace that original one.
*note that represented elf is and ELF64 so widths of some fields differ from ELF32's ones.
After I've made all this I've managed to just run this new Elf file and it worked perfectly! Maybe (for sure) it's not the best solution but it works and it was a nice meterial for my research work.

Reverse engineer firmware image and rebuild Linux kernel for TI-AR7

I am trying to build my own Linux derivative to run on an TI-AR7 board. I took the board from an old Telekom Speedport W 501V router. To understand how firmware is flashed onto the device I have downloaded the most recent official firmware. Using the Linux file command I determined the image is a tar archive, which can be extracted easily.
ubuntu#ip-172-31-23-210:~/reverse$ ls
fw_speedport_w501v_v_28.04.38.image
ubuntu#ip-172-31-23-210:~/reverse$ file fw*
fw_speedport_w501v_v_28.04.38.image: POSIX tar archive (GNU)
ubuntu#ip-172-31-23-210:~/reverse$ tar -xvf fw*
./var/
./var/tmp/
./var/tmp/kernel.image
./var/tmp/filesystem.image
./var/flash_update.ko
./var/flash_update.o
./var/info.txt
./var/install
./var/chksum
./var/regelex
./var/signature
ubuntu#ip-172-31-23-210:~/reverse$
According to a wiki (Firmware-Image) that I have found, ./var/tmp/kernel.image contains the actual firmware. During the update process this image is written to the mtd1 device. As stated in the wiki (LZMA-Kernel) the lzma compressed kernel starts with the magic number 0xfeed1281. A hexdump of kernel.image contains that number at its beginning.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ hexdump -n 4 kernel.image
0000000 1281 feed
0000004
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$
The following script given on the last wiki entry should decompress the kernel.
#! /usr/bin/perl
use Compress::unLZMA;
use Archive::Zip;
open INPUT, "<$ARGV[0]" or die "can't open $ARGV[0]: $!";
read INPUT, $buf, 4;
$magic = unpack("V", $buf);
if ($magic != 0xfeed1281) {
die "bad magic";
}
read INPUT, $buf, 4;
$len = unpack("V", $buf);
read INPUT, $buf, 4*2; # address, unknown
read INPUT, $buf, 4;
$clen = unpack("V", $buf);
read INPUT, $buf, 4;
$dlen = unpack("V", $buf);
read INPUT, $buf, 4;
$cksum = unpack("V", $buf);
printf "Archive checksum: 0x%08x\n", $cksum;
read INPUT, $buf, 1+4; # properties, dictionary size
read INPUT, $dummy, 3; # alignment
$buf .= pack('VV', $dlen, 0); # 8 bytes of real size
#$buf .= pack('VV', -1, -1); # 8 bytes of real size
read INPUT, $buf2, $clen;
$crc = Archive::Zip::computeCRC32($buf2);
printf "Input CRC32: 0x%08x\n", $crc;
if ($cksum != $crc) {
die "wrong checksum";
}
$buf .= $buf2;
$data = Compress::unLZMA::uncompress($buf);
unless (defined $data) {
die "uncompress: $#";
}
open OUTPUT, ">$ARGV[1]" or die "can't write $ARGV[1]";
print OUTPUT $data;
#truncate OUTPUT, $dlen;
To use the script you may need to install Compress::unLZMA and Archive::Zip perl modules.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ tar -xvf Compress*
Compress-unLZMA-0.04/
Compress-unLZMA-0.04/Makefile.PL
Compress-unLZMA-0.04/ppport.h
Compress-unLZMA-0.04/Changes
Compress-unLZMA-0.04/lzma_sdk/
[...]
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ cd Compress*
ubuntu#ip-172-31-23-210:~/reverse/var/tmp/Compress-unLZMA-0.04$ perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for Compress::unLZMA
Writing MYMETA.yml and MYMETA.json
ubuntu#ip-172-31-23-210:~/reverse/var/tmp/Compress-unLZMA-0.04$ make
cp lib/Compress/unLZMA.pm blib/lib/Compress/unLZMA.pm
/usr/bin/perl /usr/share/perl/5.18/ExtUtils/xsubpp -typemap /usr/share/perl/5.18/ExtUtils/typemap unLZMA.xs > unLZMA.xsc && mv unLZMA.xsc unLZMA.c
cc -c -I. -Ilzma_sdk/Source -D_REENTRANT -D_GNU_SOURCE
[...]
ubuntu#ip-172-31-23-210:~/reverse/var/tmp/Compress-unLZMA-0.04$ sudo make install
Files found in blib/arch: installing files in blib/lib into architecture dependent library tree
Installing /usr/local/lib/perl/5.18.2/auto/Compress/unLZMA/unLZMA.bs
Installing /usr/local/lib/perl/5.18.2/auto/Compress/unLZMA/unLZMA.so
Installing /usr/local/lib/perl/5.18.2/Compress/unLZMA.pm
Installing /usr/local/man/man3/Compress::unLZMA.3pm
Appending installation info to /usr/local/lib/perl/5.18.2/perllocal.pod
ubuntu#ip-172-31-23-210:~/reverse/var/tmp/Compress-unLZMA-0.04$ # same for Archive::Zip module
After installing these dependencies the script decompressed the kernel successfully.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ ./decompress.pl kernel.image kernel.decompressed
Archive checksum: 0x29176e12
Input CRC32: 0x29176e12
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$
But what kind of file is kernel.decompressed and how do I generate a similar file from my Linux kernel source? I continued analyzing it using file and binwalk.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ file kernel.decompressed
kernel.decompressed: data
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ binwalk kernel.decompressed
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
1509632 0x170900 Linux kernel version "2.6.13.1-ohio (686) (gcc version 3.4.6) #9 Wed Apr 4 13:48:08 CEST 2007"
1516240 0x1722D0 CRC32 polynomial table, little endian
1517535 0x1727DF Copyright string: "Copyright 1995-1998 Mark Adler "
1549488 0x17A4B0 Unix path: /usr/gnemul/irix/
1550920 0x17AA48 Unix path: /usr/lib/libc.so.1
1618031 0x18B06F Neighborly text, "neighbor %.2x%.2x.%.2x:%.2x:%.2x:%.2x:%.2x:%.2x lost on port %d(%s)(%s)"
1966080 0x1E0000 gzip compressed data, maximum compression, from Unix, last modified: 2007-04-04 11:45:13
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$
So the Linux kernel starts at 1509632 and ends at 1516240. What kind of data is stored in front the Linux kernel (0 to 1509632)? I extracted the kernel and that piece of unknown data using dd.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ dd if=kernel.decompressed of=unknown.data bs=1 count=1509632
1509632+0 records in
1509632+0 records out
1509632 bytes (1.5 MB) copied, 1.62137 s, 931 kB/s
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ dd if=kernel.decompressed of=kernel bs=1 skip=1509632 count=6608
6608+0 records in
6608+0 records out
6608 bytes (6.6 kB) copied, 0.0072771 s, 908 kB/s
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$
I need to ask again: What kind of file is kernel and how do I generate a similar file from my Linux kernel source? I used xxd and strings to look at the file more closely.
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ xxd -l 100 kernel
0000000: 4c69 6e75 7820 7665 7273 696f 6e20 322e Linux version 2.
0000010: 362e 3133 2e31 2d6f 6869 6f20 2836 3836 6.13.1-ohio (686
0000020: 2920 2867 6363 2076 6572 7369 6f6e 2033 ) (gcc version 3
0000030: 2e34 2e36 2920 2339 2057 6564 2041 7072 .4.6) #9 Wed Apr
0000040: 2034 2031 333a 3438 3a30 3820 4345 5354 4 13:48:08 CEST
0000050: 2032 3030 370a 0000 0000 0000 0000 0000 2007...........
0000060: 0000 0000 ....
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$ strings kernel
Linux version 2.6.13.1-ohio (686) (gcc version 3.4.6) #9 Wed Apr 4 13:48:08 CEST 2007
do_be
do_bp
do_tr
do_ri
do_cpu
nmi_exception_handler
do_ade
emulate_load_store_insn
do_page_fault
context_switch
__put_task_struct
do_exit
local_bh_enable
run_workqueue
2.6.13.1-ohio gcc-3.4
enable_irq
__free_pages_ok
free_hot_cold_page
prep_new_page
kmem_cache_destroy
kmem_cache_create
pageout
vunmap_pte_range
vmap_pte_range
__vunmap
__brelse
sync_dirty_buffer
bio_endio
queue_kicked_iocb
proc_get_inode
remove_proc_entry
sysfs_get
sysfs_fill_super
kref_get
kref_put
0123456789abcdefghijklmnopqrstuvwxyz
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
vsnprintf
{zt^f
pw0Gm
0cIZ-
68BG+
QC]S%
v,;Zk
ubuntu#ip-172-31-23-210:~/reverse/var/tmp$
This Github repository contains the extracted files to use for further analysis.

Why using conv=notrunc when cloning a disk with dd?

If you look up how to clone an entire disk to another one on the web, you will find something like that:
dd if=/dev/sda of=/dev/sdb conv=notrunc,noerror
While I understand the noerror, I am getting a hard time understanding why people think that notrunc is required for "data integrity" (as ArchLinux's Wiki states, for instance).
Indeed, I do agree on that if you are copying a partition to another partition on another disk, and you do not want to overwrite the entire disk, just one partition. In thise case notrunc, according to dd's manual page, is what you want.
But if you're cloning an entire disk, what does notrunc change for you? Just time optimization?

TL;DR version:
notrunc is only important to prevent truncation when writing into a file. This has no effect on a block device such as sda or sdb.
Educational version
I looked into the coreutils source code which contains dd.c to see how notrunc is processed.
Here's the segment of code that I'm looking at:
int opts = (output_flags
| (conversions_mask & C_NOCREAT ? 0 : O_CREAT)
| (conversions_mask & C_EXCL ? O_EXCL : 0)
| (seek_records || (conversions_mask & C_NOTRUNC) ? 0 : O_TRUNC));
/* Open the output file with *read* access only if we might
need to read to satisfy a `seek=' request. If we can't read
the file, go ahead with write-only access; it might work. */
if ((! seek_records
|| fd_reopen (STDOUT_FILENO, output_file, O_RDWR | opts, perms) < 0)
&& (fd_reopen (STDOUT_FILENO, output_file, O_WRONLY | opts, perms) < 0))
error (EXIT_FAILURE, errno, _("opening %s"), quote (output_file));
We can see here that if notrunc is not specified, then the output file will be opened with O_TRUNC. Looking below at how O_TRUNC is treated, we can see that a normal file will get truncated if written into.
O_TRUNC
If the file already exists and is a regular file and the open
mode allows writing (i.e., is O_RDWR or O_WRONLY) it will be truncated
to length 0. If the file is a FIFO or terminal device file, the
O_TRUNC flag is ignored. Otherwise the effect of O_TRUNC is
unspecified.
Effects of notrunc / O_TRUNC I
In the following example, we start out by creating junk.txt of size 1024 bytes. Next, we write 512 bytes to the beginning of it with conv=notrunc. We can see that the size stays the same at 1024 bytes. Finally, we try it without the notrunc option and we can see that the new file size is 512. This is because it was opened with O_TRUNC.
$ dd if=/dev/urandom of=junk.txt bs=1024 count=1
$ ls -l junk.txt
-rw-rw-r-- 1 akyserr akyserr 1024 Dec 11 17:08 junk.txt
$ dd if=/dev/urandom of=junk.txt bs=512 count=1 conv=notrunc
$ ls -l junk.txt
-rw-rw-r-- 1 akyserr akyserr 1024 Dec 11 17:10 junk.txt
$ dd if=/dev/urandom of=junk.txt bs=512 count=1
$ ls -l junk.txt
-rw-rw-r-- 1 akyserr akyserr 512 Dec 11 17:10 junk.txt
Effects of notrunc / O_TRUNC II
I still haven't answered your original question of why when doing a disk-to-disk clone, why conv=notrunc is important. According to the above definition, O_TRUNC seems to be ignored when opening certain special files, and I would expect this to be true for block device nodes too. However, I don't want to assume anything and will attempt to prove it here.
openclose.c
I've written a simple C program here which opens and closes a file given as an argument with the O_TRUNC flag.
#include <stdio.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <time.h>
int main(int argc, char * argv[])
{
if (argc < 2)
{
fprintf(stderr, "Not enough arguments...\n");
return (1);
}
int f = open(argv[1], O_RDWR | O_TRUNC);
if (f >= 0)
{
fprintf(stderr, "%s was opened\n", argv[1]);
close(f);
fprintf(stderr, "%s was closed\n", argv[1]);
} else {
perror("Opening device node");
}
return (0);
}
Normal File Test
We can see below that the simple act of opening and closing a file with O_TRUNC will cause it to lose anything that was already there.
$ dd if=/dev/urandom of=junk.txt bs=1024 count=1^C
$ ls -l junk.txt
-rw-rw-r-- 1 akyserr akyserr 1024 Dec 11 17:26 junk.txt
$ ./openclose junk.txt
junk.txt was opened
junk.txt was closed
$ ls -l junk.txt
-rw-rw-r-- 1 akyserr akyserr 0 Dec 11 17:27 junk.txt
Block Device File Test
Let's try a similar test on a USB flash drive. We can see that we start out with a single partition on the USB flash drive. If it get's 'truncated', perhaps the partition will go away (considering it's defined in the first 512 bytes of the disk)?
$ ls -l /dev/sdc*
brw-rw---- 1 root disk 8, 32 Dec 11 17:22 /dev/sdc
brw-rw---- 1 root disk 8, 33 Dec 11 17:22 /dev/sdc1
$ sudo ./openclose /dev/sdc
/dev/sdc was opened
/dev/sdc was closed
$ sudo ./openclose /dev/sdc1
/dev/sdc1 was opened
/dev/sdc1 was closed
$ ls -l /dev/sdc*
brw-rw---- 1 root disk 8, 32 Dec 11 17:31 /dev/sdc
brw-rw---- 1 root disk 8, 33 Dec 11 17:31 /dev/sdc1
It looks like it has no affect whatsoever to open either the disk or the disk's partition 1 with the O_TRUNC option. From what I can tell, the filesystem is still mountable and the files are accessible and intact.
Effects of notrunc / O_TRUNC III
Okay, for my final test I will use dd on my flash drive directly. I will start by writing 512 bytes of random data, then writing 256 bytes of zeros at the beginning. For the final test, we will verify that the last 256 bytes remained unchanged.
$ sudo dd if=/dev/urandom of=/dev/sdc bs=256 count=2
$ sudo hexdump -n 512 /dev/sdc
0000000 3fb6 d17f 8824 a24d 40a5 2db3 2319 ac5b
0000010 c659 5780 2d04 3c4e f985 053c 4b3d 3eba
0000020 0be9 8105 cec4 d6fb 5825 a8e5 ec58 a38e
0000030 d736 3d47 d8d3 9067 8db8 25fb 44da af0f
0000040 add7 c0f2 fc11 d734 8e26 00c6 cfbb b725
0000050 8ff7 3e79 af97 2676 b9af 1c0d fc34 5eb1
0000060 6ede 318c 6f9f 1fea d200 39fe 4591 2ffb
0000070 0464 9637 ccc5 dfcc 3b0f 5432 cdc3 5d3c
0000080 01a9 7408 a10a c3c4 caba 270c 60d0 d2f7
0000090 2f8d a402 f91a a261 587b 5609 1260 a2fc
00000a0 4205 0076 f08b b41b 4738 aa12 8008 053f
00000b0 26f0 2e08 865e 0e6a c87e fc1c 7ef6 94c6
00000c0 9ced 37cf b2e7 e7ef 1f26 0872 cd72 54a4
00000d0 3e56 e0e1 bd88 f85b 9002 c269 bfaa 64f7
00000e0 08b9 5957 aad6 a76c 5e37 7e8a f5fc d066
00000f0 8f51 e0a1 2d69 0a8e 08a9 0ecf cee5 880c
0000100 3835 ef79 0998 323d 3d4f d76b 8434 6f20
0000110 534c a847 e1e2 778c 776b 19d4 c5f1 28ab
0000120 a7dc 75ea 8a8b 032a c9d4 fa08 268f 95e8
0000130 7ff3 3cd7 0c12 4943 fd23 33f9 fe5a 98d9
0000140 aa6d 3d89 c8b4 abec 187f 5985 8e0f 58d1
0000150 8439 b539 9a45 1c13 68c2 a43c 48d2 3d1e
0000160 02ec 24a5 e016 4c2d 27be 23ee 8eee 958e
0000170 dd48 b5a1 10f1 bf8e 1391 9355 1b61 6ffa
0000180 fd37 7718 aa80 20ff 6634 9213 0be1 f85e
0000190 a77f 4238 e04d 9b64 d231 aee8 90b6 5c7f
00001a0 5088 2a3e 0201 7108 8623 b98a e962 0860
00001b0 c0eb 21b7 53c6 31de f042 ac80 20ee 94dd
00001c0 b86c f50d 55bc 32db 9920 fd74 a21e 911a
00001d0 f7db 82c2 4d16 3786 3e18 2c0f 47c2 ebb0
00001e0 75af 6a8c 2e80 c5b6 e4ea a9bc a494 7d47
00001f0 f493 8b58 0765 44c5 ff01 42a3 b153 d395
$ sudo dd if=/dev/zero of=/dev/sdc bs=256 count=1
$ sudo hexdump -n 512 /dev/sdc
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000100 3835 ef79 0998 323d 3d4f d76b 8434 6f20
0000110 534c a847 e1e2 778c 776b 19d4 c5f1 28ab
0000120 a7dc 75ea 8a8b 032a c9d4 fa08 268f 95e8
0000130 7ff3 3cd7 0c12 4943 fd23 33f9 fe5a 98d9
0000140 aa6d 3d89 c8b4 abec 187f 5985 8e0f 58d1
0000150 8439 b539 9a45 1c13 68c2 a43c 48d2 3d1e
0000160 02ec 24a5 e016 4c2d 27be 23ee 8eee 958e
0000170 dd48 b5a1 10f1 bf8e 1391 9355 1b61 6ffa
0000180 fd37 7718 aa80 20ff 6634 9213 0be1 f85e
0000190 a77f 4238 e04d 9b64 d231 aee8 90b6 5c7f
00001a0 5088 2a3e 0201 7108 8623 b98a e962 0860
00001b0 c0eb 21b7 53c6 31de f042 ac80 20ee 94dd
00001c0 b86c f50d 55bc 32db 9920 fd74 a21e 911a
00001d0 f7db 82c2 4d16 3786 3e18 2c0f 47c2 ebb0
00001e0 75af 6a8c 2e80 c5b6 e4ea a9bc a494 7d47
00001f0 f493 8b58 0765 44c5 ff01 42a3 b153 d395
Summary
Through the above experimentation, it seems that notrunc is only important for when you have a file you want to write into, but don't want to truncate it. This seems to have no effect on a block device such as sda or sdb.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string