increasing depth of extent tree in ext4

increasing depth of extent tree in ext4 - linux

How can I increase the depth of an extent tree? How much space should be allocated? To be more precise I only need the depth to be more than 1.

In ext4, an inode uses 60 bytes to address 4 extents by default (12bytes for the header + 4*12bytes for the extents). Every extent can hold data up to 128MiB (by default).
If a file is larger than 4*128MiB (> 512MiB) ext4 is building an extent tree for that. So you'll reach depth = 1 with that.
Now the question is when will you reach depth=2. I tried with a file of ~700MiB and checked the extent depth which was 1. Then I opened the fs-block indexed by the extent and analysed the header.
0A F3 06 00 54 01 00 00 00 00 00 00 00
It has the typical magic number of f30A and the next two bytes should indicate the number of extents. (in this case=6 which makes sense because 700/128 = 6).
The next 2 bytes 0154 should indicate the maximum number of extents (according to the header information) which comes up to 340 in my case.
-> so why 340? If we look at the actual fs-block size of ext4 the default is 4096 bytes.
Each index consists of 12 bytes. 340*12 = 4080 + 12 for the header = 4092 so it is the max number of extent-info that can fit into this fs-block.
If I interpret this correctly, you can save up to 340 other pointers/leaves/indexes in this fs-block. Each of these 340 refer to an extent (which can hold up to 128MiB)
-> so it comes to: 340*128MiB ~ 43,5 GiB
Additionally each inode can hold up to 4 extents. So I guess this number needs to be multiplied by 4 even though.
-> so I think your file should be > 4*340*128MiB ~ 174GiB to reach depth=2
I have not yet tested this hypothesis, but I will try it soon. Meanwhile if someone can proove me wrong, I'm happy as well. :)

Related

Why doesn't the QNAME for this DNS q. end with a NULL character?

I was monitoring port 5353 (mDNS) with WireShark and came across the following DNS question:
According to section 4.1.2 of RFC 1035 QNAME is:
a domain name represented as a sequence of labels, where each label consists of a length octet followed by that number of octets. The domain name terminates with the zero length octet for the null label of the root...
This seems to contradict what I'm seeing over-the-wire in the capture above. The last label ends with c0 12 instead of 00. Why is this and how come it isn't documented in the RFC?

Apparently, when a label sequence ends with c0 12, this indicates an indirect pointer. It is roughly equivalent to stating "go to this offset in the DNS query and continue reading from there".
The first two bits are a constant (c0) and the remaining 14 bits are the offset from the start of the query. In my question, for example, c0 12 indicates that the next part of the QNAME should come from 47 bytes into the query.
05 6c 6f 63 61 6c 00 .local.

Sending image directly to Epson Projector, trouble decoding jpeg image

I have an Epson Ex5220 that doesn't have a linux driver and have been trying to work out communication through wifi. I can connect and send images captured through packet traces from a Windows machine with a driver but cannot create an acceptable image. Here is where the problem lies:
In the data send, a jpeg image is sent with a header attached like this.
00:00:00:01:00:00:00:00:02:70:01:a0:00:00:00:07:90:80:85:00
00:00:00:04 - Number of jpeg images being sent (only the first header)
00:00 - X offset
00:00 - Y offset
02:70 - Width of Jpeg image (624 in this case)
01:a0 - Height of Jpeg image (416 in this case)
00:00:00:07:90 - Unknown (I believe it's a version number perhaps)
80:85:00 - (What I'm after) Some count of data units?
Following the header is a normal jpeg image. If I strip off that header, I can view the image. Here is a screen shot of a partial capture with the 3 bytes highlighted:
I have found what seems to be a base line by setting those last three bytes to 80:85:00. Anything less and the image will not project. Also the smallest image size I can send to the projector is a 3w x 1h which correlates with my first two images show below.
Here are some examples:
1a - All white (RGB565) image 1024x768 - filesize 12915 - 4 blocks
2a - Color (RGB565) image 1024x768 - filesize 58577 - only 3 blocks
I then changed the 3 bytes to 00:b5:80 (incremented the middle one by 0x30)
1b - All white (RGB565) image 1024x768 - filesize 12915 - 22 full rows and 4 blocks.
2b - Color (RGB565) image 1024x768 - filesize 58577 - 7 rows and 22 blocks.
So it seems that the 3 bytes have something to do with data units. I've read lots of stuff about jpeg and am still digesting much of it but I think if I knew what was required to calculate data units, I'd find my mystery 3 bytes.
ADDITIONAL INFO:
The projector only supports the use of RGB565 jpeg images inside the data send.

(Posted on behalf of the OP).
I was able to solve this, but I would like to know why this works. Here is the formula for those last 3 bytes:
int iSize = bImage.length;
baHeader[17] = (byte) ((iSize) | 0x80);
baHeader[18] = (byte) ((iSize >> 7) | 0x80);
baHeader[19] = (byte) ((iSize >> 14));
I got fed up with messing with it and just look at several images, wrote down all the file sizes and the magic bytes, converted everything to binary and hammered away at ANDing ORing bitshifting until I forced a formal that worked. I would like to know if this is related to calculating jpeg data units. I'm still researching Jpeg but it's not simple stuff!

It looks like you're misinterpreting how the SOS marker works. Here are bytes you show in one of your examples:
SOS = 00 0C 03 01 00 02 11 03 11 00 3F 00 F9 FE
This erroneously has two bytes of compressed data (F9 FE) included in the SOS. The length of 12 (00 0C) includes the 2 length bytes themselves, so there are really only 10 bytes of data for this marker.
The 00 byte before the F9 FE is the "successive approximation" bits field and is used for progressive JPEG images. It's actually a pair of 4-bit fields.
The bytes that you see as varying between images are really the first 2 compressed data bytes (which encode the DC value for the first MCU).

How to properly format a raw DNS query?

I'm creating a Lua library to help process sending and receiving DNS requests and am currently reading this (DNS protocol RFC), but I'm unaware of how to properly format the request. For instance, do I have to specify how long the message? How do I do this?
I understand, from my Wireshark inspection, that I'm supposed to also include options afterwards. I also see a 0x00 in the response; does this mean that I just simply have to zero-terminate the request name before adding in values?
The section I'm specifically talking about is 4.1.3 of the RFC.
Some notes: I tested this with a personal server and got these values in the query section: 06 61 6c 69 73 73 61 04 69 6e 66 6f 00. The 00 in particular is what interests me, it's highlighted in WireShark, which means it's significant. I assume it means that the values are null-terminated? Then the options about the type and class follow?

When section 4.1.3 refers to a "NAME", it's referring back to the definition in section 3.1, which specifies that a domain name consists of a sequence of labels, each of which consist of a length specification octet and a number of octets. The final label is always the root zone, which has a zero-length name and thus consists only of a length octet with a zero in it. So, yes, the whole name is terminated with a zero octet, but it's not "zero-terminated" in the usual C string sense.
Note also that only the lower six bits in the length octets are the length data, the uppermost two bits are flags.

Understanding the ZMODEM protocol

I need to include basic file-sending and file-receiving routines in my program, and it needs to be through the ZMODEM protocol. The problem is that I'm having trouble understanding the spec.
For reference, here is the specification.
The spec doesn't define the various constants, so here's a header file from Google.
It seems to me like there are a lot of important things left undefined in that document:
It constantly refers to ZDLE-encoding, but what is it? When exactly do I use it, and when don't I use it?
After a ZFILE data frame, the file's metadata (filename, modify date, size, etc.) are transferred. This is followed by a ZCRCW block and then a block whose type is undefined according to the spec. The ZCRCW block allegedly contains a 16-bit CRC, but the spec doesn't define on what data the CRC is computed.
It doesn't define the CRC polynomial it uses. I found out by chance that the CRC32 poly is the standard CRC32, but I've had no such luck with the CRC16 poly. Nevermind, I found it through trial and error. The CRC16 poly is 0x1021.
I've looked around for reference code, but all I can find are unreadable, undocumented C files from the early 90s. I've also found this set of documents from the MSDN, but it's painfully vague and contradictory to tests that I've run: http://msdn.microsoft.com/en-us/library/ms817878.aspx (you may need to view that through Google's cache)
To illustrate my difficulties, here is a simple example. I've created a plaintext file on the server containing "Hello world!", and it's called helloworld.txt.
I initiate the transfer from the server with the following command:
sx --zmodem helloworld.txt
This prompts the server to send the following ZRQINIT frame:
2A 2A 18 42 30 30 30 30 30 30 30 30 30 30 30 30 **.B000000000000
30 30 0D 8A 11 00.Š.
Three issues with this:
Are the padding bytes (0x2A) arbitrary? Why are there two here, but in other instances there's only one, and sometimes none?
The spec doesn't mention the [CR] [LF] [XON] at the end, but the MSDN article does. Why is it there?
Why does the [LF] have bit 0x80 set?
After this, the client needs to send a ZRINIT frame. I got this from the MSDN article:
2A 2A 18 42 30 31 30 30 30 30 30 30 32 33 62 65 **.B0100000023be
35 30 0D 8A 50.Š
In addition to the [LF] 0x80 flag issue, I have two more issues:
Why isn't [XON] included this time?
Is the CRC calculated on the binary data or the ASCII hex data? If it's on the binary data I get 0x197C, and if it's on the ASCII hex data I get 0xF775; neither of these are what's actually in the frame (0xBE50). (Solved; it follows whichever mode you're using. If you're in BIN or BIN32 mode, it's the CRC of the binary data. If you're in ASCII hex mode, it's the CRC of what's represented by the ASCII hex characters.)
The server responds with a ZFILE frame:
2A 18 43 04 00 00 00 00 DD 51 A2 33 *.C.....ÝQ¢3
OK. This one makes sense. If I calculate the CRC32 of [04 00 00 00 00], I indeed get 0x33A251DD. But now we don't have ANY [CR] [LF] [XON] at the end. Why is this?
Immediately after this frame, the server also sends the file's metadata:
68 65 6C 6C 6F 77 6F 72 6C 64 2E 74 78 74 00 31 helloworld.txt.1
33 20 32 34 30 20 31 30 30 36 34 34 20 30 20 31 3 240 100644 0 1
20 31 33 00 18 6B 18 50 D3 0F F1 11 13..k.PÓ.ñ.
This doesn't even have a header, it just jumps straight to the data. OK, I can live with that. However:
We have our first mysterious ZCRCW frame: [18 6B]. How long is this frame? Where is the CRC data, and is it CRC16 or CRC32? It's not defined anywhere in the spec.
The MSDN article specifies that the [18 6B] should be followed by [00], but it isn't.
Then we have a frame with an undefined type: [18 50 D3 0F F1 11]. Is this a separate frame or is it part of ZCRCW?
The client needs to respond with a ZRPOS frame, again taken from the MSDN article:
2A 2A 18 42 30 39 30 30 30 30 30 30 30 30 61 38 **.B0900000000a8
37 63 0D 8A 7c.Š
Same issues as with the ZRINIT frame: the CRC is wrong, the [LF] has bit 0x80 set, and there's no [XON].
The server responds with a ZDATA frame:
2A 18 43 0A 00 00 00 00 BC EF 92 8C *.C.....¼ï’Œ
Same issues as ZFILE: the CRC is all fine, but where's the [CR] [LF] [XON]?
After this, the server sends the file's payload. Since this is a short example, it fits in one block (max size is 1024):
48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0A Hello world!.
From what the article seems to mention, payloads are escaped with [ZDLE]. So how do I transmit a payload byte that happens to match the value of [ZDLE]? Are there any other values like this?
The server ends with these frames:
18 68 05 DE 02 18 D0 .h.Þ..Ð
2A 18 43 0B 0D 00 00 00 D1 1E 98 43 *.C.....Ñ.˜C
I'm completely lost on the first one. The second makes as much sense as the ZRINIT and ZDATA frames.

My buddy wonders if you are implementing a time
machine.
I don't know that I can answer all of your questions -- I've never
actually had to implement zmodem myself -- but here are few answers:
From what the article seems to mention, payloads are escaped with
[ZDLE]. So how do I transmit a payload byte that happens to match the
value of [ZDLE]? Are there any other values like this?
This is explicitly addressed in the document you linked to at the
beginning of your questions, which says:
The ZDLE character is special. ZDLE represents a control sequence
of some sort. If a ZDLE character appears in binary data, it is
prefixed with ZDLE, then sent as ZDLEE.
It constantly refers to ZDLE-encoding, but what is it? When exactly
do I use it, and when don't I use it?
In the Old Days, certain "control characters" were used to control the
communication channel (hence the name). For example, sending XON/XOFF
characters might pause the transmission. ZDLE is used to escape
characters that may be problematic. According to the spec, these are
the characters that are escaped by default:
ZMODEM software escapes ZDLE, 020, 0220, 021, 0221, 023, and 0223.
If preceded by 0100 or 0300 (#), 015 and 0215 are also escaped to
protect the Telenet command escape CR-#-CR. The receiver ignores
021, 0221, 023, and 0223 characters in the data stream.
I've looked around for reference code, but all I can find are
unreadable, undocumented C files from the early 90s.
Does this include the code for the lrzsz package? This is still
widely available on most Linux distributions (and surprisingly handy
for transmitting files over an established ssh connection).
There are a number of other implementations out there, including
several in software listed on freecode, including qodem,
syncterm, MBSE, and others. I believe the syncterm
implementation is written as library that may be reasonable easy
to use from your own code (but I'm not certain).
You may find additional code if you poke around older collections of
MS-DOS software.

I can't blame you. The user manual is not organized in a user friendly way
Are the padding bytes (0x2A) arbitrary?
No, from page 14,15:
A binary header begins with the sequence ZPAD, ZDLE, ZBIN.
A hex header begins with the sequence ZPAD, ZPAD, ZDLE, ZHEX.
.
The spec doesn't mention the [CR] [LF] [XON] at the end, but the MSDN article does. Why is it there?
Page 15
* * ZDLE B TYPE F3/P0 F2/P1 F1/P2 F0/P3 CRC-1 CRC-2 CR LF XON
.
Why does the [LF] have bit 0x80 set?
Not sure. From Tera term I got both control characters XORed with 0x80 (8D 8A 11)
We have our first mysterious ZCRCW frame: [18 6B]. How long is this frame? Where is the CRC data, and is it CRC16 or CRC32? It's not defined anywhere in the spec.
The ZCRCW is not a header or a frame type, it's more like a footer that tells the receiver what to expect next. In this case it's the footer of the data subpacket containing the file name. It's going to be a 32 bit checksum because you're using a "C" type binary header.
ZDLE C TYPE F3/P0 F2/P1 F1/P2 F0/P3 CRC-1 CRC-2 CRC-3 CRC-4
.
Then we have a frame with an undefined type: [18 50 D3 0F F1 11]. Is this a separate frame or is it part of ZCRCW?
That's the CRC for the ZCRCW data subpacket. It's 5 bytes because the first one is 0x10, a control character that needs to be ZDLE escaped. I'm not sure what 0x11 is.
and there's no [XON].
XON is just for Hex headers. You don't use it for a binary header.
ZDLE A TYPE F3/P0 F2/P1 F1/P2 F0/P3 CRC-1 CRC-2
.
So how do I transmit a payload byte that happens to match the value of [ZDLE]?
18 58 (AKA ZDLEE)
18 68 05 DE 02 18 D0
That's the footer of the data subframe. The next 5 bytes are the CRC (last byte is ZDLE encoded)

The ZDLE + ZBIN (0x18 0x41) means the frame is CRC-CCITT(XMODEM 16) with Binary Data.
ZDLE + ZHEX (0x18 0x42) means CRC-CCITT(XMODEM 16) with HEX data.
The HEX data is tricky, since at first some people don't understand it. Every two bytes, the ASCII chars represent one byte in binary. For example, 0x30 0x45 0x41 0x32 means 0x30 0x45, 0x41 0x32, or in ASCII 0 E A 2, then 0x0E, 0xA2. They expand the binary two nibbles to a ASCII representation. I found in several dataloggers that some devices use lower case to represent A~F (a~f) in HEX, it doesn't matter, but on those, you will not find 0x30 0x45 0x41 0x32 (0EA2) but 0x30 0x65 0x61 0x32 (0ea2), it doesn't change a thing, just make it a little bit confuse at first.
And yes, the CRC16 for ZBIN or ZHEX is CRC-CCITT(XMODEM).
The ZDLE ZBIN32 (0x18 0x43) or ZDLE ZBINR32 (0x18 0x44) use CRC-32 calculation.
Noticing that the ZDLE and the following byte are excluded in the CRC calculation.
I am digging into the ZMODEM since I need to "talk" with some Elevators Door Boards, to program a new set of parameters at once, instead using their software to change attribute by attribute. This "talk" could be on the bench instead sitting over the elevator car roof with a notebook. Those boards talk ZMODEM, but as I don't have the packets format they expect, the board still rejecting my file transfer. The boards send 0x2a 0x2A 0x18 0x42 0x30 0x31 0x30 (x6) + crc, the Tera Terminal transfering the file in ZMODEM send to the board 0x2A 0x2A 0x18 0x42 0x30 0x30 ... + CRC, I don't know why this 00 or 01 after the 0x4B means. The PC send this and the filename and some file attributes. The board after few seconds answer with "No File received"...

How does one reclaim zeroed blocks of a sparse file?

Consider a sparse file with 1s written to a portion of the file.
I want to reclaim the actual space on disk for these 1s as I no longer need that portion of the sparse file. The portion of the file containing these 1s should become a "hole" as it was before the 1s were themselves written.
To do this, I cleared the region to 0s. This does not reclaim the blocks on disk.
How do I actually make the sparse file, well, sparse again?
This question is similar to this one but there is no accepted answer for that question.
Consider the following sequence of events run on a stock Linux server:
$ cat /tmp/test.c
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <string.h>
int main(int argc, char **argv) {
int fd;
char c[1024];
memset(c,argc==1,1024);
fd = open("test",O_CREAT|O_WRONLY,0777);
lseek(fd,10000,SEEK_SET);
write(fd,c,1024);
close(fd);
return 0;
}
$ gcc -o /tmp/test /tmp/test.c
$ /tmp/test
$ hexdump -C ./test
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00002710 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 |................|
*
00002b10
$ du -B1 test; du -B1 --apparent-size test
4096 test
11024 test
$ /tmp/test clear
$ hexdump -C ./test
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00002b10
$ du -B1 test; du -B1 --apparent-size test
4096 test
11024 test
# NO CHANGE IN SIZE.... HMM....
EDIT -
Let me further qualify that I don't want to rewrite files, copy files, etc. If it is not possible to somehow free previously allocated blocks in situ, so be it, but I'd like to determine if such is actually possible or not. It seems like "no, it is not" at this point. I suppose I'm looking for sys_punchhole for Linux (discussions of which I just stumbled upon).

It appears as if linux have added a syscall called fallocate for "punching holes" in files. The implementations in individual filesystems seem to focus on the ability to use this for pre-allocating a larger continous number of blocks.
There is also the posix_fallocate call that only focus on the latter, and is not usable for hole punching.

Right now it appears that only NTFS supports hole-punching. This has been historically a problem across most filesystems. POSIX as far as I know, does not define an OS interface to punch holes, so none of the standard Linux filesystems have support for it. NetApp supports hole punching through Windows in its WAFL filesystem. There is a nice blog post about this here.
For your problem, as others have indicated, the only solution is to move the file leaving out blocks containing zeroes. Yeah its going to be slow. Or write an extension for your filesystem on Linux that does this and submit a patch to the good folks in the Linux kernel team. ;)
Edit: Looks like XFS supports hole-punching. Check this thread.
Another really twisted option can be to use a filesystem debugger to go and punch holes in all indirect blocks which point to zeroed out blocks in your file (maybe you can script that). Then run fsck which will correct all associated block counts, collect all orphaned blocks (the zeroed out ones) and put them in the lost+found directory (you can delete them to reclaim space) and correct other properties in the filesystem. Scary, huh?
Disclaimer: Do this at your own risk. I am not responsible for any data loss you incur. ;)

Ron Yorston offers several solutions; but they all involve either mounting the FS read-only (or unmounting it) while the sparsifying takes place; or making a new sparse file, then copying across those chunks of the original that aren't just 0s, and then replacing the original file with the newly-sparsified file.
It really depends on your filesystem though. We've already seen that NTFS handles this. I imagine that any of the other filesystems Wikipedia lists as handling transparent compression would do exactly the same - this is, after all, equivalent to transparently compressing the file.

After you have "zeroed" some region of the file you have to tell to the file system that this new region is intended to be a sparse region. So in case of NTFS you have to call DeviceIoControl() for that region again. At least I do this way in my utility: "sparse_checker"
For me the bigger problem is how to unset the sparse region back :).
Regards

This way is cheap, but it works. :-P
Read in all the data past the hole you want, into memory (or another file, or whatever).
Truncate the file to the start of the hole (ftruncate is your friend).
Seek to the end of the hole.
Write the data back in.

umount your filesystem and edit filesystem directly in way similar debugfs or fsck. usually you need driver for each used fs.

Seems like writing zeros (as in the referenced question) to the part you're done with is a logical thing to try. Here a link to an MSDN question for NTFS sparse files that does just that to "release" the "unused" part. YMMV.
http://msdn.microsoft.com/en-us/library/ms810500.aspx

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string