EDIT:
Not sure if that's a problem or not, but here is something I've noticed about:
The rx_skbuffers are allocated in two places: once when the driver is been initialized, it calls __netdev_alloc_skb_ip_align with GPF_KERNEL, and the second time if the rx_skbuff is already freed it calls netdev_alloc_skb_ip_align (which is internally uses GPF_ATOMIC).
Shouldn't these skb allocations be called with GPF_DMA ?
==========================================================================
I'm having issues with data corruption in an Ethernet driver (for ST MAC 10/100/1000) I'm working with. The driver runs on Allwinner A20 (ARM Cortex-A7).
Some details:
The driver holds a ring of Rx sk_buffers (allocated with __netdev_alloc_skb_ip_align).
The data (rx_skbuff[i]->data) of each of the sk_buffers is mapped to the DMA using dma_map_single.
The mapping has been succeed (verified with dma_mapping_error).
The problem:
After a while (minutes, hours, days... very random), kernel panics due to data corruption.
Debugging (EDITED):
After digging a bit more, I found out that sometimes, after a while, one of the sk_buffer structure is been corrupted, and this may lead the program to do things it should not and thus cause the kernel to panic.
After some more digging, I've found out that the corruption occurs after skb_copy_to_linear_data (which is just the same as memcpy). Keep in mind that this corruption doesn't occur after each call of skb_copy_to_linear_data, but when the corruption does occurs, it is always after the call to skb_copy_to_linear_data.
When the corruption occurs, it doesn't happens on the rx_q->rx_skbuff of the current entry (rx_q->rx_skbuff[entry]). For example, if we perform the skb_copy_to_linear_data on rx_q->rx_skbuff[X], the corrupted sk_buff structure will be rx_q->rx_skbuff[Y] (where X is not equal Y).
It seems that the physical address of the skb->data (that has been allocated right before the skb_copy_to_linear_data call), has the same physical address of rx_q->rx_skbuff[Y]->end. First thing I thought is that maybe the driver doesn't know rx_q->rx_skbuff[Y] has been released, but when this collision occurs I see the rx_q->rx_skbuff[Y]->users is 1.
How could it be, Any ideas ?
Not sure if that's a problem or not, but here is something I've noticed about:
The rx_skbuffers are allocated in two places: once when the driver is been initialized, it calls __netdev_alloc_skb_ip_align with GPF_KERNEL, and the second time if the rx_skbuff is already freed it calls netdev_alloc_skb_ip_align (which is internally uses GPF_ATOMIC). Shouldn't these skb allocations be called with GPF_DMA ?
Code:
Here is part of the code where the corruption occurs.
The full code of the driver is from linux kernel mainline 4.19, and it can be found here.
I've paste here only the part between lines 3451-3474.
Does anyone find here a wrong behavior regarning the use of the DMA-API ?
skb = netdev_alloc_skb_ip_align(priv->dev,
frame_len);
if (unlikely(!skb)) {
if (net_ratelimit())
dev_warn(priv->device,
"packet dropped\n");
priv->dev->stats.rx_dropped++;
continue;
}
dma_sync_single_for_cpu(priv->device,
rx_q->rx_skbuff_dma
[entry], frame_len,
DMA_FROM_DEVICE);
// Here I check if data has been corrupted (the answer is ALWAYS NO).
debug_check_data_corruption();
skb_copy_to_linear_data(skb,
rx_q->
rx_skbuff[entry]->data,
frame_len);
// Here I check again if data has been corrupted (the answer is SOMETIMES YES).
debug_check_data_corruption();
skb_put(skb, frame_len);
dma_sync_single_for_device(priv->device,
rx_q->rx_skbuff_dma
[entry], frame_len,
DMA_FROM_DEVICE);
Some last notes:
I tried to run the kernel with CONFIG_DMA_API_DEBUG enabled. It's is not always triggered, but when I catch the corruption by my self (with my debug fucntion), sometimes I can see that /sys/kernel/debug/dma-api/num_errors has been increased, and sometimes I also get this log: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000006879f902] [size=61 bytes]
I've also enabled the CONFIG_DEBUG_KMEMLEAK and right after I catches the data corruption event, I get this log: kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak), but I still don't understand what clues it throws, although it seem to be taken from the same part of code I've pasted here (__netdev_alloc_skb is been called from __netdev_alloc_skb_ip_align). This what /sys/kernel/debug/kmemleak displays:
unreferenced object 0xe9ea52c0 (size 192):
comm "softirq", pid 0, jiffies 6171209 (age 32709.360s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 40 4d 2d ea ............#M-.
00 00 00 00 00 00 00 00 d4 83 7c b3 7a 87 7c b3 ..........|.z.|.
backtrace:
[<045ac811>] __netdev_alloc_skb+0x9f/0xdc
[<4f2b009a>] stmmac_napi_poll+0x89b/0xfc4
[<1dd85c70>] net_rx_action+0xd3/0x28c
[<1c60fabb>] __do_softirq+0xd5/0x27c
[<9e007b1d>] irq_exit+0x8f/0xc0
[<beb36a07>] __handle_domain_irq+0x49/0x84
[<67c17c88>] gic_handle_irq+0x39/0x68
[<e8f5dc30>] __irq_svc+0x65/0x94
[<075bc7c7>] down_read+0x8/0x3c
[<075bc7c7>] down_read+0x8/0x3c
[<790c6556>] get_user_pages_unlocked+0x49/0x13c
[<544d56e3>] get_futex_key+0x77/0x2e0
[<1fd5d0e9>] futex_wait_setup+0x3f/0x144
[<8bc86dff>] futex_wait+0xa1/0x198
[<b362fbc0>] do_futex+0xd3/0x9a8
[<46f336be>] sys_futex+0xcd/0x138
I am currently porting code that uses a USB device from Windows to Linux.
I've thoroughly tested the original application and I'm pretty sure that the device works well. I implemented the USB interface on Linux using hidapi-libusb and there are times when the returned data from the device is missing at least a byte.
Once it happens, all the returned values are missing that much data. I more or less have to disconnect and reconnect the USB device in order to make the USB device read data correct. I'm starting to think that maybe the first byte is sometimes returned as 00 and Linux ignores it. It usual occurs on successive reads.
For example:
I send get register state and I expect 10 data available for USB read. Byte 5 is the number of the data.
Expected:
00 00 01 02 00 08 42 (Data 8)
00 00 01 02 00 09 42 (Data 9)
Actual:
00 00 01 02 00 08 42 (Data 8)
00 00 02 00 09 42 ab (Data 9)
Data 9's packet number becomes wrong because it is missing a byte. I've tried changing to hidapi-hidraw, and it happens significantly less. I've checked the hexdump of the hidraw of the device (/dev/hidraw0), and it is consistent with the data I am getting in my application. I've tried using memory leak detection tools and no leaks/corruption is detected.
Is this a Linux problem (3.2.0-4-amd64) or is it possibly the device?
The pseudo code of my application is just:
Initialize HIDAPI and device related
Connect to device using HIDAPI
Write USB command
Read USB command (done multiple times if write expects multiple data)
Parse data
Repeat 3 and 4 until all commands are performed
Free memory and close HIDAPI.
Things I've tried:
Ensure no delay is between read and writes
Add flushing of read data before writing (sometimes catches stray data)
Add a really long timeout (five seconds) on flushing of read data - significantly reduces the problem at a big cost.
The question is: How can i tell two files apart? One coded with JPEG, another with JPEG2000.
I need format-specific file read/write functions, i can't find file encoding without reading it.
JPEG works fine right now, but JPEG func fails to open JPEG2000.
So i need to determine whether my file is JPG or JPEG2000.
According to the Digital Formats at Library of Congress, all JPEG 2000 files start with the following signature (also known as magic bytes or magic number):
00 00 00 0C 6A 50 20 20 0D 0A 87 0A
(The IANA record only lists the first 12, so I left the remainder out).
Normal JPEG files on the other hand, starts with:
FF D8 FF E0
Comparing these bytes, you should easily be able to tell them apart.
I want to do a statistic of memory bytes access on programs running on Linux (X86_64 architecture). I use perf tool to dump the file like this:
: ffffffff81484700 <load2+0x484700>:
2.86 : ffffffff8148473b: 41 8b 57 04 mov 0x4(%r15),%edx
5.71 : ffffffff81484800: 65 8b 3c 25 1c b0 00 mov %gs:0xb01c,%edi
22.86 : ffffffff814848a0: 42 8b b4 39 80 00 00 mov 0x80(%rcx,%r15,1),%esi
25.71 : ffffffff814848d8: 42 8b b4 39 80 00 00 mov 0x80(%rcx,%r15,1),%esi
2.86 : ffffffff81484947: 80 bb b0 00 00 00 00 cmpb $0x0,0xb0(%rbx)
2.86 : ffffffff81484954: 83 bb 88 03 00 00 01 cmpl $0x1,0x388(%rbx)
5.71 : ffffffff81484978: 80 79 40 00 cmpb $0x0,0x40(%rcx)
2.86 : ffffffff8148497e: 48 8b 7c 24 08 mov 0x8(%rsp),%rdi
5.71 : ffffffff8148499b: 8b 71 34 mov 0x34(%rcx),%esi
5.71 : ffffffff814849a4: 0f af 34 24 imul (%rsp),%esi
My current method is to analyze file and get all memory access instructions, such as move, cmp, etc. Then calculate every access bytes of every instruction, such as mov 0x4(%r15),%edx will add 4 bytes.
I want to know whether there is possible way to calculate through machine code , such as by analyzing "41 8b 57 04", I can also add 4 bytes. Because I am not familiar with X86_64 machine code, could anyone give any clues? Or is there any better way to do statistics? Thanks in advance!
See https://stackoverflow.com/a/20319753/120163 for information about decoding Intel instructions; in fact, you really need to refer to Intel reference manuals: http://download.intel.com/design/intarch/manuals/24319101.pdf If you only want to do this manually for a few instructions, you can just look up the data in these manuals.
If you want to automate the computation of instruction total-memory-access, you will need a function that maps instructions to the amount of data accessed. Since the instruction set is complex, the corresponding function will be complex and take you a long time to write from scratch.
My SO answer https://stackoverflow.com/a/23843450/120163 provides C code that maps x86-32 instructions to their length, given a buffer that contains a block of binary code. Such code is necessary if one is to start at some point in the object code buffer and simply enumerate the instructions that are being used. (This code has been used in production; it is pretty solid). This routine was built basically by reading the Intel reference manual very carefully. For OP, this would have to be extended to x86-64, which shouldn't be very hard, mostly you have account for the extended-register prefix opcode byte and some differences from x86-32.
To solve OP's problem, one would also modify this routine to also return the number of byte reads by each individual instruction. This latter data also has to be extracted by careful inspection from the Intel reference manuals.
OP also has to worry about where he gets the object code from; if he doesn't run this routine in the address space of the object code itself, he will need to somehow get this object code from the .exe file. For that, he needs to build or run the equivalent of the Windows loader, and I'll bet that
has a bunch of dark corners. Check out the format of object code files.
I need to include basic file-sending and file-receiving routines in my program, and it needs to be through the ZMODEM protocol. The problem is that I'm having trouble understanding the spec.
For reference, here is the specification.
The spec doesn't define the various constants, so here's a header file from Google.
It seems to me like there are a lot of important things left undefined in that document:
It constantly refers to ZDLE-encoding, but what is it? When exactly do I use it, and when don't I use it?
After a ZFILE data frame, the file's metadata (filename, modify date, size, etc.) are transferred. This is followed by a ZCRCW block and then a block whose type is undefined according to the spec. The ZCRCW block allegedly contains a 16-bit CRC, but the spec doesn't define on what data the CRC is computed.
It doesn't define the CRC polynomial it uses. I found out by chance that the CRC32 poly is the standard CRC32, but I've had no such luck with the CRC16 poly. Nevermind, I found it through trial and error. The CRC16 poly is 0x1021.
I've looked around for reference code, but all I can find are unreadable, undocumented C files from the early 90s. I've also found this set of documents from the MSDN, but it's painfully vague and contradictory to tests that I've run: http://msdn.microsoft.com/en-us/library/ms817878.aspx (you may need to view that through Google's cache)
To illustrate my difficulties, here is a simple example. I've created a plaintext file on the server containing "Hello world!", and it's called helloworld.txt.
I initiate the transfer from the server with the following command:
sx --zmodem helloworld.txt
This prompts the server to send the following ZRQINIT frame:
2A 2A 18 42 30 30 30 30 30 30 30 30 30 30 30 30 **.B000000000000
30 30 0D 8A 11 00.Š.
Three issues with this:
Are the padding bytes (0x2A) arbitrary? Why are there two here, but in other instances there's only one, and sometimes none?
The spec doesn't mention the [CR] [LF] [XON] at the end, but the MSDN article does. Why is it there?
Why does the [LF] have bit 0x80 set?
After this, the client needs to send a ZRINIT frame. I got this from the MSDN article:
2A 2A 18 42 30 31 30 30 30 30 30 30 32 33 62 65 **.B0100000023be
35 30 0D 8A 50.Š
In addition to the [LF] 0x80 flag issue, I have two more issues:
Why isn't [XON] included this time?
Is the CRC calculated on the binary data or the ASCII hex data? If it's on the binary data I get 0x197C, and if it's on the ASCII hex data I get 0xF775; neither of these are what's actually in the frame (0xBE50). (Solved; it follows whichever mode you're using. If you're in BIN or BIN32 mode, it's the CRC of the binary data. If you're in ASCII hex mode, it's the CRC of what's represented by the ASCII hex characters.)
The server responds with a ZFILE frame:
2A 18 43 04 00 00 00 00 DD 51 A2 33 *.C.....ÝQ¢3
OK. This one makes sense. If I calculate the CRC32 of [04 00 00 00 00], I indeed get 0x33A251DD. But now we don't have ANY [CR] [LF] [XON] at the end. Why is this?
Immediately after this frame, the server also sends the file's metadata:
68 65 6C 6C 6F 77 6F 72 6C 64 2E 74 78 74 00 31 helloworld.txt.1
33 20 32 34 30 20 31 30 30 36 34 34 20 30 20 31 3 240 100644 0 1
20 31 33 00 18 6B 18 50 D3 0F F1 11 13..k.PÓ.ñ.
This doesn't even have a header, it just jumps straight to the data. OK, I can live with that. However:
We have our first mysterious ZCRCW frame: [18 6B]. How long is this frame? Where is the CRC data, and is it CRC16 or CRC32? It's not defined anywhere in the spec.
The MSDN article specifies that the [18 6B] should be followed by [00], but it isn't.
Then we have a frame with an undefined type: [18 50 D3 0F F1 11]. Is this a separate frame or is it part of ZCRCW?
The client needs to respond with a ZRPOS frame, again taken from the MSDN article:
2A 2A 18 42 30 39 30 30 30 30 30 30 30 30 61 38 **.B0900000000a8
37 63 0D 8A 7c.Š
Same issues as with the ZRINIT frame: the CRC is wrong, the [LF] has bit 0x80 set, and there's no [XON].
The server responds with a ZDATA frame:
2A 18 43 0A 00 00 00 00 BC EF 92 8C *.C.....¼ï’Œ
Same issues as ZFILE: the CRC is all fine, but where's the [CR] [LF] [XON]?
After this, the server sends the file's payload. Since this is a short example, it fits in one block (max size is 1024):
48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0A Hello world!.
From what the article seems to mention, payloads are escaped with [ZDLE]. So how do I transmit a payload byte that happens to match the value of [ZDLE]? Are there any other values like this?
The server ends with these frames:
18 68 05 DE 02 18 D0 .h.Þ..Ð
2A 18 43 0B 0D 00 00 00 D1 1E 98 43 *.C.....Ñ.˜C
I'm completely lost on the first one. The second makes as much sense as the ZRINIT and ZDATA frames.
My buddy wonders if you are implementing a time
machine.
I don't know that I can answer all of your questions -- I've never
actually had to implement zmodem myself -- but here are few answers:
From what the article seems to mention, payloads are escaped with
[ZDLE]. So how do I transmit a payload byte that happens to match the
value of [ZDLE]? Are there any other values like this?
This is explicitly addressed in the document you linked to at the
beginning of your questions, which says:
The ZDLE character is special. ZDLE represents a control sequence
of some sort. If a ZDLE character appears in binary data, it is
prefixed with ZDLE, then sent as ZDLEE.
It constantly refers to ZDLE-encoding, but what is it? When exactly
do I use it, and when don't I use it?
In the Old Days, certain "control characters" were used to control the
communication channel (hence the name). For example, sending XON/XOFF
characters might pause the transmission. ZDLE is used to escape
characters that may be problematic. According to the spec, these are
the characters that are escaped by default:
ZMODEM software escapes ZDLE, 020, 0220, 021, 0221, 023, and 0223.
If preceded by 0100 or 0300 (#), 015 and 0215 are also escaped to
protect the Telenet command escape CR-#-CR. The receiver ignores
021, 0221, 023, and 0223 characters in the data stream.
I've looked around for reference code, but all I can find are
unreadable, undocumented C files from the early 90s.
Does this include the code for the lrzsz package? This is still
widely available on most Linux distributions (and surprisingly handy
for transmitting files over an established ssh connection).
There are a number of other implementations out there, including
several in software listed on freecode, including qodem,
syncterm, MBSE, and others. I believe the syncterm
implementation is written as library that may be reasonable easy
to use from your own code (but I'm not certain).
You may find additional code if you poke around older collections of
MS-DOS software.
I can't blame you. The user manual is not organized in a user friendly way
Are the padding bytes (0x2A) arbitrary?
No, from page 14,15:
A binary header begins with the sequence ZPAD, ZDLE, ZBIN.
A hex header begins with the sequence ZPAD, ZPAD, ZDLE, ZHEX.
.
The spec doesn't mention the [CR] [LF] [XON] at the end, but the MSDN article does. Why is it there?
Page 15
* * ZDLE B TYPE F3/P0 F2/P1 F1/P2 F0/P3 CRC-1 CRC-2 CR LF XON
.
Why does the [LF] have bit 0x80 set?
Not sure. From Tera term I got both control characters XORed with 0x80 (8D 8A 11)
We have our first mysterious ZCRCW frame: [18 6B]. How long is this frame? Where is the CRC data, and is it CRC16 or CRC32? It's not defined anywhere in the spec.
The ZCRCW is not a header or a frame type, it's more like a footer that tells the receiver what to expect next. In this case it's the footer of the data subpacket containing the file name. It's going to be a 32 bit checksum because you're using a "C" type binary header.
ZDLE C TYPE F3/P0 F2/P1 F1/P2 F0/P3 CRC-1 CRC-2 CRC-3 CRC-4
.
Then we have a frame with an undefined type: [18 50 D3 0F F1 11]. Is this a separate frame or is it part of ZCRCW?
That's the CRC for the ZCRCW data subpacket. It's 5 bytes because the first one is 0x10, a control character that needs to be ZDLE escaped. I'm not sure what 0x11 is.
and there's no [XON].
XON is just for Hex headers. You don't use it for a binary header.
ZDLE A TYPE F3/P0 F2/P1 F1/P2 F0/P3 CRC-1 CRC-2
.
So how do I transmit a payload byte that happens to match the value of [ZDLE]?
18 58 (AKA ZDLEE)
18 68 05 DE 02 18 D0
That's the footer of the data subframe. The next 5 bytes are the CRC (last byte is ZDLE encoded)
The ZDLE + ZBIN (0x18 0x41) means the frame is CRC-CCITT(XMODEM 16) with Binary Data.
ZDLE + ZHEX (0x18 0x42) means CRC-CCITT(XMODEM 16) with HEX data.
The HEX data is tricky, since at first some people don't understand it. Every two bytes, the ASCII chars represent one byte in binary. For example, 0x30 0x45 0x41 0x32 means 0x30 0x45, 0x41 0x32, or in ASCII 0 E A 2, then 0x0E, 0xA2. They expand the binary two nibbles to a ASCII representation. I found in several dataloggers that some devices use lower case to represent A~F (a~f) in HEX, it doesn't matter, but on those, you will not find 0x30 0x45 0x41 0x32 (0EA2) but 0x30 0x65 0x61 0x32 (0ea2), it doesn't change a thing, just make it a little bit confuse at first.
And yes, the CRC16 for ZBIN or ZHEX is CRC-CCITT(XMODEM).
The ZDLE ZBIN32 (0x18 0x43) or ZDLE ZBINR32 (0x18 0x44) use CRC-32 calculation.
Noticing that the ZDLE and the following byte are excluded in the CRC calculation.
I am digging into the ZMODEM since I need to "talk" with some Elevators Door Boards, to program a new set of parameters at once, instead using their software to change attribute by attribute. This "talk" could be on the bench instead sitting over the elevator car roof with a notebook. Those boards talk ZMODEM, but as I don't have the packets format they expect, the board still rejecting my file transfer. The boards send 0x2a 0x2A 0x18 0x42 0x30 0x31 0x30 (x6) + crc, the Tera Terminal transfering the file in ZMODEM send to the board 0x2A 0x2A 0x18 0x42 0x30 0x30 ... + CRC, I don't know why this 00 or 01 after the 0x4B means. The PC send this and the filename and some file attributes. The board after few seconds answer with "No File received"...