Types default alignment in visual C++ - visual-c++

Just noticed something that looks strange to me. Visual C++ doesn't align object in their required boundary by default. For example long long is aligned to 4 bytes boundary, while __alignof(T) returns 8 (as far as I see it always returns size of the type). So it looks like it is not properly aligned. For example
long long a1;
char g;
long long a2;
// alignment check for &a2 fails
if (((uintptr_t)&a2 & (__alignof(long long) - 1)) != 0) // failed
I also tried just to print the pointer, the value of &a2 is 0x0035F8FC(3537148 in dec).
Is there something I get wrong? I need properly aligned object of type long long. What can I do about that?
I could use __declspec(align()) Microsoft extension, but it requires literal number, so I can't write anything like that.
__declspec(align(__alignof(long long))) long long object;

VC doesn't guarantee automatic stack alignment of variables, at most the variable will be aligned to the stacks alignment(generally 4 bytes on 32 bit systems). If you need special alignment, you need to use __declspec(align(x)), just like MSVC's SSE types(like __m128), else you'll need to use _aligned_malloc instead

The alignment should minimize the memory cycles on RAM access. The 4 byte alignment uses only two 32-bit acccesses to the long long. The 8 byte aligment doesn't improve the behavior. The compiler has a default alignment that can be overwritten with the /Zp option.
See also: Configuration Properties C/C++ Code Generation Struct Member Aligment.

Related

Does std::ptr::write transfer the "uninitialized-ness" of the bytes it writes?

I'm working on a library that help transact types that fit in a pointer-size int over FFI boundaries. Suppose I have a struct like this:
use std::mem::{size_of, align_of};
struct PaddingDemo {
data: u8,
force_pad: [usize; 0]
}
assert_eq!(size_of::<PaddingDemo>(), size_of::<usize>());
assert_eq!(align_of::<PaddingDemo>(), align_of::<usize>());
This struct has 1 data byte and 7 padding bytes. I want to pack an instance of this struct into a usize and then unpack it on the other side of an FFI boundary. Because this library is generic, I'm using MaybeUninit and ptr::write:
use std::ptr;
use std::mem::MaybeUninit;
let data = PaddingDemo { data: 12, force_pad: [] };
// In order to ensure all the bytes are initialized,
// zero-initialize the buffer
let mut packed: MaybeUninit<usize> = MaybeUninit::zeroed();
let ptr = packed.as_mut_ptr() as *mut PaddingDemo;
let packed_int = unsafe {
std::ptr::write(ptr, data);
packed.assume_init()
};
// Attempt to trigger UB in Miri by reading the
// possibly uninitialized bytes
let copied = unsafe { ptr::read(&packed_int) };
Does that assume_init call triggered undefined behavior? In other words, when the ptr::write copies the struct into the buffer, does it copy the uninitialized-ness of the padding bytes, overwriting the initialized state as zero bytes?
Currently, when this or similar code is run in Miri, it doesn't detect any Undefined Behavior. However, per the discussion about this issue on github, ptr::write is supposedly allowed to copy those padding bytes, and furthermore to copy their uninitialized-ness. Is that true? The docs for ptr::write don't talk about this at all, nor does the nomicon section on uninitialized memory.
Does that assume_init call triggered undefined behavior?
Yes. "Uninitialized" is just another value that a byte in the Rust Abstract Machine can have, next to the usual 0x00 - 0xFF. Let us write this special byte as 0xUU. (See this blog post for a bit more background on this subject.) 0xUU is preserved by copies just like any other possible value a byte can have is preserved by copies.
But the details are a bit more complicated.
There are two ways to copy data around in memory in Rust.
Unfortunately, the details for this are also not explicitly specified by the Rust language team, so what follows is my personal interpretation. I think what I am saying is uncontroversial unless marked otherwise, but of course that could be a wrong impression.
Untyped / byte-wise copy
In general, when a range of bytes is being copied, the source range just overwrites the target range -- so if the source range was "0x00 0xUU 0xUU 0xUU", then after the copy the target range will have that exact list of bytes.
This is what memcpy/memmove in C behave like (in my interpretation of the standard, which is not very clear here unfortunately). In Rust, ptr::copy{,_nonoverlapping} probably performs a byte-wise copy, but it's not actually precisely specified right now and some people might want to say it is typed as well. This was discussed a bit in this issue.
Typed copy
The alternative is a "typed copy", which is what happens on every normal assignment (=) and when passing values to/from a function. A typed copy interprets the source memory at some type T, and then "re-serializes" that value of type T into the target memory.
The key difference to a byte-wise copy is that information which is not relevant at the type T is lost. This is basically a complicated way of saying that a typed copy "forgets" padding, and effectively resets it to uninitialized. Compared to an untyped copy, a typed copy loses more information. Untyped copies preserve the underlying representation, typed copies just preserve the represented value.
So even when you transmute 0usize to PaddingDemo, a typed copy of that value can reset this to "0x00 0xUU 0xUU 0xUU" (or any other possible bytes for the padding) -- assuming data sits at offset 0, which is not guaranteed (add #[repr(C)] if you want that guarantee).
In your case, ptr::write takes an argument of type PaddingDemo, and the argument is passed via a typed copy. So already at that point, the padding bytes may change arbitrarily, in particular they may become 0xUU.
Uninitialized usize
Whether your code has UB then depends on yet another factor, namely whether having an uninitialized byte in a usize is UB. The question is, does a (partially) uninitialized range of memory represent some integer? Currently, it does not and thus there is UB. However, whether that should be the case is heavily debated and it seems likely that we will eventually permit it.
Many other details are still unclear, though -- for example, transmuting "0x00 0xUU 0xUU 0xUU" to an integer may well result in a fully uninitialized integer, i.e., integers may not be able to preserve "partial initialization". To preserve partially initialized bytes in integers we would have to basically say that an integer has no abstract "value", it is just a sequence of (possibly uninitialized) bytes. This does not reflect how integers get used in operations like /. (Some of this also depends on LLVM decisions around poison and freeze; LLVM might decide that when doing a load at integer type, the result is fully poison if any input byte is poison.) So even if the code is not UB because we permit uninitialized integers, it may not behave as expected because the data you want to transfer is being lost.
If you want to transfer raw bytes around, I suggest to use a type suited for that, such as MaybeUninit. If you use an integer type, the goal should be to transfer integer values -- i.e., numbers.

String (constants) Literals pointer

I'm relatively new to C++ and I searched for an answer to my question, however I got more confused. As I understand, string literals must be pointed by "const" pointers, since are considered to be readable only. I also understand the pointer itself is not constant (and could be changed), but actually it is pointing to a string constant.I also understand that the string itself cannot be modified. So in this example:
const char* cstr="string";
*cstr = 'a';
I get an error: "assignment of read-only location."
Now, if I define my C-string as following, and define a pointer to it, I'll be able to change the string:
char str[7]="string";
char* cstr = str;
*cstr = 'a';
cout << cstr <<endl;
the string will be modified (output --> a), means the first element of the string is changes. My two questions are:
1- why in the second example I am able to modify the C-string but in the first case I cannot make any changes to the string? 2- In both cases I am using pointers, but in the first case I should Use constant char pointer?
When you use the syntax
const char* cstr="string";
C++ defines:
An array of 7 character in the read-only section of memory, with the contents string\0 in it.
pointer on the stack (or in the writable global section of memory), with the address of that array.
However, when you use the syntax:
char str[7]="string";
C++ defines:
An array of 7 character on the stack (or in the writable global section of memory), with the contents "string\0" in it.
In the first case, the actual values are in read-only memory, so you can't change them. In the second case, they are in writable memory (stack or global).
C++ tries to enforce this semantic, so if the definition is read-only memory, you should use a const pointer.
Note that not all architectures have read-only memory, but because most of them do, and C++ might want to use the read-only memory feature (for better correctness), then C++ programmers should assume (for the purpose of pointer types) that constants are going to be placed in read-only memory.

What does the IS_ALIGNED macro in the linux kernel do?

I've been trying to read the implementation of a kernel module, and I'm stumbling on this piece of code.
unsigned long addr = (unsigned long) buf;
if (!IS_ALIGNED(addr, 1 << 9)) {
DMCRIT("#%s in %s is not sector-aligned. I/O buffer must be sector-aligned.", name, caller);
BUG();
}
The IS_ALIGNED macro is defined in the kernel source as follows:
#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0)
I understand that data has to be aligned along the size of a datatype to work, but I still don't understand what the code does.
It left-shifts 1 by 9, then subtracts by 1, which gives 111111111. Then 111111111 does bitwise-and with x.
Why does this code work? How is this checking for byte alignment?
In systems programming it is common to need a memory address to be aligned to a certain number of bytes -- that is, several lowest-order bits are zero.
Basically, !IS_ALIGNED(addr, 1 << 9) checks whether addr is on a 512-byte (2^9) boundary (the last 9 bits are zero). This is a common requirement when erasing flash locations because flash memory is split into large blocks which must be erased or written as a single unit.
Another application of this I ran into. I was working with a certain DMA controller which has a modulo feature. Basically, that means you can allow it to change only the last several bits of an address (destination address in this case). This is useful for protecting memory from mistakes in the way you use a DMA controller. Problem it, I initially forgot to tell the compiler to align the DMA destination buffer to the modulo value. This caused some incredibly interesting bugs (random variables that have nothing to do with the thing using the DMA controller being overwritten... sometimes).
As far as "how does the macro code work?", if you subtract 1 from a number that ends with all zeroes, you will get a number that ends with all ones. For example, 0b00010000 - 0b1 = 0b00001111. This is a way of creating a binary mask from the integer number of required-alignment bytes. This mask has ones only in the bits we are interested in checking for zero-value. After we AND the address with the mask containing ones in the lowest-order bits we get a 0 if any only if the lowest 9 (in this case) bits are zero.
"Why does it need to be aligned?": This comes down to the internal makeup of flash memory. Erasing and writing flash is a much less straightforward process then reading it, and typically it requires higher-than-logic-level voltages to be supplied to the memory cells. The circuitry required to make write and erase operations possible with a one-byte granularity would waste a great deal of silicon real estate only to be used rarely. Basically, designing a flash chip is a statistics and tradeoff game (like anything else in engineering) and the statistics work out such that writing and erasing in groups gives the best bang for the buck.
At no extra charge, I will tell you that you will be seeing a lot of this type of this type of thing if you are reading driver and kernel code. It may be helpful to familiarize yourself with the contents of this article (or at least keep it around as a reference): https://graphics.stanford.edu/~seander/bithacks.html

How to tell compiler to pad a specific amount of bytes on every C function?

I'm trying to practice some live instrumentation and I saw there was a linker option -call-nop=prefix-nop, but it has some restriction as it only works with GOT function (I don't know how to force compiler to generate GOT function, and not sure if it's good idea for performance reason.) Also, -call-nop=* cannot pad more than 1 byte.
Ideally, I'd like to see a compiler option to pad any specific amount of bytes, and compiler will still perform all the normal function alignment.
Once I have this pad area, I can at run time to reuse these padding area to store some values or redirect the control flow.
P.S. I believe Linux kernel use similar trick to dynamically enable some software tracepoint.
-pg is intended for profile-guided optimization. The correct option for this is -fpatchable-function-entry
-fpatchable-function-entry=N[,M]
Generate N NOPs right at the beginning of each function, with the function entry point before the Mth NOP. If M is omitted, it defaults to 0 so the function entry points to the address just at the first NOP. The NOP instructions reserve extra space which can be used to patch in any desired instrumentation at run time, provided that the code segment is writable. The amount of space is controllable indirectly via the number of NOPs; the NOP instruction used corresponds to the instruction emitted by the internal GCC back-end interface gen_nop. This behavior is target-specific and may also depend on the architecture variant and/or other compilation options.
It'll insert N single-byte 0x90 NOPs and doesn't make use of multi-byte NOPs thus performance isn't as good as it should, but you probably don't care about that in this case so the option should work fine
I achieved this goal by implement my own mcount function in an assembly file and compile the code with -pg.

Ada : Variant size in record type

I having some trouble with the type Record with Ada.
I'm using Sequential_IO to read a binary file. To do that I have to use a type where the size is a multiple of the file's size. In my case I need a structure of 50 bytes so I created a type like this ("Vecteur" is an array of 3 Float) :
type Double_Byte is mod 2 ** 16; for Double_Byte'Size use 16;
type Triangle is
record
Normal : Vecteur(1..3);
P1 : Vecteur(1..3);
P2 : Vecteur(1..3);
P3 : Vecteur(1..3);
Byte_count1 : Double_Byte;
end record;
When I use the type triangle the size is 52 bytes, but when I take the size of each one separetely within it I find 50 bytes. Because 52 is not a multiple of my file's size I have execution errors. But I don't know how to fix this size, I ran some test and I think it come from Double_Byte, because when I removed it from the record I found a size of 48 bytes and when I put it back it's 52 bytes again.
Thanks you for your help.
Given Simon's latest comment, it may be impossible to do this portably using Sequential_IO; namely, reading the file on some machines (which don't support unaligned accesses) may leave half its contents unaligned and therefore liable to fail when you access them.
I can't help feeling that a better solution is to divorce the file format (which is fixed by compatibility with other systems) from the machine format (which is not). And therefore moving to Stream_IO and writing your own Read and Write primitives where necessary (e.g. to pack the odd sized Double_Byte component into 2 bytes, whatever its representation in memory) would be a more robust solution.
Then you can guarantee a file format compatible with other systems, and an internal memory format guaranteed to work.
The compiler is in no way obligated to use a specific size for Triangle unless you specify it. As you don't, it chooses whatever size it sees fit for fast access to the data. Even if you specify representation details for every component type of the record, the compiler might still choose to use more space for the record itself than necessary.
Considering the sizes you give, it seems obvious that one component of Vecteur has 4 bytes, which gives a total payload of 50 bytes for Triangle. The compiler now chooses to add 2 bytes padding, so that the record size is a multiple of the size of a 4-byte word. You can override this behavior with:
for Triangle'Size use 50 * 8;
This will force the compiler to use only 50 bytes for the record. As this is a tight fit, there is only one way to represent the record, and no further specification is necessary. If you do need to specify how exactly the record is represented, you can use a record representation clause.
Edit:
The representation clause specifies the size for the type. However, each object of this type may still take up more space unless you additionally specify
pragma Pack (Triangle);
Edit 2:
After Simon's comment, I had a closer look at this and realized that there is a far better and cleaner solution. Instead of setting the 'Size and using pragma Pack, do this:
for Triangle use record at mod 2;
Normal at 0 range 0 .. 95;
P1 at 12 range 0 .. 95;
P2 at 24 range 0 .. 95;
P3 at 36 range 0 .. 95;
Byte_count1 at 48 range 0 .. 15;
end record;
The initial mod 2 defines that the record is to be aligned at a multiple of 2 bytes. This eliminates the padding at the end without the need of pragma Pack (which is not guaranteed to work the same way on every compiler).

Resources