64 bit portability issues - visual-c++

All this originated from me poking at a compiler warning message (C4267) when attempting the following line:
const unsigned int nSize = m_vecSomeVec.size();
size() returns a size_t which although typedef'd to unsigned int, is not actually a unsigned int. This I believe have to do with 64 bit portability issues, however can someone explain it a bit better for me? ( I don't just want to disable 64bit warnings.)

It depends on the implementation. std::size_t for example has a minimal required size. But there is no upper limit. To avoid these kind of situations, always use the proper typedef:
const std::vector<T>::size_type nSize = m_vecSomeVec.size();
You will be always on the safe side then.

When compiling for a 64-bit platform, size_t will be a 64-bit type. Because of this, Visual Studio gives warnings about assigning size_ts to ints when 'Detect 64-bit Portability Issues' is enabled.
Visual C++ gets this information about size_t through the __w64 token, e.g. __w64 unsigned int.
Refer the below link for more on 64 bit porting issues..
http://www.viva64.com/en/a/0065/

If size_t is typedef:ed to unsigned int, then of course it is an unsigned int, on your particular platform. But it is abstracted so that you cannot depend on it always being an unsigned int, it might be larger on some other platform.
Probably it has not been made larger since it would cost too much to do so, and e.g. vectors with more than 2^32 items in them are not very common.

Depending on the compiler, int may be 32-bits in 64-bit land.

Related

Function for unaligned memory access on ARM

I am working on a project where data is read from memory. Some of this data are integers, and there was a problem accessing them at unaligned addresses. My idea would be to use memcpy for that, i.e.
uint32_t readU32(const void* ptr)
{
uint32_t n;
memcpy(&n, ptr, sizeof(n));
return n;
}
The solution from the project source I found is similar to this code:
uint32_t readU32(const uint32_t* ptr)
{
union {
uint32_t n;
char data[4];
} tmp;
const char* cp=(const char*)ptr;
tmp.data[0] = *cp++;
tmp.data[1] = *cp++;
tmp.data[2] = *cp++;
tmp.data[3] = *cp;
return tmp.n;
}
So my questions:
Isn't the second version undefined behaviour? The C standard says in 6.2.3.2 Pointers, at 7:
A pointer to an object or incomplete type may be converted to a pointer to a different
object or incomplete type. If the resulting pointer is not correctly aligned 57) for the
pointed-to type, the behavior is undefined.
As the calling code has, at some point, used a char* to handle the memory, there must be some conversion from char* to uint32_t*. Isn't the result of that undefined behaviour, then, if the uint32_t* is not corrently aligned? And if it is, there is no point for the function as you could write *(uint32_t*) to fetch the memory. Additionally, I think I read somewhere that the compiler may expect an int* to be aligned correctly and any unaligned int* would mean undefined behaviour as well, so the generated code for this function might make some shortcuts because it may expect the function argument to be aligned properly.
The original code has volatile on the argument and all variables because the memory contents could change (it's a data buffer (no registers) inside a driver). Maybe that's why it does not use memcpy since it won't work on volatile data. But, in which world would that make sense? If the underlying data can change at any time, all bets are off. The data could even change between those byte copy operations. So you would have to have some kind of mutex to synchronize access to this data. But if you have such a synchronization, why would you need volatile?
Is there a canonical/accepted/better solution to this memory access problem? After some searching I come to the conclusion that you need a mutex and do not need volatile and can use memcpy.
P.S.:
# cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 10 (v7l)
BogoMIPS : 1581.05
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10
This code
uint32_t readU32(const uint32_t* ptr)
{
union {
uint32_t n;
char data[4];
} tmp;
const char* cp=(const char*)ptr;
tmp.data[0] = *cp++;
tmp.data[1] = *cp++;
tmp.data[2] = *cp++;
tmp.data[3] = *cp;
return tmp.n;
}
passes the pointer as a uint32_t *. If it's not actually a uint32_t, that's UB. The argument should probably be a const void *.
The use of a const char * in the conversion itself is not undefined behavior. Per 6.3.2.3 Pointers, paragraph 7 of the C Standard (emphasis mine):
A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned for the referenced type, the behavior is undefined.
Otherwise, when converted back again, the result shall compare
equal to the original pointer. When a pointer to an object is
converted to a pointer to a character type, the result points to the
lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining
bytes of the object.
The use of volatile with respect to the correct way to access memory/registers directly on your particular hardware would have no canonical/accepted/best solution. Any solution for that would be specific to your system and beyond the scope of standard C.
Implementations are allowed to define behaviors in cases where the Standard does not, and some implementations may specify that all pointer types have the same representation and may be freely cast among each other regardless of alignment, provided that pointers which are actually used to access things are suitably aligned.
Unfortunately, because some obtuse compilers compel the use of "memcpy" as an
escape valve for aliasing issues even when pointers are known to be aligned,
the only way compilers can efficiently process code which needs to make
type-agnostic accesses to aligned storage is to assume that any pointer of a type requiring alignment will always be aligned suitably for such type. As a result, your instinct that approach using uint32_t* is dangerous is spot on. It may be desirable to have compile-time checking to ensure that a function is either passed a void* or a uint32_t*, and not something like a uint16_t* or a double*, but there's no way to declare a function that way without allowing a compiler to "optimize" the function by consolidating the byte accesses into a 32-bit load that will fail if the pointer isn't aligned.

Linux struct msghdr :: msg_iovlen type

No practical reason, just wondering.
Why in Linux in msghdr struct, they use size_t type for msg_iovlen field? I found it a bit confusing, as size_t usually means "how much bytes".
Btw, in FreeBSD they use u_int for that field, and int is in Posix standard.
See size_t:
size_t is the unsigned integer type of the result of the sizeof operator as well as the sizeof... operator and the alignof operator... size_t can store the maximum size of a theoretically possible object of any type (including array). On many platforms (an exception are systems with segmented addressing)

Incorrect address returned by crypt() on Solaris x64

When debugging shared library loaded with dlopen(), I found an interesting thing.The address returned by crypt() function when called from my library is 32 bits based; that is, when I try to see that address in debugger. it says that this is a bad address. Adding to this address a shift which is in my case 0xffffffff00000000 gives the correct result. Looking at the crypt sources it is clear that the string returned by crypt is a static char array, but it is not clear why the address is 32 bits based.
Thank you in advance to any ideas and help
Did you #include <unistd.h> or #include <crypt.h> in your code so that it had the function prototype declaring crypt() as returning char *?
If you don't have a function prototype, C defaults to assuming functions return int, even if that's only 32-bits on a 64-bit machine, and this often breaks functions that return pointers (which work by accident on 32-bit systems where int is the same size as a pointer).

printf ptr: can the leading 0x be eliminated?

The Linux printf renders %p arguments as hex digits with a leading 0x. Is there a way to make it not print the 0x? (Needs to work on both 32 and 64 bit.)
You can use the format specifier for uintptr_t from <inttypes.h>:
#include <inttypes.h>
[...]
printf("%"PRIxPTR"\n", (uintptr_t) p);
This works like %x for the uintptr_t type, which is an integer type capable of roundtrip conversion from/to any pointer type.
Use %llx, it will work on 64-bit for sure. Tried and tested.
Use %lx or %08lx. It works for both 32 and 64 bit linux gcc, because long int is always the same width as void *. Doesn't work for MSVC, because long int is always 32 bit in MSVC.
If you want it to work on all compilers, you can use %llx and cast your pointer to unsigned long long int, it's not efficient in 32 bit though.
If you want efficiency as well, define different macro for different cases.

Question about file seeking position

My previous Question is about raw data reading and writing, but a new problem arised, it seems there is no ending....
The question is: the parameters of the functions like lseek() or fseek() are all 4 bytes. If i want to move a span over 4G, that is imposible. I know in Win32, there is a function SetPointer(...,Hign, Low,....), this pointers can generate 64 byte pointers, which is what i want.
But if i want to create an app in Linux or Unix (create a file or directly write
the raw drive sectors), How can I move to a pointer over 4G?
Thanx, Waiting for your replies...
The offset parameter of lseek is of type off_t. In 32-bit compilation environments, this type defaults to a 32-bit signed integer - however, if you compile with this macro defined before all system includes:
#define _FILE_OFFSET_BITS 64
...then off_t will be a 64-bit signed type.
For fseek, the fseeko function is identical except that it uses the off_t type for the offset, which allows the above solution to work with it too.
a 4 byte unsigned integer can represent a value up to 4294967295, which means if you want to move more than 4G, you need to use lseek64(). In addition, you can use fgetpos() and fsetpos() to change the position in the file.
On Windows, use _lseeki64(), on Linux, lseek64().
I recommend to use lseek64() on both systems by doing something like this:
#ifdef _WIN32
#include <io.h>
#define lseek64 _lseeki64
#else
#include <unistd.h>
#endif
That's all you need.

Resources