What is the use of __iomem in linux while writing device drivers? - linux

I have seen that __iomem is used to store the return type of ioremap(), but I have used u32 in ARM architecture for it and it works well.
So what difference does __iomem make here? And in which circumstances should I use it exactly?

Lots of type casts are going to just "work well". However, this is not very strict. Nothing stops you from casting a u32 to a u32 * and dereference it, but this is not following the kernel API and is prone to errors.
__iomem is a cookie used by Sparse, a tool used to find possible coding faults in the kernel. If you don't compile your kernel code with Sparse enabled, __iomem will be ignored anyway.
Use Sparse by first installing it, and then adding C=1 to your make call. For example, when building a module, use:
make -C $KPATH M=$PWD C=1 modules
__iomem is defined like this:
# define __iomem __attribute__((noderef, address_space(2)))
Adding (and requiring) a cookie like __iomem for all I/O accesses is a way to be stricter and avoid programming errors. You don't want to read/write from/to I/O memory regions with absolute addresses because you're usually using virtual memory. Thus,
void __iomem *ioremap(phys_addr_t offset, unsigned long size);
is usually called to get the virtual address of an I/O physical address offset, for a specified length size in bytes. ioremap() returns a pointer with an __iomem cookie, so this may now be used with inline functions like readl()/writel() (although it's now preferable to use the more explicit macros ioread32()/iowrite32(), for example), which accept __iomem addresses.
Also, the noderef attribute is used by Sparse to make sure you don't dereference an __iomem pointer. Dereferencing should work on some architecture where the I/O is really memory-mapped, but other architectures use special instructions for accessing I/Os and in this case, dereferencing won't work.
Let's look at an example:
void *io = ioremap(42, 4);
Sparse is not happy:
warning: incorrect type in initializer (different address spaces)
expected void *io
got void [noderef] <asn:2>*
Or:
u32 __iomem* io = ioremap(42, 4);
pr_info("%x\n", *io);
Sparse is not happy either:
warning: dereference of noderef expression
In the last example, the first line is correct, because ioremap() returns its value to an __iomem variable. But then, we deference it, and we're not supposed to.
This makes Sparse happy:
void __iomem* io = ioremap(42, 4);
pr_info("%x\n", ioread32(io));
Bottom line: always use __iomem where it's required (as a return type or as a parameter type), and use Sparse to make sure you did so. Also: do not dereference an __iomem pointer.
Edit: Here's a great LWN article about the inception of __iomem and functions using it.

Simple, Straight and Short (S3) Explanation.
There is an article https://lwn.net/Articles/653585/ for more details.

Related

simple_read_from_buffer/simple_write_to_buffer vs. copy_to_user/copy_from_user

I recently wrote a module implementing these functions.
What is the difference between the two? From my understanding, the copy_..._user functions are more secure. Please correct me if I'm mistaken.
Furthermore, is it a bad idea to mix the two functions in one program? For example, I used simple_read_from_buffer in my misc dev read function, and copy_from_user in my write function.
Edit: I believe I've found the answer to my question from reading fs/libfs.c (I wasn't aware that this was where the source code was located for these functions); from my understanding the simple_...() functions are essentially a wrapper around the copy_...() functions. I think it was appropriate in my case to use copy_from_user for the misc device write function as I needed to validate that the input matched a specific string before returning it to the user buffer.
I will still leave this question open though in case someone has a better explanation or wants to correct me!
simple_read_from_buffer and simple_write_to_buffer are just convenience wrappers around copy_{to,from}_user for when all you need to do is service a read from userspace from a kernel buffer, or service a write from userspace to a kernel buffer.
From my understanding, the copy_..._user functions are more secure.
Neither version is "more secure" than the other. Whether or not one might be more secure depends on the specific use case.
I would say that simple_{read,write}_... could in general be more secure since they do all the appropriate checks for you before copying. If all you need to do is service a read/write to/from a kernel buffer, then using simple_{read,write}_... is surely faster and less error-prone than manually checking and calling copy_{from,to}_user.
Here's a good example where those functions would be useful:
#define SZ 1024
static char kernel_buf[SZ];
static ssize_t dummy_read(struct file *filp, char __user *user_buf, size_t n, loff_t *off)
{
return simple_read_from_buffer(user_buf, n, off, kernel_buf, SZ);
}
static ssize_t dummy_write(struct file *filp, char __user *user_buf, size_t n, loff_t *off)
{
return simple_write_to_buffer(kernel_buf, SZ, off, user_buf, n);
}
It's hard to tell what exactly you need without seeing your module's code, but I would say that you can either:
Use copy_{from,to}_user if you want to control the exact behavior of your function.
Use a return simple_{read,write}_... if you don't need such fine-grained control and you are ok with just returning the standard values produced by those wrappers.

Why does the ICU use this aliasing barrier when doing a reinterpret_cast?

I'm porting code from ICU 58.2 to ICU 59.1 where they changed the character type from a uint16_t to a char16_t. I was going to just do a straight reinterpret_cast where I needed to convert the types, but found that the ICU 59.1 actually provides functions for this conversion. What I don't understand is why they need to use this anti aliasing barrier before doing a reinterpret_cast.
#elif (defined(__clang__) || defined(__GNUC__)) && U_PLATFORM !=
U_PF_BROWSER_NATIVE_CLIENT
# define U_ALIASING_BARRIER(ptr) asm volatile("" : : "rm"(ptr) : "memory")
#endif
...
inline const UChar *toUCharPtr(const char16_t *p) {
#ifdef U_ALIASING_BARRIER
U_ALIASING_BARRIER(p);
#endif
return reinterpret_cast<const UChar *>(p);
Why wouldn't it be safe just to use reinterpret_cast without calling U_ALIASING_BARRIER?
At a guess, it's to stop any violations of the strict aliasing rule, that might occur in calling code that hasn't been completely cleaned up, from resulting in unexpected behaviour when optimizing (the hint to this is in the comment above: "Barrier for pointer anti-aliasing optimizations even across function boundaries.").
The strict aliasing rule forbids dereferencing pointers that alias the same value when they have incompatible types (a C notion, but C++ says a similar thing with more words). Here's a small gotcha: char16_t and uint16_t aren't required to be compatible. uint16_t is actually an optionally-supported type (in both C and C++); char16_t is equivalent to uint_least16_t, which isn't necessarily the same type. It will have the same width on x86, but a compiler isn't required to have it tagged as actually being the same thing. It might even be intentionally lax with assuming types that typically indicate different intent could alias.
There's a more complete explanation in the linked answer, but basically given code like this:
uint16_t buffer[] = ...
buffer[0] = u'a';
uint16_t * pc1 = buffer;
char16_t * pc2 = (char16_t *)pc1;
pc2[0] = u'b';
uint16_t c3 = pc1[0];
...if for whatever reason the compiler doesn't have char16_t and uint16_t tagged as compatible, and you're compiling with optimizations on including its equivalent of -fstrict-aliasing, it's allowed to assume that the write through pc2 couldn't have modified whatever pc1 points at, and not reload the value before assigning it to c3, possibly giving it u'a' instead.
Code a bit like the example could plausibly arise mid-way through a conversion process where the previous code was happily using uint16_t * everywhere, but now a char16_t * is made available at the top of a block for compatibility with ICU 59, before all the code below has been completely changed to read only through the correctly-typed pointer.
Since compilers don't generally optimize hand-coded assembly, the presence of an asm block will force it to check all of its assumptions about the state of registers and other temporary values, and do a full reload of every value the first time it's dereferenced after U_ALIASING_BARRIER, regardless of optimization flags. This won't protect you from any further aliasing problems if you continue to write through the uint16_t * below the conversion (if you do that, it's legitimately your own fault), but it should at least ensure the state from before the conversion call doesn't persist in a way that could cause writes through the new pointer to be accidentally skipped afterwards.

Function for unaligned memory access on ARM

I am working on a project where data is read from memory. Some of this data are integers, and there was a problem accessing them at unaligned addresses. My idea would be to use memcpy for that, i.e.
uint32_t readU32(const void* ptr)
{
uint32_t n;
memcpy(&n, ptr, sizeof(n));
return n;
}
The solution from the project source I found is similar to this code:
uint32_t readU32(const uint32_t* ptr)
{
union {
uint32_t n;
char data[4];
} tmp;
const char* cp=(const char*)ptr;
tmp.data[0] = *cp++;
tmp.data[1] = *cp++;
tmp.data[2] = *cp++;
tmp.data[3] = *cp;
return tmp.n;
}
So my questions:
Isn't the second version undefined behaviour? The C standard says in 6.2.3.2 Pointers, at 7:
A pointer to an object or incomplete type may be converted to a pointer to a different
object or incomplete type. If the resulting pointer is not correctly aligned 57) for the
pointed-to type, the behavior is undefined.
As the calling code has, at some point, used a char* to handle the memory, there must be some conversion from char* to uint32_t*. Isn't the result of that undefined behaviour, then, if the uint32_t* is not corrently aligned? And if it is, there is no point for the function as you could write *(uint32_t*) to fetch the memory. Additionally, I think I read somewhere that the compiler may expect an int* to be aligned correctly and any unaligned int* would mean undefined behaviour as well, so the generated code for this function might make some shortcuts because it may expect the function argument to be aligned properly.
The original code has volatile on the argument and all variables because the memory contents could change (it's a data buffer (no registers) inside a driver). Maybe that's why it does not use memcpy since it won't work on volatile data. But, in which world would that make sense? If the underlying data can change at any time, all bets are off. The data could even change between those byte copy operations. So you would have to have some kind of mutex to synchronize access to this data. But if you have such a synchronization, why would you need volatile?
Is there a canonical/accepted/better solution to this memory access problem? After some searching I come to the conclusion that you need a mutex and do not need volatile and can use memcpy.
P.S.:
# cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 10 (v7l)
BogoMIPS : 1581.05
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10
This code
uint32_t readU32(const uint32_t* ptr)
{
union {
uint32_t n;
char data[4];
} tmp;
const char* cp=(const char*)ptr;
tmp.data[0] = *cp++;
tmp.data[1] = *cp++;
tmp.data[2] = *cp++;
tmp.data[3] = *cp;
return tmp.n;
}
passes the pointer as a uint32_t *. If it's not actually a uint32_t, that's UB. The argument should probably be a const void *.
The use of a const char * in the conversion itself is not undefined behavior. Per 6.3.2.3 Pointers, paragraph 7 of the C Standard (emphasis mine):
A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned for the referenced type, the behavior is undefined.
Otherwise, when converted back again, the result shall compare
equal to the original pointer. When a pointer to an object is
converted to a pointer to a character type, the result points to the
lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining
bytes of the object.
The use of volatile with respect to the correct way to access memory/registers directly on your particular hardware would have no canonical/accepted/best solution. Any solution for that would be specific to your system and beyond the scope of standard C.
Implementations are allowed to define behaviors in cases where the Standard does not, and some implementations may specify that all pointer types have the same representation and may be freely cast among each other regardless of alignment, provided that pointers which are actually used to access things are suitably aligned.
Unfortunately, because some obtuse compilers compel the use of "memcpy" as an
escape valve for aliasing issues even when pointers are known to be aligned,
the only way compilers can efficiently process code which needs to make
type-agnostic accesses to aligned storage is to assume that any pointer of a type requiring alignment will always be aligned suitably for such type. As a result, your instinct that approach using uint32_t* is dangerous is spot on. It may be desirable to have compile-time checking to ensure that a function is either passed a void* or a uint32_t*, and not something like a uint16_t* or a double*, but there's no way to declare a function that way without allowing a compiler to "optimize" the function by consolidating the byte accesses into a 32-bit load that will fail if the pointer isn't aligned.

alloc_pages() is paired by __free_pages()

I read the book "Linux Kernel Development", and find some functions that make me confused, listed as bellow:
struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
void __free_pages(struct page *page, unsigned int order)
unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
void free_pages(unsigned long addr, unsigned int order)
The problem is the use of the two underline in the function name, and how the function pairs.
1. when will the linux kernel uses two underline in its function name?
2. why alloc_pages is paired with __free_pages, but not free_pages?
As you can notice:
alloc_pages() / __free_pages() takes "page *" (page descriptor) as argument.
They are ususally used internally by some infrastrcture kernel code, like page fault handler, which wish to manipulate page descriptor instead of memory block content.
__get_free_pages() / free_pages() takes "unsigned long" (virtual address of memory block) as argument
They could be used by code which wish to use the memory block itself, after allocation, you can read / write to this memory block.
As for their name and double underscore "__", you don't need to bother too much. Sometimes kernel functions were named casually without too much consideration when they were first written. And when people think of that the names are not proper, but later those functions are already used wildly in kernel, and kernel guys are simply lazy to change them.

Writing a custom malloc which stores informations in the pointer

I have recently been reading about a family of automatic memory management techniques that rely on storing information in the pointer returned by the allocator, i.e. few bits of header e.g. to differentiate between pointers or to store thread-related information (note that I'm not talking about limited-field reference counting here, only immutable information).
I'd like to toy with these techniques. Now, to implement them, I need to be able to return pointers with a specific shape from my allocator. I suppose I could play with the least weight bits but this would require padding that looks extremely memory consuming, so I believe that I should play with the heaviest bits. However, I have no good idea on how to do this. Is there a way for me to, call malloc or malloc_create_zone or some related function and request a pointer that always starts with the given bits?
Thanks everyone!
The amount of information you can actually store in a pointer is pretty limited (typically one or two bits per pointer). And every attempt to dereference the pointer has to first mask out the magic information. The technique is often called tagging, BTW.
#define TAG_MASK 0x3
#define CONS_TAG 0x1
#define STRING_TAG 0x2
#define NUMBER_TAG 0x3
typedef uintptr_t value_t;
typedef struct cons {
value_t car;
value_t cdr;
} cons_t;
value_t
create_cons(value_t t1, value_t t2)
{
cons_t* pair = malloc(sizeof(cons_t));
value_t addr = (value_t)pair;
pair->car = t1;
pair->cdr = t2;
return addr | CONS_TAG;
}
value_t
car_of_cons(value_t v)
{
if ((v % TAG_MASK) != CONS_TAG) error("wrong type of argument");
return ((cons_t*) (v & ~TAG_MASK))->car;
}
One advantage of this technique is, that you can directly infer the type of the object from the pointer itself. You don't need to dereference it (say, in order to read a special type field or similar). Many language implementations using this scheme also have a special tag combination for "immediate" numbers and other small values, which can be represented direcly using the "pointer".
The disadvatage is, that the amount of information, which can be stored, is pretty limited. Also, as the example code shows, you have to be aware of the tagging in every access to the object, and need to "untag" the pointer before actually using it.
The use of the least significant bits for tagging stemms from the observation, that on most platforms, all pointer to malloced memory is actually aligned on a non-byte boundary (usually 8 bytes), so the least significant bits are always zero.

Resources