Does the stack in this 'ld' linker script overwrite the stored executable?

Does the stack in this 'ld' linker script overwrite the stored executable? - gnu

I have a question about the behavior of the linker script found in this question:
https://stackoverflow.com/a/55193198/2421349
To save you a click, the relavant portion is:
OUTPUT_ARCH(riscv)
MEMORY
{
/* qemu-system-risc64 virt machine */
RAM (rwx) : ORIGIN = 0x80000000, LENGTH = 128M
}
ENTRY(_start)
And in a later section:
PROVIDE (__executable_start = SEGMENT_START("text-segment", ORIGIN(RAM)));
. = SEGMENT_START("text-segment", ORIGIN(RAM)) + SIZEOF_HEADERS;
PROVIDE(__stack_top = ORIGIN(RAM) + LENGTH(RAM));
We set __executable_start to begin at ORIGIN(RAM). Then we use the . command to move the linker output location SIZEOF_HEADERS bytes forward. And finally we set __stack_top = ORIGIN(RAM) + LENGTH(RAM).
Assuming the stack grows down towards ORIGIN(RAM), won't it eventually overwrite __executable_start and whatever SIZEOF_HEADERS is if the stack grows large enough?

Yes, if the stack grows large enough, it will eventually start overwriting parts of the memory it should not. But this is not specific to this linker script: Ultimately, the memory is a finite resource, and any stack growing too much because too large automatic variables beeing allocated, and/or out of control recursive calls will end up causing problems.

Related

Rust Multi-threading Memory Allocation on the RP Pico/RP2040

I'm working with the Raspberry PI Pico to perform the basic task of reading data from a UART signal, modifying it, and writing it back out to a different UART address. However, I need to simultaneously be constantly monitoring an on-board sensor and sending the values it generates as well.
I found a good example at cortexm-threads but it performs some stack allocation like this:
let mut stack1 = [0xDEADBEEF; 512];
let mut stack2 = [0xDEADBEEF; 512];
How do I know (or find out) what memory addresses I can allocate the stacks to on the RP2040/Pico?

In the example, the 0xDEADBEEF will denote the initial per-cell value of stack1 and stack2 arrays, and can be set to anything. Since those arrays are function-local non-constants/non-statics, they will end up in the (main thread) stack.
Just make sure that the arrays are large enough for your use case, otherwise risking a stack overflow (How does a "stack overflow" occur and how do you prevent it?).
Regarding on where those variables end up being located in the device memory space: cortex-m will set the initial SP value to the largest possible memory address (0x20040000 on Pico - RP2040 SRAM is located between 0x20000000 and 0x20040000, sized 256 kB). See https://github.com/rust-embedded/cortex-m/blob/657af97d66b7157d6a6e5704d86dd59b398e7108/cortex-m-rt/link.x.in#L63. Thereby, the location of those variables will be close to the end of SRAM. See also https://docs.rust-embedded.org/embedonomicon/memory-layout.html
Regarding the multicore use case, see also https://github.com/rp-rs/rp-hal/blob/427344667e9f24f03d132fa08e2dfaa709bc805d/rp2040-hal/src/multicore.rs.
You could also achieve the similar functionality (but using only one core) with interrupt-driven approach, where you store each incoming UART-byte into a circular buffer, handle on-board sensor read on a (either DMA/timer) interrupt, and process the circular buffer (and possibly read sensor value) contents in the idle loop. For more information, see https://en.wikipedia.org/wiki/Circular_buffer and https://rtic.rs/,

Getting the percentage of used space and used inodes in a mount

I need to calculate the percentage of used space and used inodes for a mount path (e.g. /mnt/mycustommount) in Go.
This is my attempt:
var statFsOutput unix.Statfs_t
err := unix.Statfs(mount_path, &statFsOutput)
if err != nil {
return err
}
totalBlockCount := statFsOutput.Blocks // Total data blocks in filesystem
freeSystemBlockCount = statFsOutput.Bfree // Free blocks in filesystem
freeUserBlockCount = statFsOutput.Bavail // Free blocks available to unprivileged user
Now the proportion I need would be something like this:
x : 100 = (totalBlockCount - free[which?]BlockCount) : totalBlockCount
i.e. x : 100 = usedBlockCount : totalBlockCount . But I don't understand the difference between Bfree and Bavail (what's 'unprivileged' user go to do with filesystem blocks?).
For inodes my attempt:
totalInodeCount = statFsOutput.Files
freeInodeCount = statFsOutput.Ffree
// so now the proportion is
// x : 100 = (totalInodeCount - freeInodeCount) : totalInodeCount
How to get the percentage for used storage?
And is the inodes calculation I did correct?

Your comment expression isn't valid Go, so I can't really interpret it without guessing. With guessing, I interpret it as correct, but have I guessed what you actually mean, or merely what I think you mean? In other words, without showing actual code, I can only imagine what your final code will be. If the code I imagine isn't the actual code, the correctness of the code I imagine you will write is irrelevant.
That aside, I can answer your question here:
(what's 'unprivileged' user go to do with filesystem blocks?)
The Linux statfs call uses the same fields as 4.4BSD. The default 4.4BSD file system (the one called the "fast file system") uses a blocks-with-fragmentation approach to allocate blocks in a sort of stochastic manner. This allocation process works very well on an empty file system, and continues to work well, without extreme slowdown, on somewhat-full file systems. Computerized modeling of its behavior, however, showed pathological slowdowns (amounting to linear search, more or less) were possible if the block usage exceeded somewhere around 90%.
(Later, analysis of real file systems found that the slowdowns generally did not hit until the block usage exceeded 95%. But the idea of a 10% "reserve" was pretty well established by then.)
Hence, if a then-popular large-size disk drive of 400 MB1 gave 10% for inodes and another 10% for reserved blocks, that meant that ordinary users could allocate about 320 MB of file data. At that point the drive was "100% full", but it could go to 111% by using up the remaining blocks. Those blocks were reserved to the super-user though.
These days, instead of a "super user", one can have a capability that can be granted or revoked. However, these days we don't use the same file systems either. So there may be no difference between bfree and bavail on your system.
1Yes, the 400 MB Fujitsu Eagle was a large (in multiple senses: it used a 19 inch rack mount setup) drive back then. People are spoiled today with their multi-terabyte SSDs. 😀

realloc function gives SIGABRT due to limited heap size

I am trying to reproduc a problem .
My c code giving SIGABRT , i traced it back to this line number :3174
https://elixir.bootlin.com/glibc/glibc-2.27/source/malloc/malloc.c
/* Little security check which won't hurt performance: the allocator
never wrapps around at the end of the address space. Therefore
we can exclude some size values which might appear here by
accident or by "design" from some intruder. We need to bypass
this check for dumped fake mmap chunks from the old main arena
because the new malloc may provide additional alignment. */
if ((__builtin_expect ((uintptr_t) oldp > (uintptr_t) -oldsize, 0)
|| __builtin_expect (misaligned_chunk (oldp), 0))
&& !DUMPED_MAIN_ARENA_CHUNK (oldp))
malloc_printerr ("realloc(): invalid pointer");
My understanding is that when i call calloc function memory get allocated when I call realloc function and try to increase memory area ,heap is not available for some reason giving SIGABRT
My another question is, How can I limit the heap area to some bytes say, 10 bytes to replicate the problem. In stackoverflow RSLIMIT and srlimit is mentioned but no sample code is mentioned. Can you provide sample code where heap size is 10 Bytes ?

How can I limit the heap area to some bytes say, 10 bytes
Can you provide sample code where heap size is 10 Bytes ?
From How to limit heap size for a c code in linux , you could do:
You could use (inside your program) setrlimit(2), probably with RLIMIT_AS (as cited by Ouah's answer).
#include <sys/resource.h>
int main() {
setrlimit(RLIMIT_AS, &(struct rlimit){10,10});
}
Better yet, make your shell do it. With bash it is the ulimit builtin.
$ ulimit -v 10
$ ./your_program.out
to replicate the problem
Most probably, limiting heap size will result in a different problem related to heap size limit. Most probably it is unrelated, and will not help you to debug the problem. Instead, I would suggest to research address sanitizer and valgrind.

Linux `top` command: how much process memory is physically stored in swap space?

Let's say I run my program on a 64-bit Linux machine with 64 Gb of RAM. In my very small C program immediately after the start I do
void *p = sbrk(1024ull * 1024 * 1024 * 120);
this moving my data segment break forward by 120 Gb.
After the above sbrk call top entry for my process shows RES at some low value, VIRT at 120g, and SWAP at 120g.
After this operation I write something into the first 90 Gb of the above region
memset(p, 0xAB, 1024ull * 1024 * 1024 * 90);
This causes some changes in the top entry for my process: VIRT expectedly remains at 120g, RES becomes almost 64g, SWAP drops to around 56g.
The common Swap stats in the header of top output show that swap file usage increases, which is expected since my program will have to push about 26 Gb of memory pages into the swap file.
So, according to the above observations, SWAP column simply reports my process's non-RES address space regardless of whether this address space has been "materialized", i.e. regardless of whether I already wrote something into that region of virtual memory.
But is there any way to figure out how much of that SWAP size has actually been "materialized" and backed up by something stored in the swap file? I.e. is there any way to make top to display that 26 Gb value for my process?

The behavior depends on a version of procps you are using. For instance, in version 3.0.5 SWAP value equals:
task->size - task->resident
and it is exactly what you are encountering. Man top.1 says:
VIRT = SWAP + RES
Procps-ng, however, reads /proc/pid/status and sets SWAP correctly
https://gitlab.com/procps-ng/procps/blob/master/proc/readproc.c#L383
So, you can update procps or look at /proc/pid/status directly

malloc/realloc/free capacity optimization

When you have a dynamically allocated buffer that varies its size at runtime in unpredictable ways (for example a vector or a string) one way to optimize its allocation is to only resize its backing store on powers of 2 (or some other set of boundaries/thresholds), and leave the extra space unused. This helps to amortize the cost of searching for new free memory and copying the data across, at the expense of a little extra memory use. For example the interface specification (reserve vs resize vs trim) of many C++ stl containers have such a scheme in mind.
My question is does the default implementation of the malloc/realloc/free memory manager on Linux 3.0 x86_64, GLIBC 2.13, GCC 4.6 (Ubuntu 11.10) have such an optimization?
void* p = malloc(N);
... // time passes, stuff happens
void* q = realloc(p,M);
Put another way, for what values of N and M (or in what other circumstances) will p == q?

From the realloc implementation in glibc trunk at http://sources.redhat.com/git/gitweb.cgi?p=glibc.git;a=blob;f=malloc/malloc.c;h=12d2211b0d6603ac27840d6f629071d1c78586fe;hb=HEAD
First, if the memory has been obtained via mmap() instead of sbrk(), which glibc malloc does for large requests, >= 128 kB by default IIRC:
if (chunk_is_mmapped(oldp))
{
void* newmem;
#if HAVE_MREMAP
newp = mremap_chunk(oldp, nb);
if(newp) return chunk2mem(newp);
#endif
/* Note the extra SIZE_SZ overhead. */
if(oldsize - SIZE_SZ >= nb) return oldmem; /* do nothing */
/* Must alloc, copy, free. */
newmem = public_mALLOc(bytes);
if (newmem == 0) return 0; /* propagate failure */
MALLOC_COPY(newmem, oldmem, oldsize - 2*SIZE_SZ);
munmap_chunk(oldp);
return newmem;
}
(Linux has mremap(), so in practice this is what is done).
For smaller requests, a few lines below we have
newp = _int_realloc(ar_ptr, oldp, oldsize, nb);
where _int_realloc is a bit big to copy-paste here, but you'll find it starting at line 4221 in the link above. AFAICS, it does NOT do the constant factor optimization increase that e.g. the C++ std::vector does, but rather allocates exactly the amount requested by the user (rounded up to the next chunk boundaries + alignment stuff and so on).
I suppose the idea is that if the user wants this factor of 2 size increase (or any other constant factor increase in order to guarantee logarithmic efficiency when resizing multiple times), then the user can implement it himself on top of the facility provided by the C library.

Perhaps you can use malloc_usable_size (google for it) to find the answer experimentally. This function, however, seems undocumented, so you will need to check out if it is still available at your platform.
See also How to find how much space is allocated by a call to malloc()?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string