How can I reduce the virtual memory required by gccgo compiled executable? - linux

When I compile this simple hello world example using gccgo, the resulting executable uses over 800 MiB of VmData. I would like to know why, and if there is anything I can do to lower that. The sleep is just to give me time to observe the memory usage.
The source:
package main
import (
"fmt"
"time"
)
func main() {
fmt.Println("hello world")
time.Sleep(1000000000 * 5)
}
The script I use to compile:
#!/bin/bash
TOOLCHAIN_PREFIX=i686-linux-gnu
OPTIMIZATION_FLAG="-O3"
CGO_ENABLED=1 \
CC=${TOOLCHAIN_PREFIX}-gcc-8 \
CXX=${TOOLCHAIN_PREFIX}-g++-8 \
AR=${TOOLCHAIN_PREFIX}-ar \
GCCGO=${TOOLCHAIN_PREFIX}-gccgo-8 \
CGO_CFLAGS="-g ${OPTIMIZATION_FLAG}" \
CGO_CPPFLAGS="" \
CGO_CXXFLAGS="-g ${OPTIMIZATION_FLAG}" \
CGO_FFLAGS="-g ${OPTIMIZATION_FLAG}" \
CGO_LDFLAGS="-g ${OPTIMIZATION_FLAG}" \
GOOS=linux \
GOARCH=386 \
go build -x \
-compiler=gccgo \
-gccgoflags=all="-static -g ${OPTIMIZATION_FLAG}" \
$1
The version of gccgo:
$ i686-linux-gnu-gccgo-8 --version
i686-linux-gnu-gccgo-8 (Ubuntu 8.2.0-1ubuntu2~18.04) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The output from /proc/<pid>/status:
VmPeak: 811692 kB
VmSize: 811692 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 5796 kB
VmRSS: 5796 kB
VmData: 807196 kB
VmStk: 132 kB
VmExe: 2936 kB
VmLib: 0 kB
VmPTE: 52 kB
VmPMD: 0 kB
VmSwap: 0 kB
I ask because my device only has 512 MiB of RAM. I know that this is virtual memory but I would like to reduce or remove the overcommit if possible. It does not seem reasonable to me for a simple executable to require that much allocation.

I was able to locate where gccgo is asking for so much memory. It's in the libgo/go/runtime/malloc.go file in the mallocinit function:
// If we fail to allocate, try again with a smaller arena.
// This is necessary on Android L where we share a process
// with ART, which reserves virtual memory aggressively.
// In the worst case, fall back to a 0-sized initial arena,
// in the hope that subsequent reservations will succeed.
arenaSizes := [...]uintptr{
512 << 20,
256 << 20,
128 << 20,
0,
}
for _, arenaSize := range &arenaSizes {
// SysReserve treats the address we ask for, end, as a hint,
// not as an absolute requirement. If we ask for the end
// of the data segment but the operating system requires
// a little more space before we can start allocating, it will
// give out a slightly higher pointer. Except QEMU, which
// is buggy, as usual: it won't adjust the pointer upward.
// So adjust it upward a little bit ourselves: 1/4 MB to get
// away from the running binary image and then round up
// to a MB boundary.
p = round(getEnd()+(1<<18), 1<<20)
pSize = bitmapSize + spansSize + arenaSize + _PageSize
if p <= procBrk && procBrk < p+pSize {
// Move the start above the brk,
// leaving some room for future brk
// expansion.
p = round(procBrk+(1<<20), 1<<20)
}
p = uintptr(sysReserve(unsafe.Pointer(p), pSize, &reserved))
if p != 0 {
break
}
}
if p == 0 {
throw("runtime: cannot reserve arena virtual address space")
}
The interesting part is that it falls back to smaller arena sizes if larger ones fail. So limiting the virtual memory available to a go executable will actually limit how much it will successfully allocate.
I was able to use ulimit -v 327680 to limit the virtual memory to smaller numbers:
VmPeak: 300772 kB
VmSize: 300772 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 5712 kB
VmRSS: 5712 kB
VmData: 296276 kB
VmStk: 132 kB
VmExe: 2936 kB
VmLib: 0 kB
VmPTE: 56 kB
VmPMD: 0 kB
VmSwap: 0 kB
These are still big numbers, but the best that a gccgo executable can achieve. So the answer to the question is, yes you can reduce the VmData of a gccgo compiled executable, but you really shouldn't worry about it. (On a 64 bit machine gccgo tries to allocate 512 GB.)

The likely cause is that you are linking libraries into the code. My guess is that you'd be able to get a smaller logical address space if you were to explicitly link to static libraries so that you get the minimum added to your executable. In any event, there is minimum harm in having a large logical address space.

Related

While running, is it possible to display the currently allocated buffers managed by LeakSanitizer?

I've a very large program (okay, only 13,000 lines of code according to cloc) which leaks. I know because over time, it uses more and more resident memory.
I have the sanitizer option turned on, but on a clean exit, all my C++ software will properly clean everything as expected. So I don't see anything growing in the sanitizer output.
What would be useful in this case, is a way to call a function which displays the (large) list of allocated buffers while running the code. I can then look at a diff of two such outputs and see what was allocated anew. The leaked buffers will be in there...
At this point, though, I just don't see any header with sanitizer functions I could call to see such a list. Does it exist?
Lsan interface is available in sanitizer/lsan_interface.h but AFAIK it has no API to print allocation info. The best you can get is compile your code with Asan (which includes Lsan as well) and use __asan_print_accumulated_stats to get basic allocation statistics:
$ cat tmp.c
#include <sanitizer/asan_interface.h>
#include <stdlib.h>
int main() {
malloc(100);
__asan_print_accumulated_stats();
return 0;
}
$ gcc -fsanitize=address -g tmp.c && ./a.out
Stats: 0M malloced (0M for red zones) by 2 calls
Stats: 0M realloced by 0 calls
Stats: 0M freed by 0 calls
Stats: 0M really freed by 0 calls
Stats: 0M (0M-0M) mmaped; 5 maps, 0 unmaps
mallocs by size class: 7:1; 11:1;
Stats: malloc large: 0
Stats: StackDepot: 2 ids; 0M allocated
Stats: SizeClassAllocator64: 0M mapped in 256 allocations; remains 256
07 (112): mapped: 64K allocs: 128 frees: 0 inuse: 128 num_freed_chunks 457 avail: 585 rss: 4K releases: 0
11 (176): mapped: 64K allocs: 128 frees: 0 inuse: 128 num_freed_chunks 244 avail: 372 rss: 4K releases: 0
Stats: LargeMmapAllocator: allocated 0 times, remains 0 (0 K) max 0 M; by size logs:
=================================================================
==15060==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 100 byte(s) in 1 object(s) allocated from:
#0 0x7fdf2194fb40 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdeb40)
#1 0x559ca08a7857 in main /home/yugr/tmp.c:5
#2 0x7fdf214a1bf6 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)
SUMMARY: AddressSanitizer: 100 byte(s) leaked in 1 allocation(s).
Unfortunately there is no way to print exact allocations.

Memory still used after process exits

I have a question about a weird occurrence, I am on a server with 1GB ram, I run only a single application that is lightweight, after a while its memory usage increases (probably a memory leak I did not find yet), but that's another story, the problem is, when it gets too much memory used I exit the process, to my surprise the memory usage is still high, how could that be? The memory is not cached, it is "raw" memory being used, but htop does not seem to know which process is using it either ...
Here I attach the image so you can see it, it is sorted by Memory usage descending.
I don't understand how memory usage is 751MB if the process that uses the most memory is using 1.8%.
I have read some solutions such as disabling swap, but swap is disabled as seen in the image.
Update 1
Here I attach the meminfo:
MemTotal: 1004852 kB
MemFree: 108456 kB
MemAvailable: 97392 kB
Buffers: 2768 kB
Cached: 31868 kB
SwapCached: 0 kB
Active: 503268 kB
Inactive: 15100 kB
Active(anon): 491920 kB
Inactive(anon): 2852 kB
Active(file): 11348 kB
Inactive(file): 12248 kB
Unevictable: 18516 kB
Mlocked: 18516 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 20 kB
Writeback: 0 kB
AnonPages: 502288 kB
Mapped: 28200 kB
Shmem: 2856 kB
KReclaimable: 96860 kB
Slab: 308536 kB
SReclaimable: 96860 kB
SUnreclaim: 211676 kB
KernelStack: 5676 kB
PageTables: 11836 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 502424 kB
Committed_AS: 808208 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 12656 kB
VmallocChunk: 0 kB
Percpu: 1772 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 501740 kB
DirectMap2M: 546816 kB
There is an interesting solution about freeing cached memory on https://unix.stackexchange.com/questions/524846/why-does-my-system-use-more-ram-after-an-hour-of-usage but here it does not work, the problem is not the cached memory ...
Update 2
Someone I know told me that was weird to have that many sshd processes, I gave it a try killing all those root#notty sshd processes and it worked. Problem solved, I leave this question here hoping it may be useful for others.
Someone I know told me that was weird to have that many sshd processes, I gave it a try killing all those root#notty sshd processes and it worked. Problem solved, I leave this question here hoping it may be useful for others.
To simplify the process I used the following Golang Code:
import (
"github.com/shirou/gopsutil/v3/process"
"fmt"
"strings"
)
func main() {
processes, err := process.Processes()
if err != nil {
panic(err)
}
sshCount := 0
for i := 0; i < len(processes); i++ {
currentProcess := processes[i]
name, err := currentProcess.Name()
if err != nil {
fmt.Printf("Failed to retrieve the name of process %d\n", currentProcess.Pid)
continue
}
// username, err := currentProcess.Username()
cmdLine, err := currentProcess.Cmdline()
// cwd, _ := currentProcess.Cwd()
// terminalStr, _ := currentProcess.Terminal()
if strings.Contains(name, "sshd") && cmdLine == "sshd: root#notty" {
sshCount++
err = currentProcess.Terminate()
if err != nil {
fmt.Printf("Error on terminating %d %s\n", currentProcess.Pid, name)
err = currentProcess.Kill()
if err != nil {
fmt.Printf("Error on Killing %d %s\n", currentProcess.Pid, name)
} else {
fmt.Printf("Killed %d %s\n", currentProcess.Pid, name)
}
} else {
fmt.Printf("Terminated successfully %d %s\n", currentProcess.Pid, name)
}
}
}
fmt.Printf("We got %d sshd processes\n", sshCount)
}

Reservation of hugepages on a Linux system

My VMWare guest system details:
Linux 2.6.32-358.el6.x86_64 (RH 6.4 - Santiago)
# cat /proc/meminfo
MemTotal: 8058796 kB
MemFree: 5145692 kB
Buffers: 32320 kB
Cached: 291312 kB
SwapCached: 0 kB
Active: 1524652 kB
Inactive: 192444 kB
Active(anon): 1393628 kB
Inactive(anon): 1196 kB
Active(file): 131024 kB
Inactive(file): 191248 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4063224 kB
SwapFree: 4063224 kB
Dirty: 144 kB
Writeback: 0 kB
AnonPages: 1393488 kB
Mapped: 47288 kB
Shmem: 1364 kB
Slab: 52080 kB
SReclaimable: 18572 kB
SUnreclaim: 33508 kB
KernelStack: 3776 kB
PageTables: 15864 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 4063224 kB
Committed_AS: 3101408 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 160248 kB
VmallocChunk: 34359572656 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 512
HugePages_Free: 240
HugePages_Rsvd: 240
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 10240 kB
DirectMap2M: 8378368 kB
grub contents:
transparent_hugepage=never default_hugepagesz=2M hugepagesz=2M hugepages=512
# sysctl -a | grep vm
vm.overcommit_memory = 0
vm.panic_on_oom = 0
vm.oom_kill_allocating_task = 0
vm.extfrag_threshold = 500
vm.oom_dump_tasks = 1
vm.would_have_oomkilled = 0
vm.overcommit_ratio = 0
vm.page-cluster = 3
vm.dirty_background_ratio = 10
vm.dirty_background_bytes = 0
vm.dirty_ratio = 20
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000
vm.nr_pdflush_threads = 0
vm.swappiness = 60
vm.nr_hugepages = 512
vm.nr_hugepages_mempolicy = 512
vm.hugetlb_shm_group = 0
vm.hugepages_treat_as_movable = 0
vm.nr_overcommit_hugepages = 0
vm.lowmem_reserve_ratio = 256 256 32
vm.drop_caches = 0
vm.min_free_kbytes = 2048
vm.extra_free_kbytes = 0
vm.percpu_pagelist_fraction = 0
vm.max_map_count = 65530
vm.laptop_mode = 0
vm.block_dump = 0
vm.vfs_cache_pressure = 100
vm.legacy_va_layout = 0
vm.zone_reclaim_mode = 0
vm.min_unmapped_ratio = 1
vm.min_slab_ratio = 5
vm.stat_interval = 1
vm.mmap_min_addr = 4096
vm.numa_zonelist_order = default
vm.scan_unevictable_pages = 0
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
My application needs to grab as many huge pages as it can when it starts, but the fact that HugePages_Free equals HugePages_Rsvd means that it is unable to reserve any.
What is the reason for this, and how do I disable the reservation of hugepages by other applications, if there are any?
Thanks
What is the reason for this, and how do I disable the reservation of hugepages by other applications, if there are any?
The reason for this is that Linux uses Huge Pages for allocations and memory mappings. Please have a look at mmmap(2) (MAP_HUGETLB flag), madvice(2) (MADV_HUGEPAGE flag) and transparent huge pages:
https://www.kernel.org/doc/Documentation/vm/transhuge.txt
So basically the reason for this is that any application running on your system might ask system to use huge pages or even the Linux itself might place your allocations to huge pages transparently.
So to disable all those implicit usages, you might look into /sys/kernel/mm/transparent_hugepage/enabled
On the other hand, you might consider to allocate huge pages right before your application start. Run the system without any huge pages, then configure them with:
echo 512 > /proc/sys/vm/nr_hugepages
And run your application immediately. This will increase the chance the huge pages will be available solely for your application.

How to Get Free Swap Memory for Matrix Computation in Linux Matlab?

Situation: estimate if you can compute big matrix with your Ram and Swap in Linux Matlab
I need the sum of Mem and Swap, corresponding values by free -m under Heading total in Linux
total used free shared buff/cache available
Mem: 7925 3114 3646 308 1164 4220
Swap: 28610 32 28578
Free Ram memory in Matlab by
% http://stackoverflow.com/a/12350678/54964
[r,w] = unix('free | grep Mem');
stats = str2double(regexp(w, '[0-9]*', 'match'));
memsize = stats(1)/1e6;
freeRamMem = (stats(3)+stats(end))/1e6;
Free Swap memory in Matlab: ...
Relation between Memory requirement and Matrix size of Matlab: ...
Testing Suever's 2nd iteration
Suever's command gives me 29.2 GB that is corresponding to free's output so correct
$ free
total used free shared buff/cache available
Mem: 8115460 4445520 1956672 350692 1713268 3024604
Swap: 29297656 33028 29264628
System: Linux Ubuntu 16.04 64 bit
Linux kernel: 4.6
Linux kernel options: wl, zswap
Matlab: 2016a
Hardware: Macbook Air 2013-mid
Ram: 8 GB
Swap: 28 Gb on SSD (set up like in the thread How to Allocate More Space to Swap and Increase its Size Greater than Ram?)
SSD: 128 GB
You can just make a slight modification to the code that you've posted to get the swap amount.
function freeMem = freeMemory(type)
[r, w] = unix(['free | grep ', type]);
stats = str2double(regexp(w, '[0-9]*', 'match'));
memsize = stats(1)/1e6;
if numel(stats) > 3
freeMem = (stats(3)+stats(end))/1e6;
else
freeMem = stats(3)/1e6;
end
end
totalFree = freeMemory('Mem') + freeMemory('Swap')
To figure out how much memory a matrix takes up, use the size of the datatype and multiply by the number of elements as a first approximation.

(linux) non-dynamic struct array use Rss memory when memset use; why?

here is my code
struct test oops[4][2][3][40960]; // global struct (maybe .data section)
...
{
...
//memset(oops, 0, sizeof(struct test) * 40960 * 3 * 2 * 4);
...
}
have question.
when i use memset,
cat /proc/PID/smaps
...
Size: 756480 kB
Rss: 721208 kB
Pss: 721208 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 721208 kB
Referenced: 361252 kB
Anonymous: 721208 kB
AnonHugePages: 6144 kB
Swap: 35272 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
...
but, not use memset
Size: 756480 kB
Rss: 2048 kB
Pss: 2048 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 2048 kB
Referenced: 2048 kB
Anonymous: 2048 kB
AnonHugePages: 2048 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
why memset use RSS?
and what is mean Pss, Private_dirty, Referenced, Anonymous ?
global struct maybe set '0', but i want to Explicit initialization.
help me, thanks.
Without memset you never use the memory referenced by "oops".
The linux kernel do not reserve memory pages for you until you access the memory the first time.
Size is the whole "accessible" memory size for your program.
RSS is the memory in your program which really exists (this doesn't mean the sum of all RSS is total memory used RSS also count shared memory).
memset touches every byte in your array, which forces the kernel to reserve the memory for you and put it into the RSS section.
memset don't reserve more memory it just used it the first time.
Additional infos: Getting information about a process' memory usage from /proc/pid/smaps

Resources