I have a container mounted using docker-compose version 2 which has a memory limit on it of 32mb.
Whenever I run the container I can monitor the used resources like so:
docker stats 02bbab9ae853
It shows the following:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
02bbab9ae853 client-web_postgres-client-web_1_e4513764c3e7 0.07% 8.078MiB / 32MiB 25.24% 5.59MB / 4.4MB 135GB / 23.7MB 0
What looks really weird to me is the memory part:
8.078MiB / 32MiB 25.24%
If outside the container I list of Postgres PIDs I get:
$ pgrep postgres
23051, 24744, 24745, 24746, 24747, 24748, 24749, 24753, 24761
If I stop the container and re-run the above command I get no PID.
That is a clear proof that all PID where created by the stopped container.
Now, if I re-run the container and get every PID and I calculate its RSS memory usage and I sum it together with a python method, I don't get ~8Mb docker is telling me but a much higher value not even close to it (like ~100Mb or so).
This is the python method I'm using to calculate the RSS memory:
def get_process_memory(name):
total = 0.0
try:
for pid in map(int, check_output(["pgrep",name]).split()):
total += psutil.Process(pid).memory_info().rss
except Exception as e:
pass
return total
Does anybody know why the memory declared by docker is so different?
This is of course a problem for me because the memory limit applied doesn't look respected.
I'm using a Raspberry PI.
That's because Docker is reporting only RSS from cgroups memory.stats, but you actually need to sum up cache, rss and swap (https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt). More info about that in https://sysrq.tech/posts/docker-misleading-containers-memory-usage/
Related
I am working on a per process memory monitoring (Bash) script but it turns out to be more of a headache than I thought. Especially on forked processes such as PostgreSQL. There are a couple of reasons:
RSS is a potential value to be used as memory usage, however this also contains shared libraries etc which are used in other processes
PSS is another potential value which (should) show only the private memory of a process. Problem here is that PSS can only be retrieved from /proc//smaps which requires elevated capability privileges (or root)
USS (calculated as Private_Dirty + Private_Clean, source How does smem calculate RSS, USS and PSS?) could also be a potential candidate but here again we need access to /proc//smaps
For now I am trying to solve the forked process problem by looping through each PID's smaps (as suggested in https://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/), for example:
for pid in $(pgrep -a -f "postgres" | awk '{print $1}' | tr "\n" " " ); do grep "^Pss:" /proc/$pid/smaps; done
Maybe some of the postgres processes should be excluded, I am not sure.
Using this method to calculate and sum the PSS and USS values, resulting in:
PSS: 4817 MB - USS: 4547 MB - RES: 6176 MB - VIRT: 26851 MB used
Obviously this only works with elevated privileges, which I would prefer to avoid. If these values actually represent the truth is not known because other tools/commands show yet again different values.
Unfortunately top and htop are unable to combine the postgres processes. atop is able to do this and seems to be (from a feeling) the most accurate with the following values:
NPROCS SYSCPU USRCPU VSIZE RSIZE PSIZE SWAPSZ RDDSK WRDSK RNET SNET MEM CMD 1/1
27 56m50s 16m40s 5.4G 1.1G 0K 2308K 0K 0K 0 0 11% postgres
Now to the question: What is the suggested and best way to retrieve the most accurate memory usage of an application with forked processes, such as PostgreSQL?
And in case atop already does an accurate calculation, how does atop get the to RSIZE value? Note that this value shown as root and non-root user, which would probably mean that /proc/<pid>/smaps is not used for the calculation.
Please comment if more information is needed.
EDIT: I actually found a bug in my pgrep pattern in my final script and it falsely parsed a lot more than just the postgres processes.
The new output now shows the same RES value as seen in atop RSIZE:
Script output:
PSS: 205 MB - USS: 60 MB - RES: 1162 MB - VIRT: 5506 MB
atop summarized postgres output:
NPROCS SYSCPU USRCPU VSIZE RSIZE PSIZE SWAPSZ RDDSK WRDSK RNET SNET MEM CMD
27 0.04s 0.10s 5.4G 1.1G 0K 2308K 0K 32K 0 0 11% postgres
But the question remains of course. Unless I am now using the most accurate way with the summarized RSS (RES) memory value. Let me know your thoughts, thanks :)
I've consumer-applications which reads (no-write) the database of size ~4GiB and performs some tasks. To make sure same database is not duplicated across applications, I've stored it on all node machines of k8s-cluster.
daemonset
I've used one daemonset which is using "hostpath" volume. The daemonset pod extracts the database on each node machine (/var/lib/DATABASE).
For health-check of daemonset pod, I've written the shell script which checks the modification time of the database file (using date command).
For database extraction, approximately 300MiB memory is required and to perform health-check 50MiB is more than sufficient. Hence I've set the memory-request as 100MiB and memory-limit as 1.5GiB.
When I run the daemonset, I observed memory usage is high ~300MiB for first 10 seconds (to perform database extraction) and after that it goes down to ~30MiB. The daemonset works fine as per my expectation.
Consumer Application
Now, The consumer applications (written in golang) pods are using same "hostPath" volume (/var/lib/DATABASE) and reading the database from that location (/var/lib/DATABASE). This consumer applications does not perform any write operations on /var/lib/DATABASE directory.
However, when I deploy this consumer application on k8s then I see huge increase in memory usage of the daemonset-pod from 30MiB to 1.5GiB. The memory-usage by daemonset-pods is almost same as that of memory-limit.
I am not able to understand this behaviour, why consumer application is causing memory usage of daemonset pod ?
Any help/suggestion/truobleshooting steps would be of great help !!
Note : I'm using 'kubernetes top" command to measure the memory (working-set-bytes).
I've found this link (Kubernetes: in-memory shared cache between pods),
which says
hostPath by itself poses a security risk, and when used, should be scoped to only the required file or directory, and mounted as ReadOnly. It also comes with the caveat of not knowing who will get "charged" for the memory, so every pod has to be provisioned to be able to absorb it, depending how it is written. It also might "leak" up to the root namespace and be charged to nobody but appear as "overhead"
However, I did not find any reference from official k8s documentation. It would be helpful if someone can elaborate on it.
Following are the content of memory.stat file from daemonset pod.
cat /sys/fs/cgroup/memory/memory.stat*
cache 1562779648
rss 1916928
rss_huge 0
shmem 0
mapped_file 0
dirty 0
writeback 0
swap 0
pgpgin 96346371
pgpgout 95965640
pgfault 224070825
pgmajfault 0
inactive_anon 0
active_anon 581632
inactive_file 37675008
active_file 1522688000
unevictable 0
hierarchical_memory_limit 1610612736
hierarchical_memsw_limit 1610612736
total_cache 1562779648
total_rss 1916928
total_rss_huge 0
total_shmem 0
total_mapped_file 0
total_dirty 0
total_writeback 0
total_swap 0
total_pgpgin 96346371
total_pgpgout 95965640
total_pgfault 224070825
total_pgmajfault 0
total_inactive_anon 0
total_active_anon 581632
total_inactive_file 37675008
total_active_file 1522688000
total_unevictable 0
I'm running a node.js application from within a docker container. I'm trying to retrieve system usage metrics of the container the node.js application is running inside of. Right now I'm using https://www.npmjs.com/package/dockerstats but it consistently shows no cpu or memory usage, running docker stats shows usage in each.
My code resembles the following:
let dockerId = setUp.getDockerId();
dockerId.then(dockerId => {
if (dockerId !== null) {
console.log(`dockerId: ${dockerId}`);
dockerstats.dockerContainerStats(dockerId, data => {
console.log(`cpu_percent: ${data.cpu_percent}`);
console.log(`memPercent: ${data.memPercent}`);
console.log(`memUsage: ${data.memUsage}`);
});
}
});
The setUp class resembles the following and uses https://www.npmjs.com/package/docker-container-id:
const getId = require('docker-container-id');
module.exports = class setUp {
getDockerId () {
return getId().then(id => {
if (!id) {
return null;
}
return id;
});
}
}
As you said, you are using the docker-container-id package to obtain the container ID. This package works by inspecting the /proc/self/cgroup file, thus it should work only from inside the container (i.e. only when getContainerId() is executed from the containerized process). That said, further I will assume that you are trying to obtain the metrics from inside the container where your application runs (you did not mentioned this fact explicitly).
The problem here is that, as stated in the dockerstats package description, this package uses Docker API and, as per package source, the client connects to the docker socket (/var/run/docker.sock), which is is not available inside the container by default. The easy (but dangerous) way to workaround this is to mount host's /var/run/docker.sock into the container by using the following option when starting the container:
-v /var/run/docker.sock:/var/run/docker.sock
E.g.
docker run -v /var/run/docker.sock:/var/run/docker.sock $MY_IMAGE_NAME
However, this is STRONGLY DISCOURAGED, as it creates a serious security risk. Never do this in production. By doing so, you are allowing your container to control Docker, which is essentially the same as giving the container root access to the host system.
But you actually don't need to use the Docker API to access resource consumption metrics. The point is that you may directly read the information about process' cpuacct and memory control groups (which are responsible for tracking and limiting the CPU and the memory consumption respectively) from /sys/fs/cgroup. For example, reading the /sys/fs/cgroup/memory/memory.usage_in_bytes file will give you the amount of memory used by your container (in bytes):
# cat /sys/fs/cgroup/memory/memory.usage_in_bytes
164823040
And reading the /sys/fs/cgroup/cpuacct/cpuacct.usage file will give you a total CPU usage of your container (in nanoseconds):
# cat /sys/fs/cgroup/cpuacct/cpuacct.usage
2166331144
So, you can read these metrics from your application and process them. Also you may use statistics from procfs, refer to this discussion for details.
As I do not have enough reputation to comment, I would like to complement the answer from #danila-kiver with a quick way to check the memory usage in Megabytes:
cat /sys/fs/cgroup/memory/memory.usage_in_bytes | awk '{ byte =$1 /1024/1024; print byte " MB" }'
Or in Gigabytes:
cat /sys/fs/cgroup/memory/memory.usage_in_bytes | awk '{ byte =$1 /1024/1024/1024; print byte " GB" }'
For anyone in need.
In Linux, you can read the value of /proc/sys/fs/aio-nr and this returns the total no. of events allocated across all active aio contexts in the system. The max value is controlled by /proc/sys/fs/aio-max-nr.
Is there a way to tell which process is responsible for allocating these aio contexts?
There isn't a simple way. At least, not that I've ever found! However, you can see them being consumed and freed using systemtap.
https://blog.pythian.com/troubleshooting-ora-27090-async-io-errors/
Attempting to execute the complete script in that article produced errors on my Centos 7 system. But, if you just take the first part of it, the part that logs allocations, it may give you enough insight:
stap -ve '
global allocated, allocatedctx
probe syscall.io_setup {
allocatedctx[pid()] += maxevents; allocated[pid()]++;
printf("%d AIO events requested by PID %d (%s)\n",
maxevents, pid(), cmdline_str());
}
'
You'll need to coordinate things such that systemtap is running before your workload kicks in.
Install systemtap, then execute the above command. (Note, I've altered this slightly from the linked article to removed the unused freed symbol.) After a few seconds, it'll be running. Then, start your workload.
Pass 1: parsed user script and 469 library scripts using 227564virt/43820res/6460shr/37524data kb, in 260usr/10sys/263real ms.
Pass 2: analyzed script: 5 probes, 14 functions, 101 embeds, 4 globals using 232632virt/51468res/11140shr/40492data kb, in 80usr/150sys/240real ms.
Missing separate debuginfos, use: debuginfo-install kernel-lt-4.4.70-1.el7.elrepo.x86_64
Pass 3: using cached /root/.systemtap/cache/55/stap_5528efa47c2ab60ad2da410ce58a86fc_66261.c
Pass 4: using cached /root/.systemtap/cache/55/stap_5528efa47c2ab60ad2da410ce58a86fc_66261.ko
Pass 5: starting run.
Then, once your workload starts, you'll see the context requests logged:
128 AIO events requested by PID 28716 (/Users/blah/awesomeprog)
128 AIO events requested by PID 28716 (/Users/blah/awesomeprog)
So, not as simple as lsof, but I think it's all we have!
I am having issues with a large query, that I expect to rely on wrong configs of my postgresql.config. My setup is PostgreSQL 9.6 on Ubuntu 17.10 with 32GB RAM and 3TB HDD. The query is running pgr_dijkstraCost to create an OD-Matrix of ~10.000 points in a network of 25.000 links. Resulting table is thus expected to be very big ( ~100'000'000 rows with columns from, to, costs). However, creating simple test as select x,1 as c2,2 as c3
from generate_series(1,90000000) succeeds.
The query plan:
QUERY PLAN
--------------------------------------------------------------------------------------
Function Scan on pgr_dijkstracost (cost=393.90..403.90 rows=1000 width=24)
InitPlan 1 (returns $0)
-> Aggregate (cost=196.82..196.83 rows=1 width=32)
-> Seq Scan on building_nodes b (cost=0.00..166.85 rows=11985 width=4)
InitPlan 2 (returns $1)
-> Aggregate (cost=196.82..196.83 rows=1 width=32)
-> Seq Scan on building_nodes b_1 (cost=0.00..166.85 rows=11985 width=4)
This leads to a crash of PostgreSQL:
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
normally and possibly corrupted shared memory.
Running dmesg I could trace it down to be an Out of memory issue:
Out of memory: Kill process 5630 (postgres) score 949 or sacrifice child
[ 5322.821084] Killed process 5630 (postgres) total-vm:36365660kB,anon-rss:32344260kB, file-rss:0kB, shmem-rss:0kB
[ 5323.615761] oom_reaper: reaped process 5630 (postgres), now anon-rss:0kB,file-rss:0kB, shmem-rss:0kB
[11741.155949] postgres invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null), order=0, oom_score_adj=0
[11741.155953] postgres cpuset=/ mems_allowed=0
When running the query I also can observe with topthat my RAM is going down to 0 before the crash. The amount of committed memory just before the crash:
$grep Commit /proc/meminfo
CommitLimit: 18574304 kB
Committed_AS: 42114856 kB
I would expect the HDD is used to write/buffer temporary data, when RAM is not enough. But the available space on my hdd does not change during the processing. So I began to dig for missing configs (expecting issues due to my relocated data-directory) and following different sites:
https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
https://www.credativ.com/credativ-blog/2010/03/postgresql-and-linux-memory-management
My original settings of postgresql.conf are default except for changes in the data-directory:
data_directory = '/hdd_data/postgresql/9.6/main'
shared_buffers = 128MB # min 128kB
#huge_pages = try # on, off, or try
#temp_buffers = 8MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
#work_mem = 4MB # min 64kB
#maintenance_work_mem = 64MB # min 1MB
#replacement_sort_tuples = 150000 # limits use of replacement selection sort
#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
#max_stack_depth = 2MB # min 100kB
dynamic_shared_memory_type = posix # the default is the first option
I changed the config:
shared_buffers = 128MB
work_mem = 40MB # min 64kB
maintenance_work_mem = 64MB
Relaunched with sudo service postgresql reload and tested the same query, but found no change in behavior. Does this simply mean, such a large query can not be done? Any help appreciated.
I'm having similar trouble, but not with PostgreSQL (which is running happily): what is happening is simply that the kernel cannot allocate more RAM to the process, whichever process it is.
It would certainly help to add some swap to your configuration.
To check how much RAM and swap you have, run: free -h
On my machine, here is what it returns:
total used free shared buff/cache available
Mem: 7.7Gi 5.3Gi 928Mi 865Mi 1.5Gi 1.3Gi
Swap: 9.4Gi 7.1Gi 2.2Gi
You can clearly see that my machine is quite overloaded: about 8Gb of RAM, and 9Gb of swap, from which 7 are used.
When the RAM-hungry process got killed after Out of memory, I saw both RAM and swap being used at 100%.
So, allocating more swap may improve our problems.