How can I get the ppid from a oom killed process? - linux

I’m using docker, there are more than ten containers running in my machine. I found some oom log in /var/log/message, but I can't figure out those killed process belong which container.
/var/log/message log as below:
kernel: Out of memory: Kill process 165480 (java) score 987 or
sacrifice child

Related

memcheck-amd64- killed by OOM

I'm using Valgrind to correct a segmentation fault in my code, but when the run arrives to the segmentation fault point my process is killed.
Searching in /var/log/syslog file, I can see that the process memcheck-amd64- (valgrind?) has been killed:
Sep 7 12:48:34 fabiano-HP kernel: [10154.654505] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-c2.scope,task=memcheck-amd64-,pid=3688,uid=1000
Sep 7 12:48:34 fabiano-HP kernel: [10154.654560] Out of memory: Killed process 3688 (memcheck-amd64-) total-vm:11539708kB, anon-rss:6503332kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:12952kB oom_score_adj:0
Sep 7 12:46:26 fabiano-HP org.freedesktop.thumbnails.Cache1[3661]: message repeated 3 times: [ libpng error: Read Error]
Sep 7 12:48:34 fabiano-HP systemd[1]: session-c2.scope: A process of this unit has been killed by the OOM killer.
Now, the problem is that Valgrind doesn't write the output file, so I can't understand what's going on... how can I avoid this? I mean, what's happening?
EDIT:
I'm running Valgrind with this command valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --log-file=valgrind-out.txt ./mainCPU -ellpack

How to kill locked Node process in WSL

Running my web application in WSL is occasionally getting stuck, I am able to close the script. But the process in the background is stuck and has my files locked.
Detailed info
The Web application is running with webpack dev server listening on changes in the code. When doing git operations, sometimes the files are locked and I cannot perform changes.
I can see the process by running
$ps aux
The process is taking a lot of memory.
I tried killing the process with
$kill -9 604
$pkill -f node
$kill -SIGKILL 604
But none of them works
I even try to kill the process from Task Manager but its still there.
(Windows Subsystem for Linux running on Windows 10)
Hello same problem on win11
sudo kill -9 2769
mike 2769 1.5 5.7 13169872 1914268 ? Z 11:54 7:42 /home/mike/.nvm/versions/node/v14.17.4/bin/node /mnt/d/dev/repo/mikecodeur/react-course-app/example/react-fundamentals/node_modules/react-scripts/scripts/start.js
mike 2907 0.1 0.0 0 0 ? Z 11:54 0:36 [node] <defunct>

sudo ./jetty Stop or Start Failure

The jetty on our linux server is not installed as a service as we have multiple jetty servers on different ports. And we use command./jetty.sh stop and ./jetty.sh start to stop and start jetty.
However, when I add sudo to the command, the server never stop/start successfully. When I run sudo ./jetty.sh stop, it shows
Stopping Jetty: start-stop-daemon: warning: failed to kill 18772: No such process
1 pids were not killed
No process in pidfile '/var/run/jetty.pid' found running; none killed.
and the server was not stopped.
When I run sudo ./jetty.sh start, it shows
Starting Jetty: FAILED Tue Apr 23 23:07:15 CST 2019
How could this happen? From my understanding. Using sudo gives you more power and privilege to run commands. If you can successfully execute without sudo, then the command should never fail with sudo, since it only grants superuser privilege.
As a user it uses $HOME.
As root it uses system paths.
The error you got ..
Stopping Jetty: start-stop-daemon: warning: failed to kill 18772: No such process
1 pids were not killed
No process in pidfile '/var/run/jetty.pid' found running; none killed.
... means that there was a bad pid file sitting around for a process that no longer exists.
Short answer, the processing is different if you are root (a service) vs a user (just an application).

cgroup limit reached - no space left on device

We have two servers running ubuntu 14.04 using docker. Every other month when starting or building a container we get the message:
container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused
\"mkdir /sys/fs/cgroup/memory/docker/cf657a58a1382e62976b4d339946f07e8a40f22f18b52822f884834f78830806: no space left on device\""
The disks have still lots of space but cat /proc/cgroups gives this: (num_cgroups keeps increasing)
#subsys_name hierarchy num_cgroups enabled
cpuset 1 65805 1
cpu 2 65807 1
cpuacct 3 65803 1
blkio 4 65803 1
memory 5 65535 1
devices 6 65805 1
freezer 7 65803 1
net_cls 8 65803 1
perf_event 9 65803 1
net_prio 10 65803 1
hugetlb 11 65803 1
Restarting the server always helped so far but we don't want to restart a server every few months.
So I started some research and found a directory in the /sys/fs/cgroup/*/user path.
/sys/fs/cgroup/systemd/user/998.user is itself holding 65662 subdirectories. All named somewhat like 36309.session (the number increases)
Is there a ways to see what process is creating those cgroups?
I thought it was process 998, but that doesn't even exists.
I ran into this same problem with AWS Batch. I have no solution but I found this discussion https://github.com/moby/moby/issues/29638. It seems that the problem is some kind of leak in kernel and/or Docker.
I encountered the same issue. You probably have a lot of dangling images/containers
which is causing the cgroup of docker to run out of space. check it by:
docker images -a
docker ps -a
You need to clean it up. One solution is to remove all images/containers/etc that are not being used at the moment:
docker system prune -a

Cassandra won't start in linux as a service

I have a debian linux image running on Google compute. Can successfully get cassandra working with "sudo cassandra" or "sudo cassandra -f" but then as soon as I log off this stops working. But when I try to run this as a service it simply doesnt say anything and doesnt start it either! I installed it using the aptget package v2.1.
I've tried sudo service cassandra start. It looks like its doing something and then quits without any logs.
Please help me run this up as a service. I can't even locate where the logs are stored when I run it as a service.
I ran into this issue recently, and as BrianC indicated it can be an out of memory condition. In my case I could successfully start cassandra with sudo cassandra -f but not with /etc/init.d/cassandra start.
For me, the last log entry in /var/log/cassandra/system.log when starting as a service was:
INFO [main] 2015-04-30 10:58:16,234 CassandraDaemon.java (line 248) Classpath: /etc/cassandra:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.0.14.jar:/usr/share/cassandra/apache-cassandra-thrift-2.0.14.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar::/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar
And nothing afterwards. If it is a memory problem you should be able to verify this in your syslog. If if contains something like:
Apr 30 10:53:39 dev kernel: [1173246.957818] Out of memory: Kill process 8229 (java) score 132 or sacrifice child
Apr 30 10:53:39 dev kernel: [1173246.957831] Killed process 8229 (java) total-vm:634084kB, anon-rss:286772kB, file-rss:12676kB
Increase your ram. In my case I increased it to 2GB and it started fine.

Resources