What killed my process and why? - linux

My application runs as a background process on Linux. It is currently started at the command line in a Terminal window.
Recently a user was executing the application for a while and it died mysteriously. The text:
Killed
was on the terminal. This happened two times. I asked if someone at a different Terminal used the kill command to kill the process? No.
Under what conditions would Linux decide to kill my process? I believe the shell displayed "killed" because the process died after receiving the kill(9) signal. If Linux sent the kill signal should there be a message in a system log somewhere that explains why it was killed?

If the user or sysadmin did not kill the program the kernel may have. The kernel would only kill a process under exceptional circumstances such as extreme resource starvation (think mem+swap exhaustion).

Try:
dmesg -T| grep -E -i -B100 'killed process'
Where -B100 signifies the number of lines before the kill happened.
Omit -T on Mac OS.

This looks like a good article on the subject: Taming the OOM killer (1).
The gist is that Linux overcommits memory. When a process asks for more space, Linux will give it that space, even if it is claimed by another process, under the assumption that nobody actually uses all of the memory they ask for. The process will get exclusive use of the memory it has allocated when it actually uses it, not when it asks for it. This makes allocation quick, and might allow you to "cheat" and allocate more memory than you really have. However, once processes start using this memory, Linux might realize that it has been too generous in allocating memory it doesn't have, and will have to kill off a process to free some up. The process to be killed is based on a score taking into account runtime (long-running processes are safer), memory usage (greedy processes are less safe), and a few other factors, including a value you can adjust to make a process less likely to be killed. It's all described in the article in a lot more detail.
Edit: And here is [another article] (2) that explains pretty well how a process is chosen (annotated with some kernel code examples). The great thing about this is that it includes some commentary on the reasoning behind the various badness() rules.

Let me first explain when and why OOMKiller get invoked?
Say you have 512 RAM + 1GB Swap memory. So in theory, your CPU has access to total of 1.5GB of virtual memory.
Now, for some time everything is running fine within 1.5GB of total memory. But all of sudden (or gradually) your system has started consuming more and more memory and it reached at a point around 95% of total memory used.
Now say any process has requested large chunck of memory from the kernel. Kernel check for the available memory and find that there is no way it can allocate your process more memory. So it will try to free some memory calling/invoking OOMKiller (http://linux-mm.org/OOM).
OOMKiller has its own algorithm to score the rank for every process. Typically which process uses more memory becomes the victim to be killed.
Where can I find logs of OOMKiller?
Typically in /var/log directory. Either /var/log/kern.log or /var/log/dmesg
Hope this will help you.
Some typical solutions:
Increase memory (not swap)
Find the memory leaks in your program and fix them
Restrict memory any process can consume (for example JVM memory can be restricted using JAVA_OPTS)
See the logs and google :)

This is the Linux out of memory manager (OOM). Your process was selected due to 'badness' - a combination of recentness, resident size (memory in use, rather than just allocated) and other factors.
sudo journalctl -xb
You'll see a message like:
Jul 20 11:05:00 someapp kernel: Mem-Info:
Jul 20 11:05:00 someapp kernel: Node 0 DMA per-cpu:
Jul 20 11:05:00 someapp kernel: CPU 0: hi: 0, btch: 1 usd: 0
Jul 20 11:05:00 someapp kernel: Node 0 DMA32 per-cpu:
Jul 20 11:05:00 someapp kernel: CPU 0: hi: 186, btch: 31 usd: 30
Jul 20 11:05:00 someapp kernel: active_anon:206043 inactive_anon:6347 isolated_anon:0
active_file:722 inactive_file:4126 isolated_file:0
unevictable:0 dirty:5 writeback:0 unstable:0
free:12202 slab_reclaimable:3849 slab_unreclaimable:14574
mapped:792 shmem:12802 pagetables:1651 bounce:0
free_cma:0
Jul 20 11:05:00 someapp kernel: Node 0 DMA free:4576kB min:708kB low:884kB high:1060kB active_anon:10012kB inactive_anon:488kB active_file:4kB inactive_file:4kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present
Jul 20 11:05:00 someapp kernel: lowmem_reserve[]: 0 968 968 968
Jul 20 11:05:00 someapp kernel: Node 0 DMA32 free:44232kB min:44344kB low:55428kB high:66516kB active_anon:814160kB inactive_anon:24900kB active_file:2884kB inactive_file:16500kB unevictable:0kB isolated(anon):0kB isolated
Jul 20 11:05:00 someapp kernel: lowmem_reserve[]: 0 0 0 0
Jul 20 11:05:00 someapp kernel: Node 0 DMA: 17*4kB (UEM) 22*8kB (UEM) 15*16kB (UEM) 12*32kB (UEM) 8*64kB (E) 9*128kB (UEM) 2*256kB (UE) 3*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 4580kB
Jul 20 11:05:00 someapp kernel: Node 0 DMA32: 216*4kB (UE) 601*8kB (UE) 448*16kB (UE) 311*32kB (UEM) 135*64kB (UEM) 74*128kB (UEM) 5*256kB (EM) 0*512kB 0*1024kB 1*2048kB (R) 0*4096kB = 44232kB
Jul 20 11:05:00 someapp kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jul 20 11:05:00 someapp kernel: 17656 total pagecache pages
Jul 20 11:05:00 someapp kernel: 0 pages in swap cache
Jul 20 11:05:00 someapp kernel: Swap cache stats: add 0, delete 0, find 0/0
Jul 20 11:05:00 someapp kernel: Free swap = 0kB
Jul 20 11:05:00 someapp kernel: Total swap = 0kB
Jul 20 11:05:00 someapp kernel: 262141 pages RAM
Jul 20 11:05:00 someapp kernel: 7645 pages reserved
Jul 20 11:05:00 someapp kernel: 264073 pages shared
Jul 20 11:05:00 someapp kernel: 240240 pages non-shared
Jul 20 11:05:00 someapp kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Jul 20 11:05:00 someapp kernel: [ 241] 0 241 13581 1610 26 0 0 systemd-journal
Jul 20 11:05:00 someapp kernel: [ 246] 0 246 10494 133 22 0 -1000 systemd-udevd
Jul 20 11:05:00 someapp kernel: [ 264] 0 264 29174 121 26 0 -1000 auditd
Jul 20 11:05:00 someapp kernel: [ 342] 0 342 94449 466 67 0 0 NetworkManager
Jul 20 11:05:00 someapp kernel: [ 346] 0 346 137495 3125 88 0 0 tuned
Jul 20 11:05:00 someapp kernel: [ 348] 0 348 79595 726 60 0 0 rsyslogd
Jul 20 11:05:00 someapp kernel: [ 353] 70 353 6986 72 19 0 0 avahi-daemon
Jul 20 11:05:00 someapp kernel: [ 362] 70 362 6986 58 18 0 0 avahi-daemon
Jul 20 11:05:00 someapp kernel: [ 378] 0 378 1621 25 8 0 0 iprinit
Jul 20 11:05:00 someapp kernel: [ 380] 0 380 1621 26 9 0 0 iprupdate
Jul 20 11:05:00 someapp kernel: [ 384] 81 384 6676 142 18 0 -900 dbus-daemon
Jul 20 11:05:00 someapp kernel: [ 385] 0 385 8671 83 21 0 0 systemd-logind
Jul 20 11:05:00 someapp kernel: [ 386] 0 386 31573 153 15 0 0 crond
Jul 20 11:05:00 someapp kernel: [ 391] 999 391 128531 2440 48 0 0 polkitd
Jul 20 11:05:00 someapp kernel: [ 400] 0 400 9781 23 8 0 0 iprdump
Jul 20 11:05:00 someapp kernel: [ 419] 0 419 27501 32 10 0 0 agetty
Jul 20 11:05:00 someapp kernel: [ 855] 0 855 22883 258 43 0 0 master
Jul 20 11:05:00 someapp kernel: [ 862] 89 862 22926 254 44 0 0 qmgr
Jul 20 11:05:00 someapp kernel: [23631] 0 23631 20698 211 43 0 -1000 sshd
Jul 20 11:05:00 someapp kernel: [12884] 0 12884 81885 3754 80 0 0 firewalld
Jul 20 11:05:00 someapp kernel: [18130] 0 18130 33359 291 65 0 0 sshd
Jul 20 11:05:00 someapp kernel: [18132] 1000 18132 33791 748 64 0 0 sshd
Jul 20 11:05:00 someapp kernel: [18133] 1000 18133 28867 122 13 0 0 bash
Jul 20 11:05:00 someapp kernel: [18428] 99 18428 208627 42909 151 0 0 node
Jul 20 11:05:00 someapp kernel: [18486] 89 18486 22909 250 46 0 0 pickup
Jul 20 11:05:00 someapp kernel: [18515] 1000 18515 352905 141851 470 0 0 npm
Jul 20 11:05:00 someapp kernel: [18520] 0 18520 33359 291 66 0 0 sshd
Jul 20 11:05:00 someapp kernel: [18522] 1000 18522 33359 294 64 0 0 sshd
Jul 20 11:05:00 someapp kernel: [18523] 1000 18523 28866 115 12 0 0 bash
Jul 20 11:05:00 someapp kernel: Out of memory: Kill process 18515 (npm) score 559 or sacrifice child
Jul 20 11:05:00 someapp kernel: Killed process 18515 (npm) total-vm:1411620kB, anon-rss:567404kB, file-rss:0kB

As dwc and Adam Jaskiewicz have stated, the culprit is likely the OOM Killer. However, the next question that follows is: How do I prevent this?
There are several ways:
Give your system more RAM if you can (easy if its a VM)
Make sure the OOM killer chooses a different process.
Disable the OOM Killer
Choose a Linux distro which ships with the OOM Killer disabled.
I found (2) to be especially easy to implement:
Adjust /proc/<PID>/oom_score_adj to -1000 (which automatically takes oom_adj to -17 and oom_score to 0).
See how To Create OOM Exclusions in Linux for more.

A tool like systemtap (or a tracer) can monitor kernel signal-transmission logic and report. e.g., https://sourceware.org/systemtap/examples/process/sigmon.stp
# stap --example sigmon.stp -x 31994 SIGKILL
SPID SNAME RPID RNAME SIGNUM SIGNAME
5609 bash 31994 find 9 SIGKILL
The filtering if block in that script can be adjusted to taste, or eliminated to trace systemwide signal traffic. Causes can be further isolated by collecting backtraces (add a print_backtrace() and/or print_ubacktrace() to the probe, for kernel- and userspace- respectively).

The PAM module to limit resources caused exactly the results you described: My process died mysteriously with the text Killed on the console window. No log output, neither in syslog nor in kern.log. The top program helped me to discover that exactly after one minute of CPU usage my process gets killed.

In an lsf environment (interactive or otherwise) if the application exceeds memory utilization beyond some preset threshold by the admins on the queue or the resource request in submit to the queue the processes will be killed so other users don't fall victim to a potential run away. It doesn't always send an email when it does so, depending on how its set up.
One solution in this case is to find a queue with larger resources or define larger resource requirements in the submission.
You may also want to review man ulimit
Although I don't remember ulimit resulting in Killed its been a while since I needed that.

In my case this was happening with a Laravel queue worker. The system logs did not mention any killing so I looked further and it turned out that the worker was basically killing itself because of a job that exceeded the memory limit (which is set to 128M by default).
Running the queue worker with --timeout=600 and --memory=1024 fixed the problem for me.

We have had recurring problems under Linux at a customer site (Red Hat, I think), with OOMKiller (out-of-memory killer) killing both our principle application (i.e. the reason the server exists) and it's data base processes.
In each case OOMKiller simply decided that the processes were using to much resources... the machine wasn't even about to fail for lack of resources. Neither the application nor it's database has problems with memory leaks (or any other resource leak).
I am not a Linux expert, but I rather gathered it's algorithm for deciding when to kill something and what to kill is complex. Also, I was told (I can't speak as to the accuracy of this) that OOMKiller is baked into the Kernel and you can't simply not run it.

The user has the ability to kill his own programs, using kill or Control+C, but I get the impression that's not what happened, and that the user complained to you.
root has the ability to kill programs of course, but if someone has root on your machine and is killing stuff you have bigger problems.
If you are not the sysadmin, the sysadmin may have set up quotas on CPU, RAM, ort disk usage and auto-kills processes that exceed them.
Other than those guesses, I'm not sure without more info about the program.

I encountered this problem lately. Finally, I found my processes were killed just after Opensuse zypper update was called automatically. To disable zypper update solved my problem.

Related

How to unfreeze a user's memory limit?

For these two days, I have met a weird question.
STAR from https://github.com/alexdobin/STAR is a program used to build suffix array indexes. I have been used this program for years. It works OK until recently.
For these days, when I run STAR, it will always be killed.
root#localhost:STAR --runMode genomeGenerate --runThreadN 10 --limitGenomeGenerateRAM 31800833920 --genomeDir star_GRCh38 --genomeFastaFiles GRCh38.fa --sjdbGTFfile GRCh38.gtf --sjdbOverhang 100
.
.
.
Killed
root#localhost:STAR --runMode genomeGenerate --runThreadN 10 --genomeDir star_GRCh38 --genomeFastaFiles GRCh38.fa --sjdbGTFfile GRCh38.gtf --sjdbOverhang 100
Jun 03 10:15:08 ..... started STAR run
Jun 03 10:15:08 ... starting to generate Genome files
Jun 03 10:17:24 ... starting to sort Suffix Array. This may take a long time...
Jun 03 10:17:51 ... sorting Suffix Array chunks and saving them to disk...
Killed
A month ago, the same command with same inputs and same parameters runs OK. It does cost some memory, but not a lot.
I have tried 3 recently released version of this program, all failed. So I do not think it is the question of STAR program but my sever configuration.
I also tried to run this program as both root and normal user, no lucky for each.
I suspect there is a limitation of memory usage in my server.
But I do not know how the memory is limited? I wonder if some one can give me some hints.
Thanks!
Tong
The following is my debug process and system info.
Command dmesg -T| grep -E -i -B5 'killed process' showing it is Out of memory problem.
But before the STAR program is killed, top command showing only 5% mem is occupied by this porgram.
[一 6 1 23:43:00 2020] [40479] 1002 40479 101523 18680 112 487 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] [40480] 1002 40480 101526 18681 112 486 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] [40481] 1002 40481 101529 18682 112 485 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] [40482] 1002 40482 101531 18673 111 493 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] Out of memory: Kill process 33822 (STAR) score 36 or sacrifice child
[一 6 1 23:43:00 2020] Killed process 33822 (STAR) total-vm:23885188kB, anon-rss:10895128kB, file-rss:4kB, shmem-rss:0kB
[三 6 3 10:02:13 2020] [12296] 1002 12296 101652 18681 113 486 0 /anaconda2/bin/
[三 6 3 10:02:13 2020] [12330] 1002 12330 101679 18855 112 486 0 /anaconda2/bin/
[三 6 3 10:02:13 2020] [12335] 1002 12335 101688 18682 112 486 0 /anaconda2/bin/
[三 6 3 10:02:13 2020] [12365] 1349 12365 30067 1262 11 0 0 bash
[三 6 3 10:02:13 2020] Out of memory: Kill process 7713 (STAR) score 40 or sacrifice child
[三 6 3 10:02:13 2020] Killed process 7713 (STAR) total-vm:19751792kB, anon-rss:12392428kB, file-rss:0kB, shmem-rss:0kB
--
[三 6月 3 10:42:17 2020] [ 4697] 1002 4697 101526 18681 112 486 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] [ 4698] 1002 4698 101529 18682 112 485 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] [ 4699] 1002 4699 101532 18680 112 487 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] [ 4701] 1002 4701 101534 18673 110 493 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] Out of memory: Kill process 21097 (STAR) score 38 or sacrifice child
[三 6月 3 10:42:17 2020] Killed process 21097 (STAR) total-vm:19769500kB, anon-rss:11622928kB, file-rss:884kB, shmem-rss:0kB
Command free -hl showing I have enough memory.
total used free shared buff/cache available
Mem: 251G 10G 11G 227G 229G 12G
Low: 251G 240G 11G
High: 0B 0B 0B
Swap: 29G 29G 0B
Also as showed by ulimit -a, no virtual memory limitation is set.
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1030545
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1030545
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Here is the version of my Centos and Kernel (output by hostnamectl):
hostnamectl
Static hostname: localhost.localdomain
Icon name: computer-server
Chassis: server
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-514.26.2.el7.x86_64
Architecture: x86-64
Here shows the content of cat /etc/security/limits.conf:
#* soft core 0
#* hard rss 10000
##student hard nproc 20
##faculty soft nproc 20
##faculty hard nproc 50
#ftp hard nproc 0
##student - maxlogins 4
* soft nofile 65536
* hard nofile 65536
##intern hard as 162400000
##intern hard nproc 150
# End of file
As suggested, I have updated the output of df -h:
Filesystem All Used Available (Used)% Mount
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 1.3M 126G 1% /dev/shm
tmpfs 126G 4.0G 122G 4% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/mapper/cl-root 528G 271G 257G 52% /
/dev/sda1 492M 246M 246M 51% /boot
tmpfs 26G 0 26G 0% /run/user/0
tmpfs 26G 0 26G 0% /run/user/1002
tmpfs 26G 0 26G 0% /run/user/1349
tmpfs 26G 0 26G 0% /run/user/1855
ls -a /dev/shm/
. ..
grep Shmem /proc/meminfo
Shmem: 238640272 kB
Several tmpfs costs 126G memory. I am googleing this, but still not sure what should be done?
That is the problem of shared memory due to program terminated abnormally.
ipcrm is used to clear all shared memory and then STAR running is fine.
$ ipcrm
.....
$ free -h
total used free shared buff/cache available
Mem: 251G 11G 226G 3.9G 14G 235G
Swap: 29G 382M 29G
It looks like the problem is with shared memory: you have 227G of memory eaten up by shared objects.
Shared memory files are persistent. Have a look in /dev/shm and any other tmpfs mounts to see if there are large files that can be removed to free up more physical memory (RAM+swap).
$ ls -l /dev/shm
...
$ df -h | grep '^Filesystem\|^tmpfs'
...
When I run a program called STAR, it will always be killed.
It probably has some memory leak. Even old programs may have residual bugs, and they could appear in some very particular cases.
Check with strace(1) or ltrace(1) and pmap(1). Learn also to query /proc/, see proc(5), top(1), htop(1). See LinuxAteMyRam and read about memory over-commitment and virtual address space and perhaps a textbook on operating systems.
If you have access to the source code of your STAR, consider recompiling it with all warnings and debug info (with GCC, you would pass -Wall -Wextra -g to gcc or g++) then use valgrind and/or some address sanitizer. If you don't have legal access to the source code of STAR, contact the entity (person or organization) which provided it to you.
You could be interested in that draft report and by the Clang static analyzer or Frama-C (or coding your own GCC plugin).
So I do not think it is the question of STAR program but my server configuration.
I recommend using valgrind or gdb and inspect your /proc/ to validate that optimistic hypothesis.

mount.nfs: access denied by server while mounting 192.168.8.104:/mnt/sdb/var/lib/glance/images [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
Before asking this question, I searched the stackoverflow, but the answers no use:
mount: nfs access denied by server
In my VM, I have sda, sdb hard disk in the VM.
Then VM have ips:192.168.8.101, 192.168.8.104.
When I mount the sdb's directory to the VM's directory under /var:
[root#ha-node1 sdb]# mount -t nfs 192.168.8.104:/mnt/sdb/var/lib/glance/images /var/lib/glance/images
Get the below error:
mount.nfs: access denied by server while mounting 192.168.8.104:/mnt/sdb/var/lib/glance/images
And the directories under /mnt permissions all are perfect.
[root#ha-node1 sdb]# ll -d /mnt/
drwxr-xr-x. 4 root root 26 Jul 26 00:43 /mnt/
[root#ha-node1 sdb]# ll -d /mnt/sdb
drwxr-xr-x 4 root root 4096 Jul 26 10:05 /mnt/sdb
[root#ha-node1 sdb]# ll -d /mnt/sdb/var/
drwxr-xr-x 3 root root 4096 Jul 26 10:05 /mnt/sdb/var/
[root#ha-node1 sdb]# ll -d /mnt/sdb/var/lib/
drwxr-xr-x 3 root root 4096 Jul 26 10:05 /mnt/sdb/var/lib/
[root#ha-node1 sdb]# ll -d /mnt/sdb/var/lib/glance/
drwxr-xr-x 3 root root 4096 Jul 26 10:05 /mnt/sdb/var/lib/glance/
[root#ha-node1 sdb]# ll -d /mnt/sdb/var/lib/glance/images/
drwxr-xr-x 2 root root 4096 Jul 26 10:05 /mnt/sdb/var/lib/glance/images/
The network connection is also ok.
[root#ha-node1 sdb]# ping 192.168.8.104
PING 192.168.8.104 (192.168.8.104) 56(84) bytes of data.
64 bytes from 192.168.8.104: icmp_seq=1 ttl=64 time=0.024 ms
64 bytes from 192.168.8.104: icmp_seq=2 ttl=64 time=0.030 ms
64 bytes from 192.168.8.104: icmp_seq=3 ttl=64 time=0.032 ms
64 bytes from 192.168.8.104: icmp_seq=4 ttl=64 time=0.031 ms
The NFS service works normal :
[root#ha-node1 sdb]# systemctl status nfs.service
● nfs-server.service - NFS server and services
Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; vendor preset: disabled)
Active: active (exited) since Wed 2017-07-26 00:26:23 CST; 11h ago
Process: 1916 ExecStart=/usr/sbin/rpc.nfsd $RPCNFSDARGS (code=exited, status=0/SUCCESS)
Process: 1786 ExecStartPre=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
Main PID: 1916 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/nfs-server.service
Jul 26 00:26:22 ha-node1 systemd[1]: Starting NFS server and services...
Jul 26 00:26:23 ha-node1 systemd[1]: Started NFS server and services.
In the log I grep the mount:
[root#ha-node1 sdb]# cat /var/log/messages | grep mount
Jul 24 14:44:07 ha-node1 systemd: tmp.mount: Directory /tmp to mount over is not empty, mounting anyway.
Jul 26 00:12:08 ha-node1 systemd: Started dracut pre-mount hook.
Jul 26 00:12:11 ha-node1 systemd: Started dracut mount hook.
Jul 26 00:12:15 ha-node1 systemd: Started Remount Root and Kernel File Systems.
Jul 26 00:23:44 ha-node1 rpc.mountd[4312]: Version 1.3.0 starting
Jul 26 00:26:02 ha-node1 systemd: Started dracut pre-mount hook.
Jul 26 00:26:04 ha-node1 systemd: Started dracut mount hook.
Jul 26 00:26:08 ha-node1 systemd: Started Remount Root and Kernel File Systems.
Jul 26 00:26:22 ha-node1 rpc.mountd[1561]: Version 1.3.0 starting
Jul 26 00:43:03 ha-node1 kernel: EXT4-fs (sdb): mounted filesystem with ordered data mode. Opts: (null)
Jul 26 00:43:13 ha-node1 kernel: EXT4-fs (sdc): mounted filesystem with ordered data mode. Opts: (null)
Jul 26 10:07:03 ha-node1 rpc.mountd[1561]: refused mount request from 192.168.8.104 for /mnt/sdb/var/lib/glance/images (/): not exported
It shows not exported, I tried to exportfs -r, but no use.
Someone can tell me about this, why I can not NFS mount the sdb to self directory?
The directories exported to outer world is controled by file:'/etc/exports ' under linux OS.
add:
/mnt/sdb/var/lib/glance/images *(rw,sync,no_subtree_check)
in that file and try again. This is only a example for you, The access right should be modified accroding exactly requirement. read the manual of exports please.

MTD start and config at runtime

Got a embedded system that i have root shell access to.
I can not enter the U-boot boot menu. (boot delay=0)
The device boots from a nor flash and loads the filesystem on emmc.
It does not set /dev/mtd devices.
I want to access the nor flash.
There are MTD drivers on the system, so that seems the best option.
(no experiance with this at all, so please correct me if i'm wrong)
drwxrwxr-x 2 1000 root 1024 Jul 29 2013 chips
drwxrwxr-x 2 1000 root 1024 Jul 29 2013 maps
-rw-rw-r-- 1 1000 1000 21544 Jul 29 2013 mtd.ko
-rw-rw-r-- 1 1000 1000 8560 Jul 29 2013 mtd_blkdevs.ko
-rw-rw-r-- 1 1000 1000 6132 Jul 29 2013 mtdblock.ko
-rw-rw-r-- 1 1000 1000 9648 Jul 29 2013 mtdchar.ko
If start MTD with modprobe, /proc/mtd is created.
Nothing in dmesg.
root:/proc# cat /proc/mtd
dev: size erasesize name
So no partition.
How can i configure mtd to be able to access the nor flash.
( physical addresses are known )
Thanks
You need to describe your NOR partitions in a board-specific file in the kernel. In u-boot, you should be able to see them with smeminfo.
In your linux kernel, you'll need to populate an array of mtd_partitions.
Find more here: http://free-electrons.com/blog/managing-flash-storage-with-linux/

report memory and cpu usage - matlab - on multicore linux server

we would need to know how much memory and cpu time a matlab process had used with all of it's spawned threads. If I understand it correctly, all the threads will pop up as new processes with new process-ids but the CMD name will remain the same.
so I thought about creating a demon which append the usage in every n sec:
ps -o %cpu,%mem,cmd -C MATLAB | grep "[0-9]+" >> matlab_log
and later counting and summing up the ratios multiplied by the demon tick time.
I wonder if there is an easier way, or I missing something, or simply just exist some tool more handy for this job?
Cheers
If you install the BSD Process Accounting utilities (package acct on Debian and Ubuntu) you can use the sa(8) utility to summarize executions or give you semi-detailed execution logs:
$ lastcomm
...
man F X sarnold pts/3 0.00 secs Fri May 4 16:21
man F X sarnold pts/3 0.00 secs Fri May 4 16:21
vim sarnold pts/3 0.05 secs Fri May 4 16:20
sa sarnold pts/3 0.00 secs Fri May 4 16:20
sa sarnold pts/3 0.00 secs Fri May 4 16:20
bzr sarnold pts/3 0.99 secs Fri May 4 16:19
apt-get S root pts/1 0.44 secs Fri May 4 16:18
dpkg root pts/1 0.00 secs Fri May 4 16:19
dpkg root pts/1 0.00 secs Fri May 4 16:19
dpkg root pts/1 0.00 secs Fri May 4 16:19
apt-get F root pts/1 0.00 secs Fri May 4 16:19
...
$ sa
633 15.22re 0.09cp 0avio 6576k
24 8.51re 0.03cp 0avio 6531k ***other*
2 0.31re 0.02cp 0avio 10347k apt-get
3 0.02re 0.02cp 0avio 9667k python2.7
18 0.04re 0.01cp 0avio 5444k dpkg
2 0.01re 0.01cp 0avio 13659k debsums
...
The format of the acct file is documented in acct(5), so you could write your own programs to parse the files if none of the standard tools lets you express the queries you want.
Probably the largest downside to the BSD process accounting utilities is that the kernel will only update the process accounting log when processes exit, because many of the summary numbers are only available once another process wait(2)s for it -- so currently running processes are completely overlooked by the utilities.
These utilities may be sufficient though; these utilities is how compute centers billed their clients, back when compute centers were popular...
You can also use top:
top -b -n 1 | grep MATLAB
14226 user 39 19 2476m 1.4g 26m S 337.2 9.2 24:44.60 MATLAB
25878 user 39 19 2628m 1.6g 26m S 92.0 10.6 21:07.36 MATLAB
14363 user 39 19 2650m 1.4g 26m S 79.7 9.1 23:58.38 MATLAB
14088 user 39 19 2558m 1.4g 26m S 61.3 9.1 25:14.53 MATLAB
14648 user 39 19 2629m 1.6g 26m S 55.2 10.5 22:03.20 MATLAB
14506 user 39 19 2613m 1.5g 26m S 49.0 9.4 22:32.47 MATLAB
14788 user 39 19 2599m 1.6g 26m S 49.0 10.3 20:44.78 MATLAB
25650 user 39 19 2608m 1.6g 26m S 42.9 10.2 25:08.38 MATLAB
or to get fieldnames too:
top -b -n 1 | head -n 7 | tail -n 1; top -b -n 1 | grep MATLAB
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14226 user 39 19 2476m 1.4g 26m S 337.2 9.2 24:44.60 MATLAB
25878 user 39 19 2628m 1.6g 26m S 92.0 10.6 21:07.36 MATLAB
14363 user 39 19 2650m 1.4g 26m S 79.7 9.1 23:58.38 MATLAB
14088 user 39 19 2558m 1.4g 26m S 61.3 9.1 25:14.53 MATLAB
14648 user 39 19 2629m 1.6g 26m S 55.2 10.5 22:03.20 MATLAB
14506 user 39 19 2613m 1.5g 26m S 49.0 9.4 22:32.47 MATLAB
14788 user 39 19 2599m 1.6g 26m S 49.0 10.3 20:44.78 MATLAB
25650 user 39 19 2608m 1.6g 26m S 42.9 10.2 25:08.38 MATLAB

Linux Userspace GPIO Interrupts using sysfs

I would like to use interrupts with GPIO on userspace using sysfs.
I use these commands :
[root#at91]:gpio109 > echo 109 > export
[root#at91]:gpio109 > cd gpio109/
[root#at91]:gpio109 > ll
-rw-r--r-- 1 root 0 4096 Jan 1 00:17 direction
drwxr-xr-x 2 root 0 0 Jan 1 00:17 power
lrwxrwxrwx 1 root 0 0 Jan 1 00:17 subsystem -> ../../gpio
-rw-r--r-- 1 root 0 4096 Jan 1 00:17 uevent
-rw-r--r-- 1 root 0 4096 Jan 1 00:17 value
The gpio works well but I can't use interrupts.
I read everywhere i must have an edge file to poll this file. But on my system this file doesn't exist.
I made a lot of tries to find a solution but remain unsuccessfull.
My target is an AT91SAM9263 on linux kernel 2.6.30.
At the boot of my board I got this message on interrupts :
AT91: 160 gpio irqs in 5 banks
which show that the function at91_gpio_irq_setup() is well executed.
Have you any idea ?
The "edge" file only exists if that GPIO pin can be configured as a an interrupt generting pin. See: http://www.mjmwired.net/kernel/Documentation/gpio.txt#634.
Since you don't see it, it means the driver and possibly the hardware do not support using that GPIO pin for interrupt source.

Resources