How do I know if my server has NUMA? - linux

Hopping from Java Garbage Collection, I came across JVM settings for NUMA. Curiously I wanted to check if my CentOS server has NUMA capabilities or not. Is there a *ix command or utility that could grab this info?

I'm no expert here, but here's something:
Box 1, no NUMA:
~$ dmesg | grep -i numa
[ 0.000000] No NUMA configuration found
Box 2, some NUMA:
~$ dmesg | grep -i numa
[ 0.000000] NUMA: Initialized distance table, cnt=8
[ 0.000000] NUMA: Node 4 [0,80000000) + [100000000,280000000) -> [0,280000000)

I think this previous question is similar: How to confirm NUMA?
In particular, you can review the NUMA man page here:
http://man7.org/linux/man-pages/man7/numa.7.html
And from there you'll see:
$ find /proc -name numa_maps
/proc/1/task/1/numa_maps
/proc/1/numa_maps
/proc/2/task/2/numa_maps
/proc/2/numa_maps
/proc/3/task/3/numa_maps
[etc if you have numa]
And you can get more detail like so:
$ grep NUMA=y /boot/config-`uname -r`
CONFIG_NUMA=y
CONFIG_K8_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_ACPI_NUMA=y
$ numactl --hardware
available: 2 nodes (0-1)
node 0 size: 18156 MB
node 0 free: 9053 MB
node 1 size: 18180 MB
node 1 free: 6853 MB
node distances:
node 0 1
0: 10 20
1: 20 10

For Redhat 4,5,6 and 7 systems, one can try the following to determine if NUMA configuration is disabled:
numactl --show does not show multiple nodes
# numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
cpubind: 0
nodebind: 0
membind: 0
or numactl --hardware does not list multiple nodes
# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
node 0 size: 524163 MB
node 0 free: 505253 MB
node distances:
node 0
0: 10

You can also get this info from lscpu command:
lscpu | grep -i numa
NUMA node(s): 2
NUMA node0 CPU(s): 0-19,40-59
NUMA node1 CPU(s): 20-39,60-79

You can also just query the information from /sys (this is what tools like numactl do underneath). As others pointed out, using dmesg will be unreliable since it usually does not have unlimited buffering.
To find out how many NUMA nodes are currently available, do:
cat /sys/devices/system/node/online
0-3

Related

How do I get 4 MB huge pages on Linux

According to:
$ ls -l /sys/kernel/mm/hugepages
drwxr-xr-x 2 root root 0 Dec 6 10:38 hugepages-1048576kB
drwxr-xr-x 2 root root 0 Dec 6 10:38 hugepages-2048kB
There is a choice of 2 MB and 1 GB sizes of huge pages on my system which is running a 5.4.17 kernel
However according to:
$ cpuid | grep -i tlb |sort| uniq
0x03: data TLB: 4K pages, 4-way, 64 entries
0x63: data TLB: 2M/4M pages, 4-way, 32 entries
0x76: instruction TLB: 2M/4M pages, fully, 8 entries
0xb5: instruction TLB: 4K, 8-way, 64 entries
0xc3: L2 TLB: 4K/2M pages, 6-way, 1536 entries
cache and TLB information (2):
data TLB: 1G pages, 4-way, 4 entries
L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx):
L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax):
L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx):
the TLBs on my Skylake also support 4 MB pages. The same information can be found at
https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(server)
So the question is: can I really have 4 MB pages, and if so what do I need to do to set up my system to have that option?
The best answer is probably to install and/or use libhugetlbfs
If it's already installed, you can check status of huge pages in the OS with a command like:
$ hugeadm --pool-list
Size Minimum Current Maximum Default
2097152 0 1 257388 *
4194304 0 0 128694
8388608 0 0 64347
16777216 0 0 32173
33554432 0 0 16086
67108864 0 0 8043
134217728 0 0 4021
268435456 0 0 2010
536870912 0 0 1005
1073741824 0 0 502
2147483648 0 0 251
The same hugeadm command can also be run as sudo with various options to configure the available huge memory pools. See the hugeadm man page for details.

How to unfreeze a user's memory limit?

For these two days, I have met a weird question.
STAR from https://github.com/alexdobin/STAR is a program used to build suffix array indexes. I have been used this program for years. It works OK until recently.
For these days, when I run STAR, it will always be killed.
root#localhost:STAR --runMode genomeGenerate --runThreadN 10 --limitGenomeGenerateRAM 31800833920 --genomeDir star_GRCh38 --genomeFastaFiles GRCh38.fa --sjdbGTFfile GRCh38.gtf --sjdbOverhang 100
.
.
.
Killed
root#localhost:STAR --runMode genomeGenerate --runThreadN 10 --genomeDir star_GRCh38 --genomeFastaFiles GRCh38.fa --sjdbGTFfile GRCh38.gtf --sjdbOverhang 100
Jun 03 10:15:08 ..... started STAR run
Jun 03 10:15:08 ... starting to generate Genome files
Jun 03 10:17:24 ... starting to sort Suffix Array. This may take a long time...
Jun 03 10:17:51 ... sorting Suffix Array chunks and saving them to disk...
Killed
A month ago, the same command with same inputs and same parameters runs OK. It does cost some memory, but not a lot.
I have tried 3 recently released version of this program, all failed. So I do not think it is the question of STAR program but my sever configuration.
I also tried to run this program as both root and normal user, no lucky for each.
I suspect there is a limitation of memory usage in my server.
But I do not know how the memory is limited? I wonder if some one can give me some hints.
Thanks!
Tong
The following is my debug process and system info.
Command dmesg -T| grep -E -i -B5 'killed process' showing it is Out of memory problem.
But before the STAR program is killed, top command showing only 5% mem is occupied by this porgram.
[一 6 1 23:43:00 2020] [40479] 1002 40479 101523 18680 112 487 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] [40480] 1002 40480 101526 18681 112 486 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] [40481] 1002 40481 101529 18682 112 485 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] [40482] 1002 40482 101531 18673 111 493 0 /anaconda2/bin/
[一 6 1 23:43:00 2020] Out of memory: Kill process 33822 (STAR) score 36 or sacrifice child
[一 6 1 23:43:00 2020] Killed process 33822 (STAR) total-vm:23885188kB, anon-rss:10895128kB, file-rss:4kB, shmem-rss:0kB
[三 6 3 10:02:13 2020] [12296] 1002 12296 101652 18681 113 486 0 /anaconda2/bin/
[三 6 3 10:02:13 2020] [12330] 1002 12330 101679 18855 112 486 0 /anaconda2/bin/
[三 6 3 10:02:13 2020] [12335] 1002 12335 101688 18682 112 486 0 /anaconda2/bin/
[三 6 3 10:02:13 2020] [12365] 1349 12365 30067 1262 11 0 0 bash
[三 6 3 10:02:13 2020] Out of memory: Kill process 7713 (STAR) score 40 or sacrifice child
[三 6 3 10:02:13 2020] Killed process 7713 (STAR) total-vm:19751792kB, anon-rss:12392428kB, file-rss:0kB, shmem-rss:0kB
--
[三 6月 3 10:42:17 2020] [ 4697] 1002 4697 101526 18681 112 486 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] [ 4698] 1002 4698 101529 18682 112 485 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] [ 4699] 1002 4699 101532 18680 112 487 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] [ 4701] 1002 4701 101534 18673 110 493 0 /anaconda2/bin/
[三 6月 3 10:42:17 2020] Out of memory: Kill process 21097 (STAR) score 38 or sacrifice child
[三 6月 3 10:42:17 2020] Killed process 21097 (STAR) total-vm:19769500kB, anon-rss:11622928kB, file-rss:884kB, shmem-rss:0kB
Command free -hl showing I have enough memory.
total used free shared buff/cache available
Mem: 251G 10G 11G 227G 229G 12G
Low: 251G 240G 11G
High: 0B 0B 0B
Swap: 29G 29G 0B
Also as showed by ulimit -a, no virtual memory limitation is set.
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1030545
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1030545
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Here is the version of my Centos and Kernel (output by hostnamectl):
hostnamectl
Static hostname: localhost.localdomain
Icon name: computer-server
Chassis: server
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-514.26.2.el7.x86_64
Architecture: x86-64
Here shows the content of cat /etc/security/limits.conf:
#* soft core 0
#* hard rss 10000
##student hard nproc 20
##faculty soft nproc 20
##faculty hard nproc 50
#ftp hard nproc 0
##student - maxlogins 4
* soft nofile 65536
* hard nofile 65536
##intern hard as 162400000
##intern hard nproc 150
# End of file
As suggested, I have updated the output of df -h:
Filesystem All Used Available (Used)% Mount
devtmpfs 126G 0 126G 0% /dev
tmpfs 126G 1.3M 126G 1% /dev/shm
tmpfs 126G 4.0G 122G 4% /run
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/mapper/cl-root 528G 271G 257G 52% /
/dev/sda1 492M 246M 246M 51% /boot
tmpfs 26G 0 26G 0% /run/user/0
tmpfs 26G 0 26G 0% /run/user/1002
tmpfs 26G 0 26G 0% /run/user/1349
tmpfs 26G 0 26G 0% /run/user/1855
ls -a /dev/shm/
. ..
grep Shmem /proc/meminfo
Shmem: 238640272 kB
Several tmpfs costs 126G memory. I am googleing this, but still not sure what should be done?
That is the problem of shared memory due to program terminated abnormally.
ipcrm is used to clear all shared memory and then STAR running is fine.
$ ipcrm
.....
$ free -h
total used free shared buff/cache available
Mem: 251G 11G 226G 3.9G 14G 235G
Swap: 29G 382M 29G
It looks like the problem is with shared memory: you have 227G of memory eaten up by shared objects.
Shared memory files are persistent. Have a look in /dev/shm and any other tmpfs mounts to see if there are large files that can be removed to free up more physical memory (RAM+swap).
$ ls -l /dev/shm
...
$ df -h | grep '^Filesystem\|^tmpfs'
...
When I run a program called STAR, it will always be killed.
It probably has some memory leak. Even old programs may have residual bugs, and they could appear in some very particular cases.
Check with strace(1) or ltrace(1) and pmap(1). Learn also to query /proc/, see proc(5), top(1), htop(1). See LinuxAteMyRam and read about memory over-commitment and virtual address space and perhaps a textbook on operating systems.
If you have access to the source code of your STAR, consider recompiling it with all warnings and debug info (with GCC, you would pass -Wall -Wextra -g to gcc or g++) then use valgrind and/or some address sanitizer. If you don't have legal access to the source code of STAR, contact the entity (person or organization) which provided it to you.
You could be interested in that draft report and by the Clang static analyzer or Frama-C (or coding your own GCC plugin).
So I do not think it is the question of STAR program but my server configuration.
I recommend using valgrind or gdb and inspect your /proc/ to validate that optimistic hypothesis.

In ss -s, what is the kernel counter actually counting?

While troubleshooting a problem on an OEL 7 server (3.10.0-1062.9.1.el7.x86_64), I ran the command
sudo ss -s
Which gave me the output of:
Total: 601 (kernel 1071)
TCP: 8 (estab 2, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 0
Transport Total IP IPv6
1071 - -
RAW 2 0 2
UDP 6 4 2
TCP 8 5 3
INET 16 9 7
FRAG 0 0 0
Doing an ss -a | wc -l came back with 225 entries.
It leads me to the question, what is kernel 1071 actually counting?
Looking through the various man pages did not provide an answer.
Using strace, I can see where ss reads:
/proc/net/sockstat
/proc/net/sockstat6
/proc/net/snmp
/proc/slabinfo
Looking through those files and the docs, it looks like the value is coming from /proc/slabinfo.
Grepping through /proc/slabinfo for 1071 came back with one entry:
sock_inode_cache 1071 1071 640 51 8 : tunables 0 0 0 : slabdata 21 21 0
Looking through the files and docs on sock_inode_cache has not helped so far. I am hoping someone here knows what the kernel counter is actually counting, or can point me in the right direction.
what is kernel 1071 actually counting?
sock_inode_cache represents Linux kernel Slab statistics. It shows how many socket inodes (active objects) are there.
struct socket_alloc corresponds to the sock_inode_cache slab cache and contains the struct socket and struct inode, so it is connected to VFS.

linux irq affinity set cannot take effctive

when i set irq for my Ethernet cannot be effective. (irq from 99-119)
$ sudo cat /proc/irq/109/smp_affinity
00,00000400
$ sudo sh -c "echo 0 > /proc/irq/109/smp_affinity"
$ sudo cat /proc/irq/109/smp_affinity
00,00000400
i want to set all ethernet irq to bind with cpu0. but no lucky for me to set it. i am not sure what kind of problems i met.
and i noticed affinity_hint has following value, and i cannot set it anyway.
$sudo cat /proc/irq/109/affinity_hint
00,00000400
$ sudo sh -c "echo 0 > /proc/irq/109/affinity_hint"
sh: line 0: echo: write error: Input/output error
this system i have 2 cpus with 6 cores. and enable hyper thread. totally has 24 cpu cores.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2660 v2 # 2.20GHz
Stepping: 4
CPU MHz: 1201.921
BogoMIPS: 4404.51
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 25600K
NUMA node0 CPU(s): 0-9,20-29
NUMA node1 CPU(s): 10-19,30-39
please help advise how to resolve. thanks!
You need to specify a bit mask giving a set of CPUs that can handle the interrupt. For CPU0, the mask value is 1.
i have got the solution. i make a mistake here. it should be echo "1" for core 0, and echo "2" for core 1.

how to get the tasks taking more size on RAM in linux

With the command free -g, I am able to get the total occupied size and free size of RAM in Linux. But want to understand which tasks or process taking more size, so that I can free up the RAM size.
total used free shared buffers cached
Mem: 125 121 4 0 6 94
-/+ buffers/cache: 20 105
Swap: 31 0 31
Go for top command
then press shift+f
press a for pid information
ALso check
ps -eo pmem,vsz,pid
man ps
checkout pmem,vsz,pid.......
hope it helps..
thanks for the question !
You can use below command to find running processes sorted by memory use:
ps -eo pmem,pcpu,rss,vsize,args | sort -k 1 -r | less

Resources