How to determine number of processor sockets on Linux ppc64le - linux

There seems to be a bug with lscpu where it can not determine the correct number of sockets. There is an issue opened for this but I haven't got any response https://github.com/karelzak/util-linux/issues/698. This is my output:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 32
NUMA node(s): 5
Model: IBM,9119-MHE
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-255
NUMA node4 CPU(s):
NUMA node5 CPU(s):
NUMA node6 CPU(s):
NUMA node7 CPU(s):
Is there another way to go about getting the number of sockets?

A linux patch currently in testing (on Feb 26 2020) patch fixes this issue.
The patch is this one
Expect this to come out in the next 5.6 linux kernel release.

Related

QEMU Linux Kernel Panic With BusyBox

I am following this guide to run a minimal linux kernel on top of QEMU, and keep running into the same issue. I have also tried other similar tutorials and end up with the same panic. Since the panic seems to be related to the /init script, I have tried swapping this out with a (statically linked) binary but still end up with the panic. The only thing I have done differently from the tutorial is using the latest stable version of BusyBox (currently 1.34.1). The text of the panic is here and linked below. If anyone has any insight on this that would be great!
Linux version 5.15.0+ (michael#tree) (gcc (Gentoo Hardened 10.3.0-r2 p3) 10.3.0, GNU ld (G1
Command line: console=ttyS0
x86/fpu: x87 FPU will use FXSAVE
signal: max sigframe size: 1040
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x0000000007fdffff] usable
BIOS-e820: [mem 0x0000000007fe0000-0x0000000007ffffff] reserved
BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
NX (Execute Disable) protection: active
SMBIOS 2.8 present.
DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
tsc: Fast TSC calibration using PIT
tsc: Detected 3992.482 MHz processor
last_pfn = 0x7fe0 max_arch_pfn = 0x400000000
x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
found SMP MP-table at [mem 0x000f5c80-0x000f5c8f]
RAMDISK: [mem 0x07cf6000-0x07fdffff]
Zone ranges:
DMA [mem 0x0000000000001000-0x0000000000ffffff]
DMA32 [mem 0x0000000001000000-0x0000000007fdffff]
Normal empty
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000000001000-0x000000000009efff]
node 0: [mem 0x0000000000100000-0x0000000007fdffff]
Initmem setup node 0 [mem 0x0000000000001000-0x0000000007fdffff]
On node 0, zone DMA: 1 pages in unavailable ranges
On node 0, zone DMA: 97 pages in unavailable ranges
On node 0, zone DMA32: 32 pages in unavailable ranges
Intel MultiProcessor Specification v1.4
MPTABLE: OEM ID: BOCHSCPU
MPTABLE: Product ID: 0.1
MPTABLE: APIC at: 0xFEE00000
Processor #0 (Bootup-CPU)
IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
Processors: 1
[mem 0x08000000-0xfffbffff] available for PCI devices
clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 764551s
Built 1 zonelists, mobility grouping on. Total pages: 31968
Kernel command line: console=ttyS0
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 114084K/130552K available (4097K kernel code, 768K rwdata, 484K rodata, 532K init,)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
NR_IRQS: 4352, nr_irqs: 48, preallocated irqs: 16
Console: colour VGA+ 80x25
printk: console [ttyS0] enabled
APIC: Switch to symmetric I/O mode setup
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x73193cf9da3, max_idle_ns: 8s
Calibrating delay loop (skipped), value calculated using timer frequency.. 7984.96 BogoMIP)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
CPU: AMD QEMU Virtual CPU version 2.5+ (family: 0x6, model: 0x6, stepping: 0x3)
Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
Spectre V2 : Spectre mitigation: kernel not compiled with retpoline; no mitigation availab!
Speculative Store Bypass: Vulnerable
Performance Events: PMU not available due to virtualization, using software events only.
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76450417851000s
futex hash table entries: 256 (order: 0, 6144 bytes, linear)
clocksource: Switched to clocksource tsc-early
platform rtc_cmos: registered platform RTC device (no PNP device found)
Unpacking initramfs...
workingset: timestamp_bits=62 max_order=15 bucket_order=0
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
random: get_random_bytes called from init_oops_id+0x2f/0x40 with crng_init=0
sched_clock: Marking stable (333996480, 16836404)->(354705829, -3872945)
Freeing initrd memory: 2984K
Freeing unused kernel image (initmem) memory: 532K
Write protecting the kernel read-only data: 8192k
Freeing unused kernel image (text/rodata gap) memory: 2044K
Freeing unused kernel image (rodata/data gap) memory: 1564K
Run /init as init process
Failed to execute /init (error -13)
Run /sbin/init as init process
Run /etc/init as init process
Run /bin/init as init process
Run /bin/sh as init process
Kernel panic - not syncing: No working init found. Try passing init= option to kernel. Se.
CPU: 0 PID: 1 Comm: swapper Not tainted 5.15.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
Call Trace:
<TASK>
dump_stack+0x20/0x22
panic+0xd0/0x246
? rest_init+0x90/0x90
kernel_init+0xfc/0x100
ret_from_fork+0x1f/0x30
</TASK>
Kernel Offset: disabled
---[ end Kernel panic - not syncing: No working init found. Try passing init= option to k-
QEMU 6.0.0 monitor - type 'help' for more information
(qemu) quit
Additionally, I have actually tried very similar steps on a completely different computer, so I'm pretty sure there is something I am missing in-general. Both machines are running Gentoo but I have successfully used QEMU to emulate other existing operating systems on both devices (so I think I can exclude a missing kernel configuration)?
Add execute mode to /init file before you build CPIO initramfs:
chmod +x <path-to-your-initrd-root>/init

intel SPDK ioat example fail to run

I am new in the intel SPDK and meet some problem when I run the example code.
I setup the BIOS as this page said.
Intel® Hyper-Threading Technology off
Intel SpeedStep® technology enabled
Intel® Turbo Boost Technology disabled
then I git clone from this page and run all the command. The test command ./test/unit/unittest.sh return All unit tests passed.
But when I run the example examples/ioat/verify/verify , it return
EAL: 24 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found for that size
Starting SPDK v18.10-pre / DPDK 18.05.0 initialization...
[ DPDK EAL parameters: verify --no-shconf -c 0x1 --legacy-mem --file-prefix=spdk_pid3170 ]
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/spdk_pid3170/mp_socket
EAL: 24 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found
for that size
EAL: Probing VFIO support...
User configuration:
Run time: 10 seconds
Core mask: 0x1
Queue depth: 32
Not enough ioat channels found. Check that ioat channels are bound
to uio_pci_generic or vfio-pci. scripts/setup.sh can help with this.
and scripts/setup.sh status shows
Hugepages
node hugesize free / total
node0 1048576kB 24 / 24
node0 2048kB 0 / 800
node1 1048576kB 0 / 0
node1 2048kB 0 / 224
NVMe devices
BDF Numa Node Driver name Device name
I/OAT DMA
BDF Numa Node Driver Name
virtio
BDF Numa Node Driver Name Device Name
My hardware is:
linux kernel version 4.15.7
with ioatdma compile as module
CPU intel Xeon E5-2695
chipset C612
It would be great help if somebody could give me some advises or send me some website about SPDK!
Thank you!
Run ./scripts/setup.sh (with no parameters). If there will be no ioat devices under I/OAT DMA section you can't run this app. Also there is no hugetlbfs mount points.

Xen PV VM uses max 1 Thread

I am running a CPU benchmark tool (LINPACK) on a PV virtual machine inside Xen 4.5.1 on Ubuntu 15.10 x64 on an IBM x3550 M4 server. This tool should consume all possible CPU cycles available. I allocate 4 vCPUs by defining this in the Xen PV (test.cfg). However, LINPACK detects only 1 core and 4 threads while it should detect at least 4 cores:
CPU frequency: 2.494 GHz
Number of CPUs: 1
Number of cores: 1
Number of threads: 4
This is what lscpu says inside this Xen PV VM:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 4
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2609 v2 # 2.50GHz
Stepping: 4
CPU MHz: 2500.062
BogoMIPS: 5000.12
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 10240K
NUMA node0 CPU(s): 0-3
Other platforms such as Docker and HVM DO get cores allocated inside the virtual node (see below). These nodes have significant better performance then the Xen PV virtual node.
CPU frequency: 2.499 GHz
Number of CPUs: 2
Number of cores: 8
Number of threads: 4
This is the DomU Xen host machine lscpu:
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2609 v2 # 2.50GHz
Stepping: 4
CPU MHz: 2500.062
BogoMIPS: 5000.12
Hypervisor vendor: Xen
Virtualization type: none
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 10240K
NUMA node0 CPU(s): 0-7
Xen vCPU list:
xl vcpu-list
Domain-0 0 0 7 -b- 10.1 all / all
Domain-0 0 1 2 -b- 6.5 all / all
Domain-0 0 2 4 r-- 2.6 all / all
Domain-0 0 3 0 r-- 3.9 all / all
Domain-0 0 4 3 -b- 4.4 all / all
Domain-0 0 5 6 -b- 2.6 all / all
Domain-0 0 6 5 -b- 4.7 all / all
Domain-0 0 7 7 -b- 2.9 all / all
test 3 0 1 -b- 1.5 0-3 / all
test 3 1 0 -b- 1.8 0-3 / all
test 3 2 0 -b- 0.7 0-3 / all
test 3 3 2 -b- 0.6 0-3 / all
xen DomU PV VM config:
cat test.cfg
bootloader = '/usr/lib/xen-4.5/bin/pygrub'
vcpus = '4'
memory = '2048'
cpus = "0-3"
Is there any options to give the paravirtualized guest the host-cpu topology? In other words, how do I get Xen to use more vCPU cores/ vCPUs?
Thanks!

NUMA support on which CPU? What are the current server configuration of this kind of CPU?

NUMA support on which CPU? What are the current server configuration of this kind of CPU? Linux NUMA commands regarding what, how to open NUMA?
This is going to depend of your server, if it's using a multicore cpu that support Numa affinity. Type numactl --hardware and you'll check how it's the current configuration, for example:
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 32733 MB
node 0 free: 4027 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 32767 MB
node 1 free: 20898 MB
node distances:
node 0 1
0: 10 21
1: 21 10
If you want to check performance with your application, just make sure that it's using the CPUs from the same numa node. You can check this using ps -aux ortop commands.

Counting the number of parallel threads in a server

Following is an output of the "lscpu" command in a server machine.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 1200.000
BogoMIPS: 5188.69
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-31
How many threads can be executed in parallel on this machine? what will be the formula to calculate this number with the above parameters?

Resources