Counting the number of parallel threads in a server - linux

Following is an output of the "lscpu" command in a server machine.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 1200.000
BogoMIPS: 5188.69
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-31
How many threads can be executed in parallel on this machine? what will be the formula to calculate this number with the above parameters?

Related

QEMU Linux Kernel Panic With BusyBox

I am following this guide to run a minimal linux kernel on top of QEMU, and keep running into the same issue. I have also tried other similar tutorials and end up with the same panic. Since the panic seems to be related to the /init script, I have tried swapping this out with a (statically linked) binary but still end up with the panic. The only thing I have done differently from the tutorial is using the latest stable version of BusyBox (currently 1.34.1). The text of the panic is here and linked below. If anyone has any insight on this that would be great!
Linux version 5.15.0+ (michael#tree) (gcc (Gentoo Hardened 10.3.0-r2 p3) 10.3.0, GNU ld (G1
Command line: console=ttyS0
x86/fpu: x87 FPU will use FXSAVE
signal: max sigframe size: 1040
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x0000000007fdffff] usable
BIOS-e820: [mem 0x0000000007fe0000-0x0000000007ffffff] reserved
BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
NX (Execute Disable) protection: active
SMBIOS 2.8 present.
DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
tsc: Fast TSC calibration using PIT
tsc: Detected 3992.482 MHz processor
last_pfn = 0x7fe0 max_arch_pfn = 0x400000000
x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
found SMP MP-table at [mem 0x000f5c80-0x000f5c8f]
RAMDISK: [mem 0x07cf6000-0x07fdffff]
Zone ranges:
DMA [mem 0x0000000000001000-0x0000000000ffffff]
DMA32 [mem 0x0000000001000000-0x0000000007fdffff]
Normal empty
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000000001000-0x000000000009efff]
node 0: [mem 0x0000000000100000-0x0000000007fdffff]
Initmem setup node 0 [mem 0x0000000000001000-0x0000000007fdffff]
On node 0, zone DMA: 1 pages in unavailable ranges
On node 0, zone DMA: 97 pages in unavailable ranges
On node 0, zone DMA32: 32 pages in unavailable ranges
Intel MultiProcessor Specification v1.4
MPTABLE: OEM ID: BOCHSCPU
MPTABLE: Product ID: 0.1
MPTABLE: APIC at: 0xFEE00000
Processor #0 (Bootup-CPU)
IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
Processors: 1
[mem 0x08000000-0xfffbffff] available for PCI devices
clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 764551s
Built 1 zonelists, mobility grouping on. Total pages: 31968
Kernel command line: console=ttyS0
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
mem auto-init: stack:off, heap alloc:off, heap free:off
Memory: 114084K/130552K available (4097K kernel code, 768K rwdata, 484K rodata, 532K init,)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
NR_IRQS: 4352, nr_irqs: 48, preallocated irqs: 16
Console: colour VGA+ 80x25
printk: console [ttyS0] enabled
APIC: Switch to symmetric I/O mode setup
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x73193cf9da3, max_idle_ns: 8s
Calibrating delay loop (skipped), value calculated using timer frequency.. 7984.96 BogoMIP)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
CPU: AMD QEMU Virtual CPU version 2.5+ (family: 0x6, model: 0x6, stepping: 0x3)
Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
Spectre V2 : Spectre mitigation: kernel not compiled with retpoline; no mitigation availab!
Speculative Store Bypass: Vulnerable
Performance Events: PMU not available due to virtualization, using software events only.
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76450417851000s
futex hash table entries: 256 (order: 0, 6144 bytes, linear)
clocksource: Switched to clocksource tsc-early
platform rtc_cmos: registered platform RTC device (no PNP device found)
Unpacking initramfs...
workingset: timestamp_bits=62 max_order=15 bucket_order=0
Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
random: get_random_bytes called from init_oops_id+0x2f/0x40 with crng_init=0
sched_clock: Marking stable (333996480, 16836404)->(354705829, -3872945)
Freeing initrd memory: 2984K
Freeing unused kernel image (initmem) memory: 532K
Write protecting the kernel read-only data: 8192k
Freeing unused kernel image (text/rodata gap) memory: 2044K
Freeing unused kernel image (rodata/data gap) memory: 1564K
Run /init as init process
Failed to execute /init (error -13)
Run /sbin/init as init process
Run /etc/init as init process
Run /bin/init as init process
Run /bin/sh as init process
Kernel panic - not syncing: No working init found. Try passing init= option to kernel. Se.
CPU: 0 PID: 1 Comm: swapper Not tainted 5.15.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
Call Trace:
<TASK>
dump_stack+0x20/0x22
panic+0xd0/0x246
? rest_init+0x90/0x90
kernel_init+0xfc/0x100
ret_from_fork+0x1f/0x30
</TASK>
Kernel Offset: disabled
---[ end Kernel panic - not syncing: No working init found. Try passing init= option to k-
QEMU 6.0.0 monitor - type 'help' for more information
(qemu) quit
Additionally, I have actually tried very similar steps on a completely different computer, so I'm pretty sure there is something I am missing in-general. Both machines are running Gentoo but I have successfully used QEMU to emulate other existing operating systems on both devices (so I think I can exclude a missing kernel configuration)?
Add execute mode to /init file before you build CPIO initramfs:
chmod +x <path-to-your-initrd-root>/init

How to determine number of processor sockets on Linux ppc64le

There seems to be a bug with lscpu where it can not determine the correct number of sockets. There is an issue opened for this but I haven't got any response https://github.com/karelzak/util-linux/issues/698. This is my output:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 32
NUMA node(s): 5
Model: IBM,9119-MHE
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-255
NUMA node4 CPU(s):
NUMA node5 CPU(s):
NUMA node6 CPU(s):
NUMA node7 CPU(s):
Is there another way to go about getting the number of sockets?
A linux patch currently in testing (on Feb 26 2020) patch fixes this issue.
The patch is this one
Expect this to come out in the next 5.6 linux kernel release.

how to fix cassandra debug.log error LEAK: ByteBuf.release()

I am getting this error in my debug file.If anybody knows this error please help to solve it. Frustated with this error
ERROR [epollEventLoopGroup-2-51] 2017-11-09 16:09:21,495 Slf4JLogger.java:176 - LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetection.level=advanced' or call ResourceLeakDetector.setLevel() See http://netty.io/wiki/reference-counted-objects.html for more information.
I am using G1 garbage collector instead of CMS collector
I have 4 servers
x.x.x.1 contains-------------------------------------------
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="2000M"
OS: CentOS - 7
RAM: 142 GB
Swap: 3 GB
Processor: Intel(R) Xeon(R) CPU E5-2630 v4 # 2.20GHz
Core: 40 Core
Disk: 2.5T
x.x.x.2 contains--------------------------------------------
MAX_HEAP_SIZE="16G"
HEAP_NEWSIZE="4000M"
OS: CentOS - 7
RAM: 125 GB
Swap: 3 GB
Processor: Intel(R) Xeon(R) CPU E5-2630 v4 # 2.20GHz
Core: 40 Core
Disk: 2.2T
x.x.x.3 contains---------------------------------------------
MAX_HEAP_SIZE="16G"
HEAP_NEWSIZE="4000M"
OS: CentOS - 7
RAM: 125 GB
Swap: 3 GB
Processor: Intel(R) Xeon(R) CPU E5-2630 v4 # 2.20GHz
Core: 40 Core
Disk: 2 TB
x.x.x.4 contains-----------------------------------------
MAX_HEAP_SIZE="4G"
HEAP_NEWSIZE="1200M"
OS: CentOS - 7
RAM: 125 GB
Swap: 3 GB
Processor: Intel(R) Xeon(R) CPU E5-1650 v3 # 3.50GHz
Core: 12 Core
Disk: 2.7 TB
jvm options are like this----------------------
-XX:InitiatingHeapOccupancyPercent=70
-XX:ParallelGCThreads=16
-XX:ConcGCThreads=16
log options are like this -----------------------
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
-Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M
but thing is one of the server is always turns down.
Thanks and Regards
pavs

Xen PV VM uses max 1 Thread

I am running a CPU benchmark tool (LINPACK) on a PV virtual machine inside Xen 4.5.1 on Ubuntu 15.10 x64 on an IBM x3550 M4 server. This tool should consume all possible CPU cycles available. I allocate 4 vCPUs by defining this in the Xen PV (test.cfg). However, LINPACK detects only 1 core and 4 threads while it should detect at least 4 cores:
CPU frequency: 2.494 GHz
Number of CPUs: 1
Number of cores: 1
Number of threads: 4
This is what lscpu says inside this Xen PV VM:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 4
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2609 v2 # 2.50GHz
Stepping: 4
CPU MHz: 2500.062
BogoMIPS: 5000.12
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 10240K
NUMA node0 CPU(s): 0-3
Other platforms such as Docker and HVM DO get cores allocated inside the virtual node (see below). These nodes have significant better performance then the Xen PV virtual node.
CPU frequency: 2.499 GHz
Number of CPUs: 2
Number of cores: 8
Number of threads: 4
This is the DomU Xen host machine lscpu:
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2609 v2 # 2.50GHz
Stepping: 4
CPU MHz: 2500.062
BogoMIPS: 5000.12
Hypervisor vendor: Xen
Virtualization type: none
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 10240K
NUMA node0 CPU(s): 0-7
Xen vCPU list:
xl vcpu-list
Domain-0 0 0 7 -b- 10.1 all / all
Domain-0 0 1 2 -b- 6.5 all / all
Domain-0 0 2 4 r-- 2.6 all / all
Domain-0 0 3 0 r-- 3.9 all / all
Domain-0 0 4 3 -b- 4.4 all / all
Domain-0 0 5 6 -b- 2.6 all / all
Domain-0 0 6 5 -b- 4.7 all / all
Domain-0 0 7 7 -b- 2.9 all / all
test 3 0 1 -b- 1.5 0-3 / all
test 3 1 0 -b- 1.8 0-3 / all
test 3 2 0 -b- 0.7 0-3 / all
test 3 3 2 -b- 0.6 0-3 / all
xen DomU PV VM config:
cat test.cfg
bootloader = '/usr/lib/xen-4.5/bin/pygrub'
vcpus = '4'
memory = '2048'
cpus = "0-3"
Is there any options to give the paravirtualized guest the host-cpu topology? In other words, how do I get Xen to use more vCPU cores/ vCPUs?
Thanks!

different CPU cache size reported by /sys/device/ and dmidecode

I'm trying to get the size of different cache level in my system.
I tried two techniques.
a) Using information from /sys/device. Here is the output.
$ cat /sys/devices/system/cpu/cpu0/cache/index1/size
32K
$ cat /sys/devices/system/cpu/cpu0/cache/index2/size
256K
$ cat /sys/devices/system/cpu/cpu0/cache/index3/size
8192K
b) Using information from dmidecode
$ sudo dmidecode -t cache
Cache Information
Socket Designation: CPU Internal L1
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Through
Location: Internal
Installed Size: 256 KB
Maximum Size: 256 KB
< .... >
Cache Information
Socket Designation: CPU Internal L2
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Through
Location: Internal
Installed Size: 1024 KB
Maximum Size: 1024 KB
< .... >
Cache Information
Socket Designation: CPU Internal L3
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 8192 KB
Maximum Size: 8192 KB
< .... >
The size reported for L2 and L3 cache is different. Any ideas as to a) why this discrepancy? b) Which method gives the correct value?
Other related information:
$uname -a
Linux 3.0.0-14-generic #23somerville3-Ubuntu SMP Mon Dec 12 09:20:18 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i7-3770 CPU # 3.40GHz
stepping : 9
cpu MHz : 2400.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips : 6784.23
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
< ... >
A few things:
You have a quad-core CPU
The index<n> name in /sys/devices/system/cpu/cpu<n>/cache does not correspond to L1/L2/L3 etc. There is a .../index<n>/level file that will tell you the level of the cache.
Your L1 cache is split into two caches (likely index0 and index1), one for data, and one for instructions (see .../index<n>/type), per core. 4 cores * 2 halves * 32K matches the 256K that dmidecode reports.
The L2 cache is split per-core. 4 cores * 256K (from index2) = 1024K, which matches dmidecodes L2 number.

Resources