I am running a CPU benchmark tool (LINPACK) on a PV virtual machine inside Xen 4.5.1 on Ubuntu 15.10 x64 on an IBM x3550 M4 server. This tool should consume all possible CPU cycles available. I allocate 4 vCPUs by defining this in the Xen PV (test.cfg). However, LINPACK detects only 1 core and 4 threads while it should detect at least 4 cores:
CPU frequency: 2.494 GHz
Number of CPUs: 1
Number of cores: 1
Number of threads: 4
This is what lscpu says inside this Xen PV VM:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 4
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2609 v2 # 2.50GHz
Stepping: 4
CPU MHz: 2500.062
BogoMIPS: 5000.12
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 10240K
NUMA node0 CPU(s): 0-3
Other platforms such as Docker and HVM DO get cores allocated inside the virtual node (see below). These nodes have significant better performance then the Xen PV virtual node.
CPU frequency: 2.499 GHz
Number of CPUs: 2
Number of cores: 8
Number of threads: 4
This is the DomU Xen host machine lscpu:
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2609 v2 # 2.50GHz
Stepping: 4
CPU MHz: 2500.062
BogoMIPS: 5000.12
Hypervisor vendor: Xen
Virtualization type: none
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 10240K
NUMA node0 CPU(s): 0-7
Xen vCPU list:
xl vcpu-list
Domain-0 0 0 7 -b- 10.1 all / all
Domain-0 0 1 2 -b- 6.5 all / all
Domain-0 0 2 4 r-- 2.6 all / all
Domain-0 0 3 0 r-- 3.9 all / all
Domain-0 0 4 3 -b- 4.4 all / all
Domain-0 0 5 6 -b- 2.6 all / all
Domain-0 0 6 5 -b- 4.7 all / all
Domain-0 0 7 7 -b- 2.9 all / all
test 3 0 1 -b- 1.5 0-3 / all
test 3 1 0 -b- 1.8 0-3 / all
test 3 2 0 -b- 0.7 0-3 / all
test 3 3 2 -b- 0.6 0-3 / all
xen DomU PV VM config:
cat test.cfg
bootloader = '/usr/lib/xen-4.5/bin/pygrub'
vcpus = '4'
memory = '2048'
cpus = "0-3"
Is there any options to give the paravirtualized guest the host-cpu topology? In other words, how do I get Xen to use more vCPU cores/ vCPUs?
Thanks!
Related
There seems to be a bug with lscpu where it can not determine the correct number of sockets. There is an issue opened for this but I haven't got any response https://github.com/karelzak/util-linux/issues/698. This is my output:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 32
NUMA node(s): 5
Model: IBM,9119-MHE
L1d cache: 64K
L1i cache: 32K
NUMA node0 CPU(s): 0-255
NUMA node4 CPU(s):
NUMA node5 CPU(s):
NUMA node6 CPU(s):
NUMA node7 CPU(s):
Is there another way to go about getting the number of sockets?
A linux patch currently in testing (on Feb 26 2020) patch fixes this issue.
The patch is this one
Expect this to come out in the next 5.6 linux kernel release.
I am new in the intel SPDK and meet some problem when I run the example code.
I setup the BIOS as this page said.
Intel® Hyper-Threading Technology off
Intel SpeedStep® technology enabled
Intel® Turbo Boost Technology disabled
then I git clone from this page and run all the command. The test command ./test/unit/unittest.sh return All unit tests passed.
But when I run the example examples/ioat/verify/verify , it return
EAL: 24 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found for that size
Starting SPDK v18.10-pre / DPDK 18.05.0 initialization...
[ DPDK EAL parameters: verify --no-shconf -c 0x1 --legacy-mem --file-prefix=spdk_pid3170 ]
EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/spdk_pid3170/mp_socket
EAL: 24 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found
for that size
EAL: Probing VFIO support...
User configuration:
Run time: 10 seconds
Core mask: 0x1
Queue depth: 32
Not enough ioat channels found. Check that ioat channels are bound
to uio_pci_generic or vfio-pci. scripts/setup.sh can help with this.
and scripts/setup.sh status shows
Hugepages
node hugesize free / total
node0 1048576kB 24 / 24
node0 2048kB 0 / 800
node1 1048576kB 0 / 0
node1 2048kB 0 / 224
NVMe devices
BDF Numa Node Driver name Device name
I/OAT DMA
BDF Numa Node Driver Name
virtio
BDF Numa Node Driver Name Device Name
My hardware is:
linux kernel version 4.15.7
with ioatdma compile as module
CPU intel Xeon E5-2695
chipset C612
It would be great help if somebody could give me some advises or send me some website about SPDK!
Thank you!
Run ./scripts/setup.sh (with no parameters). If there will be no ioat devices under I/OAT DMA section you can't run this app. Also there is no hugetlbfs mount points.
I am getting this error in my debug file.If anybody knows this error please help to solve it. Frustated with this error
ERROR [epollEventLoopGroup-2-51] 2017-11-09 16:09:21,495 Slf4JLogger.java:176 - LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetection.level=advanced' or call ResourceLeakDetector.setLevel() See http://netty.io/wiki/reference-counted-objects.html for more information.
I am using G1 garbage collector instead of CMS collector
I have 4 servers
x.x.x.1 contains-------------------------------------------
MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="2000M"
OS: CentOS - 7
RAM: 142 GB
Swap: 3 GB
Processor: Intel(R) Xeon(R) CPU E5-2630 v4 # 2.20GHz
Core: 40 Core
Disk: 2.5T
x.x.x.2 contains--------------------------------------------
MAX_HEAP_SIZE="16G"
HEAP_NEWSIZE="4000M"
OS: CentOS - 7
RAM: 125 GB
Swap: 3 GB
Processor: Intel(R) Xeon(R) CPU E5-2630 v4 # 2.20GHz
Core: 40 Core
Disk: 2.2T
x.x.x.3 contains---------------------------------------------
MAX_HEAP_SIZE="16G"
HEAP_NEWSIZE="4000M"
OS: CentOS - 7
RAM: 125 GB
Swap: 3 GB
Processor: Intel(R) Xeon(R) CPU E5-2630 v4 # 2.20GHz
Core: 40 Core
Disk: 2 TB
x.x.x.4 contains-----------------------------------------
MAX_HEAP_SIZE="4G"
HEAP_NEWSIZE="1200M"
OS: CentOS - 7
RAM: 125 GB
Swap: 3 GB
Processor: Intel(R) Xeon(R) CPU E5-1650 v3 # 3.50GHz
Core: 12 Core
Disk: 2.7 TB
jvm options are like this----------------------
-XX:InitiatingHeapOccupancyPercent=70
-XX:ParallelGCThreads=16
-XX:ConcGCThreads=16
log options are like this -----------------------
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure
-XX:PrintFLSStatistics=1
-Xloggc:/var/log/cassandra/gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M
but thing is one of the server is always turns down.
Thanks and Regards
pavs
NUMA support on which CPU? What are the current server configuration of this kind of CPU? Linux NUMA commands regarding what, how to open NUMA?
This is going to depend of your server, if it's using a multicore cpu that support Numa affinity. Type numactl --hardware and you'll check how it's the current configuration, for example:
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 32733 MB
node 0 free: 4027 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 32767 MB
node 1 free: 20898 MB
node distances:
node 0 1
0: 10 21
1: 21 10
If you want to check performance with your application, just make sure that it's using the CPUs from the same numa node. You can check this using ps -aux ortop commands.
Following is an output of the "lscpu" command in a server machine.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 1200.000
BogoMIPS: 5188.69
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-31
How many threads can be executed in parallel on this machine? what will be the formula to calculate this number with the above parameters?