How to check what is causing performance issue - linux

I have inherited a server with some performance issues. It is running node js, nginx, basic MEAN stack. (DB on another server though)
Whenever I copied a file (log file with size of around 150MB) or vim a file with that size, the output of "iostat -x 1" will be like below
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 8137.62 0.00 49.50 0.00 29924.75 604.48 17.32 123.54 16.50 81.68
avg-cpu: %user %nice %system %iowait %steal %idle
1.59 0.00 24.34 0.00 0.00 74.07
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 39.39 0.00 36606.06 929.23 2.42 351.64 1.87 7.37
avg-cpu: %user %nice %system %iowait %steal %idle
2.78 0.00 24.44 0.00 0.00 72.78
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The main reason I am bringing this up is because sometimes a simple RESTFUL API that my nodejs is serving will respond slowly (from 10ms to 500ms) and I am not sure what to look for or check what is causing this.
The same codebase copied to another server will run smoothly without issues, the problem mentioned above is the only lead I can find that there might be something wrong with the server but I am not sure what is it.
The codes are like below:
In one file statistic/index.js:
var tracker = function (data) {
piwik.tracker(data);
};
exports.tracker = tracker;
In another file statistic/piwik.js:
exports.tracker = function (data) {
var params = {};
/** Assign params with data - just static string **/
/** API_URL is another machine in same LAN **/
needle.post(API_URL, params, function (err, response, body) {
if (err || (response.statusCode !== 200 && response.statusCode !== 204)) {
util.error(err);
}
});
};
In the file that is calling the above route/getuser.js:
exports.getUser = function (req, res) {
async.auto({
get_user: function (cb) {
/** Read user data from DB **/
cb();
},
record_statistic: ['get_user', function (cb) {
statistic.tracker({ /** Pass static string data **/});
cb();
}]
}, function (err) {
if (err) {
res.json(err);
} else {
res.json();
}
});
};
The reason I am mentioning the above codes is because when I remarked out statistic.tracker({ /** Pass static string data **/}); The function will response within 50ms, but if I include it most of the time it will respond between 100ms - 500ms. I have even put a timestamp check for the "needle" HTTP post, and it respond within 10 - 20ms.
When I copy a file (cp -p x.txt y.txt) especially when it is a > 100MB file, it will also slow down my node js. But even when I am not doing anything on the server besides nginx and node js listening for request the codes below will respond slowly. (If I didn't remark out the code)
I am suspecting IO but where else to check? or what to look out for?
Below are some info about the server:
[ec2-user#tlp-backend logs]$ uname -a
Linux tlp-backend 2.6.32-71.el6.x86_64 #1 SMP Fri May 20 03:51:51 BST 2011 x86_64 x86_64 x86_64 GNU/Linux
[ec2-user#tlp-backend logs]$ cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: NECVMWar Model: VMware IDE CDR10 Rev: 1.00
Type: CD-ROM ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: VMware, Model: VMware Virtual S Rev: 1.0
Type: Direct-Access ANSI SCSI revision: 02
[ec2-user#tlp-backend logs]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 30502
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[ec2-user#tlp-backend logs]$ more /proc/meminfo
MemTotal: 3918960 kB
MemFree: 392260 kB
Buffers: 296116 kB
Cached: 1205652 kB
SwapCached: 364 kB
Active: 1725084 kB
Inactive: 1155564 kB
Active(anon): 949492 kB
Inactive(anon): 430528 kB
Active(file): 775592 kB
Inactive(file): 725036 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 4095992 kB
SwapFree: 4092872 kB
Dirty: 28 kB
Writeback: 0 kB
AnonPages: 1378528 kB
Mapped: 29860 kB
Shmem: 1140 kB
Slab: 588628 kB
SReclaimable: 461108 kB
SUnreclaim: 127520 kB
KernelStack: 2296 kB
PageTables: 16940 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 6055472 kB
Committed_AS: 1829320 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 288456 kB
VmallocChunk: 34359446140 kB
HardwareCorrupted: 0 kB
AnonHugePages: 507904 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 10240 kB
DirectMap2M: 4184064 kB
[ec2-user#tlp-backend logs]$ more /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5645 # 2.40GHz
stepping : 2
cpu MHz : 2400.000
cache size : 12288 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch
_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf unfair_
spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lah
f_lm ida arat
bogomips : 4800.00
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5645 # 2.40GHz
stepping : 2
cpu MHz : 2400.000
cache size : 12288 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch
_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf unfair_
spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lah
f_lm ida arat
bogomips : 4800.00
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5645 # 2.40GHz
stepping : 2
cpu MHz : 2400.000
cache size : 12288 KB
physical id : 1
siblings : 2
core id : 0
cpu cores : 2
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch
_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf unfair_
spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lah
f_lm ida arat
bogomips : 4800.00
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5645 # 2.40GHz
stepping : 2
cpu MHz : 2400.000
cache size : 12288 KB
physical id : 1
siblings : 2
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc arch
_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf unfair_
spinlock pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lah
f_lm ida arat
bogomips : 4800.00
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
top - 18:02:58 up 26 days, 18:49, 2 users, load average: 0.00, 0.00, 0.00
Tasks: 176 total, 1 running, 175 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 3918960k total, 3522368k used, 396592k free, 295340k buffers
Swap: 4095992k total, 3120k used, 4092872k free, 1201660k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 19340 1248 1040 S 0.0 0.0 0:03.45 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:02.55 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:01.57 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
6 root RT 0 0 0 0 S 0.0 0.0 0:18.87 migration/1
7 root 20 0 0 0 0 S 0.0 0.0 0:01.34 ksoftirqd/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
9 root RT 0 0 0 0 S 0.0 0.0 0:01.90 migration/2
10 root 20 0 0 0 0 S 0.0 0.0 0:01.30 ksoftirqd/2
11 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/2

Apparently, the main problem is coming from the code itself. When you are mixing async.auto with needle package, you need to explicitly stating { connection: 'keep-alive' } in HTTP header
See more info here: https://github.com/tomas/needle/issues/148

Related

Yocto sato image is running extremely slow, is there a way to improve its performance?

We have a legacy device with following configurations:
Chipset Architecture : Intel NM10 express
OS : Yocto warrior
CPU : Atom D2250 Dual Core
Volatile Memory : 2GB DDR3
CPU core : 4
After installing yocto sato image on device, we are seeing sluggish behavior. We are unsure whether it is because of hardware limitations or anything else.
If this slow behavior is because of yocto sato image then,
Is there something which can be done to improve performance?
Additional information:
Output(only first core) of cat /proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 54
model name : Intel(R) Atom(TM) CPU D2550 # 1.86GHz
stepping : 1
microcode : 0x10d
cpu MHz : 1861.758
cache size : 512 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm arat
bugs :
bogomips : 3723.54
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
Output of cat /proc/meminfo:
MemTotal: 917184 kB
MemFree: 700044 kB
MemAvailable: 761264 kB
Buffers: 5928 kB
Cached: 74112 kB
SwapCached: 0 kB
Active: 129856 kB
Inactive: 44956 kB
Active(anon): 95096 kB
Inactive(anon): 5720 kB
Active(file): 34760 kB
Inactive(file): 39236 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 102980 kB
HighFree: 220 kB
LowTotal: 814204 kB
LowFree: 699824 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 16 kB
Writeback: 0 kB
AnonPages: 94816 kB
Mapped: 42624 kB
Shmem: 6048 kB
Slab: 17692 kB
SReclaimable: 9124 kB
SUnreclaim: 8568 kB
KernelStack: 976 kB
PageTables: 1676 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 458592 kB
Committed_AS: 348404 kB
VmallocTotal: 122880 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Percpu: 528 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 22520 kB
DirectMap2M: 884736 kB
Output of cat /proc/uptime:
00:44:08 up 29 min, load average: 0.51, 0.78, 0.67
Edit1 : lubuntu is running as expected i.e. No sluggish behavior is seen on device
Edit2 : Graphics gets faster after removing libglamoregl.so. What is there in this library which is causing graphics performance issue.

isolcpus - binding not working

Sorry, this is a duplicate question from ServerFault. I am posting this here since I didn't get any response there.
Question:
I am using isolcpus to isolate cores. I would like to bind specific threads to cores, but it is not working. The threads are moved to different cores after I bind them.
Cores 13, 14, and 15 are isolated:
$ cat /proc/cmdline
ro root=/dev/mapper/vg0-root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg0/swaprd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=137M#0M rd_NO_DM KEYBOARDTYPE=pc KEYTABLE=us rd_LVM_LV=vg0/root rhgb quiet audit=0 intel_idle.max_cstate=0 console=tty0 console=ttyS1,115200 printk.time=1 processor.max_cstate=1 idle=poll biosdevname=0 isolcpus=13-15
top -H -p pgrep -u prusr12 Ser -d 1 shows this: 5017 and 5018 should have been bound to 14 and 15 and 5014 and 5016 should have been on 13.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
5017 prusr12 20 0 1312m 1.1g 1.1g R 99.9 0.9 9:53.93 5 Server-3.10.
5018 prusr12 20 0 1312m 1.1g 1.1g R 99.9 0.9 10:08.88 7 Server-3.10.
5014 prusr12 20 0 1312m 1.1g 1.1g S 0.0 0.9 0:00.40 2 Server-3.10.
5016 prusr12 20 0 1312m 1.1g 1.1g S 0.0 0.9 0:01.04 4 Server-3.10.
The command line is this:
sg devuser "taskset -c 13 /releases/3.10.0/bin/Server-3.10.0 -n X -e DEV -p DEFAULT > /logs/ServerDevPR_DEFAULT.out 2>&1 &"
There are 4 threads in the process. I want the main thread to start on 13, hence taskset -c 13. Then two threads are spawed and will bind them to 14 and 15. I see that the threads were bound to 14 and 15, but then they were moved to other cores. pthread_setaffinity_np() is being used to bind the threads to cores.
Log after I bind the threads to 14 and 15:
CpuSet returned by pthread_getaffinity_np() contained:CPU 14
CpuSet returned by pthread_getaffinity_np() contained:CPU 15
System details:
$ uname -a
Linux host123 2.6.32-573.12.1.el6.x86_64 #1 SMP Mon Nov 23 12:55:32 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Stepping: 2
CPU MHz: 3199.847
BogoMIPS: 6399.06
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
What could be going wrong? Thanks for your time.

how to enable linux perf tool's branch sampling

I use linux perf tool to collect branch info of programs, and the command and result is as follow:
$ sudo perf record -b /bin/ls
Error:
No hardware sampling interrupt available.
No APIC? If so then you can boot the kernel with the "lapic" boot parameter to force-enable it.
the content in /pro/cpuinfo is below:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5405 # 2.00GHz
stepping : 10
microcode : 0xa07
cpu MHz : 1994.921
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 **apic** sep mtrr pge mca cmov pat pse36 clflush dts acpi strong text mmx fxsr sse sse2 ss ht tm
pbe syscall nx lm constant_tsc arch_perfmon pebs **bts** rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm
dca sse4_1 xsave lahf_lm dtherm tpr_shadow vnmi flexpriority
bugs :
bogomips : 3989.84
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:
apic and bts in flags entry is strengthened(I want but just encapsuled by "**") and I don't know what else is import for this case. And the other 7 processors are same to processor 0.
The boot parameter "lapic" is added by modifying /boot/grub/grub.cfg:
menuentry 'Ubuntu' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-0ed8a872-4eb7-4339-a0bb-6c0033da582e' {
recordfail
load_video
gfxmode $linux_gfx_mode
insmod gzio
insmod part_msdos
insmod ext2
set root='hd0,msdos1'
if [ x$feature_platform_search_hint = xy ]; then
search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 ced80bc6-08a9-4909-9717-97658cf0c4fd
else
search --no-floppy --fs-uuid --set=root ced80bc6-08a9-4909-9717-97658cf0c4fd
fi
linux /vmlinuz-4.2.0-42-generic root=/dev/mapper/fedora_hustyong-root ro **lapic** quiet splash $vt_handoff
initrd /initrd.img-4.2.0-42-generic
}
just add lapic in linux entry.
But no sense after rebooting.
My questions:
1) What does the error info means?
2) Does the perf tool branch sampling use Intel Branch Trace Store(BTS)? Or Last Branch Record(LBR)?
3) How can I look up the LBR support?
4) what is different of the LBR and BTS support between x86 32bit and 64bit?
My OS is Ubuntu 14.04 64bit:
$ uname -a
Linux user-S5000VSA 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
The perf install instructions:
$ sudo apt-get install linux-tools-common
$ sudo apt-get install linux-tools-4.2.0-27-generic linux-cloud-tools-4.2.0-27-generic
update:
the content of /proc/interrupts:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 127780 52084 127784 126729 127706 128431 127785 126822 IO-APIC 2-edge timer
1: 52 42 3 2 62 49 5 2 IO-APIC 1-edge i8042
8: 0 0 0 0 0 0 0 1 IO-APIC 8-edge rtc0
9: 0 0 0 0 0 0 0 0 IO-APIC 9-fasteoi acpi
12: 1428 1307 52 47 1424 1324 53 58 IO-APIC 12-edge i8042
14: 0 0 0 0 0 0 0 0 IO-APIC 14-edge ata_piix
15: 0 0 0 0 0 0 0 0 IO-APIC 15-edge ata_piix
17: 47 276 1004 49 52 295 993 50 IO-APIC 17-fasteoi radeon
20: 31062 4201 7533 29935 31080 4297 7540 29824 IO-APIC 20-fasteoi ata_piix
22: 0 0 0 0 0 0 0 0 IO-APIC 22-fasteoi uhci_hcd:usb3, uhci_hcd:usb5
23: 0 0 0 0 0 0 0 0 IO-APIC 23-fasteoi ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb4
25: 2 755654 3 3 1 1 3 6 PCI-MSI 2621440-edge eth0
27: 0 0 0 0 0 0 0 1 PCI-MSI 131072-edge ioat-msi
NMI: 6756 678 6894 5867 861 2168 4994 3700 Non-maskable interrupts
LOC: 343554 578094 1736638 773135 219952 777567 1459249 689292 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 6756 678 6894 5867 861 2168 4994 3700 Performance monitoring interrupts
IWI: 6756 678 6894 5867 861 2168 4994 3700 IRQ work interrupts
RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries
RES: 82594 294601 142535 259797 77845 316210 84927 261455 Rescheduling interrupts
CAL: 4749 9296 7358 31330 7560 8564 5751 20364 Function call interrupts
TLB: 5933 2044 12867 11215 6563 4682 8669 8272 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 292 292 292 292 292 292 292 292 Machine check polls
HYP: 0 0 0 0 0 0 0 0 Hypervisor callback interrupts
ERR: 0
MIS: 0
PIN: 0 0 0 0 0 0 0 0 Posted-interrupt notification event
PIW: 0 0 0 0 0 0 0 0 Posted-interrupt wakeup event
I install ubuntu 16.10 64bit in my PC and run perf record -b successfully. I think maybe it's wrong in kernel or linux-tools-4.2.0-27-generic or linux-cloud-tools-4.2.0-27-generic package.

How do I find out how many CPUs (vcpus) my XEN system has?

If I start top or look into /proc/cpuinfo, I see only two CPUs. If I view the values displayed for my system with virt-manager, that tool shows me 32 vcpus (which is the value I think is correct).
I failed (yet) to find out on a script-level on the hypervisor that correct value (32). I have been looking into /proc/cpuinfo and /sys/devices/system/cpu/ and other things I could think of, but found that value nowhere. Also the shell commands like xen or xm I examined closely, but found no way to display the value I'm looking for.
Does anybody know how I can find out how many vcpus my XEN system provides?
EDIT: lscpu gives me:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 2900.086
BogoMIPS: 5800.17
Hypervisor vendor: Xen
Virtualization type: none
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0,1
So, this also does not show the "32" anywhere.
The contents of /proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2690 0 # 2.90GHz
stepping : 7
microcode : 0x70d
cpu MHz : 2900.086
cache size : 20480 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc pni pclmulqdq est ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm ida arat pln pts dtherm
bogomips : 5800.17
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2690 0 # 2.90GHz
stepping : 7
microcode : 0x70d
cpu MHz : 2900.086
cache size : 20480 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc pni pclmulqdq est ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm ida arat pln pts dtherm
bogomips : 5800.17
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
I found a way now:
IFS=: read _ nr_cpus < <(xm info | grep ^nr_cpus)
echo "$nr_cpus"
This will (finally) print
32
on my system.
This is a nice way:
cat /proc/cpuinfo | grep "core id"
This is an example of the output:
core id : 0
core id : 1
core id : 0
core id : 1
In my case, the system is dual-core, I have only core-id 0 and core-id 1.

VmSize = physical memory + swap?

I have a little question regarding VmSize, in the documentation it's supposed to be the application's usage of memory.
However on my system:
VmSize = physical memory + swap
VmHWM seems more like what the application actually would be using.
[root#sun ~]# free -m
total used free shared buffers cached
Mem: 12012 9223 2788 0 613 1175
-/+ buffers/cache: 7434 4577
Swap: 3967 0 3967
[root#sun ~]# cat /proc/8268/status
Name: mysqld
State: S (sleeping)
Tgid: 8268
Pid: 8268
PPid: 1
TracerPid: 0
Uid: 89 89 89 89
Gid: 89 89 89 89
FDSize: 512
Groups: 89
VmPeak: 15878128 kB
VmSize: 15878128 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 7036312 kB
VmRSS: 7036312 kB
VmData: 15839272 kB
VmStk: 136 kB
VmExe: 10744 kB
VmLib: 6356 kB
VmPTE: 16208 kB
VmSwap: 0 kB
Threads: 265
SigQ: 0/96048
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000087007
SigIgn: 0000000000001000
SigCgt: 00000001800066e9
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp: 0
Cpus_allowed: fff
Cpus_allowed_list: 0-11
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 2567
nonvoluntary_ctxt_switches: 77
Any idea of why?
I try to get the usage of memory for this application in particular but this result doesn't really make sense.
Thanks.
VMsize is the "address space" that the process has in use: the number of available adresses. These addresses do not have to have any physical memory attached to them. (Attached physical memory is the RSS figure)
You can verify this by allocating a chunk of memory with p = malloc(4 * 1024 * 1024);, and not doing anything to *p: the VmSize will increase by 1K pages, but the RSS will be (about) the same. (your program will have more adressable memory, but it does not address it, so the memory does not need to be attached )
VmSize is the sum of all mapped memory (/proc/pid/maps)

Resources