How to find the L3 Cache parameters on Intel CPUs?

How to find the L3 Cache parameters on Intel CPUs? - linux

I would like to know the following L3 cache parameters. But not sure how to get them, I also pasted the /proc/cpuinfo output (4 processors, only pasted the first one, the others are repetitive.)
CACHE_SIZE
LINE_SIZE
Associativity
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i7-3520M CPU # 2.90GHz
stepping : 9
microcode : 0x15
cpu MHz : 1200.000
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips : 5786.68
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
UPDATE 1: It seems that someone posted the cache size and associativity here:
http://www.cpu-world.com/CPUs/Core_i7/Intel-Core%20i7-3520M%20%28PGA%29%20Mobile%20processor.html
But still I dont know the line size.

Hwloc / lstopo
Hwloc (Portable hardware locality) is a small utility that reports the structure of the processor in a neat visual diagram. The diagram shows the number of cores, hyperthreads and cache size. A single diagram tells it all.
$ sudo apt-get install hwloc
$ hwloc

Related

Tensorflow 2.4 doesn't work despite my cpu having AVX support

I am running Ubuntu 20.04.1 LTS/tf version 2.4.0, and I am not able to run the tensorflow library, because it always results in an error
This is the only line that I put in
import tensorflow as tf
This is the error it gives out
Illegal instruction (core dumped)
These are the processor specs
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i5-3437U CPU # 1.90GHz
stepping : 9
microcode : 0x21
cpu MHz : 842.451
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds
bogomips : 4789.04
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
If more information will be needed I will provide it.

I have the same problem running the tf version 2.4.0 in Ubuntu 18.04.4 LTS. I have been looking for a solution but I didn't find it yet so, for now, I am using the previous version which works for me.
pip uninstall tensorflow
pip install tensorflow==2.3.1

It should be fixed now in 2.4.1

Why I have no scaling_governor?

I am running v8's benchmark program, and I run the following command
./tools/cpu.sh fast
It prints out
Setting CPU frequency governor to "ondemand"
./tools/cpu.sh: line 13: /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor: no such file or directory
And I run
# ls /sys/devices/system/cpu/cpu0
cache crash_notes crash_notes_size microcode node0 power subsystem topology uevent
And find there is no "cpufreq"
After some searching, I find that I should install cpufrequtils, and I run
yum install cpufrequtils
After that, no thing works. So I wonder what is wrong here.
My system is
# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2 (Final)
Release: 7.2
Codename: Final
And my cpuinfo is
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel(R) Xeon(R) Gold 61xx CPU
stepping : 3
microcode : 0x1
cpu MHz : 2499.998
cache size : 4096 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap xsaveopt xsavec xgetbv1 arat
bogomips : 4999.99
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

It depends on your kernel configuration whether the governor interface is exposed at all, and which governors are available. I don't know any specifics about CentOS. On Debian/Ubuntu the governors should be available by default (last I checked, they were).
Maybe https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/power_management_guide/cpufreq_governors helps?

what's mean 'the use of this feature is restricted'?

I used the Xenserver 6.1 in my two servers.
I want to use live-migration.
But, they can't to join same resource pool .
So, I use cpu-masking feature.
However, it isn't working, too.
My first server info is..
[server-1]
cpu_count : 32
vendor: GenuineIntel
speed: 2000.066
modelname: Intel(R) Xeon(R) CPU E5-2640 v2 # 2.00GHz
family: 6
model: 62
stepping: 4
flags: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat
clflush acpi mmx fxsr sse sse2 ss ht nx constant_tsc
nonstop_tsc aperfmperf pni pclmulqdq vmx est ssse3 sse4_1 sse4_2
x2apic popcnt aes hypervisor ida arat tpr_shadow vnmi
flexpriority ept vpid
features: 77bee3ff-bfebfbff-00000001-2c100800
features_after_reboot: 77bee3ff-bfebfbff-00000001-2c100800
physical_features: 77bee3ff-bfebfbff-00000001-2c100800
maskable: no
My second server info is..
[server-2]
cpu_count : 24
vendor: GenuineIntel
speed: 2000.040
modelname: Intel(R) Xeon(R) CPU E5-2620 0 # 2.00GHz
family: 6
model: 45
stepping: 7
flags: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush
acpi mmx fxsr sse sse2 ss ht nx constant_tsc nonstop_tsc
aperfmperf pni pclmulqdq vmx est ssse3 sse4_1 sse4_2 x2apic
popcnt aes hypervisor ida arat tpr_shadow vnmi flexpriority
ept vpid
features: 17bee3ff-bfebfbff-00000001-2c100800
features_after_reboot: 17bee3ff-bfebfbff-00000001-2c100800
physical_features: 17bee3ff-bfebfbff-00000001-2c100800
maskable: full
I use this command in my server1.
xe host-set-cpu-features features=17bee3ff-bfebfbff-00000001-2c100800 uuid=6c91e5c8-06b9-4b5c-a41d-ec4d6b2c44aa
Result is 'The use of this feature is restricted'.
And, I use this command in my server2.
xe host-set-cpu-features features=77bee3ff-bfebfbff-00000001-2c100800 uuid=53566e64-a24f-42a4-8a6d-a26e9f740fa8
Result is same.
What's mean this message?
'The use of this feature is restricted'.
How to use cpu-masking in my environment?

I encountered the same error message when trying to join a host with a cpu model E5503 to a pool with two hosts with E5645's. I was not able to get past this error with XenServer 6.1, but after upgrading the pool and the lone host to 6.2 I was able to apply a mask and join the third host to the pool without further issue.

For MPI hosts file, how many slots

I have 2 computers I am using in a small cluster. Each has 2 Intel Xeon quad-cord processors.
I just wanted to verify that in my host file, I should specify 8 slots per host.
The tail of the /proc/cpu file looks like the following (with procs 0 - 6 before this):
.... processors 0-6 above ....
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5420 # 2.50GHz
stepping : 6
microcode : 0x606
cpu MHz : 1998.000
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dtherm tpr_shadow vnmi flexpriority
bogomips : 4987.49
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

8 hosts per node would certainly be a valid way of doing things.
You could potentially also use fewer if you plan to do something like use MPI + threads. It just depends on your application. In general though, it is a safe way to go to say that you plan to use 1 rank per core.

different CPU cache size reported by /sys/device/ and dmidecode

I'm trying to get the size of different cache level in my system.
I tried two techniques.
a) Using information from /sys/device. Here is the output.
$ cat /sys/devices/system/cpu/cpu0/cache/index1/size
32K
$ cat /sys/devices/system/cpu/cpu0/cache/index2/size
256K
$ cat /sys/devices/system/cpu/cpu0/cache/index3/size
8192K
b) Using information from dmidecode
$ sudo dmidecode -t cache
Cache Information
Socket Designation: CPU Internal L1
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Through
Location: Internal
Installed Size: 256 KB
Maximum Size: 256 KB
< .... >
Cache Information
Socket Designation: CPU Internal L2
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Through
Location: Internal
Installed Size: 1024 KB
Maximum Size: 1024 KB
< .... >
Cache Information
Socket Designation: CPU Internal L3
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 8192 KB
Maximum Size: 8192 KB
< .... >
The size reported for L2 and L3 cache is different. Any ideas as to a) why this discrepancy? b) Which method gives the correct value?
Other related information:
$uname -a
Linux 3.0.0-14-generic #23somerville3-Ubuntu SMP Mon Dec 12 09:20:18 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i7-3770 CPU # 3.40GHz
stepping : 9
cpu MHz : 2400.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips : 6784.23
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:
< ... >

A few things:
You have a quad-core CPU
The index<n> name in /sys/devices/system/cpu/cpu<n>/cache does not correspond to L1/L2/L3 etc. There is a .../index<n>/level file that will tell you the level of the cache.
Your L1 cache is split into two caches (likely index0 and index1), one for data, and one for instructions (see .../index<n>/type), per core. 4 cores * 2 halves * 32K matches the 256K that dmidecode reports.
The L2 cache is split per-core. 4 cores * 256K (from index2) = 1024K, which matches dmidecodes L2 number.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to find the L3 Cache parameters on Intel CPUs? - linux

Hwloc / lstopo Hwloc (Portable hardware locality) is a small utility that reports the structure of the processor in a neat visual diagram. The diagram shows the number of cores, hyperthreads and cache size. A single diagram tells it all. $ sudo apt-get install hwloc $ hwloc

Related

Tensorflow 2.4 doesn't work despite my cpu having AVX support

Why I have no scaling_governor?

what's mean 'the use of this feature is restricted'?

For MPI hosts file, how many slots

different CPU cache size reported by /sys/device/ and dmidecode

Categories

Resources