CPU cache information for Raspberry Pi not shown - linux

I want to know the size of L1 and L2 cache as well as other metrics of my Raspberry Pi 3B I'm using.
However, for whatever reason I cannot get any information regarding the cache. Commands such as getconf -a | grep CACHE give me:
LEVEL1_ICACHE_SIZE 0
LEVEL1_ICACHE_ASSOC 0
LEVEL1_ICACHE_LINESIZE 0
LEVEL1_DCACHE_SIZE 0
LEVEL1_DCACHE_ASSOC 0
LEVEL1_DCACHE_LINESIZE 0
LEVEL2_CACHE_SIZE 0
LEVEL2_CACHE_ASSOC 0
LEVEL2_CACHE_LINESIZE 0
LEVEL3_CACHE_SIZE 0
LEVEL3_CACHE_ASSOC 0
LEVEL3_CACHE_LINESIZE 0
LEVEL4_CACHE_SIZE 0
LEVEL4_CACHE_ASSOC 0
LEVEL4_CACHE_LINESIZE 0
Other tools like lshw give no cache information also.
What is the cause of this and how can I get this cache info?

It says on the Raspberry Pi Wikipedia page:
The Raspberry Pi 4 uses a Broadcom BCM2711 SoC with a 1.5 GHz 64-bit
quad-core ARM Cortex-A72 processor, with 1 MB shared L2 cache.
And yes, I would would also like to have the commands working like on the most other chips/OSs (I don't know exactly who is responsible for that).

Related

Single ZFS Checksum error on mirror, sounds improbable to me

I have a ZFS pool with the following layout and errors:
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000039ff3d3b114-part2 ONLINE 0 0 0
wwn-0x5000039ff4d3b513-part1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
wwn-0x5000c500a42783bc-part1 ONLINE 0 0 2
wwn-0x5000c500a426d50b-part1 ONLINE 0 0 2
errors: Permanent errors have been detected in the following files:
tank/foo/bar#veryOldSnapshot;corruptFile.qcow2
So it looks as if on two different devices the same record has been corrupted at the same time. The data in question is on this disks since 2019 and the pool is scrubbed every week. How are the chances? IMHO this cannot be a real "bits are flipped due to cosmical radiation or hdd failure" case, because the probability that exactly the same blocks are corrupted on both disks and no other block is are really low.
What else can have caused this? I ran memtestx86 without problems and scrubbing again does not find any other errors. However since the block is used in a long chain of snapshots, even removing the snapshot in question does just make the problem move to the next snapshot.

how to set all system irq for first 4 cores

server: i have servers with 2 intel 10 cores cpus or 8 cores. So some has 40 cores, some has 32 cores (enable intel HT)
background: i am running our application, which will isolate cpus, currently, i isolate the last 32 cores (core 8-39) for that application. 4 cores (core 4-7) for other use(normally, it will used 50% sys cpu). And i want to assign core 0-3 for system IRQ usage. since currently, if i run the application, system response is very slow, i think some of irq requests have been disputed to core 4-7, that cause low response.
do you think if that is possible just use 4 cores to handle system irq?
If you have more then one socket ("stone") that means you have NUMA system.
Here is a link to get more info https://en.wikipedia.org/wiki/Non-uniform_memory_access
Try to use CPUs on the same socket. Below I will explain why and how to do that
Determine what exactly CPU ids located on each socket.
% numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22
node 0 size: 24565 MB
node 0 free: 2069 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23
node 1 size: 24575 MB
node 1 free: 1806 MB
node distances:
node 0 1
0: 10 20
1: 20 10
Here "node" means "socket" (stone). So 0,2,4,6 CPUs are located on the same node.
And it makes sense to move all IRQs into one node to use L3 cache for set of CPUs.
Isolate all CPUs except 0,2,4,6.
Need to add argument to start Linux kernel
isolcpus= cpu_number [, cpu_number ,...]
for example
isolcpus=1,3,5,7-31
Control what IRQs are running on what CPUs
cat /proc/interrupts
Start your application with numactl command to aligne to CPUs and Memory.
(Here need to understand what NUMA and aligned is. Please follow the link at the beginning of the article)
numactl [--membind=nodes] [--cpunodebind=nodes]
Your question is much bigger than I mentioned here.
If you see the system is slow need to understand bottleneck.
Try to gather raw info with top, vmstat, iostat to find out the point of weakness.
Provide some stat of your system and I will help you to turn it up right way.

Linux(Ubuntu) load average higher than total-true-utilization?

I have a dell pd2950(2x4core) server running Ubuntu server 12.04LTS. And there's a VLC encoder instance running. Recently I updated the script(VLM) for VLC to increase quality and this means I'm increasing the CPU utilization too. So I started to tune the script to avoid exceeding maximum utilization. I use top to monitor the CPU utilization. I found that the load average is higher than 100%(I have 8-cores totally so 8.00 is 100%) but there's still 20-35% is idle, like:
top - 21:41:19 up 2 days, 17:15, 1 user, load average: 9.20, 9.65, 8.80
Tasks: 148 total, 1 running, 147 sleeping, 0 stopped, 0 zombie
Cpu(s): 32.8%us, 0.7%sy, 29.7%ni, 36.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 1982680k total, 1735672k used, 247008k free, 126284k buffers
Swap: 0k total, 0k used, 0k free, 774228k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9715 wilson RT 0 2572m 649m 13m S 499 33.5 13914:44 vlc
11663 wilson 20 0 17344 1328 964 R 2 0.1 0:02.00 top
1 root 20 0 24332 2264 1332 S 0 0.1 0:01.06 init
2 root 20 0 0 0 0 S 0 0.0 0:00.09 kthreadd
3 root 20 0 0 0 0 S 0 0.0 0:27.05 ksoftirqd/0
4 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/0:0
5 root 0 -20 0 0 0 S 0 0.0 0:00.00 kworker/0:0H
To confirm my CPU(s) don't have Hyper-Thread, I tried:
wilson#server:/$ nproc
8
And to reduce the sampling deviation cause by refresh time, I also tried:
wilson#server:/$ top -d 0.1
I looked at the number %id for a long time, it haven't been lower than 14.
I also tried:
wilson#server:/$ uptime
21:57:20 up 2 days, 17:31, 1 user, load average: 9.03, 9.12, 9.35
The 1m load average often reach 14-15. So I'm wondering what's wrong with my system? Has anyone ever have this problem?
More information:
I'm using VLC with x264 codec to encode a live HTTP stream(application/octet-stream). It use ffmpeg(libavc) to decode and output as Apple HLS(.ts segment). I found this problem after I added arguments for x264:
level=41,ref=5,b-adapt=2,direct=auto,me=umh,subq=8,rc-lookahead=60,analyse=all
This almost equal to preset=slower. And as you can see, my VLC in running in real-time. The parameter is:
wilson#server:/$ chrt -p -f 99 vlc-wrapper
There does not appear to be anything wrong with your system. What is wrong seems to be your understanding of CPU accounting. In particular, load average has nearly nothing at all to do with CPU usage. Load average is based on the number of processes that are ready to run (not waiting on I/O, network, keyboard input, etc...), if there is an available CPU for them to be scheduled on. While it's true that, given an 8 core system, if all 8 cores are 100% busy with a single CPU-bound thread each, your load average should be around 8.00, it is entirely possible to have a load average of 200.0 with near-0% CPU utilization. All that would indicate is you have 200 processes that are ready to run, but as soon as they get scheduled, they do almost nothing before they go back to waiting for input of some sort.
Your top output shows that vlc seems to be using roughly the equivalent of 5 of your cores, but it doesn't indicate whether you have 5 cores at 100% each, or if all 8 cores are at 62.5% each. All of the other processes listed by top also contribute to your load average, as well as CPU usage. In particular, top running with a short delay like your example of 0.1 seconds, will probably increase your load average by almost 1 itself, even though, overall, it's not using a lot of CPU time.
Read this:
Understanding load average vs. cpu usage
If the load average is at 7, with 4 hyper-threaded processors, shouldn't that means that the CPU is working to about 7/8 capacity?
No it just means that you have 7 running processes in the job queue on average.
But I think that we can't use load average as a reference number to determine system is overload or not. So that I wonder if there's a kernel-level cpu utitlization statistical tools or not?(why kernel level because reduce performance loss)

Determine average transfer rate on linux system IP interface [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I want to know, what the average transfer rate on a particular (VPN) interface of my linux system is.
I have the following info from netstat:
# netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 264453 0 0 0 145331 0 0 0 BMRU
lo 16436 0 382692 0 0 0 382692 0 0 0 LRU
tun0 1500 0 13158 0 0 0 21264 0 12 0 MOPRU
The VPN interface is tun0. So this interface received 13158 packets and sent 21264 packets. My question based on this:
what is the time-frame during which these stats are collected? Since the computer was started?
# uptime
15:05:49 up 7 days, 20:40, 1 user, load average: 0.19, 0.08, 0.06
how to convert the 13158 "packets" to kB of data so as to get kbps?
Or should I use a completely other method?
Question 1:
The time frame is from the time the device was brought up until now (maybe days or weeks ago, try and figure from the logs!).
Which means that to get a practical average kbps number comparable to what you'd see in a system monitor or what e.g. top or uptime display for the CPU, you will want to read the current value twice (with, say, 1 second in between), and subtract the second value from the first. Then divide by the time (which is not necessary if you have a 1-second delay), multiply by 8, and divide by 1,000 to get kbps.
Question 2:
You don't. There is no way to convert "packets" to "bytes" as packets are variable sized. There is a "bytes" field that you can read.
Test case on my NAS box with some traffic going on:
nas:# grep eth0 /proc/net/dev ; sleep 1 ; grep eth0 /proc/net/dev
eth0:137675373 166558 0 0 0 0 0 0 134406802 41228 0 0 0 0 0 0
eth0:156479566 182767 0 0 0 0 0 0 155912310 44479 0 0 0 0 0 0
The result is: (155912310 - 134406802)*8/1000 = 172044 kbps (172 Mbps usage on a 1Gbps network).
If you look in /proc/net/dev instead of netstat -i, you can get bytes transmitted/received (also available via ifconfig or netstat -ie, but more easily parsed from /proc/net/dev). The counts are typically since the interface was created, which is usually boot time for "real" interfaces. For a tun interface, it's likely when the tunnel was started, which might be different than system boot, depending on when/how you're creating it...

How do I get information on linux whether my program is swapping or not?

More specifically: I want to find this information from inside the program, preferably just before it starts swapping so I can react. So far I found:
Information inside /proc, which is not very useful
mincore syscall which seems to be available on linux and bsd, but requires me to pass in all the pages I'm interested in (might be enough, but it's a bit tedious)
Any more ideas?
vmstat
To run every 2 seconds, you say "vmstat 2". It gives you output like:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 16124 431352 439000 0 0 4 2 37 18 0 0 100 0 0
The "si" and "so" columns are "swap-in" and "swap-out". Swapd is how much memory is in the swap device. Swapd should be stable, and si and so zero.
Remember:
You shouldn't really ask "is my program swapping" - as opposed to "is the system swapping". You program can cause others to swap - others can cause yours to swap, etc. Either way, when that happens - performance d...i..e...s....

Resources