Single ZFS Checksum error on mirror, sounds improbable to me - linux

I have a ZFS pool with the following layout and errors:
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-0x5000039ff3d3b114-part2 ONLINE 0 0 0
wwn-0x5000039ff4d3b513-part1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
wwn-0x5000c500a42783bc-part1 ONLINE 0 0 2
wwn-0x5000c500a426d50b-part1 ONLINE 0 0 2
errors: Permanent errors have been detected in the following files:
tank/foo/bar#veryOldSnapshot;corruptFile.qcow2
So it looks as if on two different devices the same record has been corrupted at the same time. The data in question is on this disks since 2019 and the pool is scrubbed every week. How are the chances? IMHO this cannot be a real "bits are flipped due to cosmical radiation or hdd failure" case, because the probability that exactly the same blocks are corrupted on both disks and no other block is are really low.
What else can have caused this? I ran memtestx86 without problems and scrubbing again does not find any other errors. However since the block is used in a long chain of snapshots, even removing the snapshot in question does just make the problem move to the next snapshot.

Related

CPU cache information for Raspberry Pi not shown

I want to know the size of L1 and L2 cache as well as other metrics of my Raspberry Pi 3B I'm using.
However, for whatever reason I cannot get any information regarding the cache. Commands such as getconf -a | grep CACHE give me:
LEVEL1_ICACHE_SIZE 0
LEVEL1_ICACHE_ASSOC 0
LEVEL1_ICACHE_LINESIZE 0
LEVEL1_DCACHE_SIZE 0
LEVEL1_DCACHE_ASSOC 0
LEVEL1_DCACHE_LINESIZE 0
LEVEL2_CACHE_SIZE 0
LEVEL2_CACHE_ASSOC 0
LEVEL2_CACHE_LINESIZE 0
LEVEL3_CACHE_SIZE 0
LEVEL3_CACHE_ASSOC 0
LEVEL3_CACHE_LINESIZE 0
LEVEL4_CACHE_SIZE 0
LEVEL4_CACHE_ASSOC 0
LEVEL4_CACHE_LINESIZE 0
Other tools like lshw give no cache information also.
What is the cause of this and how can I get this cache info?
It says on the Raspberry Pi Wikipedia page:
The Raspberry Pi 4 uses a Broadcom BCM2711 SoC with a 1.5 GHz 64-bit
quad-core ARM Cortex-A72 processor, with 1 MB shared L2 cache.
And yes, I would would also like to have the commands working like on the most other chips/OSs (I don't know exactly who is responsible for that).

Determine average transfer rate on linux system IP interface [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I want to know, what the average transfer rate on a particular (VPN) interface of my linux system is.
I have the following info from netstat:
# netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 264453 0 0 0 145331 0 0 0 BMRU
lo 16436 0 382692 0 0 0 382692 0 0 0 LRU
tun0 1500 0 13158 0 0 0 21264 0 12 0 MOPRU
The VPN interface is tun0. So this interface received 13158 packets and sent 21264 packets. My question based on this:
what is the time-frame during which these stats are collected? Since the computer was started?
# uptime
15:05:49 up 7 days, 20:40, 1 user, load average: 0.19, 0.08, 0.06
how to convert the 13158 "packets" to kB of data so as to get kbps?
Or should I use a completely other method?
Question 1:
The time frame is from the time the device was brought up until now (maybe days or weeks ago, try and figure from the logs!).
Which means that to get a practical average kbps number comparable to what you'd see in a system monitor or what e.g. top or uptime display for the CPU, you will want to read the current value twice (with, say, 1 second in between), and subtract the second value from the first. Then divide by the time (which is not necessary if you have a 1-second delay), multiply by 8, and divide by 1,000 to get kbps.
Question 2:
You don't. There is no way to convert "packets" to "bytes" as packets are variable sized. There is a "bytes" field that you can read.
Test case on my NAS box with some traffic going on:
nas:# grep eth0 /proc/net/dev ; sleep 1 ; grep eth0 /proc/net/dev
eth0:137675373 166558 0 0 0 0 0 0 134406802 41228 0 0 0 0 0 0
eth0:156479566 182767 0 0 0 0 0 0 155912310 44479 0 0 0 0 0 0
The result is: (155912310 - 134406802)*8/1000 = 172044 kbps (172 Mbps usage on a 1Gbps network).
If you look in /proc/net/dev instead of netstat -i, you can get bytes transmitted/received (also available via ifconfig or netstat -ie, but more easily parsed from /proc/net/dev). The counts are typically since the interface was created, which is usually boot time for "real" interfaces. For a tun interface, it's likely when the tunnel was started, which might be different than system boot, depending on when/how you're creating it...

Determine TCP payload activity/statistics

I'd like to lookup a counter of the TCP payload activity (total bytes received) either for a given file descriptor or a given interface. Preferably the given file descriptor, but for the interface would be sufficient. Ideally I'd really like to know about any bytes that have been ack-ed, even ones which I have not read into userspace (yet?).
I've seen the TCP_INFO feature of getsockopt() but none of the fields appear to store "Total bytes received" or "total bytes transmitted (acked, e.g.)" so far as I can tell.
I've also seen the netlink IFLA_STATS+RTNL_TC_BYTES and the SIOCETHTOOL+ETHTOOL_GSTATS ioctl() (rx_bytes field) for the interfaces, and those are great, but I don't think they'll be able to discriminate between the overhead/headers of the other layers and the actual payload bytes.
procfs has /proc/net/tcp but this doesn't seem to contain what I'm looking for either.
Is there any way to get this particular data?
EDIT: promiscuous mode has an unbearable impact on throughput, so I can't leverage anything that uses it. Not to mention that implementing large parts of the IP stack to determine which packets are appropriate is beyond my intended scope for this solution.
The goal is to have an overarching/no-trust/second-guess of what values I store from recvmsg().
The Right Thing™ to do is to keep track of those values correctly, but it would be valuable to have a simple "Hey OS? How many bytes have I really received on this socket?"
One could also use ioctl call with SIOCINQ to get the amount of queued unread data in the receive buffer. Here is usage from the man page: http://man7.org/linux/man-pages/man7/tcp.7.html
int value;
error = ioctl(tcp_socket_fd, SIOCINQ, &value);
For interface TCP stats, we can use " netstat -i -p tcp" to find stats on a per-interface basis.
Do you want this for diagnosis, or for development?
If diagnosis, tcpdump can tell you exactly what's happening on the network, filtered by the port and host details.
If for development, perhaps a bit more information about what you're trying to achieve would help...
ifconfig gives RX and TX totals.
ifconfig gets these details from /proc/net/dev (as you can see via strace ifconfig).
There are also the Send/Receive-Q values given by netstat -t, if that's closer to what you want.
Perhaps the statistics in /proc/net/dev can help. I am not familiar with counting payload versus full packets including headers, so that makes the question harder to answer.
As for statistics on individual file descriptors, I am not aware of any standard means to get that information.
If it's possible to control startup of the programs for which the statistics are needed, it is possible to use an "interceptor" library which implements its own read(), write(), sendto(), and recvfrom() calls, passthrough the calls to the standard C library (or directly to system call), keep counters of the activity, and find a way to publish those values.
In case you don't want to just count total RX/TX per interface (which is already available in ifconfig/iproute2 tools)...
If you look into /proc a bit more, you can get somewhat more information. More specifically /proc/<pid>/net/dev.
Sample output:
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
eth0: 12106810846 8527175 0 15842 0 0 0 682866 198923814 1503063 0 0 0 0 0 0
lo: 270255057 3992930 0 0 0 0 0 0 270255057 3992930 0 0 0 0 0 0
sit0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
If you start looking, the information is coming from net/core/net-procfs.c from Linux kernel (procfs just uses this info). All of this of course means you need specific process to track.
You can either peruse information available in /proc or if you need something more stable, then duplicating net-procfs functionality specifically for your application might make sense.

What do the mod_pagespeed statistics mean?

Here's a dump of the stats provided my mod_pagespeed from one of my sites.
resource_url_domain_rejections: 6105
rewrite_cached_output_missed_deadline: 4801
rewrite_cached_output_hits: 116004
rewrite_cached_output_misses: 934
resource_404_count: 0
slurp_404_count: 0
total_page_load_ms: 0
page_load_count: 0
resource_fetches_cached: 0
resource_fetch_construct_successes: 45
resource_fetch_construct_failures: 0
num_flushes: 947
total_fetch_count: 0
total_rewrite_count: 0
cache_time_us: 572878
cache_hits: 872
cache_misses: 1345
cache_expirations: 242
cache_inserts: 1795
cache_extensions: 50799
not_cacheable: 0
css_file_count_reduction: 0
css_elements: 0
domain_rewrites: 0
google_analytics_page_load_count: 0
google_analytics_rewritten_count: 0
image_inline: 7567
image_rewrite_saved_bytes: 208854
image_rewrites: 34128
image_ongoing_rewrites: 0
image_webp_rewrites: 0
image_rewrites_dropped_due_to_load: 0
image_file_count_reduction: 0
javascript_blocks_minified: 12438
javascript_bytes_saved: 1173778
javascript_minification_failures: 0
javascript_total_blocks: 12439
js_file_count_reduction: 0
converted_meta_tags: 902
url_trims: 54765
url_trim_saved_bytes: 1651244
css_filter_files_minified: 0
css_filter_minified_bytes_saved: 0
css_filter_parse_failures: 2
css_image_rewrites: 0
css_image_cache_extends: 0
css_image_no_rewrite: 0
css_imports_to_links: 0
serf_fetch_request_count: 1412
serf_fetch_bytes_count: 12809245
serf_fetch_time_duration_ms: 28706
serf_fetch_cancel_count: 0
serf_fetch_active_count: 0
serf_fetch_timeout_count: 0
serf_fetch_failure_count: 0
Can someone please explain what all of the stats mean?
There's a lot of stats here. I'm going to just describe a few of them, because this will get long. We should probably add detailed doc. I can follow-up with more answers later if these are useful.
resource_url_domain_rejections: 6105: this means that since your server restarted, mod_pagespeed has found 6105 resources that it's not going rewrite a resource because their domains are not authorized for rewriting with the ModPagespeedDomain directive. This is common & occurs anytime time someone refreshes a page with a twitter, facebook, or google+ widget.
rewrite_cached_output_missed_deadline: 4801: when a resources (e.g. a jpeg image) is optimized, it happens in a background thread, and the result is cached so that subsequent page views referencing the same refresh are fast. To avoid slowing down the first view, however, we use a 10 millisecond timer to avoid slowing down the time-to-first byte. This stat counts how many times that deadline was exceeded, in which case the resource is left unchanged for that view, but the optimization continues in the background & so the cache is written.
rewrite_cached_output_hits: 116004: counts the number of times we served an optimized resource from the cache, thus avoiding the need to re-optimize it.
rewrite_cached_output_misses: 934: counts the number of times we looked up a resource in our cache and it wasn't there, forcing us to rewrite it. Note that we would also rewrite a resource that was in the cache, but whose origin cache expiration-time had expired. E.g. if your images had cache-control:max-age=600 then we would re-fetch them every 10 minutes to see if they've changed. If they have changed we must re-optimize them.
num_flushes: 947: this is the number of times the Apache resource-generator for the HTML (e.g. mod_php or Wordpress) called the Apache function ap_flush(), which causes partial HTML to be flushed all the way through to the user's browser. This is interesting for mod_pagespeed because it can limit the amount of optimization we can do (e.g. we can't combine CSS files whose elements are separated by a Flush).
cache_time_us: 572878 - the total amount of time, in microseconds, spent waiting for mod_pagespeed's HTTP Cache (file + memory) to respond to a lookup request, since the server was started.
I think that's enough for now. Are there specific other statistics you'd like to learn more about?
Most of these were created for us to monitor the health of mod_pagespeed as it's running, and to help diagnose users' issues. I have to admit we haven't used it much for that purpose, but we use them during development.

How do I get information on linux whether my program is swapping or not?

More specifically: I want to find this information from inside the program, preferably just before it starts swapping so I can react. So far I found:
Information inside /proc, which is not very useful
mincore syscall which seems to be available on linux and bsd, but requires me to pass in all the pages I'm interested in (might be enough, but it's a bit tedious)
Any more ideas?
vmstat
To run every 2 seconds, you say "vmstat 2". It gives you output like:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 16124 431352 439000 0 0 4 2 37 18 0 0 100 0 0
The "si" and "so" columns are "swap-in" and "swap-out". Swapd is how much memory is in the swap device. Swapd should be stable, and si and so zero.
Remember:
You shouldn't really ask "is my program swapping" - as opposed to "is the system swapping". You program can cause others to swap - others can cause yours to swap, etc. Either way, when that happens - performance d...i..e...s....

Resources