Can someone help me in understanding the SCTP stats captured on linux - linux

I am new to SCTP protocol and trying to figure out how to interpret the SCTP stats captured by /proc/net/sctp
The output shows something like this.
2016-04-26 07:21:17
ASSOC SOCK STY SST ST HBKT ASSOC-ID TX_QUEUE RX_QUEUE UID INODE LPORT RPORT LADDRS <-> RADDRS HBINT INS OUTS MAXRT T1X T2X RTXC wmema wmemq sndbuf rcvbuf
ce0f1800 cd7ede00 2 1 3 11661 1783 423 0 77 292723 36422 36423 10.205.8.71 <-> *10.205.8.72 3000 10 10 10 0 0 0 873 704 163840 163840
ca625800 cd7ec000 2 1 3 65210 1 0 0 77 10344 3223 36412 10.205.8.71 <-> *10.205.0.135 3000 2 2 10 0 0 3 1 0 163840 163840
ENDPT SOCK STY SST HBKT LPORT UID INODE LADDRS
ca511d80 cd7ec3c0 2 10 40 36422 77 10345 10.205.8.71
ADDR ASSOC_ID HB_ACT RTO MAX_PATH_RTX REM_ADDR_RTX START
10.205.8.72 1783 1 200 5 0 0
10.205.0.135 1 1 200 15 0 0
SctpCurrEstab 2
SctpActiveEstabs 21
SctpPassiveEstabs 1855
SctpAborteds 272
SctpShutdowns 1808
SctpOutOfBlues 0
SctpChecksumErrors 0
SctpOutCtrlChunks 79214
SctpOutOrderChunks 327396
SctpOutUnorderChunks 0
SctpInCtrlChunks 268038
SctpInOrderChunks 174268
SctpInUnorderChunks 0
SctpFragUsrMsgs 0
SctpReasmUsrMsgs 0
SctpOutSCTPPacks 406626
SctpInSCTPPacks 385959
SctpT1InitExpireds 0
SctpT1CookieExpireds 0
SctpT2ShutdownExpireds 0
SctpT3RtxExpireds 5
SctpT4RtoExpireds 0
SctpT5ShutdownGuardExpireds 0
SctpDelaySackExpireds 9869
SctpAutocloseExpireds 0
SctpT3Retransmits 5
SctpPmtudRetransmits 0
SctpFastRetransmits 14
SctpInPktSoftirq 384346
SctpInPktBacklog 1613
SctpInPktDiscards 0
SctpInDataChunkDiscards 0
Can some one help me understand this or provide the link where i can get some information.
Thanks,
Vishal

The linux man page for SCTP (http://linux.die.net/man/7/sctp) has most of them covered - for example:
SctpChecksumErrors
The number of SCTP packets received with an invalid checksum.
SctpOutCtrlChunks
The number of SCTP control chunks sent (retransmissions are not included). Control chunks are those chunks different from DATA.
SctpOutOrderChunks
The number of SCTP ordered data chunks sent (retransmissions are not included).
SctpOutUnorderChunks
If there is a particular one you were wondering about, maybe let us know?

Related

Advise on stopping compaction to reduce slowness

I am seeing high CPU and memory usage of cassandra on the seed node. Is it advisable to stop compaction(nodetool stop) and enable in offpeak hours. Should I do manual compaction or enable autocompaction. I see lot of Native-Transport-Requests. I have three seed nodes. This is the first seed node.
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 54255 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 2 2566 352765 0 0
MutationStage 0 0 2659921760 0 0
MemtableReclaimMemory 0 0 180958 0 0
PendingRangeCalculator 0 0 21 0 0
GossipStage 0 0 338375 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 63 0 0
RequestResponseStage 0 1 1684328696 0 0
Native-Transport-Requests 4 0 1538523706 0 47006391
ReadRepairStage 0 0 2197 0 0
CounterMutationStage 0 0 0 0 0
MigrationStage 0 0 0 0 0
MemtablePostFlush 1 1 216220 0 0
PerDiskMemtableFlushWriter_0 1 1 180958 0 0
ValidationExecutor 0 0 33250 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 1 1 180958 0 0
InternalResponseStage 0 0 141677 0 0
ViewMutationStage 0 0 0 0 0
AntiEntropyStage 0 0 166254 0 0
CacheCleanupExecutor 0 0 0 0 0
Repair#9 0 0 5719 0 0
I do see high compactions. Is it advisable to disable compactions using nodetool stop
$ nodetool info
ID : ebeda774-cea8-40bb-9322-69c6fcded5a9
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 535.37 GiB
Generation No : 1636316595
Uptime (seconds) : 73152
Heap Memory (MB) : 19542.18 / 32168.00
Off Heap Memory (MB) : 1337.98
Data Center : us-west2
Rack : a
Exceptions : 15
Key Cache : entries 152283, size 23.07 MiB, capacity 100 MiB, 23835 hits, 280738 requests, 0.085 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache : entries 6782, size 423.88 MiB, capacity 480 MiB, 23947952 misses, 24381819 requests, 0.018 recent hit rate, 250.977 microseconds miss latency
Percent Repaired : 0.49796724500672584%
Token : (invoke with -T/--tokens to see all 256 tokens)
$ free -h
total used free shared buff/cache available
Mem: 62G 53G 658M 1.0M 8.5G 8.5G
Swap: 0B 0B 0B
~$ nodetool compactionstats
pending tasks: 197
....
id compaction type keyspace table completed total unit progress
5e555610-40b2-11ec-9b5a-27bc920e6e55 Compaction mykeyspace table1 27299674 89930474 bytes 30.36%
5e55f251-40b2-11ec-9b5a-27bc920e6e55 Compaction mykeyspace table2 13922048 74426264 bytes 18.71%
Active compaction remaining time : 0h00m02s
I would definitely not run compaction manually. Much of the compaction thresholds are file-size based, which means that forcing it creates files sized outside of the normal progression. The result, is that the chances of compaction running on that table again are extremely slim. Basically, once you start down that path, you'll be running manual compactions forever.
I would also say that compaction is a good thing. You want it to happen, as compacted files are necessary to keep reads performing well. Of course, that's not much of a consolation when the compaction process is affecting operational activity.
tl;dr;
One I have done in the past, is to lower compaction throughput during the day. Not sure what throughput you're running with currently, but you can find this out by running nodetool getcompactionthroughput:
% bin/nodetool getcompactionthroughput
Current compaction throughput: 64 MB/s
So at the times when customer/operational traffic is high, you can reduce that significantly:
% bin/nodetool setcompactionthroughput 1
% bin/nodetool getcompactionthroughput
Current compaction throughput: 1 MB/s
1 MB / second is the lowest that compaction throughput can be set. If you set it to zero, it's "un-throttled," which means it'll consume all the resources that it can get at. Setting it to 1 brings its resource use (and speed) down to a trickle.
Once the busy daily traffic subsides, that setting can be turned back up:
% bin/nodetool setcompactionthroughput 256
Current compaction throughput: 256 MB/s
This can be accomplished with a scheduled job for each command.

Is it a bug that I can add sub interface by using "ip link add"?

In centos 8(4.18.0-193.28.1.el8_2.x86_64) if I try this command:
ip link add team0:1 type dummy
I will get a message: "RTNETLINK answers: Invalid argument".
However if I do the same thing on Centos 7(3.10.0-862.14.4.el7.x86_64), there is no error !
And If I cat /proc/net/dev I can see there is a sub interface:
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
eth0: 2489766199 690315 0 0 0 0 0 0 96022876 713917 0 0 0 0 0 0
lo: 402582904 916378 0 0 0 0 0 0 402582904 916378 0 0 0 0 0 0
team0:1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
What is wrong ? Is there a bug in Centos 7? Shouldn't /proc/net/dev show net interfaces only with no sub interfaces?
I couldn't find an answer after doing all the searchings and googlings, to those who has a better understanding on this: please be so kind and help me ~😿

Create new file from two files with a common (unsorted) column

This is probably a very basic problem but I am stumped.
I am attempting create a new file from two large tab-delimited files with a common column. The heads of the two files are:
file1
k141_1 319 4 0
k141_2 400 9 0
k141_3 995 43 0
k141_4 670 21 0
k141_5 372 8 0
k141_6 359 9 0
k141_7 483 18 0
k141_8 1826 76 0
k141_9 566 15 0
k141_10 462 14 0
file2
U k141_1 0
U k141_11 0
U k141_24 0
U k141_30 0
C k141_32 2 18 77133,212695,487010, 5444279,5444689,68971626, TIEYSSLHACRSTLEDPT, cellular organisms; Bacteria;
C k141_38 1566886 16 1566886, 50380646, ELVMDREAWCAAIHGV, cellular organisms; Bacteria; Terrabacteria group; Actinobacteria; Actinobacteria; Corynebacteriales; Mycobacteriaceae; Mycobacterium; Mycobacterium sp. WCM 7299;
U k141_46 0
C k141_57 186802 23 1496,1776046,1776047, 64601048,64601468,64601628,64603689,64604310,64605360,71436886,71436980,71437249,71437272,71437295, CLLYTSDAADDLLCVDLGGRRII, cellular organisms; Bacteria; Terrabacteria group; Firmicutes; Clostridia; Clostridiales;
U k141_64 0
C k141_73 131567 14 287,305,1496,2209,1483596, 47871795,47873311,47873322,47880313,47880625,53485494,53485498,62558724,71434583,71434608, LSRGLGDVYKRQIL,SCLVGSEMCIRDRY,YLSLIHISEPTRQE, cellular organisms;
I want the new file to contain all 4 columns from file 1 and the 8th column of file 2 (taxonomic information separated by semi colons).
I have attempted to sort the files based on the common column but the outputs are not the same despite the columns having the exact same values.
For example,
[user#compute02 Contigs]$ sort -k 1 file1 | head
k141_1000 312 253 0
k141_1001 553 13 0
k141_1002 518 19 0
k141_1003 812 30 0
k141_1004 327 13 0
k141_1005 454 18 0
k141_100 595 20 0
k141_1006 1585 78 0
k141_1007 537 23 0
[user#compute02 Contigs]$ sort -k 2 file2 | head
U k141_1 0
C k141_1000 305 26 305, 62554095,62558735, PVSYTHLRAHETRGNLVCRLLLEKKK, cellular organisms; Bacteria; Proteobacteria; Betaproteobacteria; Burkholderiales; Burkholderiaceae; Ralstonia; Ralstonia solanacearum;
C k141_1001 946362 11 946362, 5059526, SGRNGLPLKVR, cellular organisms; Eukaryota; Opisthokonta; Choanoflagellida; Craspedida; Salpingoecidae; Salpingoeca; Salpingoeca rosetta;
C k141_1002 131567 15 287,305,2209,1483596, 47870166,47873029,47873592,53485045,55518854,62558495, RTCLLYTSPSPRDKR,NLSLIHISEPTRQEA,EPVSYTHLRAHETRG, cellular organisms;
C k141_100 2 14 287,1496,1776047, 53544868,64603691,71437007, SRSSAASDVYKRQV, cellular organisms; Bacteria;
U k141_1003 0
C k141_1004 2 14 518,1776046,1776047, 28571314,64603094,64605737, LFFFNDTATTEIYT, cellular organisms; Bacteria;
U k141_1005 0
C k141_1006 948 13 948, 73024016, QAPLSMGFSRQEY, cellular organisms; Bacteria; Proteobacteria; Alphaproteobacteria; Rickettsiales; Anaplasmataceae; Anaplasma; phagocytophilum group; Anaplasma phagocytophilum;
C k141_1007 287 14 287, 50594737, RRQRQMCIRDRVGS, cellular organisms; Bacteria; Proteobacteria; Gammaproteobacteria; Pseudomonadales; Pseudomonadaceae; Pseudomonas; Pseudomonas aeruginosa group; Pseudomonas aeruginosa;
Any assistance would be greatly appreciated :)
This solution should work.
for i in `cat file1.txt|awk -F" " '{print $1}'`
do
F1=`grep -w $i file1.txt`
F2=`grep -w $i file2.txt|awk -F" " '{$1=$2=$3=$4=$5=$6=$7=""; print $0}'`
echo $F1 $F2
done

vmstat -m -> pages less than total?

When I type vmstat -m in command line, it shows:
Cache Num Total Size Pages
fuse_request 0 0 424 9
fuse_inode 0 0 768 5
pid_2 0 0 128 30
nfs_direct_cache 0 0 200 19
nfs_commit_data 0 0 704 11
nfs_write_data 36 36 960 4
nfs_read_data 0 0 896 4
nfs_inode_cache 8224 8265 1048 3
nfs_page 0 0 128 30
fscache_cookie_jar 2 48 80 48
rpc_buffers 8 8 2048 2
rpc_tasks 8 15 256 15
rpc_inode_cache 17 24 832 4
bridge_fdb_cache 14 59 64 59
nf_conntrack_expect 0 0 240 16
For the nfs_write_data line(line 7), why the "pages" is less than "total"?
For some of them, the "total" is always equal to "pages".
Taken from vmstat man page
...
The -m switch displays slabinfo.
...
Field Description For Slab Mode
cache: Cache name
num: Number of currently active objects
total: Total number of available objects
size: Size of each object
pages: Number of pages with at least one active object
totpages: Total number of allocated pages
pslab: Number of pages per slab
Thus, total is the number of slabinfo objects (objects used by the OS as inodes, buffers and so on) and a page can contain more than one object

/proc/[pid]/stat refresh period

hi I am a Linux programmer
I have an order that monitor process cpus usage, so I use data on /proc/[pid]/stat â„– 14 and 15. That values are called utime and stime.
Example [/proc/[pid]/stat]
30182 (TTTTest) R 30124 30182 30124 34845 30182 4218880 142 0 0 0 5274 0 0 0 20 0 1 0 55611251 17408000 386 18446744073709551615 4194304 4260634 140733397159392 140733397158504 4203154 0 0 0 0 0 0 0 17 2 0 0 0 0 0 6360520 6361584 33239040 140733397167447 140733397167457 140733397167457 140733397168110 0
State after 5 sec
30182 (TTTTest) R 30124 30182 30124 34845 30182 4218880 142 0 0 0 5440 0 0 0 20 0 1 0 55611251 17408000 386 18446744073709551615 4194304 4260634 140733397159392 140733397158504 4203154 0 0 0 0 0 0 0 17 2 0 0 0 0 0 6360520 6361584 33239040 140733397167447 140733397167457 140733397167457 140733397168110 0
In test environment, this file refreshed 1 ~ 2 sec, so I assume this file often updated by system at least 1 sec.
So I use this calculation
process_cpu_usage = ((utime - old_utime) + (stime - old_stime))/ period
In case of above values
33.2 = ((5440 - 5274) + (0 - 0)) / 5
But, In commercial servers environment, process run with high load (cpu and file IO), /proc/[pid]/stat file update period increasing even 20~60 sec!!
So top/htop utility can't measure correct process usage value.
Why is this phenomenon occurring??
Our system is [CentOS Linux release 7.1.1503 (Core)]
Most (if not all) files in the /proc filesystem are special files, their content at any given moment reflect the actual OS/kernel data at that very moment, they're not files with contents periodically updated. See the /proc filesystem doc.
In particular the /proc/[pid]/stat content changes whenever the respective process state changes (for example after every scheduling event) - for processes mostly sleeping the file will appear to be "updated" at slower rates while for active/running processes at higher rates on lightly loaded systems. Check, for example, the corresponding files for a shell process which doesn't do anything and for a browser process playing some video stream.
On heavily loaded systems with many processes in the ready state (like the one mentioned in this Q&A, for example) there can be scheduling delays making the file content "updates" appear less often despite the processes being ready/active. Such conditions seem to be more often encountered in commercial/enterprise environments (debatable, I agree).

Resources