bgp Neighbor some times lost connection - bgp

unstable BGP connection
I notice that bdp lost Neighbhor connection
Neighbhor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd NeighborName
----------- --- ----- --------- --------- -------- ----- ------ --------- -------------- --------------
10.0.0.57 4 64600 117 206 0 0 0 00:00:49 1 ARISTA01T1
10.0.0.59 4 64600 156 237 0 0 0 00:00:33 Connect ARISTA02T1
10.0.0.61 4 64600 161 238 0 0 0 00:00:53 1 ARISTA03T1
10.0.0.63 4 64600 117 184 0 0 0 00:00:02 1 ARISTA04T1
But after a couple of seconds, another neighbor lost the connection
Neighbhor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd NeighborName
----------- --- ----- --------- --------- -------- ----- ------ --------- -------------- --------------
10.0.0.57 4 64600 166 269 0 0 0 00:00:37 1 ARISTA01T1
10.0.0.59 4 64600 178 262 0 0 0 00:00:16 1 ARISTA02T1
10.0.0.61 4 64600 209 301 0 0 0 00:00:35 1 ARISTA03T1
10.0.0.63 4 64600 147 224 0 0 0 00:00:50 Connect ARISTA04T1
The problem there is that for some reason some Neighbors lost connection during the session
The connection is unstable . For example, the second lost connection and after couple seconds it restores the connection but the fourth neighbor lost it . Not sure how to debug such issue or fix it

Normally, a BGP Notification will be sent--that message contains info on why the connection dropped. Typically, it is "hold time expired" due to an unreliable datalink. Try running a constant traceroute from peer addr to peer addr while connection bounces (could be a routing issue--e.g. does bgp advertise path to peer itself?).

Related

How to reliably detect dropped netlink requests or responses

I'm interested in using netlink for a straightforward application (reading cgroup stats at high frequency).
The man page cautions that the protocol is not reliable, hinting that the application needs to be prepared to handle dropped packets:
However, reliable transmissions from kernel to user are impossible in
any case. The kernel can't send a netlink message if the socket buffer
is full: the message will be dropped and the kernel and the user-space
process will no longer have the same view of kernel state. It
is up to the application to detect when this happens (via the ENOBUFS
error returned by recvmsg(2)) and resynchronize.
Since my requirements are simple, I'm fine with just destroying the socket and creating a new one whenever anything unexpected happens. But I can't find any documentation on what the expectations are on my program—the man page for recvmsg(2) doesn't even mention ENOBUFS for example.
What all do I need to worry about in order to make sure I can tell that a request from my application or a response from the kernel has been dropped, so that I can reset everything and start over? It's clear to me that I could do so whenever I receive an error from any of the syscalls involved, but for example what happens if my request is dropped on the way to the kernel? Will I just never receive a response? Do I need to build a timeout mechanism where I wait only so long for a response?
I found the following in Communicating between the kernel and user-space in Linux using Netlink sockets by Ayuso, Gasca, and Lefevre:
If Netlink fails to deliver a message that goes from kernel to user-space, the recvmsg() function returns the No buffer space available (ENOBUFS) error. Thus, the user-space process knows that it is losing messages [...]
On the other hand, buffer overruns cannot occur in communications from user to kernel-space since sendmsg() synchronously passes the Netlink message to the kernel subsystem. If blocking sockets are used, Netlink is completely reliable in communications from user to kernel-space since memory allocations would wait, so no memory exhaustion is possible.
Regarding acks, it looks like worrying about them is optional:
NLM_F_ACK: the user-space application requested a confirmation message from
kernel-space to make sure that a given request was successfully performed. If this
flag is not set, the kernel-space reports the error synchronously via sendmsg() as errno value.
So it sounds like for my simplistic use case I can just use sendmsg and recvmsg naively, reacting to any error (except for EINTR) by starting the whole thing over, perhaps with backoff. My guess it that since I only get one response per request and the responses are tiny, I should never even see ENOBUFS as long as I have only one request in flight at at a time.
As a side note, we can see the dropped netlink packets in the Drops column of /proc/net/netlink. For example:
# cat /proc/net/netlink
sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
76f0e0ed 0 1966 00080551 0 0 0 2 0 36968
36a83ab1 0 1431 00000001 0 0 0 2 0 30297
d7d5db8e 0 563 00000440 0 0 0 2 0 19572
a10eb5c0 0 795 00000515 704 0 0 2 0 23584
c52bbce9 0 474 00000001 0 0 0 2 0 17511
3c5a89a5 0 989856248 00000001 0 0 0 2 0 31686
051108c1 0 0 00000000 0 0 0 2 0 25
ed401538 0 562 00000440 0 0 0 2 0 19576
38699987 0 469 00000557 0 0 0 2 0 19806
d7bbb203 0 728 00000113 0 0 0 2 0 22988
4d31126f 2 795 40000000 0 0 0 2 0 31092
febb9674 2 2100 00000001 0 0 0 2 0 37904
8c18eb5b 2 728 40000000 0 0 0 2 0 22989
922a7fcf 4 0 00000000 0 0 0 2 0 8681
16cfa740 7 0 00000000 0 0 0 2 0 7680
4e55a095 9 395 00000000 0 0 0 2 0 15142
0b2c5994 9 1 00000000 0 0 0 2 0 10840
94fe571b 9 0 00000000 0 0 0 2 0 7673
a7a1d82c 9 396 00000000 0 0 0 2 0 14484
b6a3f183 10 0 00000000 0 0 0 2 0 7517
1c2dc7e3 11 0 00000000 0 0 0 2 0 640
6dafb596 12 469 00000007 0 0 0 2 0 19810
2fe2c14c 12 3676872482 00000007 0 0 0 2 0 19811
3f245567 12 0 00000000 0 0 0 2 0 8682
0da2ddc4 12 3578344684 00000000 0 0 0 2 0 43683
f9720247 15 489 00000001 0 0 0 2 11 17781
e84b6d30 15 519 00000002 0 0 0 2 0 19071
c7d75154 15 1550 ffffffff 0 0 0 2 0 31970
02c1c3db 15 4070855316 00000001 0 0 0 2 0 10852
e0d7b09a 15 1 00000002 0 0 0 2 0 10821
78649432 15 0 00000000 0 0 0 2 0 30
8182eaf3 15 504 00000002 0 0 0 2 0 22047
40263df1 15 1858 ffffffff 0 0 0 2 0 34001
49283e31 16 0 00000000 0 0 0 2 0 696

How to interpret such value of the time column in /proc/self/mountstats - does it indicate a performance issue?

I have some bladefs volume and I just checked /proc/self/mountstats where I see statistics per operations:
...
opts: rw,vers=3,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.2.100,mountvers=3,mountport=903,mountproto=tcp,local_lock=all
age: 18129
caps: caps=0x3fc7,wtmult=512,dtsize=32768,bsize=0,namlen=255
sec: flavor=1,pseudoflavor=1
events: 18840 116049 23 5808 22138 21048 146984 13896 287 2181 0 7560 31380 0 9565 5106 0 6471 0 0 13896 0 0 0 0 0 0
bytes: 339548407 48622919 0 0 311167118 48622919 76846 13896
RPC iostats version: 1.0 p/v: 100003/3 (nfs)
xprt: tcp 875 1 7 0 0 85765 85764 1 206637 0 37 1776 35298
per-op statistics
NULL: 0 0 0 0 0 0 0 0
GETATTR: 18840 18840 0 2336164 2110080 92 8027 8817
SETATTR: 0 0 0 0 0 0 0 0
LOOKUP: 21391 21392 0 3877744 4562876 118 103403 105518
ACCESS: 20183 20188 0 2584304 2421960 72 10122 10850
READLINK: 0 0 0 0 0 0 0 0
READ: 3425 3425 0 465848 311606600 340 97323 97924
WRITE: 2422 2422 0 48975488 387520 763 200645 201522
CREATE: 2616 2616 0 447392 701088 21 870 1088
MKDIR: 858 858 0 188760 229944 8 573 705
SYMLINK: 0 0 0 0 0 0 0 0
MKNOD: 0 0 0 0 0 0 0 0
REMOVE: 47 47 0 6440 6768 0 8 76
RMDIR: 23 23 0 4876 3312 0 3 5
RENAME: 23 23 0 7176 5980 0 5 6
LINK: 0 0 0 0 0 0 0 0
READDIR: 160 160 0 23040 4987464 0 16139 16142
READDIRPLUS: 15703 15703 0 2324044 8493604 43 1041634 1041907
FSSTAT: 1 1 0 124 168 0 0 0
FSINFO: 2 2 0 248 328 0 0 0
PATHCONF: 1 1 0 124 140 0 0 0
COMMIT: 68 68 0 9248 10336 2 272 275...
about my bladefs. I am interested in READ operation statistics. As I know the last column (97924) means:
execute: How long ops of this type take to execute (from
rpc_init_task to rpc_exit_task) (microsecond)
How to interpret this? Is it the average time of each read operation regardless of the block size? I have very strong suspicion that I have problems with NFS: am I right? The value of 0.1 sec looks bad for me, but I am not sure how exactly to interpret this time: average, some sum...?
After reading the kernel source, the statistics are printed from net/sunrpc/stats.c rpc_clnt_show_stats() and the 8th column of per-op statistics statistics seems to printed from _print_rpc_iostats, it's printing struct rpc_iostats member om_execute. (The newest kernel has 9 columns with errors on the last column.)
That member looks to be only referenced/actually changed in rpc_count_iostats_metrics with:
execute = ktime_sub(now, task->tk_start);
op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
Assuming ktime_add does what it says, the value of om_execute only increases. So the 8th column of mountstats would be the sum of the time of operations of this type.

Why does Cassandra major compaction fail to clear expired tombstones?

We have deployed a global Apache Cassandra cluster (node: 12, RF: 3, version: 3.11.2) in our production environment. We are running into an issue where running major compaction on column family is failing to clear tombstones from one node (out of 3 replicas) even though metadata information shows min timestamp passed gc_grace_seconds set on the table.
Here is sstable metadata output
SSTable: mc-4302-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1
Maximum timestamp: 1560326019515476
SSTable min local deletion time: 1560233203
SSTable max local deletion time: 2147483647
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 0.8808303792058351
TTL min: 0
TTL max: 0
First token: -9201661616334346390 (key=bca773eb-ecbb-49ec-9330-cc16da310b58:::)
Last token: 9117719078924671254 (key=7c23b975-5354-4c82-82e5-1762bac75a8d:::)
minClustringValues: [00000f8f-74a9-4ce3-9d87-0a4dabef30c1]
maxClustringValues: [ffffc966-a02c-4e1f-bdd1-256556624288]
Estimated droppable tombstones: 46.31761624099541
SSTable Level: 0
Repaired at: 0
Replay positions covered: {}
totalColumnsSet: 0
totalRows: 618382
Estimated tombstone drop times:
1560233680: 353
1560234658: 237
1560235604: 176
1560236803: 471
1560237652: 402
1560238342: 195
1560239166: 373
1560239969: 356
1560240586: 262
1560241207: 247
1560242037: 387
1560242847: 357
1560243742: 280
1560244469: 283
1560245095: 353
1560245957: 357
1560246773: 362
1560247956: 449
1560249034: 217
1560249849: 310
1560251080: 296
1560251984: 304
1560252993: 239
1560253907: 407
1560254839: 977
1560255761: 671
1560256486: 317
1560257199: 679
1560258020: 703
1560258795: 507
1560259378: 298
1560260093: 2302
1560260869: 2488
1560261535: 2818
1560262176: 2842
1560262981: 1685
1560263708: 1830
1560264308: 808
1560264941: 1990
1560265753: 1340
1560266708: 2174
1560267629: 2253
1560268400: 1627
1560269174: 2347
1560270019: 2579
1560270888: 3947
1560271690: 1727
1560272446: 2573
1560273249: 1523
1560274086: 3438
1560275149: 2737
1560275966: 3487
1560276814: 4101
1560277660: 2012
1560278617: 1198
1560279680: 769
1560280441: 1337
1560281033: 608
1560281876: 2065
1560282546: 2926
1560283128: 6305
1560283836: 824
1560284574: 71
1560285166: 140
1560285828: 118
1560286404: 83
1560295835: 72
1560296951: 456
1560297814: 670
1560298496: 271
1560299333: 473
1560300159: 284
1560300831: 127
1560301551: 536
1560302309: 425
1560303302: 860
1560304064: 465
1560304782: 319
1560305657: 323
1560306552: 236
1560307454: 368
1560308409: 320
1560309178: 210
1560310091: 177
1560310881: 85
1560311970: 147
1560312706: 76
1560313495: 88
1560314847: 687
1560315817: 1618
1560316544: 1245
1560317423: 5361
1560318491: 2060
1560319595: 5853
1560320587: 5390
1560321473: 3868
1560322644: 5784
1560323703: 6861
1560324838: 7200
1560325744: 5642
Count Row Size Cell Count
1 0 3054
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
10 0 0
12 0 0
14 0 0
17 0 0
20 0 0
24 0 0
29 0 0
35 0 0
42 0 0
50 0 0
60 98 0
72 49 0
86 46 0
103 2374 0
124 39 0
149 36 0
179 43 0
215 18 0
258 26 0
310 24 0
372 18 0
446 16 0
535 19 0
642 27 0
770 17 0
924 12 0
1109 14 0
1331 23 0
1597 20 0
1916 12 0
2299 11 0
2759 11 0
3311 11 0
3973 12 0
4768 5 0
5722 8 0
6866 5 0
8239 5 0
9887 6 0
11864 5 0
14237 10 0
17084 1 0
20501 8 0
24601 2 0
29521 2 0
35425 3 0
42510 2 0
51012 2 0
61214 1 0
73457 2 0
88148 3 0
105778 0 0
126934 3 0
152321 2 0
182785 1 0
219342 0 0
263210 0 0
315852 0 0
379022 0 0
454826 0 0
545791 0 0
654949 0 0
785939 0 0
943127 0 0
1131752 0 0
1358102 0 0
1629722 0 0
1955666 0 0
2346799 0 0
2816159 0 0
3379391 1 0
4055269 0 0
4866323 0 0
5839588 0 0
7007506 0 0
8409007 0 0
10090808 1 0
12108970 0 0
14530764 0 0
17436917 0 0
20924300 0 0
25109160 0 0
30130992 0 0
36157190 0 0
43388628 0 0
52066354 0 0
62479625 0 0
74975550 0 0
89970660 0 0
107964792 0 0
129557750 0 0
155469300 0 0
186563160 0 0
223875792 0 0
268650950 0 0
322381140 0 0
386857368 0 0
464228842 0 0
557074610 0 0
668489532 0 0
802187438 0 0
962624926 0 0
1155149911 0 0
1386179893 0 0
1663415872 0 0
1996099046 0 0
2395318855 0 0
2874382626 0
3449259151 0
4139110981 0
4966933177 0
5960319812 0
7152383774 0
8582860529 0
10299432635 0
12359319162 0
14831182994 0
17797419593 0
21356903512 0
25628284214 0
30753941057 0
36904729268 0
44285675122 0
53142810146 0
63771372175 0
76525646610 0
91830775932 0
110196931118 0
132236317342 0
158683580810 0
190420296972 0
228504356366 0
274205227639 0
329046273167 0
394855527800 0
473826633360 0
568591960032 0
682310352038 0
818772422446 0
982526906935 0
1179032288322 0
1414838745986 0
Estimated cardinality: 3054
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1560233203
EncodingStats minTimestamp: 1
KeyType: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)
ClusteringTypes: [org.apache.cassandra.db.marshal.UUIDType]
StaticColumns: {}
RegularColumns: {}
So far here is what we have tried,
1) major compaction with lower gc_grace_seconds
2) nodetool garbagecollect
3) nodetool scrub
None of the above methods is helping. Again, this is only happening for one node (out of total 3 replicas)
The tombstone markers generated during your major compaction are just that, markers. The data has been removed but a delete marker is left in place so that the other replicas can have gc_grace_seconds to process them too. The tombstone markers are fully dropped the next time the SSTable is compacted. Unfortunately because you've run a major compaction (rarely ever recommended) it may be a long time until there are suitable SSTables for compaction with it to clean up the tombstones. Remember that the tombstone drop will also only happen after local_delete_time + gc_grace_seconds as defined by the table.
If you're interested in learning more about how tombstones and compaction work together in the context of delete operations I suggest reading the following articles:
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlAboutDeletes.html
https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

Benchmarking CPU and File IO for an application running on Linux

I wrote two programs to run on Linux, each using a different algorithm, and I want to find a way (preferably using a benchmarking software) to compare the CPU usage and IO operations between these two programs.
Is there such a thing? and if yes, where can I find them. Thanks.
You can try hardinfo
Or there are like n different tools measuring system performance if measuring it while running your app solves your purpose
And you can also check this thread
You might try vmstat command:
vmstat 2 20 > vmstat.txt
20 samples of 2 seconds
bi = KB in, bo = KB out with wa = waiting for I/O
I/O can also increase cache demands
%CPU utilisation = us (user) = sy (system)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 277504 17060 82732 0 0 91 87 1432 236 11 3 84 1
0 0 0 277372 17068 82732 0 0 0 24 1361 399 23 8 59 10
test start
0 1 0 275240 17068 82732 0 0 0 512 1342 305 24 4 69 4
2 1 0 275232 17068 82780 0 0 24 10752 4176 216 7 8 0 85
1 1 0 275240 17076 82732 0 0 12288 2590 5295 243 15 8 0 77
0 1 0 275240 17076 82748 0 0 8 11264 4329 214 6 12 0 82
0 1 0 275240 17076 82780 0 0 16 11264 4278 233 15 10 0 75
0 1 0 275240 17084 82780 0 0 19456 542 6563 255 10 7 0 83
0 1 0 275108 17084 82748 0 0 5128 3072 3501 265 16 37 0 47
3 1 0 275108 17084 82748 0 0 924 5120 8369 3845 12 33 0 55
0 1 0 275116 17092 82748 0 0 1576 85 11483 6645 5 50 0 45
1 1 0 275116 17092 82748 0 0 0 136 2304 689 3 9 0 88
2 1 0 275084 17100 82732 0 0 0 352 2374 800 14 26 0 61
0 0 0 275076 17100 82732 0 0 546 118 2408 1014 35 17 47 1
0 1 0 275076 17104 82732 0 0 0 62 1324 76 3 2 89 7
1 1 0 275076 17108 82732 0 0 0 452 1879 442 8 13 66 12
0 0 0 275116 17108 82732 0 0 800 352 2456 1195 19 17 56 8
0 1 0 275116 17112 82732 0 0 0 54 1325 76 4 1 88 8
test end
1 1 0 275116 17116 82732 0 0 0 510 1717 286 6 10 72 11
1 0 0 275076 17116 82732 0 0 1600 1152 3087 1344 23 29 41 7

FreeBSD iostat - How to tell if there's a problem? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I run a FreeBSD NFS server and recently I've been having odd issues throughout the cluster (the Apache servers are hanging in "lockf" state when loading files from the NFS share, etc).
I'm fairly new to this, so my question is how can I tell if a server's IO is getting overloaded?
Here is my current iostat:
[root#host ~]# iostat 1 10
tty mfid0 cpu
tin tout KB/t tps MB/s us ni sy in id
0 55 16.03 194 3.04 0 0 5 0 95
0 490 21.73 238 5.05 0 0 5 0 95
0 43 20.09 402 7.88 0 0 7 0 93
0 407 12.58 531 6.53 0 0 5 0 94
0 43 15.69 416 6.37 0 0 8 1 91
0 437 30.23 287 8.46 0 0 9 1 91
0 43 23.50 109 2.50 0 0 2 0 98
0 273 11.58 76 0.86 0 0 2 0 98
0 43 15.70 243 3.72 0 0 5 0 95
0 320 20.35 248 4.92 0 0 3 0 96
[root#host ~]#
Do any of the values seem high? Are there any other tests I can do to see if the system is handling the load efficiently?
Thanks!
Try using gstat or systat -iostat but it will only show you (like iostat) IO usage not what causes it. You probably are more interested in trying:
procstat -f $ApachePIDinLockfState
or ktrace -p $ApachePIDinLockfState and kdump -R | less. Remeber to run ktrace -C when you have finished.

Resources