chrony: Find the most recent sync time - linux

I am trying to find out the most recent time chrony checked to see if we are in sync. I can see chrony's sources with the following command:
~]$ chronyc sources
210 Number of sources = 3
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* a.b.c 1 10 377 32m -8638us[-8930us] +/- 103ms
^- d.e.f 1 10 366 77m +2928us[+1960us] +/- 104ms
^+ g.h.i 2 10 377 403 -14ms[ -14ms] +/- 137ms
The * symbol indicates the server that we synced to, and the + symbol indicates "acceptable" servers. I can see that chrony is indeed polling both a.b.c and g.h.i.
So what I want to know is if I can use the minimum between the two LastRx values as the most recent time that we confirmed we are in sync. In this example we are synced to a.b.c, but 403s < 32m, so does that mean that 403 seconds ago chrony checked against g.h.i and determined we are in sync. In other words, is 403 seconds the last time I know system-time was in sync, or is it 32 minutes ago?
There is some documentation on the chronyc command here: https://chrony.tuxfamily.org/doc/3.5/chronyc.html

Related

MDNS causes local HTTP requests to block for 2 seconds

It takes 0.02 seconds to send a message via python code requests.post(http://172.16.90.18:8080, files=files), but it takes 2 seconds to send a message via python code requests.post(http://sdss-server.local:8080, files=files)
The following chart is the packet I caught with wireshark, from the first column of 62 to 107, you can see that it took 2 seconds for mdns to resolve the domain name.
My system is ubuntu 18.04, I rely on this link Mac OS X slow connections - mdns 4-5 seconds - bonjour slow
I edited the /etc/hosts file, I changed this line to
127.0.0.1 localhost sdss-server.local
After modification, it still takes 2 seconds to send the message via python code requests.post(http://sdss-server.local:8080, files=files).
Normally it should be 0.02 to 0.03 seconds, what should I do to fix this and reduce the time from 2 seconds to 0.02 seconds?

run script when chrony steps clock

I need to start a certain service after system clock was correctly stepped by crony.
System time is maintained by chrony (chronyd (chrony) version 3.5 (+CMDMON +NTP +REFCLOCK +RTC -PRIVDROP -SCFILTER -SIGND +ASYNCDNS -SECHASH +IPV6 -DEBUG)).
Chrony setup, if relevant, is:
server 192.168.100.1 trust minpoll 2 maxpoll 4 polltarget 30
refclock PPS /dev/pps0 refid KPPS trust lock GNSS maxdispersion 3 poll 2
refclock SOCK /var/run/chrony.sock refid GNSS maxdispersion 0.2 noselect
makestep 0.1 -1
driftfile /var/lib/chrony/drift
rtcsync
example of a "normal, tracking status" is:
/ # chronyc tracking
Reference ID : C0A86401 (192.168.100.1)
Stratum : 2
Ref time (UTC) : Wed Dec 01 11:52:08 2021
System time : 0.000004254 seconds fast of NTP time
Last offset : +0.000000371 seconds
RMS offset : 0.000011254 seconds
Frequency : 17.761 ppm fast
Residual freq : +0.001 ppm
Skew : 0.185 ppm
Root delay : 0.000536977 seconds
Root dispersion : 0.000051758 seconds
Update interval : 16.2 seconds
Leap status : Normal
while "unsynchronized" (initial) status is:
/ # chronyc tracking
Reference ID : 00000000 ()
Stratum : 0
Ref time (UTC) : Thu Jan 01 00:00:00 1970
System time : 0.000000000 seconds fast of NTP time
Last offset : +0.000000000 seconds
RMS offset : 0.000000000 seconds
Frequency : 0.000 ppm slow
Residual freq : +0.000 ppm
Skew : 0.000 ppm
Root delay : 1.000000000 seconds
Root dispersion : 1.000000000 seconds
Update interval : 0.0 seconds
Leap status : Not synchronised
I seem to remember crony can call a script whenever stratus level changes, but I was unable to find references.
In any case:
Is there any way to instruct crony to run a script/program or otherwise send some signal whenever acquires/loses tracking with a valid server?
I am currently relying on a rather ugly: while chronyc tracking | grep -q "Not synchronised"; do sleep 1; done but a proactive signalling by chronyd would be preferred.
Details:
System is a (relatively) small IoT device running Linux (Yocto)
It has no RTC (it always starts with clock set to Epoch).
System has no connection to the Internet (initially).
System has connection to a device having a GNSS
receiver and correct time is derived from there.
There may be a (sometimes 'very') long time before GNSS acquires a fix and thus can propagate time.
At a certain point chrony finally gets the right time
and steps system clock. After this is done I need to start a service
(or run a script or whatever).
I am currently polling chronyc tracking and parsing status, but that is not really nice.
I was looking to do the same and came up empty-handed.
I did, however, find chronyc waitsync, which appears to be a built-in way to do the polling, without the need to parse and sleep explicitly. This works well enough for my case, since I only need to delay a single start-up action.
The existence of this command also hints (albeit by no means proves) that direct triggering may not be supported. If triggering is a hard requirement, rsyslogd can help.
BTW, one can only admire the enthusiasm of systemd fans, spreading the love even when their purported answer is obviously and completely irrelevant.
Clearly, the target system does NOT use systemd. The question is about chronyd, not about systemd-timesyncd, while systemd-time-wait-sync.service applies only to the latter.
Suggesting to investigate systemd-time-wait-sync.service in here.
The suggested technique is to use systemd service unit that waits for systemd-time-wait-sync.service to synchronize kernel clock.
Using after command in the service unit file or pipe.
These technique are described here and here.

Connect exception in gatling- What does this mean?

I ran the below config in gatling from my local machine to verify 20K requests per second ..
scn
.inject(
atOnceUsers(20000)
)
It gave these below error in reports...What des this mean in gatling?
j.n.ConnectException: Can't assign requested address:
/xx.xx.xx:xxxx 3648 83.881 %
j.n.ConnectException: connection timed out: /xx.xx.xx:xxxx 416 9.565 %
status.find.is(200), but actually found 500 201 4.622 %
j.u.c.TimeoutException: Request timeout to not-connected after
60000ms 84 1.931 %
Are these timeouts happening due to server not processing the requests or requests not going from my local machine
Most probably yes, that's the reason.
Seems your simulation was compiled successfully and started.
If you look to the error messages you will see percentages after each line (83.881%, 9.565%, 1.931 %). This means that actually the requests were generated and were sent and some of them failed. Percentages are counted based on total number of fails.
If some of the requests are OK and you get these errors, then Gatling did its job. It stress tested your application.
Try to simulate with lower number of users,for example:
scn
inject(
rampUsers(20) over (10 seconds)
)
If it works then definitely your application is not capable to handle 20000 requests at once.
For more info on how to setup a simulation see here.

Configuring Snap for performance

I'm just playing with the Snap framework and wanted to see how it performs against other frameworks (under completely artificial circumstances).
What I have found is that my Snap application tops out at about 1500 requests/second (the app is simply snap init; snap build; ./dist/app/app, ie. no code changes to the default app created by snap):
$ ab -n 20000 -c 500 http://127.0.0.1:8000/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
Completed 18000 requests
Completed 20000 requests
Finished 20000 requests
Server Software: Snap/0.9.5.1
Server Hostname: 127.0.0.1
Server Port: 8000
Document Path: /
Document Length: 721 bytes
Concurrency Level: 500
Time taken for tests: 12.845 seconds
Complete requests: 20000
Failed requests: 0
Total transferred: 17140000 bytes
HTML transferred: 14420000 bytes
Requests per second: 1557.00 [#/sec] (mean)
Time per request: 321.131 [ms] (mean)
Time per request: 0.642 [ms] (mean, across all concurrent requests)
Transfer rate: 1303.07 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 44 287.6 0 3010
Processing: 6 274 153.6 317 1802
Waiting: 5 274 153.6 317 1802
Total: 20 318 346.2 317 3511
Percentage of the requests served within a certain time (ms)
50% 317
66% 325
75% 334
80% 341
90% 352
95% 372
98% 1252
99% 2770
100% 3511 (longest request)
I then fired up a Grails application, and it seems like Tomcat (once the JVM warms up) can take a bit more load:
$ ab -n 20000 -c 500 http://127.0.0.1:8080/test-0.1/book
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 127.0.0.1 (be patient)
Completed 2000 requests
Completed 4000 requests
Completed 6000 requests
Completed 8000 requests
Completed 10000 requests
Completed 12000 requests
Completed 14000 requests
Completed 16000 requests
Completed 18000 requests
Completed 20000 requests
Finished 20000 requests
Server Software: Apache-Coyote/1.1
Server Hostname: 127.0.0.1
Server Port: 8080
Document Path: /test-0.1/book
Document Length: 722 bytes
Concurrency Level: 500
Time taken for tests: 4.366 seconds
Complete requests: 20000
Failed requests: 0
Total transferred: 18700000 bytes
HTML transferred: 14440000 bytes
Requests per second: 4581.15 [#/sec] (mean)
Time per request: 109.143 [ms] (mean)
Time per request: 0.218 [ms] (mean, across all concurrent requests)
Transfer rate: 4182.99 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 67 347.4 0 3010
Processing: 1 30 31.4 21 374
Waiting: 0 26 24.4 20 346
Total: 1 97 352.5 21 3325
Percentage of the requests served within a certain time (ms)
50% 21
66% 28
75% 35
80% 42
90% 84
95% 230
98% 1043
99% 1258
100% 3325 (longest request)
I'm guessing that a part of this could be the fact that Tomcat seems to reserve a lot of RAM and can keep/cache some methods. During this experiment Tomcat was using in excess of 700mb or RAM while Snap barely approached 70mb.
Questions I have:
Am I comparing apples and oranges here?
What steps would one take to optimise Snap for throughput/speed?
Further experiments:
Then, as suggested by mightybyte, I started experimenting with +RTS -A4M -N4 options. The app was able to serve just over 2000 requests per second (about 25% increase).
I also removed the nested templating and served a document (same size as before) from the top level tpl file. This increased the performance to just over 7000 requests a second. The memory usage went up to about 700MB.
I'm by no means an expert on the subject so I can only really answer your first question, and yes you are comparing apples and oranges (and also bananas without realizing it).
First off, it looks like you are attempting to benchmark different things, so naturally, your results will be inconsistent. One of these is the sample Snap application and the other is just "a Grails application". What exactly are each of these things doing? Are you serving pages? Handling requests? The difference in applications will explain the differences in performance.
Secondly, the difference in RAM usage also shows the difference in what these applications are doing. Haskell web frameworks are very good at handling large instances without much RAM where other frameworks, like Tomcat as you saw, will be limited in their performance with limited RAM. Try limiting both applications to 100mb and see what happens to your performance difference.
If you want to compare the different frameworks, you really need to run a standard application to do that. Snap did this with a Pong benchmark. The results of an old test (from 2011 and Snap 0.3) can be seen here. This paragraph is extremely relevant to your situation:
If you’re comparing this with our previous results you will notice that we left out Grails. We discovered that our previous results for Grails may have been too low because the JVM had not been given time to warm up. The problem is that after the JVM warms up for some reason httperf isn’t able to get any samples from which to generate a replies/sec measurement, so it outputs 0.0 replies/sec. There are also 1000 connreset errors, so we decided the Grails numbers were not reliable enough to use.
As a comparison, the Yesod blog has a Pong benchmark from around the same time that shows similar results. You can find that here. They also link to their benchmark code if you would like to try to run a more similar benchmark, it is available on Github.
The answer by jkeuhlen makes good observations relevant to your first question. As to your second question, there are definitely things you can play with to tune performance. If you look at Snap's old raw result data, you can see that we were running the application with +RTS -A4M -N4. The -N4 option tells the GHC runtime to use 4 threads. (Note that you have to build the application with -threaded to do this.) The -A4M option sets the size of the garbage collector's allocation area. Our experiments showed that these two seemed to have the biggest impact on performance. But that was done a long time ago and GHC has changed a lot since then, so you probably want to play around with them and find what works best for you. This page has in-depth information about other command line options available to control GHC's runtime if you wish to do more experimentation.
A little work was done last year on updating the benchmarks. If you're interested in that, look around the different branches in the snap-benchmarks repository. It would be great to get more help on a new set of benchmarks.

How do I stress test a web form file upload?

I need to test a web form that takes a file upload.
The filesize in each upload will be about 10 MB.
I want to test if the server can handle over 100 simultaneous uploads, and still remain
responsive for the rest of the site.
Repeated form submissions from our office will be limited by our local DSL line.
The server is offsite with higher bandwidth.
Answers based on experience would be great, but any suggestions are welcome.
Use the ab (ApacheBench) command-line tool that is bundled with Apache
(I have just discovered this great little tool). Unlike cURL or wget,
ApacheBench was designed for performing stress tests on web servers (any type of web server!).
It generates plenty statistics too. The following command will send a
HTTP POST request including the file test.jpg to http://localhost/
100 times, with up to 4 concurrent requests.
ab -n 100 -c 4 -p test.jpg http://localhost/
It produces output like this:
Server Software:
Server Hostname: localhost
Server Port: 80
Document Path: /
Document Length: 0 bytes
Concurrency Level: 4
Time taken for tests: 0.78125 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Non-2xx responses: 100
Total transferred: 2600 bytes
HTML transferred: 0 bytes
Requests per second: 1280.00 [#/sec] (mean)
Time per request: 3.125 [ms] (mean)
Time per request: 0.781 [ms] (mean, across all concurrent requests)
Transfer rate: 25.60 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 2.6 0 15
Processing: 0 2 5.5 0 15
Waiting: 0 1 4.8 0 15
Total: 0 2 6.0 0 15
Percentage of the requests served within a certain time (ms)
50% 0
66% 0
75% 0
80% 0
90% 15
95% 15
98% 15
99% 15
100% 15 (longest request)
Automate Selenium RC using your favorite language. Start 100 Threads of Selenium,each typing a path of the file in the input and clicking submit.
You could generate 100 sequentially named files to make looping over them easyily, or just use the same file over and over again
I would perhaps guide you towards using cURL and submitting just random stuff (like, read 10MB out of /dev/urandom and encode it into base32), through a POST-request and manually fabricate the body to be a file upload (it's not rocket science).
Fork that script 100 times, perhaps over a few servers. Just make sure that sysadmins don't think you are doing a DDoS, or something :)
Unfortunately, this answer remains a bit vague, but hopefully it helps you by nudging you in the right track.
Continued as per Liam's comment:
If the server receiving the uploads is not in the same LAN as the clients connecting to it, it would be better to get as remote nodes as possible for stress testing, if only to simulate behavior as authentic as possible. But if you don't have access to computers outside the local LAN, the local LAN is always better than nothing.
Stress testing from inside the same hardware would be not a good idea, as you would do double load on the server: Figuring out the random data, packing it, sending it through the TCP/IP stack (although probably not over Ethernet), and only then can the server do its magic. If the sending part is outsourced, you get double (taken with an arbitrary sized grain of salt) performance by the receiving end.

Resources