I have some hosts that come up on demand in EC2 and when they do the service that starts them creates an A record in Route53 under an existing zone.
The A records are of the form: randomid.example.com.
So it's not an update or change of an existing name/IP pair, it's completely new entry. There shouldn't be any propagation delay.
What I'm seeing is that after the entry has been added and available for lookup with DNS on any of the Amazon servers, my own client PC can't resolve the name for what seems like 5-10 minutes. You ping it, and I'd expect to see an IP for it. But I simply get "no such host".
If I change my /etc/resolv.conf nameserver entry from my local nameserver to 8.8.8.8 (google dns), it resolves. I switch back and it doesn't resolve. This doesn't seem to have anything to do with Route53 given that google answers.
What would cause this? Shouldn't my local resolver be querying the relevant nameservers and eventually the nameserver for example.com which should get an answer for randomid.example.com?
There shouldn't be any propagation delay.
Yes, there should be.
All DNS configuration has a "propagation delay."¹
In the case of new records, a lookup of a hostname before the record is actually available from the authoritative name servers results in negative caching: when a resolver looks up a non-existent record, the NXDOMAIN response is cached by the resolver for a period of time, and this response is returned for subsequent request until the default TTL elapses and the response is evicted from the resolver's cache.
Negative caching is useful as it reduces the response time for negative answers. It also reduces the number of messages that have to be sent between resolvers and name servers hence overall network traffic.
https://www.rfc-editor.org/rfc/rfc2308
When you use dig to query the new record, you'll see the TTL counting down to 0. Once that happens, you start seeing the expected answer. On Linux the watch utility is handy for this, as in watch -n 1 'dig example.com'.
The timer should be set from the minimum TTL, which is found in your hosted zone's SOA record:
The minimum time to live (TTL). This value helps define the length of time that an NXDOMAIN result, which indicates that a domain does not exist, should be cached by a DNS resolver. Caching this negative result is referred to as negative caching. The duration of negative caching is the lesser of the SOA record's TTL or the value of the minimum TTL field. The default minimum TTL on Amazon Route 53 SOA records is 900 seconds.
http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/SOA-NSrecords.html
There's the source of your 5-10 minutes. It's actually a worst case of 15 minutes (900 seconds).
Reducing this timer will reduce the amount of time that well-behaved resolvers will cache the fact that the record does not (yet) exist.
"Great," you object, "but I didn't query the hostname before it existed. What now?"
You probably did, because Route 53 does not immediately make records visible. There's a brief lag between the time a change is made to a hosted zone and the time Route 53 begins returning the records.
The Route 53 API supports the GetChange action, which should not return INSYNC until the authoritative servers for your hosted zone are returning the expected answer for the change (and of course this uses "change" in the sense that both "insert" and "update" are a "change").
You can also determine this by directly querying one of the servers specifically assigned to your hosted zone (as seen in the console, among other places).
$ dig #ns-xxxx.awsdns-yy.com example.com
Because you are querying an authoritative server directly, you'll see the result of the change as soon as the server has it available, because there is no resolver in the path that will cache responses.
¹For the purposes of this answer, I'm glossing over the fact that what is commonly referred to as "propagation delay" in DNS is actually a nothing of the sort -- it's actually a TTL-based cache eviction delay for existing records.
Related
I'm wondering, how modern DNS servers dealing with millions queries per second, due to the fact that txnid field is uint16 type?
Let me explain. There is intermediate server, from one side clients sending to it DNS requests, and from other side server itself sending requests to upper DNS server (8.8.8.8 for example). So the thing is, that according to DNS protocol there is field txnid in the DNS header, which should be unchanged during request and response. Obviously, that intermediate DNS server with multiple clients replace this value with it's own txnid value (which is a counter), then sends request to external DNS server and after resolving replace this value back to client's one. And all of this will work fine for 65535 simultaneous requests due to uint16 field type. But what if we have hundreds of millions of them like Google DNS servers?
Going from your Google DNS server example:
In mid-2018 their servers were handling 1.2 trillion queries-per-day, extrapolating that growth says their service is currently handling ~20 million queries-per-second
They say that successful resolution of a cache-miss takes ~130ms, but taking timeouts into account pushes the average time up to ~400ms
I can't find any numbers on what their cache-hit rates are like, but I'd assume it's more than 90%. And presumably it increases with the popularity of their service
Putting the above together (2e7 * 0.4 * (1-0.9)) we get ~1M transactions active at any one time. So you have to find at least 20 bits of state somewhere. 16 bits comes for free because of the txnid field. As Steffen points out you can also use port numbers, which might give you another ~15 bits of state. Just these two sources give you more than enough state to run something orders of magnitude bigger than Google's DNS system.
That said, you could also just relegate transaction IDs to preventing any cache-poisoning attacks, i.e. reject any answers where the txnid doesn't match the inflight query for that question. If this check passes, then add the answer to the cache and resume any waiting clients.
I am hosting a RESTful API and my problem is that every first inbound request after a certain time will take about three seconds, compared to the normal ~100ms.
What I find most interesting is that it is always takes exactly 3100 to around 3250 milliseconds, not more and not less. So it seems pretty intentional to me.
I've already debugged the API and everything runs pretty much instantly except for one thing and that is this three second delay before my API even starts to receive the request.
My best guess is that something went wrong either in Apache or the DNS resolution but I don't know what exactly causes it (that's why I'm asking this question).
I am using the Apache ProxyPass like this:
ProxyRequests off
Timeout 54
ProxyTimeout 5400
ProxyPass /jokeapi http://localhost:8079
ProxyPassReverse /jokeapi http://localhost:8079
I'm using the Cloudflare/APNIC DNS gateway servers 1.1.1.1 and 0.0.0.0
Additionally, all my requests get routed through a Cloudflare SSL proxy before even reaching my network.
I've even partially rewritten the API so it responds with ReadStreams instead of loading the files into RAM and serving it at once but that didn't fix the problem.
My question is how I can fully debug the route a request takes and see precisely where this 3 second delay comes from.
Thanks!
PS: the server runs on NodeJS
I think the key is not related to network activity, but in the note that after a period of idle activity the first response to the API in a while requires slightly over 3 seconds. I am assuming that follow up actions are back to the 100ms window.
As you are using localhost, this is not a routing issue. If you want, you can just as easily use loopback, 127.0.0.1, to avoid a name resolution hit, but such a hit on a reserved hostname would be microseconds.
I suspect that the compiled version of your RESTful function has aged out of the cache for your system. The first hit after a period of non-use time then requires a recompile, and so long as the compiled instructions are exercised for a period of time they will remain in cache and contoninue to respond in the 100ms range. We observe this condition quite often in multiuser performance testing after cold boots of systems (setting initial conditions). Ramp-ups of the test users take the hit for the recompiles of common code before hitting the time under full load.
Another item to strike back at the network side of the house, DNS timeouts and bind cache entries tend to be quite long, usually significant portions of a day or even longer. Even so, the odds that a DNS lookup for an item which has aged out of the bind cache would not add three seconds to your initial connection time.
I have some questions to better understand DNS mechanism:
1) I know between clients and authoritative DNS server there are some intermediate DNS servers like ISP's one. What and where are the other types of them?
2) After the TTL of an NS record is expired in intermediate DNS servers, when do they refresh the addresses of names? Clients request? or right after expiration, they refresh records?
Thanks.
Your question is off topic here as not related to programming.
But:
I know between clients and authoritative DNS server there are some intermediate DNS servers like ISP's one. What and where are the other types of them?
There are only two types of DNS servers (we will put aside the stub case for now): it is either an authoritative nameserver (holding information about some domains and being the trust source of it) or a recursive one, attached to a cache, that is basically starting with no data and will then progressively, based on queries it gets, do various queries to get information.
Technically, a single server could do both, but it is a bad idea for at least the reason of the cache, and the different population of clients: an authoritative nameserver is normally open to any client as it needs to "broadcast" its data everywhere while a recursive nameserver is normally only for a selected list of clients (like the ISP clients).
There exists open public recursive nameservers today by big organizations: CloudFlare, Google, Quad9, etc. However, they have both the hardware, links, and manpower to handle all issues that come out of public recursive nameservers, like DDOS with amplification.
Technically you can have a farm of recursive nameservers, like big ISPs will need to do (or the above big public ones) because any single instance could not sustain all clients queries, and they can either share a single cache or work in a hierarchy, the bottom ones sending their data to another upstream recursive nameserver, etc.
After the TTL of an NS record is expired in intermediate DNS servers, when do they refresh the addresses of names? Clients request? or right after expiration, they refresh records?
This historic naïve way could be summarized as: a request arrive, do I have it in my cache? If no, query outside for it and cache it. If yes, is it expired in my cache? If no, ship it to client, but if yes we need to remove it from cache and then do like it was not in cache from the beginning.
You then have various variations:
some caches are not exactly honoring the TTLs: some are clamping values that are too low or too high, based on their own local policies. The most agreed reading on the specification is that the TTL is an indication of the maximum amount of time to keep the record in cache, which means the client is free to ditch it before. However, it should not rewrite it to a higher value if it thinks it is too low.
caches can be kept along reboots/restarts, and can be prefetched, especially for "popular" records; in a way, the list of root NS is prefetched at boot and compared to the internal hardcoded list, in order to update it
caches, especially in RAM, may need to be trimmed on, typically on an "oldest removed" case, in order to get places for new records coming along the way.
so depending on how the cache is managed and which features it is requested to have, there may be a background task that monitor expirations and refresh records.
I recommend you to have a look at unbound as a recursive nameserver as it has various settings around TTL handling, so you could learn things, and then reading up the code itself (which brings us back on-topic kind of).
You can also read this document: https://www.ietf.org/archive/id/draft-wkumari-dnsop-hammer-03.txt an IETF Internet-Draft about:
The principle is that popular RRset in the cache are fetched, that is
to say resolved before their TTL expires and flushed. By fetching
RRset before they are being queried by an end user, that is to say
prefetched, HAMMER is expected to improve the quality of experience
of the end users as well as to optimize the resources involved in
large DNSSEC resolving platforms.
Make sure to read Appendix A with a lot of useful examples, such as:
Unbound already does this (they use a percentage of TTL, instead of a number
of seconds).
OpenDNS that they also implement something similar.
BIND as of 9.10, around Feb
2014 now implements something like this
(https://deepthought.isc.org/article/AA-01122/0/Early-refresh-of-cache-records-cache-prefetch-in-BIND-9.10.html), and enables it by
default.
A number of recursive resolvers implement techniques similar to the
techniques described in this document. This section documents some
of these and tradeoffs they make in picking their techniques.
And to take one example, the Bind one, you can read:
BIND 9.10 prefetch works as follows. There are two numbers that control it. The first number is the "eligibility". Only records that arrive with TTL values bigger than the configured elegibility will be considered for prefetch. The second number is the "trigger". If a query arrives asking for data that is cached with fewer than "trigger" seconds left before it expires, then in addition to returning that data as the reply to the query, BIND will also ask the authoritative server for a fresh copy. The intention is that the fresh copy would arrive before the existing copy expires, which ensures a uniform response time.
BIND 9.10 prefetch values are global options. You cannot ask for different prefetch behavior in different domains. Prefetch is enabled by default. To turn it off, specify a trigger value of 0. The following command specifies a trigger value of 2 seconds and an eligibility value of 9 seconds, which are the defaults.
As simple as that. I went through quite a lot of articles on the internet and all of them just go on about how updated/modified DNS records take time to propagate and so on. I may be stupid (most likely I am), but the whole situation is not very clear. Especially the following:
Do new (absolutely new records) propagate?
Example: we have an old domain, with propagated nameservers, IP, etc and add a TXT record to it. No TXT records existed previously. Is it applied immediately, after some time or after TTL?
Is there any influence on this from local DNS, cache, ISP or anything else?
Thank you.
There are at least two things being mixed under the term "propagation" here.
One is various caches of local resolvers and recursing name servers remembering information for a set amount of time before they go out and ask an authoritative server again. This has no relevance to your question, but it is what many of those articles you read were talking about.
The other is data moving from a master name server to its secondary name servers. This is relevant to your question. A master name server is where data gets injected into DNS from outside, so that's where your new records begin their lives. Secondary servers check with the master server for new data when they think enough time has passed or when they get prodded to do so (usually, the master server is set to prod them when its information is updated). The way they tell if they need to re-fetch a zone from the master or not is by comparing the serial number in the zone's SOA record between what they have stored locally and what the server has. If the number at the master is higher, the secondary will fetch the whole zone again (usually, other options exist). If the number at the master is not higher, the secondary will assume the information it has is up to date, and do nothing.
The most common reason, by far, for new records not propagating to secondaries is that whoever added the new records forgot to increase the serial number in the SOA record.
Does a caching-nameserver usually cache the negative DNS response SERVFAIL?
EDIT:
To clarify the question, I can see the caching nameserver caching negative responses NXDOMAIN, NODATA. But it does not do this for SERVFAIL responses. Is this intentional?
SERVFAIL is covered by §7.1 of RFC2308:
Server failures fall into two major
classes. The first is where a
server can determine that it has been
misconfigured for a zone. This may
be where it has been listed as a server, but not configured to be a
server for the zone, or where it has
been configured to be a server for
the zone, but cannot obtain the zone
data for some reason. This can
occur either because the zone file
does not exist or contains errors,
or because another server from which
the zone should have been available
either did not respond or was unable
or unwilling to supply the zone.
The second class is where the
server needs to obtain an answer from
elsewhere, but is unable to do so, due
to network failures, other servers
that don't reply, or return server
failure errors, or similar.
In either case a resolver MAY cache
a server failure response. If it
does so it MUST NOT cache it for
longer than five (5) minutes, and it
MUST be cached against the specific
query tuple <query name, type,
class, server IP address>.
So basically, it's dependent on the implementation of your name server.
RFC 1034 describes how to cache negative responses but did not define a mechanism for returning those cache results to peer resolvers. RFC 2308 defines these attributes.
Negative caching was an optional part of the DNS Specifications...
One of the timeout fields in the SOA is a "negative timeout". It is usually set to a short time, such as 30 or 60 seconds. So, yes, but for a shorter time than a "positive" response.