I'm trying to write a script where part of its functionality depends on if a user-provided domain is a zone apex (example.com) or not (www.example.com). What is a robust way to determine if I'm dealing with a zone apex or a subdomain? I'm hoping for a purely pattern-based approach but that seems tricky (example: xx.yy.co is not a root domain but xx.co.uk is).
Are there any tried and true approaches to determine if a zone is a root domain or not?
The Public Suffix List indicates lists of top level and second level domains under which one can register a domain name. If a name has exactly one more level beyond its matching entry on this list, then it's what you are looking for.
(Note that "subdomains" as you call them can be DNS zones in their own right and have independent nameservers from the parent zone. These can generally be detected by the presence of an SOA record for that fully qualified name, and nameservers for that name in the parent zone.)
Yes, I had to write a script that performed this recently.
Run a non-recursive query (i.e. dig +norecurse) against the authoritative nameserver for the entity you're examining. Use a query type of SOA. Do not use a recursive server, the behavior becomes much less predictable. If it's a server that mixes auth and recursive answers, make sure you're checking for the AA (authoritative answer) flag on the reply.
If the response code is NOERROR, examine the leftmost component of the returned ANSWER section (if present). Otherwise, check the AUTHORITY section. One of the two will be present. The upshot of preferring the ANSWER is that this ensures your result is a SOA record instead of a NS record. It keeps the type of your result consistent, which can be useful if you're writing something against a resolver library.
If the response code is NXDOMAIN, examine the leftmost component of the returned AUTHORITY section. Obviously this won't be the apex, but this will tell you what the apex is.
Anything other than those response codes indicates that the server does not consider itself authoritative.
The result will be the apex. Your request is not the apex if your result is less specific, and it is the apex if they're identical.
www.example.com (assuming it isn't a subdomain, e.g., there are no foo.www.example.com entries) will not have a DNS SOA RR. However, example.com may have other subdomains e.g, xyz.example.com which contains foo.xyz.example.com and bar.xyz.example.com so I don't know if this helps you.
Walk the name backwards component by component checking for NS records.
Example: www.example.com
Does www.example.com. have a NS record? No.
Does example.com. have a NS record? Yes.
Does com. have a NS record? Yes.
Make your determination accordingly, based on whatever definition you use for "is a zone apex" (it's not 100% clear to me from your question.)
Related
I am writing an application to query the DNS SRV record to find out an internal service for a domain obtained from the email address. Is it correct to do the following.
Lets say the email domain is test.example.com
Query SRV record _service._tcp.test.example.com
No SRV record is returned
Now query SRV record _service._tcp.example.com
A record is returned. Hence use this record to connect
Is the above approach right? Assuming its not, are there any RFCs or standards that prevents an application from doing it?
Is the above approach right?
No, it is not. You should not "climb" to the root.
There is nothing explicitly telling you not to do that in RFCs and you will even find some specifications telling you to climb to the root, see CAA specifications (but they had to be changed over the year because of some unclarity exactly around the part about climbing to the root).
Most of the time, such climbing creates more problems than solution, and it all come from "finding the administrative boundaries" which looks far more simple than what it is really.
If we go back to you example, you say, use _service._tcp.test.example.com and then _service._tcp.example.com and then I suppose you stay there, because you "obviously" know that you shouldn't go to _service._tcp.com as next step, because you "know" that example.com and com are not under the same administrative boundaries, so you shouldn't cross that limit.
Ok, yes, in that specific example (and TLD) things seem simple. But imagine an arbitrary name, let us say www.admin.santé.gouv.fr, how do you know where to stop climbing?
It is a difficult problem in all generality. Attempts were made to solve it (see IETF DBOUND working group) and failed. You have only basically two venues if you need to pursue: either find delegations (zone cuts) by DNS calls (not all delegations are new administrative boundaries, but a change of administration should mean a delegation; and obviously there is not necessarily a delegation at each dot, so you can not find all of this by just looking at the string, you need to do live DNS queries) OR using Mozilla Public Suffix List, which has a lot of drawbacks.
This is all basically a rehash of what you can read in "§4. Zone Boundaries are Invisible to Applications" of RFC5507, quoting the core part here:
The false assumption has lead to an approach called "tree climbing",
where a query that does not receive a positive response (either the
requested RRSet was missing or the name did not exist) is retried by
repeatedly stripping off the leftmost label (climbing towards the
root) until the root domain is reached. Sometimes these proposals
try to avoid the query for the root or the TLD level, but still this
approach has severe drawbacks:
[..]
o For reasons similar to those outlined in RFC 1535 [RFC1535],
querying for information in a domain outside the control of the
intended entity may lead to incorrect results and may also put
security at risk. Finding the exact policy boundary is impossible
without an explicit marker, which does not exist at present. At
best, software can detect zone boundaries (e.g., by looking for
SOA Resource Records), but some TLD registries register names
starting at the second level (e.g., CO.UK), and there are various
other "registry" types at second, third, or other level domains
that cannot be identified as such without policy knowledge
external to the DNS.
Note indeed also the example given for MX because a naive view you apply the same algorithm there, but as the RFC says:
To restate, the zone boundary is purely a boundary that exists in the
DNS for administrative purposes, and applications should be careful
not to draw unwarranted conclusions from zone boundaries. A
different way of stating this is that the DNS does not support
inheritance, e.g., an MX RRSet for a TLD will not be valid for any
subdomain of that particular TLD.
There are various examples of people having tried to climb to the root... and creating a lot of problems:
in the past, Microsoft and wpad.dat: https://news.softpedia.com/news/wpad-protocol-bug-puts-windows-users-at-risk-504443.shtml
more recently, Microsoft again about email autodiscover: https://www.zdnet.com/article/design-flaw-in-microsoft-autodiscover-abused-to-leak-windows-domain-credentials/
So, in short, without a solid understanding of DNS, please do not create anything "climbing" to the root. Do note that RFC2782 about SRV gives "Usage Rules" without a case of climbing to the root.
You are not explaining fully why you are thinking about this. I suggest you have a look at the newest HTTPS/SVCB DNS records (RFCs not published yet, but RR type codepoint assigned by IANA already, and in use by Apple, Cloudflare and Google already), as they may provide similar features set as SRV but may be more relevant for your use case.
I am trying to extract all domain names out of COM and NAME dns zone file. Those zone files contain all dns entries and there seem to be lack of information about structure of zone files.
Do all domain registered has NS entries? Even those which are not actively used? Which record/records should I use to extract domain names.
Zone files are very large and sorting them would be stupid idea. So if I can use one DNS record type to extract domain name than it would be easier.
I found this python script(I dont know python) on GitHub which uses only NS entries. Is it correct logically?
Someone with experience please comment.
The format of the DNS zone file is defined in RFC 1035 (section 5) and RFC 1034 (section 3.6.1). You can find many details on Wikipedia: https://en.wikipedia.org/wiki/Zone_file
It contains only the published domain names that is those having at least one nameserver and not being under clientHold or serverHold statuses (see http://www.icann.org/epp#clientHold and http://www.icann.org/epp#serverHold), which means in short it is NOT all domain names registered.
.COM zone file is huge indeed. In any case, you need to match on NS records lines and deduplicate domain names. There are multiple strategies to do that, depending on your constraints.
Note that many providers on line already do this work for you and can provide directly the domain names if this is all you are interested in. Some may also provide differential content, one day from the previous.
A legacy domain has two SPF records that are identical except for ~all vs -all.
A domain should have a single SPF record, correct?
Which record takes precedence?
Which one should I delete?
In short, keep the HardFail -all one only. Consider also looking at this nice record syntax guide from openspf.org.
Multiple records, RFC section 3.2:
A domain name MUST NOT have multiple records that would cause an
authorization check to select more than one record.
This basically mean that you shouldn't have more than one v=spf1 txt entries in your DNS. If your SPF is too long, you can make use of other txt entries and they will be concatenated.
Selecting records, RFC section 4.5:
If the resultant record set includes
more than one record, check_host() produces the "permerror" result.
PermError policy, RFC section G.3:
As with all results, implementers have a choice to make regarding
what to do with a message that yields this result.
This means that the mail will either be refused or accepted without any degree of consistancy between servers. This is not the result intended and you shouldn't keep more than 1.
Fail (HardFail) and SoftFail, RFC section 2.6.4 and 2.6.5:
2.6.4 (-) Fail or HardFail
A "fail" result is an explicit statement that the client is not
authorized to use the domain in the given identity.
2.6.4 (~) SoftFail
A "softfail" result is a weak statement by the publishing ADMD that
the host is probably not authorized. It has not published a
stronger, more definitive policy that results in a "fail".
Unless you know what you're doing, you generally want to use HardFail to block unauthorized servers. SoftFail will pass in the majority of cases.
Yes, you can only have 1 SPF Record, now you can have 2 if one is a Type 99 and one a Type 16. Type 99 is now obsolete, but it's still OK to have.
But they both have to be the same, if they are not the same most ESP will fail the SPF. If you have 2 SPF's of the same type, most ESP will fail the SPF, one doesn't take precedence.
Read about: SPF Records
~all = means treat the email as a "Soft Fail" if SPF Fails
-all = means treat the email as a "Hard Fail" if SPF Fails
Personally, I like -all that's the whole point of SPF which is to tell the mail server you didn't send the email, so it shouldn't be delivered. But you should use a Mail Tester to make sure your SPF works properly after deleting one. Keep in mind, if you ever send through a third party mailer, you'll have to change your SPF to allow them to send on your behalf.
Is it possible to (and if so, how would I), given a domain name for a particular website, look up all other domain names that redirect to that same site? I'm thinking not, though if it were possible, I'd break the problem down into two parts:
1) Get the IP address that corresponds to the original domain name (there seem to be a lot of web services that do this - although they provide me with ~4 IP addresses for the one site, any idea what that's about?)
2) Do some kind of reverse DNS lookup on those IP addresses - this yields results of the form any-in-XXXX.1e100.net (where XXXX is a 4-digit number)
So, I'm guessing this doesn't work because of redirects and things, and any-in-XXXX.1e100.net is some sort of intermediate server in between me and the domain name I'm looking up? So the task I've described above should be impossible, then, right? Can someone who knows a bit more about how DNS works confirm (or refute) this and correct any wrong assumptions I've made? Thanks!
It will only work if sites set up their reverse DNS that way. Which, I can pretty much assure you they haven't for whatever site you're considering. However, here's an example of how to do it using bind's dig utility:
Get the original address:
# dig www.google.com a
...
www.google.com. 145 IN A 74.125.239.114
www.google.com. 145 IN A 74.125.239.115
www.google.com. 145 IN A 74.125.239.113
www.google.com. 145 IN A 74.125.239.116
www.google.com. 145 IN A 74.125.239.112
Now that we have the addresses, you can issue a reverse query for it and attempt to see how it's registered:
# dig -x 74.125.239.114
...
114.239.125.74.in-addr.arpa. 656 IN PTR nuq05s01-in-f18.1e100.net.
So in this case, you can see it was at least registered. But certainly that name doesn't match the actual registered URL. So they added a reverse entry for their "service node", but not for the URL itself (ie, they didn't add a PTR record for the www.google.com record).
This will be so common you'll be hard pressed to find something where the reverse name actually matches, at least for the web. For mail servers, on the other hand, it's actually much more common. Though even they don't frequently match exactly (but at least there is almost always a PTR record in the first place`
I need a script to find out what is the lowest available domain name with a give TLD (say .com, .info, or .net).
For example, 1000423.com is free but 1000.com is taken.
Probably my spammiest question so far.
NOTE
I mean "lowest" domain name numerically (i.e. 1.com, 2.com, 3.com, ..., n.com, n+1.com, ...) and not shortest as in String.length.
In your web-capable language of choice:
Ask the user for a top-level domain name.
i <- 0.
Send out an HTTP GET to a registrar to see if "i.(tld)" is taken.
If it's not taken, notify the user and quit.
i <- i + 1.
Go to step 2.
You may need to add a loop delay to avoid the registrar thinking that you're trying to do a DOS attack.
Unfortunately, as far as I know, there's no central repository saying that a certain domain name is or isn't for sale. You'll have to look up a domain name and see if it's owned by any of the major domain name vendors (GoDaddy, etc.)