Why are major DNS systems designed to write the subdomain before the root domain and not the reverse? - dns

We write app.example.com instead of com.example.app in major DNS-based systems, including WWW, FTP, and email. What is the reason behind this design? Why not the reverse order?

Hostnames existed before the DNS and even before TLDs.
Structures to the names have been added through RFC 921 "Domain Name System Implementation Schedule - Revised" (October 1984)
This document explains the change from simple names (no dot) to hierarchical ones, and this was needed because the Internet was growing at that time and a single flat list of names was not enough to describe every hosts.
Some excerpts:
The names are being changed from simple names, or globally unique
strings, to structured names, where each component name is unique
only with respect to the superior component name.
...
The elements (or components) of the structured names are separated
with periods, and the elements are written from the most
specific on the left to the most general on the right.
For example: USC-ISIF.ARPA
RFC 882 "DOMAIN NAMES - CONCEPTS and FACILITIES" (November 1983) just says it is a convention:
The domain name of a node or leaf is the path from
the root of the tree to the node or leaf. By convention, the
labels that compose a domain name are read left to right, from the
most specific (lowest) to the least specific (highest).
A clue may come from RFC 1034 "DOMAIN NAMES - CONCEPTS AND FACILITIES" (November 1987) that repeats the above with some details:
The domain name of a node is the list of the labels on the path from
the node to the root of the tree. By convention, the labels that
compose a domain name are printed or read left to right, from the most
specific (lowest, farthest from the root) to the least specific
(highest, closest to the root).
RFCs (see RFC 1166) have the tradition to use the "MSB 0" bit numbering: it means that when you write down a byte, with 8 bits, you start with the most significant one, the bit with highest value (the encode encoding for the 128 decimal value).
This was then extended with the concent of network byte order, where the most significant one is firsts.
I guess that the idea of starting with the most specific label of the name comes directly from this idea of the most specific bit first, which means starting with the label farthest from the root and hence finally having the root at the top right side and then reading a full name in a kind of right to left pattern.

Related

Using parent domain to query DNS SRV for sub domain

I am writing an application to query the DNS SRV record to find out an internal service for a domain obtained from the email address. Is it correct to do the following.
Lets say the email domain is test.example.com
Query SRV record _service._tcp.test.example.com
No SRV record is returned
Now query SRV record _service._tcp.example.com
A record is returned. Hence use this record to connect
Is the above approach right? Assuming its not, are there any RFCs or standards that prevents an application from doing it?
Is the above approach right?
No, it is not. You should not "climb" to the root.
There is nothing explicitly telling you not to do that in RFCs and you will even find some specifications telling you to climb to the root, see CAA specifications (but they had to be changed over the year because of some unclarity exactly around the part about climbing to the root).
Most of the time, such climbing creates more problems than solution, and it all come from "finding the administrative boundaries" which looks far more simple than what it is really.
If we go back to you example, you say, use _service._tcp.test.example.com and then _service._tcp.example.com and then I suppose you stay there, because you "obviously" know that you shouldn't go to _service._tcp.com as next step, because you "know" that example.com and com are not under the same administrative boundaries, so you shouldn't cross that limit.
Ok, yes, in that specific example (and TLD) things seem simple. But imagine an arbitrary name, let us say www.admin.santé.gouv.fr, how do you know where to stop climbing?
It is a difficult problem in all generality. Attempts were made to solve it (see IETF DBOUND working group) and failed. You have only basically two venues if you need to pursue: either find delegations (zone cuts) by DNS calls (not all delegations are new administrative boundaries, but a change of administration should mean a delegation; and obviously there is not necessarily a delegation at each dot, so you can not find all of this by just looking at the string, you need to do live DNS queries) OR using Mozilla Public Suffix List, which has a lot of drawbacks.
This is all basically a rehash of what you can read in "§4. Zone Boundaries are Invisible to Applications" of RFC5507, quoting the core part here:
The false assumption has lead to an approach called "tree climbing",
where a query that does not receive a positive response (either the
requested RRSet was missing or the name did not exist) is retried by
repeatedly stripping off the leftmost label (climbing towards the
root) until the root domain is reached. Sometimes these proposals
try to avoid the query for the root or the TLD level, but still this
approach has severe drawbacks:
[..]
o For reasons similar to those outlined in RFC 1535 [RFC1535],
querying for information in a domain outside the control of the
intended entity may lead to incorrect results and may also put
security at risk. Finding the exact policy boundary is impossible
without an explicit marker, which does not exist at present. At
best, software can detect zone boundaries (e.g., by looking for
SOA Resource Records), but some TLD registries register names
starting at the second level (e.g., CO.UK), and there are various
other "registry" types at second, third, or other level domains
that cannot be identified as such without policy knowledge
external to the DNS.
Note indeed also the example given for MX because a naive view you apply the same algorithm there, but as the RFC says:
To restate, the zone boundary is purely a boundary that exists in the
DNS for administrative purposes, and applications should be careful
not to draw unwarranted conclusions from zone boundaries. A
different way of stating this is that the DNS does not support
inheritance, e.g., an MX RRSet for a TLD will not be valid for any
subdomain of that particular TLD.
There are various examples of people having tried to climb to the root... and creating a lot of problems:
in the past, Microsoft and wpad.dat: https://news.softpedia.com/news/wpad-protocol-bug-puts-windows-users-at-risk-504443.shtml
more recently, Microsoft again about email autodiscover: https://www.zdnet.com/article/design-flaw-in-microsoft-autodiscover-abused-to-leak-windows-domain-credentials/
So, in short, without a solid understanding of DNS, please do not create anything "climbing" to the root. Do note that RFC2782 about SRV gives "Usage Rules" without a case of climbing to the root.
You are not explaining fully why you are thinking about this. I suggest you have a look at the newest HTTPS/SVCB DNS records (RFCs not published yet, but RR type codepoint assigned by IANA already, and in use by Apple, Cloudflare and Google already), as they may provide similar features set as SRV but may be more relevant for your use case.

How is a monotree organized with git?

I've recently came across an article by Greg Kroah-Hartman on why the Linux Kernel has not a stable API and how the Kernel repository is organized as a monotree. When I discussed the article with a friend it became clear that we had a different understanding of what the term tree applied to:
tree refers to different sub-folders of a project.
It refers to the different forks of the git master branch.
In the first case contributors would not checkout the complete project, e.g. the Linux Kernel, but only a sub-folder. These could then be combined with e.g. git-subtree.
In the second case contributors would have to checkout the complete project and basically create fork of a monorepo.
So what does tree in monotree refer to and how can a project be organized as a monotree with git?
Let's make a few notes here:
The phrase monotree, or even the partial word mono, never appears in the referenced article.
The article has seven occurrences of the word tree.
In six of these seven occurrences, the entire phrase here is the main kernel tree. The one reference that does not use this full phrase just says the tree but clearly has the same intent as the other six.
You have tagged this with git linux monorepo (in case the tags change).
Your question amounts to either: What does the author mean by the phrase "the main kernel tree"? or What do people in general mean when they refer to a tree? These are valid questions but not particularly relevant to Git.
Tree in computer science tends to refer to the data structure, which is also pretty loosely defined; see the wikipedia entry. We have some collection of nodes and edges—mathematically, a graph G defined by its set of vertices V and edges E, where each vertex connects by edges to other vertices—and there are constraints on the graph so that it is minimally connected, or equivalently, maximally acyclic. (See https://en.wikiversity.org/wiki/Introduction_to_graph_theory/Proof_of_Theorem_4 and the answers to What's the difference between the data structure Tree and Graph?)
A tree object in Git specifically refers to the stored Git object of Git-type "tree" (one of four Git object types that are stored in the repository database—the other three are commit, blob, and annotated tag). Such an object stores <mode, name, hash-ID> triples, where the mode and hash-ID identify additional Git objects to associate with the name, which is an arbitrary1 string of bytes excluding NUL and slash (codes 0 and 0x2f or 47 respectively). A commit object stored in Git includes the hash ID of a single tree object. Reading the tree object and locating the sub-objects it lists, then recursively reading their own sub-objects if those objects are trees, results in constructing the minimally-connected graph that is a CS-style tree.
1There's a length limit due to the cache entry ce_namelen field, which has a 32-bit integer type. So no name component can exceed about 4 GB in length. Practically speaking, none should probably exceed 255 bytes, but tree objects in Git don't enforce any particular limit, as far as I know.
A file system tree in Linux is really just a string identifying an entity within the file system, though naming anything other than a directory results in a degenerate tree with just one node in it. By naming a directory, though, you can imply that anyone interpreting this string should read the directory's contents, which are names that (by being concatenated with the string identifying the directory itself) name another Linux file system tree, possibly a degenerate one with a single file or device node or whatever. This kind of recursive enumeration leads to building up a minimally-connected graph, just as with the Git tree object. (Perhaps unsurprisingly, the Linux directory objects have essentially the same constraints on names as the Git tree objects, though they usually have a much smaller maximum component name length, typically 255 bytes or fewer.)
Finally, the way the phrase the main kernel tree is used in the article refers to the Linux kernel repository—Linus Torvald's Git repository for the Linux kernel—and the entire ecosystem around it. There is a lot of room for arguments about the details. Here, I will just include a link to this particular InfoWorld article, which seems like a reasonable summary of the state of affairs as of the time it was written (August 2016).

How many number of subdomains can a domain have?

I am planning to build a service based on subdomain like wordpress or tumblr.
I want to know what is the maximum number of subdomains a domain can have.
Well. The absolute theoretical maximum is 2^504, but that assumes no limitations on the octets making up the names. If you want the names limited to ASCII letters and digits, the answer is 111444219848545291112918149658401217019177846881717006276548100629318214534968256903948922840416256 (that is, 36^63).
In another sense the answer is "Far, far more than you will ever need".
This would be more dependant on the DNS server than on the standard. BIND allows for a maximum of 16'777'216 objects per zone file while Microsoft DNS is reported to be stable up to 20'000 objects per zone. This does not mean however that you will be able to max out the DNS object limits as your average website owner is going to want to have a meaningful object name for their sites sub domain, additionally the maximum character count for fully qualified domain names is 255 characters with no individual segment (between dots) being longer than 63 characters as per the DNS specifications.
Effectively what this means is that while there are restrictions and limitations the practical answer is that you are unlikely to encounter limitations due to DNS specifications in any reasonable timeframe.

DBPedia: What's the meaning of '__1' (double underscores) in URIs?

On DBPedia you can find a lot of URIs that containing double underscores and a number at the end, eg.:
http://dbpedia.org/resource/Eric_Cheney__1
http://dbpedia.org/resource/Eli_Wallach__1
http://dbpedia.org/resource/Ed_Wood__1
http://dbpedia.org/resource/Francis_Ford_Coppola__1
Mostly these items are of the type PersonFunction, but I can't find any documentation on why these objects exist (and a person's function isn't an ObjectProperty?)...
So why are these created?
After reading this DBPedia discussion on blank nodes storing, it seems the purpose is to avoid clashing with WikiPedia's URIs.
This, presumably, is used for nodes that don't have a corresponding article on WikiPedia, but rather points to a close article on the subject. As DBPedia tries to create a URI for everything, this URI is assembled according to specific rules (more on this can be found at the above linked discussion).
From the discussion:
Note that intermediate node URIs always contain double underscores,
e.g. 1. Wikipedia doesn't allow consecutive underscores in page
titles, so we can be sure that these URIs will not clash with DBpedia
URIs for Wikipedia pages. We pick a name from the arguments of the
template from which the intermediate node is extracted, append two
underscores, that name and a number to the URI of the main page and
use that as the URI for the intermediate node. If there are multiple
intermediate nodes on one page for which we pick the same name, we use
different numbers, e.g. see 1 and [2].

Is it possible to have one (single) character top level domain name?

I'm writing a Regex to validate email. The only one thing confuse me is:
Is it possible to have single character for top level domain name? (e.g.: lockevn.c)
Background: I knew top level domain name can be from 2 characters to anything (.uk, .us to .canon, .museum). I read some documents but I can't figure out does it allow 1 character or not.
It is technically possible, however, there are no single character tlds that have been accepted into the root (as of the moment) so the answer is:
Yes, it is possible to have single character for top level domain name, however, there are currently no single character TLDs in the root.
You can see the list of TLDs that are currently in the root at this URL:
http://data.iana.org/TLD/tlds-alpha-by-domain.txt
RFC-952 shows what a "name" is, this includes what is valid as a top level domain:
A "name" (Net, Host, Gateway, or Domain name) is a text string up
to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
sign (-), and period (.).
Additionally, the grammar from RFC-952 shows:
<name> ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>]
RFC-1123 section 2.1 specifically allowed single letter domains & subdomains, changing the initial grammar of RFC-952 from starting with just a letter to being more relaxed, so now you are allowed to have single letter top level domains that are a number:
2.1 Host Names and Numbers
The syntax of a legal Internet host name was specified in RFC-952.
One aspect of host name syntax is hereby changed: the
restriction on the first character is relaxed to allow either a
letter or a digit. Host software MUST support this more liberal
syntax.
EDIT: As per #mr.spuratic's comment, RFC-3696 section 2 tightened the rules for top level domains, stating:
There is an additional rule that essentially requires
that top-level domain names not be all-numeric.
This means that:
a. is a valid top level domain
1. is not a valid top level domain
A very unscientific test of this shows that if I add "a" into my hosts file pointing to my local machine, going to http://a in my address bar does show my Apache welcome page.
I'm not sure about the internet standard, but in practice, no.
See,
http://www.norid.no/domenenavnbaser/domreg.html
and,
http://sqa.fyicenter.com/Online_Test_Tools/Domain_Name_Format_Validator.php
You should DEFINITELY allow 1-character domains since some registries allow them not by accident (and I speak of quite big registries like UK, Germany, Poland, Ireland too - so important contributors to the Internet community, not oney exotic small exceptions). Since I also plan using such domains, that definitely work also with all e-mail services I used, letters AND numbers, I really would give the hint to allow this, else your script might need later correction.
Also some of the biggest internet companies use such domains - one of the most famous examples is Twitters t.co for shortening. Other companies I know of who have such domains are Facebook, Google, PayPal, Deutsche Telekom. But the list is longer and also some bigger investors hold them as assets.
By the way as proof there is a website trading this kind of domains online if You search for "1 letter domain names" :)

Resources