January 23, 2022
When you typed
blog.bithole.dev into your browser, it sent a request to my server at 220.127.116.11, which replied with this blogpost. But how did it know where to send the request? The answer is the Domain Name System, or DNS. Let's see how it works.
Back in the days of ARPANET, there was no DNS. Instead, there was one server, the Host Naming Registry, which answered all queries about domain names with IP addresses. This worked fine for some time, but it quickly became apparent that as the nascent Internet grew, a system which scaled beyond a single node would be necessary. Thus, the present incarnation of the domain name system was born.
DNS is best described as a decentralized database of resource records. You can think of a resource record like a database row; there are numerous types of resource records which can associate a domain with everything from IP addresses to email delivery instructions, administrator contacts, and a littany of other properties.
Domain names are grouped into zones. For each zone, there is an authoritative nameserver responsible for answering queries about domains contained within that zone. Each zone belongs to a parent zone, culminating in the root zone (a special zone administered by ICANN).
This hierarchical structure is reflected in the structure of a domain name. Consider
blog.bithole.dev, for example. This domain, along with
bithole.dev and all its other subdomains, belong to the same zone. All queries for those domains will ultimately be answered by DigitalOcean's DNS servers, which I have configured as the authoritative nameservers for my zone.
That raises a question, though: how does a client figure out who the authoriative nameserver is for a given domain? To allow clients to iterate zones until they find the one for the domain they are interested in, if an authoritative nameserver receives a query for a domain that does not belong to its zone but is contained within one of its child zones, the nameserver will respond to the query with an NS record containing the nameserver they should query next. Essentially, what the server is saying is "I can't answer your question, but if you ask this nameserver they might be able to". This process is known as delegation
To get a better understanding of how DNS works, let's resolve a domain ourselves by querying the nameservers. We can use dig to make DNS requests from the command line.
We'll start from the root zone. There are 13 logical servers in the root zone, with domains
m.root-servers.net. In reality, requests to the root nameservers are spread out among hundreds of physical servers using anycast routing, but this process is totally transparent so we don't need to worry about it. Let's ask the J root server if it knows where
$ dig A docs.google.com @j.root-servers.net ; <<>> DiG 9.16.1-Ubuntu <<>> A docs.google.com @j.root-servers.net ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54547 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;docs.google.com. IN A ;; AUTHORITY SECTION: com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. ;; ADDITIONAL SECTION: e.gtld-servers.net. 172800 IN A 18.104.22.168 e.gtld-servers.net. 172800 IN AAAA 2001:502:1ca1::30 b.gtld-servers.net. 172800 IN A 22.214.171.124 b.gtld-servers.net. 172800 IN AAAA 2001:503:231d::2:30 j.gtld-servers.net. 172800 IN A 126.96.36.199 j.gtld-servers.net. 172800 IN AAAA 2001:502:7094::30 m.gtld-servers.net. 172800 IN A 188.8.131.52 m.gtld-servers.net. 172800 IN AAAA 2001:501:b1f9::30 i.gtld-servers.net. 172800 IN A 184.108.40.206 i.gtld-servers.net. 172800 IN AAAA 2001:503:39c1::30 f.gtld-servers.net. 172800 IN A 220.127.116.11 f.gtld-servers.net. 172800 IN AAAA 2001:503:d414::30 a.gtld-servers.net. 172800 IN A 18.104.22.168 a.gtld-servers.net. 172800 IN AAAA 2001:503:a83e::2:30 g.gtld-servers.net. 172800 IN A 22.214.171.124 g.gtld-servers.net. 172800 IN AAAA 2001:503:eea3::30 h.gtld-servers.net. 172800 IN A 126.96.36.199 h.gtld-servers.net. 172800 IN AAAA 2001:502:8cc::30 l.gtld-servers.net. 172800 IN A 188.8.131.52 l.gtld-servers.net. 172800 IN AAAA 2001:500:d937::30 k.gtld-servers.net. 172800 IN A 184.108.40.206 k.gtld-servers.net. 172800 IN AAAA 2001:503:d2d::30 c.gtld-servers.net. 172800 IN A 220.127.116.11 c.gtld-servers.net. 172800 IN AAAA 2001:503:83eb::30 d.gtld-servers.net. 172800 IN A 18.104.22.168 d.gtld-servers.net. 172800 IN AAAA 2001:500:856e::30 ;; Query time: 4 msec ;; SERVER: 2001:503:c27::2:30#53(2001:503:c27::2:30) ;; WHEN: Wed Feb 02 12:12:41 PST 2022 ;; MSG SIZE rcvd: 840
Wow. That's a lot to take in. Let's break down what
First, the command basically tells
dig to contact
j.root-servers.net via the DNS protocol and ask if it has an A record for the domain
docs.google.com. An A record associates a domain with an IPv4 address; similarly, AAAA records are the IPv6 counterpart of A records.
Unsurprisingly, the J root nameserver does not have the requested record. Instead, it referred us to the nameservers for the
com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net.
In this case, we have 13 nameservers to choose from. Let's just go with the first one,
Now, let's ask
e.gtld-servers.net for the A record associated with
$ dig A docs.google.com @e.gtld-servers.net ; <<>> DiG 9.16.1-Ubuntu <<>> A docs.google.com @e.gtld-servers.net ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35138 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;docs.google.com. IN A ;; AUTHORITY SECTION: google.com. 172800 IN NS ns2.google.com. google.com. 172800 IN NS ns1.google.com. google.com. 172800 IN NS ns3.google.com. google.com. 172800 IN NS ns4.google.com. ;; ADDITIONAL SECTION: ns2.google.com. 172800 IN AAAA 2001:4860:4802:34::a ns2.google.com. 172800 IN A 22.214.171.124 ns1.google.com. 172800 IN AAAA 2001:4860:4802:32::a ns1.google.com. 172800 IN A 126.96.36.199 ns3.google.com. 172800 IN AAAA 2001:4860:4802:36::a ns3.google.com. 172800 IN A 188.8.131.52 ns4.google.com. 172800 IN AAAA 2001:4860:4802:38::a ns4.google.com. 172800 IN A 184.108.40.206 ;; Query time: 4 msec ;; SERVER: 2001:502:1ca1::30#53(2001:502:1ca1::30) ;; WHEN: Wed Feb 02 12:17:24 PST 2022 ;; MSG SIZE rcvd: 292
The nameserver for the
.com TLD (top-level domain) doesn't have an A record for
docs.google.com, either, but it does know the authoritative nameservers for the
google.com zone, bringing us closer to an authoritative answer. Let's query
$ dig A docs.google.com @ns2.google.com ; <<>> DiG 9.16.1-Ubuntu <<>> A docs.google.com @ns2.google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21549 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;docs.google.com. IN A ;; ANSWER SECTION: docs.google.com. 300 IN A 220.127.116.11 ;; Query time: 20 msec ;; SERVER: 2001:4860:4802:34::a#53(2001:4860:4802:34::a) ;; WHEN: Wed Feb 02 12:20:20 PST 2022 ;; MSG SIZE rcvd: 60
Success! We got an answer to our query. More importantly, take a look at the
flags section; the
aa flag means that according to the nameserver, the response we received is authoritative. The process we just did is called an iterative query, where we followed the chain of DNS delegations until we reached the source of truth for the zone in question.
One interesting takeaway from this process is that the
.com portion of a domain is no different from a regular domain. ICANN has stated that TLDs should never have A or AAAA records, but this hasn't stopped some TLDs like .ai from doing so.
Recursive Resolvers §
Generally speaking, your computer never performs iterative queries by itself. That job is instead offloaded to a special server called a recursive resolver. When your computer needs to resolve a domain, it contacts the resolver through the DNS protocol; in turn, the resolver contacts the appropriate nameservers and returns an answer to your machine. By serving many clients from the same cache of resource records, recursive resolvers reduce the number of queries that need to be handled by authoritative nameservers.
Caching is regulated by the time-to-live (TTL) value of each record, which indicates the maximum age of the cached record before it should be considered expired and re-retrieved from the authoritative nameserver. Setting the TTL involves a tradeoff: a value which is too low will result in more unnecessary requests, potentially increasing latency for users and creating scalability issues for providers. On the other hand, if the TTL is set very high, it could take hours or even days for changes to the DNS records to propagate globally, since it would take a long time before the old cached records expired.
In recent times sysadmins appear to have opted for the former disadvantage in the name of increased flexibility and faster failover. A 2019 analysis found that nearly half of all domains had TTLs of under a minute.
To check who your DNS servers are, run
cat /etc/resolv.conf on Linux or
ipconfig /all on Windows. Odds are, your computer is configured to use the resolver hosted by your ISP for its customers. However, there are also similar, publicly-available services like Cloudflare's 18.104.22.168 and Google's 22.214.171.124 resolvers.
Reverse DNS §
Under some situations, one might want a service that does the opposite of what regular ("forward") DNS does; that is, discovering a domain name for a given IP address. This is possible using a system known as reverse DNS.
Reverse DNS queries are virtually identical to regular DNS queries, except they use a special domain known as
in-addr.arpa. Suppose I have the IP address 126.96.36.199; to look up the domain associated with it, it first needs to be transformed into a domain name by reversing the octets and attaching it to
188.8.131.52.in-addr.arpa. Next, I can look up the PTR record associated with the domain, which will contain the regular domain name which maps to 184.108.40.206.
Reverse DNS is powered by delegations, just like forward DNS. The
in-addr.arpa zone, which is run by IANA, refers clients to an RIR's reverse DNS nameservers based on the first octet of the IP address. This works out neatly since all IANA allocations are /8. However, this scheme tends to break down for classless allocations (i.e. ones which do not fall along an octet boundary) because reverse DNS only splits up each address into four 8-bit sections, meaning that reverse DNS only natively supports blocks of size /8, /16, /24, and /32. If you want finer than 8 bits in granularity, you'll need multiple delegations. For example, if you had a /20, a /16 delegation would be too big and overlap with other allocations, so you would need to register 224 - 20 = 16 /24 delegations to service your allocation.
IPv6 reverse DNS uses the
ip6.arpa domain, and unlike IPv4 every single hex digit within the address counts as a separate zone. This yields rather unwieldy domain names like
220.127.116.11.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.e.18.104.22.168.0.2.6.2.ip6.arpa, but it offers a much higher granularity of 4 bits. However, the huge number of IPv6 addresses does present a scaling challenge to organizations that manage large numbers of IPv6 networks (such as ISPs); approaches towards mangaging reverse DNS for many IPv6 networks are discussed in RFC 8501.
At the time of writing, DNS is 37 years old—nearly twice as old as me! Over its long lifespan, it has grown to become a highly complex and robust component of the Internet protocol suite. The information in this article doesn't even begin to approach all there is to know about DNS; some of the more glaring omissions include:
- the low-level details of the DNS protocol
- the role played by DNS in other services like SMTP
If you want to learn more, I would strongly encourage you to buy a domain and try configuring DNS. It's a very hands-on experience that could definitely come in handy one day!