This is the 2nd article in the Demystifying System Design interview series where I’ll go into details about what is Domain Name Servers (DNS) and how it works.
Let's explore the origin of DNS. To illustrate, consider the case of mobile phones, where each user is assigned a unique number. Initially, we may memorize a few numbers to make calls to friends. However, as our contact list expands, we rely on a phone book to store all the numbers. This way, whenever we need to make a call, we consult the phone book and dial the desired number.
To access a website hosted on a machine, we use its IP address. However, humans find it difficult to remember IP addresses to access domain names, such as 142.250.180.14 for google.com. Therefore, we require a repository that functions like a phone book and maintains all mappings of domain names to IP addresses.
DNS is an internet service that converts user-friendly domain names into machine-readable IP addresses. This system operates transparently, without users noticing it, as their browser automatically requests the corresponding IP address from the DNS infrastructure upon entering a domain name. The final IP address is then used to forward the user's request to the intended web server.
The DNS is not a single server but rather a comprehensive infrastructure with multiple servers. The name servers are the DNS servers responsible for responding to users' queries.
The DNS database stores the mappings between domain names and IP addresses as resource records (RRs), which are the smallest units of information that users can request from the name servers. These RRs come in various types, each with a distinct type, name, and value. Depending on the RR type, the name and value of the record will change. The table below outlines the common types of RRs.
To minimize request latency, DNS implements caching at multiple levels. Caching is a critical feature that alleviates the load on DNS infrastructure, which has to handle requests from the entire internet.
DNS name servers are arranged in a hierarchical structure that facilitates scalability. This tree-like organization enables DNS to handle the growing size and query load effectively. In the following lesson, we will delve into how this structure manages the entire DNS database.
Let's look into more details of how DNS works.
The DNS is not a singular server that responds to user queries, but rather a comprehensive infrastructure with different hierarchies of name servers. There are four types of servers in the DNS hierarchy.
DNS resolvers: DNS resolvers initiate the querying sequence and forward requests to other DNS name servers, and can also be called local or default servers.
Root-level name servers: Local servers send requests to these servers, which maintain name servers based on top-level domain names like .com, .edu, .us, etc. For example, when someone requests the IP address of google.com, root name servers provide a list of top-level domain (TLD) servers that contain the IP addresses of the .com domain.
Top-level domain (TLD) name servers: These servers are responsible for holding the IP addresses of authoritative name servers. Upon requesting the IP address, the requesting party receives a list of IP addresses that belong to the authoritative servers of the organization.
Authoritative name servers: These are the DNS name servers specific to the organization that provide the IP addresses of their web or application servers.
There are two methods for conducting a DNS query:
Iterative: The local server makes requests for the IP address to the root, TLD, and authoritative servers in sequence.
Recursive: The end user submits a request to the local server, which then requests the root DNS name servers. The root name servers subsequently forward the request to other name servers.
Typically, an iterative query is preferred to reduce query load on DNS infrastructure.
The DNS hierarchy enables the decentralized Internet we use today, but it is also a distributed system with several benefits, including:
There are 13 logical root name servers, designated by the letters A through M, with multiple instances distributed globally. These servers are managed by 12 distinct organizations.
Caching involves storing frequently requested resource records temporarily. A record is a unit of data in the DNS database that displays a name-to-value association. Caching decreases network traffic and response time to the user.
By utilizing caching across different hierarchies, DNS infrastructure querying can be considerably reduced. Browsers, operating systems, local name servers within the user's network, and ISP DNS resolvers can all implement caching.
DNS can handle a lot of traffic because it has a hierarchy. There are about 1,000 copies of the 13 main servers placed in different parts of the world to help with user requests. The work is shared among different servers to handle a user's request. The organizations manage the servers that provide the requested information, and this helps the system to work. Different services take care of different parts of the system, making it easy to manage and expand when needed.
DNS uses different methods to update and share information between servers in a hierarchy. DNS values speed over accuracy because it's used more to read information than write it. Therefore, it updates information slowly and inconsistently. It can take up to three days to update information on all servers on the Internet, depending on the size of the update and the DNS system's structure.
Sometimes, cached information can cause issues with consistency. This happens when a server fails, and the organization needs to update its resource records. Cached records on local and ISP servers might not have the latest information. To solve this problem, each record has an expiration time called time-to-live (TTL).