It is common in software to migrate an existing service to a new infrastructure such as moving to the cloud. The business logic remains the same, but it is still a drastic change since the new service stack will be on new infrastructure.
We can use extensive integration tests, load tests, etc., to ensure the new service is working as expected, however, with critical services, it is still a safer option to gradually shift production traffic to the new service stack, so that the service owner can verify the new service is also robust and scalable in a safe and incremental way, as well as giving clients time to adjust if necessary.
You may think there are already tools for this "gradual shifting" scenario; for example, there are many feature flags tools, either built in-house or from third-party vendors, that probably have functionalities like this, and it could be the right solution for some cases, especially for experimenting a new feature.
But for other scenarios, you have to consider if it's the right solution; will it increase latencies that could break your service's SLA? Or if the tools can handle the level of traffic for all traffic that goes to a service? Or any cleanup work you need to do with the feature flag after all traffic is migrated.
More often, if there is no plan to maintain the existing service stack, "redirecting" the traffic using DNS resolution is common, and this is what this article will focus on (I've also seen people do a one-time flip using DNS, but this is usually not recommended considering the risk).
Before we get into the details of using AWS Route53, we need to understand a bit more about DNS and AWS Route 53:
Amazon Route 53 is a highly available and scalable cloud domain name system(DNS) service. Enables to customize DNS routing policies to reduce latency
So, how does DNS(domain name system) work?
In a nutshell, DNS is the phone book to translate a human friendly domain name such as example.com to machine readable IP address such as 192.0.2.244.
When a request is initiated, DNS lookup happens in a hierarchy name resolution architecture that resolves the DNS name with different name servers.
For example, in the above diagram, the domain name, www.example.com, is answered first by the DNS root name server; then the name server for .com TLD, when it reaches the Route53 name server, which has the record for www.example.com.
Then it will return the machine-readable IP address for the client to make a request to the host, and the resolution result is heavily cached along this path.
Now that we know how DNS work, let's see how to implement it with AWS Route53:
If it's easy to ask clients to use a new API URL, then it is relatively straightforward to add weighted routing policy records. You will delegate a new domain from either company's internal infrastructure or third-party service provider; then create a new (public) hosted zone in Route53:
Click on the hosted zone line; you should be able to see an NS record(name server) and an SOA record(start of authority) created automatically.
For this scenario, you don't need to make changes to these records; just keep them as it is and know that they are the administration type of information for DNS resolution. We will talk more about it in another scenario.
The next step is to create records with a weighted routing policy:
After creating a record that points to the new service (with 155 as the weight), we can create another record that points to the existing service with a weight of 100; then we should see these two records in the hosted zone as below:
Once this is set up, you can gradually change the weight config so that eventually, the new service stack can get all the traffic.
In reality, production services often serve a wide range of clients, and it could be challenging to ask every client to use the new API domain/URL. Luckily, we can still control which endpoints clients are using behind the scenes.
To understand how this approach works, we need to understand what is the role of the name server:
An NS record (or nameserver record) is a DNS record that contains the name of the authoritative name server within a domain or DNS zone. When a client queries for an IP address, it can find the IP address of their intended destination from an NS record via a DNS lookup.
In other words, a name server or DNS server contains all of the DNS zone files and records for a domain. As we mentioned in scenario A, when you create a hosted zone in Route53, by default, an NS record and SOA record will be created. Basically, these name servers know all the records you create in this hosted zone.
How do we make the clients using the weighted records we create in route53 without any changes? For example, the clients are using an endpoint, medium.com.
This domain is managed by some internal infra or a third-party tool, and you can get the corresponding name server record with this command:
dig medium.com +noall +answer NS
; <<>> DiG 9.10.6 <<>> medium.com +noall +answer NS
;; global options: +cmd
medium.com. 86400 IN NS alina.ns.cloudflare.com.
medium.com. 86400 IN NS kip.ns.cloudflare.com.
In this case, we want to use the name server in Route53 instead of the Cloudflare name server so that the DNS resolution will flow through Route53 instead of Cloudflare; then the weighted records we set up in Route53 hosted zone will be effective.
If you are interested in learning more about DNS or networking in general, this Coursera course from Google and this course from the University of Colorado give you an overview of computer networking. If you prefer books, there is a classic computer networking book Computer Networking: A Top-down Approach. Hope this helps!
I've compiled and curated a list of job-hunting resources for software developers; it covers resources for writing resumes, applying and managing job applications, efficient ways to prepare for coding interviews, and help to learn system design. You can download the free PDF here.