279 reads

Instrumention and Monitoring API in Node.js

by CorentinJuly 29th, 2020

Too Long; Didn't Read

Node.js lets you automatically log requests, monitor APIs, handle problems with failure remediations, and more. The concept of instrumentation is easier in Node.JS thanks to the ecosystem's reliance on the underlying http module. Nearly every API client and request library used by node developers relies on this module. In this post, we'll look at the building blocks needed to add observability to every request your application makes. This means direct insight into your application without the need to configure logging for each request individually.

Company Mentioned

featured image - Instrumention and Monitoring API in Node.js

The concept of instrumentation often refers to tracing where events happen in an application. Many application performance monitoring (APM) tools use it to provide metrics on the inner workings of your application. But sometimes, all you really need are details about API calls.

Adding a hook into every HTTP request your application makes will allow you to automatically log requests, monitor APIs, handle problems with failure remediations, and more. This holds true for both internal requests to your own services, but more importantly, it works with any request to external third-party APIs. Even those that use their own client SDK.

Creating a full system to manage this is a bit harder. You need a dashboard to view the metrics, storage to handle the logs, and a way to capture the data. Fortunately, the concept of HTTP instrumentation is easier in Node.js thanks to the ecosystem's reliance on the underlying http module. Nearly every API client and request library used by node developers relies on this module.

In this post, we'll look at the building blocks needed to add observability to every request your application makes. This means direct insight into your API calls, without the need to configure logging for each individually.

How it works

To make our instrumentation layer, we will patch the core methods of the http/https module. This means overriding their functionality, performing some action, then calling the original method. It sounds more complex than it is. To start, let's look at a minimum example, without any new features. We can create a module in hijack.js as follows:

// hijack.js
const http = require("http")

function hijack() {
  override(http)
}

function override(module) {
  let original = module.request

  function wrapper(outgoing) {
    // Store a call to the original in req
    let req = original.apply(this, arguments)
    // return the original call
    return req
  }

  module.request = wrapper
}

module.exports = hijack

Let's break down what this code is doing. We import the http module at the top. For a more complete version, we'd also need coverage for the https module. Next, the hijack function sets up the patches for http by calling an override function. The override function does three things:

It makes a reference to the original request method.
It creates a wrapper that accepts the original request parameters and returns the original, with the apply method called (more on this shortly).
It overrides the original module's request, module.request with our new wrapper function. This means http.request is now set to the wrapper function.

What is that apply method?

The .apply method exists on all functions in javascript. It allows you to "call" the function, and pass it the this context and an array of arguments. It is similar to .call, except for the way it structures the arguments.

For our purposes, we are using it to call the original request, essentially unchanged. We aren't modifying the this context or the arguments. This is what allows us to step in-between the request and the destination without interfering.

Logging the request

One common use case for http instrumentation is to automatically log parts of the request. This provides observability and metrics about the request. In the wrapper function above, notice that we accept an argument called req. This is somewhat misleading, as it may not be the request. What it actually accepts are the arguments passed to http.request.

Depending on the client that is using http under the hood, this may be different. For most, like axios, our example will work fine. For others, you'll need to write more logic to handle edge cases. For these examples, we can assume that the first argument, arguments[0], maps to the request.

Let's make a function that handles logging. For this example, it will parse the request and log to the console. You could also send this data to a logging service, write it to a file, or whatever suits your needs.

// hijack.js
function logger(req) {
  let log = {
    method: req.method || "GET",
    host: req.host || req.hostname || "localhost",
    port: req.port || "443",
    path: req.pathname || req.path || "/",
    headers: req.headers || {},
  }
  console.log(log)
}

While mosts request should be uniform, we add in some optional fallbacks to avoid any undefined values. We capture a limited amount of information from the request in this implementation. You can view all the available options by inspecting or logging out req.

When combined with our code from earlier, we get something like the following:

function override(module) {
  let original = module.request

  function wrapper(outgoing) {
    let req = original.apply(this, arguments)
    logger(outgoing) // NEW
    return req
  }

  module.request = wrapper
}

Capturing the response

We captured the details about the request, but not the body of the request. More on that later. For now, let's look at capturing the response that comes back. This requires using the same technique we used earlier of overriding the default functionality of a method. Our goal is to listen to the response events that fire when an http request comes back. To recap, we need to:

Make a reference to the original method.
Create our own function to use in its place.
Return the original, with .apply used to call it as normal.

In this case, we want information about the events that http.request emits when a response comes in. Specifically, the data and end events. To do this, we will patch the functionality of request.emit. An HTTP request emits events, such as the response event, that themselves emit events. If you're familiar with making HTTP calls using the standard http library, this will look similar.

To start, let's override emit. We will only focus on code inside of the wrapper.

function wrapper(outgoing) {
  let req = original.apply(this, arguments)
  let emit = req.emit // New

  // New
  req.emit = function (eventName, response) {
    switch (eventName) {
      case "response": {
        response.on("data", (d) => {
          // build body from chunks of data
        })

        response.on("end", () => {
          // handle final response
        })
      }
    }
    return emit.apply(this, arguments)
  }

  logger(outgoing)
  return req
}

In this code, we capture a reference to req.emit in emit. This allows us to do the next step, where we override req.emit with our own function. It takes the same two arguments that a node event emitter would take. The name of the event, and any specific arguments (in this case, the response).

We then set up some listeners when a response event comes in. You could add additional switch cases for error, abort, timeout, or any events emitted by the http.clientRequest. For now, we'll only focus on responses that successfully came back.

Finally, we return emit.apply(this, arguments) similar to how we did earlier. This ensures that the "real" request still emits events as expected.

Let's fill in the switch block to handle the response body, and log some details about the response to the console:

function wrapper(outgoing) {
  let req = original.apply(this, arguments)
  let emit = req.emit
  let body = ""

  req.emit = function (eventName, response) {
    switch (eventName) {
      case "response": {
        response.on("data", (d) => {
          // NEW: Collect data chunks
          body += d
        })

        response.on("end", () => {
          // NEW: Complete response
          let res = {
            statusCode: response.statusCode,
            headers: response.headers,
            message: response.statusMessage,
            body,
          }
          console.log(res)
        })
      }
    }
    return emit.apply(this, arguments)
  }

  logger(outgoing)
  return req
}

The two main changes here are:

We define a variable, body, to hold the data from the response.
We build the data each time the event fires. (You could alternately do this as an array and concat the chunks to a Buffer).
We log out some useful data and the body to the console.

This is another case where creating a dedicated logging method would be useful. You can also combine the request and response details into a single log, attach timestamps or use process.hrtime() to time the requests, and more.

That covers the bulk of our introductory implementation. The final code looks like this:

// hijack.js
const http = require("http")
const https = require("https")

function hijack() {
  override(http)
  override(https)
}

function override(module) {
  let original = module.request
  function wrapper(outgoing) {
    let req = original.apply(this, arguments)
    let emit = req.emit
    let body = ""

    req.emit = function (eventName, response) {
      switch (eventName) {
        case "response": {
          response.on("data", (d) => {
            // NEW: Collect data chunks
            body += d
          })

          response.on("end", () => {
            // NEW: Complete response
            let res = {
              statusCode: response.statusCode,
              headers: response.headers,
              message: response.statusMessage,
              body,
            }
            console.log(res)
          })
        }
      }
      return emit.apply(this, arguments)
    }

    logger(outgoing)
    return req
  }

  function logger(req) {
    let log = {
      method: req.method || "GET",
      host: req.host || req.hostname || "localhost",
      port: req.port || "443",
      path: req.pathname || req.path || "/",
      headers: req.headers || {},
    }
    console.log(log)
  }

  module.request = wrapper
}

module.exports = hijack

To get it up and running in your application, you would then include the following at the start of your application logic:

const hijack = require("./hijack")

hijack()

// ...

Now the hijack module will pick up every outbound HTTP request in your application. Metrics and instrumentation, all with a small library. This is just the beginning!

What more can you do?

So far we've logged the request and the response, without interrupting the original requests too much. Some additional functionality could include:

Modifying requests: With full access to the arguments that get passed back to original.apply, you can modify any data before sending it. This allows you to modify URLs, insert additional headers, and more. For example, if an API changes versions, you can replace parts of the path.
Capture the request body: In our example we only capture details about the request, but by overriding the write method you could also capture the body payload in a way similar to how we captured the response.
React to failures automatically: Since we step in-between the request and the server, we can make changes on-the-fly. This means things like retrying failed requests, changing target URLs during downtimes, forcing timeouts, and more.
Automatically catch http vs https usage: Right now the implementation above requires manually setting the module. If you're using a mixed codebase, this could cause problems. Instead, write a way to patch both modules.
Handle .get: The http module includes a shorthand helper for GET requests. It uses http.request under the hood, but it isn't affected by our logic above.
Wrap the logic in error handling: If any of our interception code fails, so will the original request. To get around this, you'll want to make sure to wrap the necessary parts in try/catch blocks and ensure that the apply calls fire no matter what happens.

Wrapping up

HTTP instrumentation is much easier in Node.js than many languages due to Node's use of the http module under the hood. In other languages, you'd need to handle individual libraries on a case-by-case basis. You also need to take into account how to process all this data, how to obfuscate sensitive content, and more.

Fortunately, that's what we're solving at Bearer. We have built clients for multiple languages that work similarly to the core idea of this post (albeit far more complex). They automatically log the data you want (and nothing you don't want shared), and more importantly react when failures happen through circuit breakers and other remediation techniques.

Previously published at https://blog.bearer.sh/http-api-instrumentation-nodejs/