paint-brush
Dubugging Cypress Seeds: 3 Ways to Speed Up Your Seed Databy@alto

Dubugging Cypress Seeds: 3 Ways to Speed Up Your Seed Data

by Alto EngineeringJanuary 26th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

If seeding the database for your end-to-end tests is taking too long—or not finishing at all—here are three ways to speed things up.

Company Mentioned

Mention Thumbnail
featured image - Dubugging Cypress Seeds: 3 Ways to Speed Up Your Seed Data
Alto Engineering HackerNoon profile picture

By Amy Chen, Software Engineer at Alto Pharmacy

At Alto, we use Cypress to test our Rails application end-to-end. As our application grows, so does our end-to-end test suite — along with the seed data it requires. This led us recently to a perplexing problem:  when run in CircleCI, our end-to-end tests would mysteriously time out during the seed step, causing them to fail and block deploys.

Seeds can be painful to debug. They’re typically set up through a single request and, depending on how thorough the tests are, call upon most of the models involved in the application. If a model throws, say, a validation error because of bad seed data, you'll see it in your Cypress logs. If a model hangs, the request often doesn't show up in the logs at all. There's no status code, no error message—all you know is that somewhere, somehow, some process is failing to finish.

In this post, I’ll share how we identified the problem, what the culprit was, and three ways to make seeding test data more robust.

To Identify the Problem, Focus on Your Logs

We started by comparing logs from our failing CircleCI runs with those from local test runs. Turns out, our CircleCI logs in failed runs stopped after a specific log message.

For example, when our test runs succeeded, the logs looked something like this:

...
LOG: creating model
WARN: foo with id=123 not found
WARN: foo with id=456 not found
WARN: bar with id=789 not found
ERROR: Invalid parameter to function `foobar`
LOG: model created
... rest of logs

But when they failed, they instead looked like this:

. . .
LOG: creating model
WARN: foo with id=123 not found
WARN: foo with id=456 not found
[end of logs]

This let us know where to focus: after the function that looks for 

foo
, but before or during the one that looks for 
bar
. Of course, this only works if you have enough log statements scattered around in the first place. I’d recommend logging the time it takes to create each model in seeds, which has the added benefit of telling you where to start if your seeds are becoming annoyingly slow.

At the top of our file in

db/seeds
, we declare the following function to add timings to our CI logs:

@last_time = Time.now
def log_time(label)
    Rails.logger.info("Time [#{label}]: #{(Time.now - @last_time)}")
    @last_time = Time.now
end

We then call the function after each

create
statement in our seeds:

Product.create!({...params})
log_time(“Product Create”)

This prints logs in the following format:

...log metadata message=Time [Product Create]: 12345
...log metadata message=Time [User Create]: 1102
...log metadata message=Time [Address Create]: 105
...

These logs will help you figure out problematic model 

create
 calls, but each model may have a lot going on in its callbacks (
before_save
before_commit
after_save
, etc).

You can follow a similar approach to time those callbacks:


# in the model file
Class Model < ApplicationRecord
	
...

before_save :foo

...

def foo()
   start_time = Time.now
   [ ... things that happen inside the function ]
   Rails.logger.info("Time [foo]: #{(Time.now - start_time)}")
   [ function return statement, if applicable ]
end

It’s important to note that the most time-consuming statements in your seeds often cause the most trouble, but that’s not always the case. If your seeds are hanging due to external API calls, it’s entirely possible for those calls to usually take a short time to complete.

If you have hanging tests, I would recommend logging test runs in CircleCI, since you may not be able to find the problem locally. 

Our Culprit: Third-Party API Calls

Through timing our seeds, we narrowed down our search to five of our most time-intensive models. We were then able to time the individual callbacks within those models. One callback  in particular—containing a call to an external database—raised the most suspicion. It took up most of the time it took to create the model, didn’t return any information needed in our test, and was not mocked or stubbed out. 

Requests to this test database would usually work, however, since we call on it multiple times on every CI run, with concurrent CI runs, these requests would sometimes take a long time to complete, leading to a test timeout. 

Three Ways to Speed Up Your Seed Data

In our case, since we were making calls to a hosted database in our tests, we changed our test environment to create and refer to an isolated instance of the database. While this was the best approach for our specific issue, we also explored some other ways of stubbing out API calls and skipping callbacks from your seeds file, which we describe below. Use these approaches with caution, as the more you stub, the less true-to-life your tests will be. 

1. Stub Third-Party API calls

Ideally, you would have already stubbed out third-party API calls in your code using environment variables. An example approach:

def do_the_thing()
    if Rails.env.production?
    	[ ...do the thing ]
    else
	[ ...do some fake thing if needed ]
       Rails.logger.info("We theoretically would have done the thing.")
end

However, you can also use Typhoeus to mock third-party api calls in seeds, similar to how you might in a test. Imagine that your Product model makes a call to "https://slack.com/api/foo”. We can then mock that call out as follows:

mock_response = Typhoeus::Response.new({code:500’, body: {
    “ok”: false,
    “error”: ‘not_authed’
}.to_json})
Typhoeus.stub("https://slack.com/api/foo”).and_return(mock_response)
Product.create!({...params})

Again, I would advise you to stub check for environment variables in your code rather than stub out individual API calls in your seeds and tests. But if for some reason you need to do the latter, it’s possible!

2. Stub Your Own Methods

Similarly, you might want to stub out a function you’ve written if it takes a long time to run and doesn’t affect the outcome of your tests. This can’t be stubbed out in the ways we mentioned above since (1) you might need it  for — and hopefully are testing it in  — other tests and (2) it doesn’t have a url. minitest has a tool for this. Imagine you’re trying to stub out some functions from a 

Photograph
 model.

require ‘minitest/autorun’

Photograph.stub :fetch_product_photo_url, ‘’ do
   Product.create!({...params})
end

This will return an empty string when 

Photograph.fetch_product_photo_url
 is called within the
Product
model’s callbacks. You can also replace the empty string with anything else you might want the function to return.

3. Skip Validation

Last but not least, it’s possible to skip entire callbacks and validation. I wouldn’t recommend skipping all of them, as that can cause models in your tests to stray very far from how your models actually work, but you may find it useful to skip specific callbacks.

While all the methods that I’ve suggested above are possible and can be useful, remember that stubbing and skipping from your seeds may just be band-aids for issues you have elsewhere. In our case, we could have stubbed out a model function or created an isolated database for testing.

While both methods would have been functionally the same, doing the latter allows us to have to worry less about that database in the future—if, for example, we write more functions that depend on it, or update the mocked functions to do more than they currently do. Keeping tests in line with how the product works also keeps it more in line with the actual user experience: if it’s taking a long time to create a model in your seeds, it probably takes a long time for your users, too. 

Since adding timing logs and creating an isolated database to our end-to-end tests, we have a regular source of truth for the efficiency of our models, are able to develop more smoothly with fewer spontaneous test failures, and have more confidence in our ability to quickly pinpoint and resolve seed timeouts in the future.