One of the main goal behind the micro services, is to build systems or subsystems that are isolated from each other where each of them have a single responsibility and by working together, bigger and more complex systems can be built.
In most cases, we need communication between different services. Communication should be done in a resilient and fault tolerant way. It is important to take care of cascading failures by avoiding propagation of errors across services.
Let’s compare Lagom to other widely adopted libraries.
Defining micro services in Lagom is an easy task. First, we need to define our service interface or contracts. These contracts are what is going to be exposed to clients.
For simplicity, let’s implement a small micro service that returns a list of countries and then use it to show some important constructs.
As we can see, we are exposing /countries
and it should return a list of the Country
class.
I don’t intend to go over how Lagom works, but you can look at Lagom docs
Now, we need to implement our service.
Notice that the list of countries is coming from the countriesRepository
and the service is using the underlying data access implementation through the ICountriesRepository
interface.
When using this service, potentially from another service. Whoever uses CountriesService
implementation we say it is a client
of the service.
At this point, we have introduced at least two points of failure, one the CountriesService
itself, and the second, the underlying data access that the CountriesService
is using. The question is how to manage failures of these dependencies so we don't cascade them making entire subsystems to fail.
Circuit Breakers are a standard way to avoid cascading failures by stopping downstream calls once a failure has occurred on the downstream dependency.
There are few things we need from a circuit breaker.
The way to go for Circuit Breakers seems to be Hystrix, an open source library created by Netflix. Hystrix provides a solid implementation of circuit breakers that has been widely adopted by the industry.
Even though Hystrix provides for a great number of circuits breaker patterns, there is an associated cost of adding such a library to an existing and ongoing project. Code complexity will tend to increase while the addition of new components requires a learning curve from the development team.
Let’s do a comparison between what Hystrix offers and what we already have in the Lagom framework.
From the point of view of CountriesService
, the underlying implementation of the data access to retrieve the list of countries is a potential point of failure. Network call timeouts, connection drops or degradation of the data access subsystem are only few examples of what can go wrong.
By using Hystrix we can avoid degradation of our service when there is a problem with the data access layer.
From the point of view of CountriesService
, ICountriesRepository
is a dependency and circuit breakers are a client's concern, we should manage them on the client side, CountriesService
in this case.
We can create a Hystrix command that wraps the call to ICountriesRepository
in the following way.
Our service needs to be changed so it uses the command instead of calling directly the repository.
If there is a problem accessing the underlying data that ICountriesRepository
uses, the circuit breaker (the hystrix command) will be activated as expected.
Notice that the complexity of this pattern increases rapidly since for every call to the ICountriesRepository
we will need to create a Hystrix command.
Based on the micro service principle that each micro service must own its own data, should we say that if a micro service fails to access it’s underlying storage (Cassandra, Kafka, MySQL, etc…) the micro service itself fail?
Lagom takes this into consideration, and exceptions on the data layer of service can be propagated through the service which activate a circuit breaker at the service level that protects the clients of our service. This implies that the extra protection that we can get from Hystrix is redundant at this point.
Let’s suppose our service’s data is coming from another service. In this case, our service is client of the data service. Let’s look at an alternative implementation of our service.
In this case, our service is almost identical. We are wrapping the call to the dependency within a Hystrix command that manage the circuit breaker logic if there is something wrong with the dependency.
In some cases, we want to provide a fallback result if our dependency is down, we can do that within the Hystrix command by implementing getFallback()
in the following way.
If there are errors on the ContentService dependency, the fallback value will be provided until the breaker is close, which means our dependency is ready to be used again.
On the other hand, Lagom offers the same functionality already without introducing new concepts. Let’s see how.
Notice that we are using .exceptionally
to provide the fallback value. Also, Lagom activates a circuit breaker when the service dependency malfunction which is the same behavior we can get from Hystrix. At the same time, we don't need to create Hystrix command for each of the calls on the dependencies, which reduces the complexity around our code.
From the point of view of our client (service that call our service in order to obtain certain functionality), it does not matter what library for circuit breakers we use since circuit breakers are client responsibility, no service responsibility. Our clients should have a way to monitor exceptions coming from our service so they can take the required steps to avoid cascading failures.
Is the client of our service is another Lagom service, this functionality is already built in as we saw when using ContentService
. Is the client of our service, is another kind of application, they can use the circuit breaker strategy they like along with the library of their choice. Again, this a client concern, not a service concern.
.exceptionally
function.If you really need to have circuit breakers around you data access calls, that is Cassandra and other, Hystrix seems to be a good choice to go with.