In this next section of the tutorial we will discuss a very important topic when it comes to microservice apps, circuit breakers.

circuit_breaker

One of the inherent problems when you have a distributed app like a microservice app is cascading failures across services.  Since the application is composed of several distributed services, there are more chances for failure when making remote calls.  In addition, distributed apps often tend to chain together service calls, so any problem with a service at the end of the chain can cause problems for all the services further up the chain.

As a service owner, we want to insulate ourselves from problems in our dependent services, so how do we do that?  One solution is to use the circuit breaker pattern which was introduced by Michael Nygard in his book Release It.

download

In the circuit breaker pattern, calls to remote services are protected by what is called a circuit breaker.  As long as there are no errors, the circuit stays closed and the remote calls are made as normal.  If the circuit detects a problem with the remote call than the circuit breaker is tripped and the circuit is opened stopping the remote call from being made.  Once the circuit detects that the remote call can successfully be made again the circuit will be closed allowing the remote call to be made.

In the Netflix and Spring Cloud world, the tool for implementing circuit breakers is called Hystrix.  Lets look at how we protect the remote service calls in our app using Hystrix.

Adding Circuit Breakers To Our Code

There are 2 places in our app where one service is calling another service.  The first place that should come to mind is where we are using the Zuul proxy (from part 2 of this tutorial).  The Zuul proxy is used in our web app to proxy our JavaScript calls from our web app to the other microservices.  Luckily Spring Cloud automatically protects all these calls with circuit breakers for you so there is nothing we really have to do :)

The other remote call we make in our app involves the API we added for mobile clients in the last blog post.  In this API we make a request to our Participants service from our Races service.  If for whatever reason our Participants service is down or not responding fast enough, our Races service will suffer.  Rather than have our Races service break because of issues with the Participants service, we can protect the remote call with a circuit breaker and actually return something back to our clients even if there is a problem with the Participants service.  What we respond back with might not be ideal or have all the information our clients need, but it is better than the call failing or timing out.  Lets take a look at how we use Hystrix in our Races service.

In order to use Hystrix we need to add the Hystrix dependency to our Races service.  Open the POM file for the Races service and add the following dependency in the dependencies section.

<dependency>
  <groupId>org.springframework.cloud</groupId>
  <artifactId>spring-cloud-starter-hystrix</artifactId>
</dependency>

Now that we have our dependency in place, we can start using Hystrix.  In our last blog post we created a Feign client to make remote calls to the Participants service and added a new REST API at /participants which used the Feign client to get participant information and add it to the race data.  We need to protect this call from potential failures, so the obvious solution would be to add a circuit breaker around the REST API.  Unfortunately, right now we cannot wrap a REST Controller in a circuit breaker (see this GitHub issue).  Because of this limitation we will need to break out the call to the Feign client into its own Bean.  Open OcrRacesApplication.java and update the OcrRacesApplication class and add a new bean called ParticipantsBean.

 

@SpringBootApplication
@RestController
@EnableEurekaClient 
@EnableFeignClients
@EnableCircuitBreaker
public class OcrRacesApplication implements CommandLineRunner {
	
	private static List<Race> races = new ArrayList<Race>();
	@Autowired
	private ParticipantsBean participantsBean;

    public static void main(String[] args) {
        SpringApplication.run(OcrRacesApplication.class, args);
    }

	@Override
	public void run(String... arg0) throws Exception {
		races.add(new Race("Spartan Beast", "123", "MA", "Boston"));
		races.add(new Race("Tough Mudder RI", "456", "RI", "Providence"));
	}
	
	@RequestMapping("/")
	public List<Race> getRaces() {
		return races;
	}
	
	@RequestMapping("/participants")
	public List<RaceWithParticipants> getRacesWithParticipants() {
		List<RaceWithParticipants> returnRaces = new ArrayList<RaceWithParticipants>();
		for(Race r : races) {
			returnRaces.add(new RaceWithParticipants(r, participantsBean.getParticipants(r.getId())));
		}
		return returnRaces;
	}
}  

 @Component
class ParticipantsBean {
	@Autowired
	private ParticipantsClient participantsClient;
	
	@HystrixCommand(fallbackMethod = "defaultParticipants")
	public List<Participant> getParticipants(String raceId) {
		return participantsClient.getParticipants(raceId);
	}
	
	public List<Participant> defaultParticipants(String raceId) {
		return new ArrayList<Participant>();
	}
}

In the OcrRacesApplication class we have added the @EnableCircuitBreake_r annotation to enable circuit breakers in our application.  The next change to this class is in our _/participants API where we are now calling our new bean instead of the Feign client directly.  In the new bean we just wrap the call to the Feign client in a method called getParticipants.  This is the method we wrap in a circuit breaker since it is the one using the remote service.  We enable the circuit breaker functionality by using the @HystrixCommand annotation on the method.  In the annotation we specify a fallback method to call if the circuit is open.  If the circuit is open we call the method in our bean called defaultParticipants which just returns an empty list.  You can do whatever you want in your fallback method, and generally it would be more sophisticated than returning an empty list, but for this sample an empty list is good enough.  In a production application, maybe our Races services would cache participants data so we have something to return if the circuit is ever open.

That is all we have to do, now our remote call to the Participants service is protected by a circuit breaker.

Hystrix Dashboard

Having circuit breakers in our services is nice, but how do we know the state of the circuits?  Luckily Netflix and Spring Cloud provide a web application called the Hystrix Dashboard that gives us the information we need.  This dashboard gives developers and operations insight into various statistics about the circuits in their applications such as success and failure rates.  In addition to the Hystrix Dashboard, Netflix and Spring Cloud also offer another tool called Turbine.  Turbine helps aggregate various streams of Hystrix data into a single stream so you don’t have to continuously switch streams in the dashboard to view data from different instances of a service.

To take advantage of these tools in our application, lets add a new service to our app to host them.  Go to start.spring.io and create a new project based on the following image.

 

Screen Shot 2015-10-07 at 9.16.21 AM

 

Make sure you add the Hystrix Dashboard and Turbine starters.  Once you have the form filled out, click Generate Project to download the zip and import the project into your workspace.  To enable the Hystrix Dashboard, we need to add a single annotation in com.ryanjbaxter.spring.cloud.ocr.hystrix.HystrixDashboardApplication.  Open this class file and add @EnableHystrixDashboard to the class file.

package com.ryanjbaxter.spring.cloud.ocr.hystrix;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.hystrix.dashboard.EnableHystrixDashboard;
import org.springframework.cloud.netflix.turbine.EnableTurbine;

@SpringBootApplication
@EnableHystrixDashboard 
public class HystrixDashboardApplication {

    public static void main(String[] args) {
        SpringApplication.run(HystrixDashboardApplication.class, args);
    }
}

The only other thing we have to do now is a little configuration.  We want to change the port our Hystrix Dashboard and Turbine services will be running on so go to src/main/resources in the new project and rename application.properties to application.yml.  Then add the following properties to the YAML file.

server:
  port: 8383

Start the application, which will be running on port 8383, and go to http://localhost:8383/hystrix.  You should see a page that looks like this.Screen Shot 2015-10-07 at 9.28.44 AM

The one required field in this form is a URL to either a Hystrix or Turbine stream.  We have not yet configured Turbine, so lets try a Hystrix stream.  Start up all the other services for the app (Eureka, Races, Participants, and Web) and wait for everything to register itself with Eureka.

Once everything is registered go to the web app at http://localhost:8080 and click on a race to view the participants.  This step is necessary in order to see any interresting statistics regarding the circuit breakers in Zuul.  Now go back to your Hystrix Dashboard, enter the URL http://localhost:8080/hystrix.stream, and click Monitor Stream.  The resulting dashboard should look something like the screen shot below.

Screen Shot 2015-10-07 at 9.55.14 AM

You will notice we have 2 circuit breakers, one for the call to proxy requests to the Races service, and the other for the call to proxy requests to the Participants service.  If you start refreshing the web app you will notice the dashboard change as it monitors requests through the circuit breakers.  However, you typically cannot do a very good job simulating load on a service by refreshing a page manually in a browser.  One tool that can better simulate load on our services is called Siege.  You can install Siege via your favorite package manager (Homebrew, Yum, Apt-Get, etc).  Once installed it is pretty easy to use.  For example, to hit the Races service through our Zuul proxy you would just do

$ siege http://localhost:8080/races

Once you do this, take a look at the Hystrix dashboard and you will notice the some more interresting data in the dashboard.

Screen Shot 2015-10-07 at 10.05.21 AM

 

For information about what all the numbers mean in the dashboard take a look at the Hystrix wiki page on GitHub.

What about monitoring the circuit breaker we added in the Races service?  First, lets make sure we hit the API so we have some data.  Go to http://localhost:8282/participants.  Then back on the Hystrix Dashboard landing page (http://localhost:8383/hystrix) enter the URL http://localhost:8282/hystrix.stream and click Monitor Stream.  When you do this you should get an error in the dashboard, why?

Screen Shot 2015-10-07 at 10.38.23 AM

This is because the stream functionality hasn’t been enabled in the application yet (Zuul automatically has it enabled, that is why it worked out of the box in our Web service).  To add the Hystrix Stream to the app we need to add the Spring Boot Actuator starter to the Races service (as specified in the Spring Cloud Netflix documentation).  Open the POM file for the Races service and add the following dependency in the dependencies section.

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Save the POM and restart the Races service.  Again hit the /participants API in your browser by going to http://localhost:8282/participants.  Now back on the Hystrix Dashboard landing page enter http://localhost:8282/hystrix.stream and click Monitor Stream.  Now you should see information about our getParticipants method protected by our circuit breaker.

Screen Shot 2015-10-07 at 10.53.26 AM

Again if you put the API under seige you will start to see more interesting data.  But what happens in the failure case?  If you shut down the Participants service, just by stopping it from running, and then hit the API or put the API under seige you should see the circuit open and the number of failures in the dashboard go up.

Screen Shot 2015-10-07 at 1.06.21 PM

In the above screenshot we see a number of requests have started failing (the purple number) and our error rate is starting to go up however the circuit is still closed.  In the below screenshot the circuit has finally opened due to the number of failing requests

 

Screen Shot 2015-10-07 at 10.57.39 AM

The number in blue (548) is the number of calls that have been short-circuited or have gone to our fallback method defined in our @HystrixCommand.  Since the circuit is open, if we hit the API in the browser we should see empty lists come back for the participants data since that is the behavior we defined in our fallback method.  Give it a shot, go to http://localhost:8282/participants.  Notice the data returned will look like this

[
   {
      "name":"Spartan Beast",
      "id":"123",
      "state":"MA",
      "city":"Boston",
      "participants":[

      ]
   },
   {
      "name":"Tough Mudder RI",
      "id":"456",
      "state":"RI",
      "city":"Providence",
      "participants":[

      ]
   }
]

No participant data as expected.

Now if we start the Participants service back up the circuit should close and our requests should again succeed.  But how does the circuit know the service is back up?  Periodically the circuit will let a couple of requests through to see if they succeed or not.  Notice the 2 failures (in red) in the screenshot below.  When these requests starts succeeding then the circuit will be closed and the requests will be let through.

Screen Shot 2015-10-07 at 1.10.50 PM

Now that the service is back up everything gets back to normal and we see the circuit close.

Screen Shot 2015-10-07 at 1.17.17 PM

 

Using Turbine

This is all nice, but switching between various Hystrix Streams can be a pain, it would be nice to be able to see streams based on service id.  This is where Turbine comes in.  When we created our Hystrix Dashboard project on start.spring.io we added the Turbine dependency to our project so we have all the dependencies we need already in our POM.  To enable Turbine we need to add a single annotation to com.ryanjbaxter.spring.cloud.ocr.hystrix.HystrixDashboardApplication.  Open the class file and add @EnableTurbine

package com.ryanjbaxter.spring.cloud.ocr.hystrix;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.cloud.netflix.hystrix.dashboard.EnableHystrixDashboard;
import org.springframework.cloud.netflix.turbine.EnableTurbine;

@SpringBootApplication
@EnableHystrixDashboard
@EnableTurbine
public class HystrixDashboardApplication {

    public static void main(String[] args) {
        SpringApplication.run(HystrixDashboardApplication.class, args);
    }
}

Now we need to tell Turbine which services we want it to aggregate the information for.  Open src/main/resources/application.yml and add the following configuration properties.

server:
  port: 8383
  
turbine:
  appConfig: web,races
  aggregator:
    clusterConfig: WEB,RACES

The turbine.appConfig property specifies the service names that you would like to aggregate.  These values just need to match the service IDs we have already configured with Eureka.  The turbine.aggregator.clusterConfig parameter are cluster names and they need to match the service IDs as well, the only difference being that they MUST be uppercase.

After adding the new annotation and the additional config restart the Hystrix Dashboard app.  Once the app is back up, head to the Hystrix Dashboard again (http://localhost:8383/hystrix).  Now instead of entering URLs to the individual Hystrix streams of our services, lets use Turbine.  Enter the URL http://localhost:8383/turbine.stream?cluster=WEB and click Monitor Stream.  This should bring up all the circuit breakers in the Web service (the one using our Zuul proxy).  You should see the circuit breakers for the Participants and Races routes displayed in the dashboard just like if you were monitoring the Hystrix Stream of the individual service.

What if we want to look at the circuit breakers for the Races service?  All we have to do is adjust the cluster query parameter in the Turbine URL to point to the Races cluster.  Go back to the Hystrix landing page by either clicking the back button in your browser or going to back to http://localhost:8383/hystrix.  Now enter the same Turbine URL but with the Races query parameter, http://localhost:8383/turbine.stream?cluster=RACES.  Again what you will see should be the same as if we were pointing at the Races Hystrix stream.  The obvious benefit here is we can monitor Hystrix streams based on service ID as opposed to URL.  The other benefit, which is less obvious, is that if we were to scale these services horizontally we would see statistics about all instances of that service instead of just an individual service.  This will be more useful when our application is running in the cloud…don’t worry we will get there eventually :)

As you can see, circuit breakers play a very important role in a microservices application, and as usual Spring Cloud makes it easy to not only use them in your application but monitor the state of them in your application.


Ryan J Baxter

Husband, Father, Software Engineer