Zuul/Ribbon/Hystrix not retrying on different instance

Background

I'm using Spring cloud Brixton.RC2, with Zuul and Eureka.

I have one gateway service with @EnableZuulProxy and a book-service with a status method. Via configuration I can emulate work on the status method by sleeping a defined amount of time.

The Zuul route is simple

zuul.routes.foos.path=/foos/**
zuul.routes.foos.serviceId=reservation-service

I run two instances of the book-service . When I set the sleeping time below the Hystrix timeout threshold (1000ms) I can see requests going to both instance of the book services. This works well.

Problem

I understand that if the Hystrix command fails, it should be possible for Ribbon to retry the command on a different server. This should make the failure transparent to the client.

I read the Ribbon configuration and added the following configuration in Zuul:

zuul.routes.reservation-service.retryable=true //not sure which one to try
zuul.routes.foos.retryable=true //not sure which one to try

ribbon.MaxAutoRetries=0 // I don't want to retry on the same host, I also tried with 1 it doesn't work either
ribbon.MaxAutoRetriesNextServer=2
ribbon.OkToRetryOnAllOperations=true

Now I update the configuration so that only one service sleeps for more than 1s, which means that I have one health service, and one bad one.

When I call the gateways the calls get send to both instances, and half the calls returns a 500. In the gateway I see the Hystrix timeout:

com.netflix.zuul.exception.ZuulException: Forwarding error
    [...]
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: reservation-service timed-out and no fallback available.
    [...]
Caused by: java.util.concurrent.TimeoutException: null

Why isn't Ribbon retrying the call on the other instance?

Am I Missing something here?


References

  • Relates to this question (not solved)
  • Ribbon configuration
  • According to this commit Zuul should support retries for Ribbon

  • By default Zuul uses SEMAPHORE isolation strategy which doesn't allow to set a timeout. I haven't been able to use load balancing with this stragegy. What worked for me was (following your example):

    1) Changing Zuul's isolation to THREAD:

    hystrix:
      command:
        reservation-service:
          execution:
            isolation:
              strategy: THREAD
              thread:
                timeoutInMilliseconds: 100000
    

    IMPORTANT : timeoutInMilliseconds= 100000 is like saying no HystrixTimeout. Why? Because if Hystrix times out there won't be any load balancing (I just tested it playing with timeoutInMilliseconds)

    Then, configure Ribbon's ReadTimeout to the desired value:

    reservation-service:
      ribbon:
        ReadTimeout: 800
        ConnectTimeout: 250
        OkToRetryOnAllOperations: true
        MaxAutoRetriesNextServer: 2
        MaxAutoRetries: 0
    

    In this case after the 1sec service times out in Ribbon it'll retry with the 500ms service

    Below you have the log I got in my zuul instance:

    o.s.web.servlet.DispatcherServlet        : DispatcherServlet with name 'dispatcherServlet' processing GET request for [/api/stories]
    o.s.web.servlet.DispatcherServlet        : Last-Modified value for [/api/stories] is: -1
    c.n.zuul.http.HttpServletRequestWrapper  : Path = null
    c.n.zuul.http.HttpServletRequestWrapper  : Transfer-Encoding = null
    c.n.zuul.http.HttpServletRequestWrapper  : Content-Encoding = null
    c.n.zuul.http.HttpServletRequestWrapper  : Content-Length header = -1
    c.n.loadbalancer.ZoneAwareLoadBalancer   : Zone aware logic disabled or there is only one zone
    c.n.loadbalancer.LoadBalancerContext     : storyteller-api using LB returned Server: localhost:7799 for request /api/stories
    
    ---> ATTEMPTING THE SLOW SERVICE
    
    com.netflix.niws.client.http.RestClient  : RestClient sending new Request(GET: ) http://localhost:7799/api/stories
    c.n.http4.MonitoredConnectionManager     : Get connection: {}->http://localhost:7799, timeout = 250
    com.netflix.http4.NamedConnectionPool    : [{}->http://localhost:7799] total kept alive: 1, total issued: 0, total allocated: 1 out of 200
    com.netflix.http4.NamedConnectionPool    : No free connections [{}->http://localhost:7799][null]
    com.netflix.http4.NamedConnectionPool    : Available capacity: 50 out of 50 [{}->http://localhost:7799][null]
    com.netflix.http4.NamedConnectionPool    : Creating new connection [{}->http://localhost:7799]
    com.netflix.http4.NFHttpClient           : Attempt 1 to execute request
    com.netflix.http4.NFHttpClient           : Closing the connection.
    c.n.http4.MonitoredConnectionManager     : Released connection is not reusable.
    com.netflix.http4.NamedConnectionPool    : Releasing connection [{}->http://localhost:7799][null]
    com.netflix.http4.NamedConnectionPool    : Notifying no-one, there are no waiting threads
    
    --- HERE'S RIBBON'S TIMEOUT
    
    c.n.l.reactive.LoadBalancerCommand       : Got error com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out when executed on server localhost:7799
    c.n.loadbalancer.ZoneAwareLoadBalancer   : Zone aware logic disabled or there is only one zone
    c.n.loadbalancer.LoadBalancerContext     : storyteller-api using LB returned Server: localhost:9977 for request /api/stories
    
    ---> HERE IT RETRIES
    
    com.netflix.niws.client.http.RestClient  : RestClient sending new Request(GET: ) http://localhost:9977/api/stories
    c.n.http4.MonitoredConnectionManager     : Get connection: {}->http://localhost:9977, timeout = 250
    com.netflix.http4.NamedConnectionPool    : [{}->http://localhost:9977] total kept alive: 1, total issued: 0, total allocated: 1 out of 200
    com.netflix.http4.NamedConnectionPool    : Getting free connection [{}->http://localhost:9977][null]
    com.netflix.http4.NFHttpClient           : Stale connection check
    com.netflix.http4.NFHttpClient           : Attempt 1 to execute request
    com.netflix.http4.NFHttpClient           : Connection can be kept alive indefinitely
    c.n.http4.MonitoredConnectionManager     : Released connection is reusable.
    com.netflix.http4.NamedConnectionPool    : Releasing connection [{}->http://localhost:9977][null]
    com.netflix.http4.NamedConnectionPool    : Pooling connection [{}->http://localhost:9977][null]; keep alive indefinitely
    com.netflix.http4.NamedConnectionPool    : Notifying no-one, there are no waiting threads
    o.s.web.servlet.DispatcherServlet        : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
    o.s.web.servlet.DispatcherServlet        : Successfully completed request
    o.s.web.servlet.DispatcherServlet        : DispatcherServlet with name 'dispatcherServlet' processing GET request for [/favicon.ico]
    o.s.w.s.handler.SimpleUrlHandlerMapping  : Matching patterns for request [/favicon.ico] are [/**/favicon.ico]
    o.s.w.s.handler.SimpleUrlHandlerMapping  : URI Template variables for request [/favicon.ico] are {}
    o.s.w.s.handler.SimpleUrlHandlerMapping  : Mapping [/favicon.ico] to HandlerExecutionChain with handler [ResourceHttpRequestHandler [locations=[ServletContext resource [/], class path resource [META-INF/resources/], class path resource [resources/], class path resource [static/], class path resource [public/], class path resource []], resolvers=[org.springframework.web.servlet.resource.PathResourceResolver@a0d875d]]] and 1 interceptor
    o.s.web.servlet.DispatcherServlet        : Last-Modified value for [/favicon.ico] is: -1
    o.s.web.servlet.DispatcherServlet        : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
    o.s.web.servlet.DispatcherServlet        : Successfully completed request
    
    链接地址: http://www.djcxy.com/p/34260.html

    上一篇: LINQ OrderBy具有投影比较器的匿名对象

    下一篇: Zuul / Ribbon / Hystrix不会在不同的实例上重试