How to gracefully restart delayed

I'm wondering what's the best way to gracefully restart delayed_job consumers after a new code push? I'm pushing code with capistrano and I know there are commands to restart, but if there are jobs currently running, the command either hangs (and my deploy takes forever) or it forcefully quits the currently running job and I lose data.

Ideally I'd like my deploy to happen like this:

  • Existing delayed_job consumer is running with version 1 code
  • I run cap deploy and version 2 code is pushed out to new servers
  • During the deploy, we touch a file to tell the delayed_job to restart when it's done processing the current job. This can be done a bunch of different ways, but I was thinking it would be similar to how passenger is gracefully restarted
  • Existing delayed_job consumer continues to finish the current job with version 1 code
  • Current job finishes, delayed_job consumer sees that it needs to restart itself before continuing to process jobs
  • delayed_job consumer automatically restarts, now running version 2 code
  • delayed_job consumer continues to process jobs, now running on version 2 code
  • I've tried to insert some code to restart before a job runs by checking the current revision of the code but every time I do that, it just dies and doesn't actually restart anything. Sample code below:

    def before(job)
      # check to make sure that the version of code here is the right version of code
      live_git_hash = LIVE_REVISION
      local_git_hash = LOCAL_REVISION
    
      if live_git_hash != local_git_hash
        # get environment to reload in
        environment = Rails.env # production, development, staging
    
        # restart the delayed job system
        %x("export RAILS_ENV=#{environment} && ./script/delayed_job restart")
      end
    end
    

    It detects it just fine but it dies on the shell call. Any ideas?

    Thanks!


    Came up with a solution that works.

    I have a base class that all of my delayed jobs inherit from called BaseJob :

    class BaseJob
      attr_accessor :live_hash
    
      def before(job)
        # check to make sure that the version of code here is the right version of code
        resp = HTTParty.get("#{Rails.application.config.root_url}/revision")
        self.live_hash = resp.body.strip
      end
    
      def should_perform()
        return self.live_hash == GIT_REVISION
      end
    
      def perform()
        if self.should_perform == true
          self.safe_perform()
        end
      end
    
      def safe_perform()
        # override this method in subclasses
      end
    
      def success(job)
        if self.should_perform == false
          # log stats here about a failure
    
          # enqueue a new job of the same kind
          new_job = DelayedJob.new
          new_job.priority = job.priority
          new_job.handler = job.handler
          new_job.queue = job.queue
          new_job.run_at = job.run_at
          new_job.save
          job.delete
    
          # restart the delayed job system
          %x("export RAILS_ENV=#{Rails.env} && ./script/delayed_job stop")
        else
          # log stats here about a success
        end
      end
    
    end
    

    All base classes inherit from BaseJob and override safe_perform to actually do their work. A few assumptions about the above code:

  • Rails.application.config.root_url points to the root of your app (ie: www.myapp.com)
  • There is a route exposed called /revision (ie: www.myapp.com/revision)
  • There is a global constant called GIT_REVISION that your app knows about
  • What I ended up doing was putting the output of git rev-parse HEAD in a file and pushing that with the code. That gets loaded in upon startup so it's available in the web version as well as in the delayed_job consumers.

    When we deploy code via Capistrano, we no longer stop, start, or restart delayed_job consumers. We install a cronjob on consumer nodes that runs every minute and determines if a delayed_job process is running. If one isn't, then a new one will be spawned.

    As a result of all of this, all of the following conditions are met:

  • Pushing code doesn't wait on delayed_job to restart/force kill anymore. Existing jobs that are running are left alone when new code is pushed.
  • We can detect when a job begins if the consumer is running old code. The job gets requeued and the consumer kills itself.
  • When a delayed_job dies, a new one is spawned via a cronjob with new code (by the nature of starting delayed_job, it has new code).
  • If you're paranoid about killing delayed_job consumers, install a nagios check that does the same thing as the cron job but alerts you when a delayed_job process hasn't been running for 5 minutes.
  • 链接地址: http://www.djcxy.com/p/75374.html

    上一篇: 如何通过目标站点上的websockets流式传输JSON数据

    下一篇: 如何正常重启延迟