Running Selenium headless with multiple spiders

2018-06-11 07:42:16

I have many scrapy spiders which run in parallel using scrapyd. What I am doing is something like the following code.

My question is, do I really need to start a display for every spider and how does the driver know to start using which display? Should I just start one display globally and start multiple webdriver instances within the same Display?

def __init__(self):
    dispatcher.connect(self.spider_closed, signals.spider_closed)

def spider_closed(self, spider):
    if self.driver:
        self.driver.quit()

    if self.display:
        self.display.stop()

def parse(self, response):
    self.display = Display(visible=0, size=(1024, 768))
    self.display.start()
    self.driver = webdriver.Firefox()

    self.driver.get(response.url)
    page = Selector(text=self.driver.page_source)

    # doing all parsing etc

I suggest using the splinter browser handler instead; it is a wrapper around selenium. It solves your problem exactly, as the Display handling is done by the package.

With a few more package installations, you can also remove the need for a Display altogether, meaning splinter is now headless (the browser window does not open, and it is much faster). Check out the Splinter docs to know how to make in headless. I personally suggest the PhantomJS driver, even though you'll have to install the non-Python PhantomJS program.

链接地址: http://www.djcxy.com/p/32500.html

上一篇: 静态成员中的通用参数声明

下一篇: 用多个蜘蛛跑无头的硒