crawler class methods work?

I need to add the following class method to my existing pipeline http://doc.scrapy.org/en/latest/faq.html#im-getting-an-error-cannot-import-name-crawler

i am not sure how to have 2 of these class methods in my class

from twisted.enterprise import adbapi
import MySQLdb.cursors

class MySQLStorePipeline(object):
    """A pipeline to store the item in a MySQL database.
    This implementation uses Twisted's asynchronous database API.
    """
    def __init__(self, dbpool):
        self.dbpool = dbpool

    @classmethod
    def from_settings(cls, settings):
        dbargs = dict(
            host= settings['DB_HOST'],
            db= settings['DB_NAME'],
            user= settings['DB_USER'],
            passwd= settings['DB_PASSWD'],
            charset='utf8',
            use_unicode=True,
        )
        dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
        return cls(dbpool)

    def process_item(self, item, spider):
        pass

From my understanding of class methods, several class methods in a python class should just be fine. It just depends on which one the caller requires. However, I have only seen from_crawler until now in scrapy pipelines. From there you can get access to the settings via crawler.settings

Are you sure that from_settings is required? I did not check all occurences, but in middleware.py priority seems to apply: If a crawler object is available and a from_crawler method exists, this is taken. Otherwise, if there is a from_settings method, that is taken. Otherwise, the raw constructor is taken.

if crawler and hasattr(mwcls, 'from_crawler'):                  
    mw = mwcls.from_crawler(crawler)                            
elif hasattr(mwcls, 'from_settings'):                           
    mw = mwcls.from_settings(settings)                          
else:                                                           
    mw = mwcls()

I admit, I do not know if this is also the place where pipelines get created (I guess not, but there is no pipelines.py), but the implementation seems very reasonable.

So, I'd just either:

  • reimplement the whole method as from_crawler and only use that one
  • add method from_crawler and use both
  • The new method could look like follows (to duplicate as little code as possible):

    @classmethod
    def from_crawler(cls, crawler):
        obj = cls.from_settings(crawler.settings)
        obj.do_something_on_me_with_crawler(crawler)
        return obj
    

    Of course this depends a bit on what you need.

    链接地址: http://www.djcxy.com/p/54288.html

    上一篇: python局部变量vs self

    下一篇: 爬虫类方法的工作?