Recommended strategies for backing up appengine datastore

Right now I use remote_api and appcfg.py download_data to take a snapshot of my database every night. It takes a long time (6 hours) and is expensive. Without rolling my own change-based backup (I'd be too scared to do something like that), what's the best option for making sure my data is safe from failure?

PS: I recognize that Google's data is probably way safer than mine. But what if one day I accidentally write a program that deletes it all?


I think you've pretty much identified all of your choices.

  • Trust Google not to lose your data, and hope you don't accidentally instruct them to destroy it.
  • Perform full backups with download_data , perhaps less frequently than once per night if it is prohibitively expensive.
  • Roll your own incremental backup solution.
  • Option 3 is actually an interesting idea. You'd need a modification timestamp on all entities, and you wouldn't catch deleted entities, but otherwise it's very doable with remote_api and cursors.

    Edit :

    Here's a simple incremental downloader for use with remote_api. Again, the caveats are that it won't notice deleted entities, and it assumes all entities store the last modification time in a property named updated_at. Use it at your own peril.

    import os
    import hashlib
    import gzip
    from google.appengine.api import app_identity
    from google.appengine.ext.db.metadata import Kind
    from google.appengine.api.datastore import Query
    from google.appengine.datastore.datastore_query import Cursor
    
    INDEX = 'updated_at'
    BATCH = 50
    DEPTH = 3
    
    path = ['backups', app_identity.get_application_id()]
    for kind in Kind.all():
      kind = kind.kind_name
      if kind.startswith('__'):
        continue
      while True:
        print 'Fetching %d %s entities' % (BATCH, kind)
        path.extend([kind, 'cursor.txt'])
        try:
          cursor = open(os.path.join(*path)).read()
          cursor = Cursor.from_websafe_string(cursor)
        except IOError:
          cursor = None
        path.pop()
        query = Query(kind, cursor=cursor)
        query.Order(INDEX)
        entities = query.Get(BATCH)
        for entity in entities:
          hash = hashlib.sha1(str(entity.key())).hexdigest()
          for i in range(DEPTH):
            path.append(hash[i])
          try:
            os.makedirs(os.path.join(*path))
          except OSError:
            pass
          path.append('%s.xml.gz' % entity.key())
          print 'Writing', os.path.join(*path)
          file = gzip.open(os.path.join(*path), 'wb')
          file.write(entity.ToXml())
          file.close()
          path = path[:-1-DEPTH]
        if entities:
          path.append('cursor.txt')
          file = open(os.path.join(*path), 'w')
          file.write(query.GetCursor().to_websafe_string())
          file.close()
          path.pop()
        path.pop()
        if len(entities) < BATCH:
          break
    
    链接地址: http://www.djcxy.com/p/9916.html

    上一篇: 一个具有两个非泛型的泛型类

    下一篇: 推荐的用于备份appengine数据存储的策略