GAE datastore admin copy failing on MapReduce model to JSON conversion

I am trying to copy my app's datastore to another app using the datastore admin console, according to this documentation. Since my app uses the Java runtime, I installed the datastore admin Python sample as instructed. I setup the app to whitelist the other app server's ID and installed it as instructed. I used this same method to copy the datastore a couple of months ago and while the process didn't go entirely smoothly, it did end up working.

The tasks created by the datastore admin copy operation are not completing. There are 9 tasks in the default queue (one for each of my entity types I'm trying to copy). The tasks' method/URL is POST /_ah/mapreduce/kickoffjob_callback . They continuously attempt to retry their operations, but continuously fail. The tasks' headers are each something like:

X-AppEngine-Current-Namespace   
content-type    application/x-www-form-urlencoded
Referer         https://ah-builtin-python-bundle-dot-mysourceappid.appspot.com/_ah/datastore_admin/copy.do
Content-Length  970
Host            ah-builtin-python-bundle-dot-mysourceappid.appspot.com
User-Agent      AppEngine-Google; (+http://code.google.com/appengine)

The tasks' previous run results are each something like:

Dispatched time (UTC)       2013/05/26 08:02:47
Seconds late                0.00
Seconds to process task     0.50
Last http response code     500
Reason to retry             App Error

Under the destination app, the only indication I'm getting of there being any incoming copy operation is the log:

2013-05-26 01:55:37.798 /_ah/remote_api?rtok=66767762443
200 1832ms 0kb AppEngine-Google; (+http://code.google.com/appengine; appid: s~mysourceappid)
0.1.0.40 - - [26/May/2013:00:55:37 -0700] "GET /_ah/remote_api?rtok=66767762443 HTTP/1.1" 200 137 - "AppEngine-Google;
(+http://code.google.com/appengine; appid: s~mysourceappid)" "datastore-admin.mydestinationappid.appspot.com" ms=1833
cpu_ms=1120 cpm_usd=0.000015 loading_request=1 app_engine_release=1.8.0 instance=00c61b117c9beacd101ff92c542598f549f755cc
I 2013-05-26 01:55:37.797
This request caused a new process to be started for your application, and thus caused your application code to be loaded
for the first time. This request may thus take longer and use more CPU than a typical request for your application.

So the requests are at least causing an app instance to be spun up, but other than that, nothing is happening and the source app is just getting 500 server errors.

I've tried with writes enabled and disabled on both the source and destination datastores. I've double, triple and quadruple checked that the correct app IDs are registered in the Python datastore admin sample and uploaded the code to both app servers, even though it is only necessary on the destination server (they each whitelist the other's ID). I've tried with both HTTPS and HTTP URLs.

ah-builtin-python-bundle-dot-mysourceappid.appspot.com/_ah/mapreduce/status doesn't give any relevant information other than that there isn't any progress or activity on any of the tasks. If I try to abort the jobs from here, they fail to abort as well. In order to stop the jobs, I have to delete the tasks from the queue directly. I then have to manually clean up the entities left behind, including the _AE_DatastoreAdmin_Operation entity, which causes the datastore admin to still show the copy job as active and a bunch of _GAE_MR_MapreduceControl, _GAE_MR_MapreduceState and _GAE_MR_ShardState entities left behind as well.

What is going wrong? I can't believe there isn't any more relevant log data or info about where the process is failing as well.

UPDATE: I must have been tired last night and didn't think to look in the logs under the source app ah-builtin-python-bundle instance version, since this is where the datastore admin operations occur. This is the log output I'm getting there:

2013-05-27 00:49:11.967 /_ah/mapreduce/kickoffjob_callback 500 320ms 1kb AppEngine-Google; (+http://code.google.com/appengine)
0.1.0.2 - - [26/May/2013:23:49:11 -0700] "POST /_ah/mapreduce/kickoffjob_callback HTTP/1.1" 500 1608 "https://ah-builtin-
python-bundle-dot-mysourceappid.appspot.com/_ah/datastore_admin/copy.do" "AppEngine-Google;
(+http://code.google.com/appengine)" "ah-builtin-python-bundle-dot-mysourceappid.appspot.com" ms=320 cpu_ms=80
cpm_usd=0.000180 queue_name=default task_name=706762757133111420 app_engine_release=1.8.0
instance=00c61b117c5825670de2531f27693bdc2ffb71
E 2013-05-27 00:49:11.966
super(type, obj): obj must be an instance or subtype of type
Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 716, in __call__
    handler.post(*groups)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/base_handler.py", line 83, in post
    self.handle()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/handlers.py", line 1087, in handle
    spec, input_readers, queue_name, self.base_path(), state)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/handlers.py", line 1159, in _schedule_shards
    output_writer=output_writer))
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/handlers.py", line 718, in _state_to_task
    params=tstate.to_dict(),
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/model.py", line 805, in to_dict
    "input_reader_state": self.input_reader.to_json_str(),
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/model.py", line 165, in to_json_str
    json = self.to_json()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/mapreduce/input_readers.py", line 2148, in to_json
    json_dict = super(DatastoreKeyInputReader, self).to_json()
TypeError: super(type, obj): obj must be an instance or subtype of type

Looks like the copy task is crashing while trying to convert the MapReduce data model to JSON because the input reader isn't a subtype of DatastoreKeyInputReader . This must be a bug introduced in either version 1.8.0 or another version since 1.7.5, which was the current SDK version last time I ran a datastore copy operation.


For reference, this has been fixed and will be out soon.

https://code.google.com/p/googleappengine/issues/detail?id=9388

链接地址: http://www.djcxy.com/p/67464.html

上一篇: 我可以将图像保存到Blobstore并将数据保存到数据存储中吗?

下一篇: GAE数据存储管理员复制在MapReduce模型上失败以JSON转换