Data: Import large dataset

i'm facing a situation where i need to import a possibly large (20.000+) dataset into core data. The data is retrieved from a webservice in JSON format. As for the import it's a simple update-or-create kind of thing and also represents a hierachichal structure so for every entity i'm setting a parent entity (except top-level of course). Currently the process is running too slow and maybe taking up to much memory. So i have to optimize and i have questions concerning best practices in doing so.

First of all i use a separate thread with a child NSManagedObjectContext to the importing so my UI thread doesn't get stuck. The basic principle is working.

First of all i want to process the data in batches. Probably the best solution is to only parse a portion of the JSON answer to objects and then process them. I would then implement the find-or-create efficiency described in https://developer.apple.com/library/mac/DOCUMENTATION/Cocoa/Conceptual/CoreData/Articles/cdImporting.html.

My questions are:

  • What would a good batch size be? 1000?

  • As i need to find-and-set a parent entity for each entity, my approach would be to do this in a second iteration after the batch was processed without the parent. This way i could do a batch-fetch for the parents as well. Is this a good idea/is there a better way?

  • After each batch i would reset the child MOC and save in the parent moc. Is this enough? Do i need to do more?

    [self.childmoc reset];
    dispatch_async(dispatch_get_main_queue(), ^(void) {
        [self.moc save];
    });
    
  • Currently i load the data via AFNetworking, which is capable of JSON parsing automatically. When refactoring what would be the best way to split the received answer into separate files (each one a batch) without breaking json objects? What JSON parser does AFNetworking use (AFJSONResponseSerializer)? And can i use it when loading a file too?

  • Any pitfalls i need to watch out for in particular?

  • Thanks for helping!


    This is just my two cents, but your problem isn't so much to import data into Core Data, it's to import it into the data store that Core Data is abstracting.

    With this in mind, you might have other alternatives depending on your particular use case (ie if your data is imported on first launch for example) such as:

  • Not using Core Data to do the import, but straight up sqlite. Then (re-)initialize the Core Data stack onces its done
  • If you control the service and it's not meant to be a public API, may be add an end point that lets you stream in the seed .sqlite file directly. Although that's probably not the greatest idea if you need to do create-or-update. Again, it depends on your use case.
  • Just a thought...

    链接地址: http://www.djcxy.com/p/47386.html

    上一篇: 我仍然可以为可可豆框架使用桥接标头吗?

    下一篇: 数据:导入大型数据集