iPad - Parsing an Extremely Huge JSON - File (Between 50 and 100 Mb)

iPad - Parsing an extremely huge json - File (between 50 and 100 mb)

1) Write your data to a file, then use NSData's dataWithContentsOfFile:options:error: and specify the NSDataReadingMappedAlways and NSDataReadingUncached flags. This will tell the system to use mmap() to reduce the memory footprint, and not to burden the file system cache with blocks of memory (that makes it slower, but much less of a burden to iOS).

2) You can use the YAJL SAX style JSON parser to get objects as they decode.

Note: I have not done 2) but have used the techniques embodied in 1).

3) I ended up needed such a thing myself, and wrote SAX-JSON-Parser-ForStreamingData that can be tied to any asynchronous downloader (including my own).

Huge memory consumption while parsing JSON and creating NSManagedObjects

See Apple's Core Data documentation on Efficiently Importing Data, particularly "Reducing Peak Memory Footprint".

You will need to make sure you don't have too many new entities in memory at once, which involves saving and resetting your context at regular intervals while you parse the data, as well as using autorelease pools well.

The general sudo code would be something like this:

while (there is new data) {
@autoreleasepool {
importAnItem();
if (we have imported more than 100 items) {
[context save:...];
[context reset];
}
}
}

So basically, put an autorelease pool around your main loop or parsing code. Count how many NSManagedObject instances you have created, and periodically save and reset the managed object context to flush these out of memory. This should keep your memory footprint down. The number 100 is arbitrary and you might want to experiment with different values.

Because you are saving the context for each batch, you may want to import into a temporary copy of your store in case something goes wrong and leaves you with a partial import. When everything is finished you can overwrite the original store.

iOS Huge JSON (30MB) handling

For anybody who has the same problem, here's how I solved it:
1. Download JSON file to local storage using AFHTTPRequestOperation's output stream.
2. Parse little chunks of NSData using YAJLParser.

Result: I was testing it on 50mb json on iPad (1), without any memory warnings (memory around 10mb).

Example:

            NSError *error = nil;
NSData *data = [NSData dataWithContentsOfFile:path
options:NSDataReadingMappedAlways | NSDataReadingUncached
error:&error];
YAJLParser *parser = [[YAJLParser alloc] initWithParserOptions:YAJLParserOptionsAllowComments];
parser.delegate = self;
[parser parse:data];
parser.delegate = nil;
parser = nil;

YAJLParser delegate:

// first declare in header file NSMutableArray *stack and NSString *mapKey
- (void)parserDidStartDictionary:(YAJLParser *)parser
{
NSString *dictName = mapKey;
if (mapKey == nil)
{
dictName = (stack.count == 0) ? @"" : [stack lastObject];
}
[stack addObject:(dictName)];
}

- (void)parserDidEndDictionary:(YAJLParser *)parser
{
mapKey = nil;
[stack removeLastObject];
}

- (void)parserDidStartArray:(YAJLParser *)parser
{
NSString *arrayName = mapKey;
if (mapKey == nil)
{
arrayName = stack.count == 0 ? @"" : [stack lastObject];
}
[stack addObject:(arrayName)];

if([mapKey isEqualToString:@"something"])
{
// do something
}
}

- (void)parserDidEndArray:(YAJLParser *)parser
{
if([mapKey isEqualToString:@"some1"])
{
// do something
}

mapKey = nil;
[stack removeLastObject];
}

- (void)parser:(YAJLParser *)parser didMapKey:(NSString *)key
{
mapKey = key;
}

- (void)parser:(YAJLParser *)parser didAdd:(id)value
{
if([mapKey isEqualToString:@"id"])
{
// do something
}
}

iOS NSJSONSerialization returning null

If it is really a size problem you might break it into pieces on some natural boundaries. I have done that with large xml files in the past with good results.

But as @Alladinian mentions in the comments do verify that it is a valid JSON file.

parsing OBJ files

Check this source: http://code.google.com/p/iphonewavefrontloader/
It works pretty fast, so you can learn how it is implemented.

Iphone XML file parsing preference, but what is Big what is Small?

We have an app that downloads a configuration file from a server that is frequently in excess of 1 MB. We use GDataXml to parse it, and it's relatively fast. 1MB of XML is kind of large for an XML file, but then again I'm sure large companies like WalMart, Tyson, etc. have apps that use massive XML files (possibly 50 MB). That really is a massive amount of text data though, and JSON may be a better alternative in terms of character use. Additionally, you can read the data straight from the file and shove it in an NSDictionary that you can then query. If you have control of the file output, consider JSON.

generate 100 milions record with monetdb

I would suggest importing by either switching off autocommit (START TRANSACTION; ... COMMIT; or using the COPY INTO method.

decoding a HUGE NSString, running out of memory

Edit:

When dealing with a file of this size, you probably do not want to load the entire multi-megabyte file in memory at one time, neither the huge input file nor the almost-as-huge output file. You should be parsing this in a streaming fashion, decoding the data in your foundCharacters as you go along, not holding any significant portions in memory.

The traditional techniques, though, may hold your entire XML file memory in three phases of the process:

  1. As you download the XML file from the server;

  2. As the XML parser parses that file; and

  3. As you do the Base64-decode of the file.

The trick is to employ a streaming technique, that does these three processes at once, for small chunks of the single, large XML file. Bottom line, as you're downloading the entire 50mb file, grab a few kb, parse the XML, and if you're parsing the Base64-encoded field, perform the Base64-decode for that few kb, and the proceed to the next chunk of data.

For an example of this (at least the streaming XML downloading-and-parsing, not including the Base64-decoding), please see Apple's XMLPerformance sample project. You'll see that it will demonstrate two XML parsers, the NSXMLParser that we're all familiar with, as well as the less familiar LibXML parser. The issue with NSXMLParser is that, left to it's own devices, will load the entire XML file in memory before it starts parsing, even if you use initWithContentsOfURL.

In my previous answer, I mistakenly claimed that by using initWithContentsOfURL, the NSXMLParser would parse the URL's contents in nice little packets as they were being downloaded. The foundCharacters method of NSXMLParserDelegate protocol seems so analogous to the NSURLConnectionDelegate method, didReceiveData, that I was sure that NSXMLParser was going to handle the stream just like NSURLConnection does, namely returning information as the download was in progress. Sadly, it doesn't.

By using LibXML, though, like the Apple XMLPerformance sample project, you can actually use the NSURLConnection ability of streaming, and thus parse the XML on the fly.

I have created a little test project, but I might suggest that you go through Apple's XMLPerformance sample project in some detail. But in my experiment, a 56mb XML file consumed well over 100mb when parsing and converting via NSXMLParser but only consumed 2mb when using LibXML2.


In your comments, you describe the desire to download the Base64-encoded data to a file and then decode that. That approach seems a lot less efficient, but certainly could work. By the way, on that initial download, you have the same memory problem (that I solve above). I urge you to make sure that your initial download of the Base64-encoded data does not blithely load it into RAM like most routines do. You want to, assuming you're using NSURLConnection, write the data to the NSOutputStream as you receive the data in didReceiveData, not hold it in RAM.

See the didReceiveResponse in AdvancedGetController.m of Apple's AdvancedURLConnections example for an example of how to write a file as it's being received, rather than typical patterns of adding it to a NSMutableData (because most of these routines just assume you're dealing with a reasonably sized file). (Ignore all the stuff in that AdvancedURLConnections sample about authentication and the like, but focus on understanding how it's writing to the NSOutputStream as it goes.) This technique will address the first of the three problems listed at the top of this answer, but not the latter two. For that, you'll have to contemplate using LibXML2 as illustrated in Apple's XMLPerformance sample project, or other similar techniques.



Related Topics



Leave a reply



Submit