Validating a large XML file ~400MB in PHP

erkarl

New Member
I have a large XML file (around 400MB) that I need to ensure is well-formed before I start processing it.First thing I tried was something similar to below, which is great as I can find out if XML is not well formed and which parts of XML are 'bad'\[code\]$doc = simplexml_load_string($xmlstr);if (!$doc) { $errors = libxml_get_errors(); foreach ($errors as $error) { echo display_xml_error($error); } libxml_clear_errors();}\[/code\]Also tried...\[code\]$doc->load( $tempFileName, LIBXML_DTDLOAD|LIBXML_DTDVALID )\[/code\]I tested this with a file of about 60MB, but anything a lot larger (~400MB) causes something which is new to me "oom killer" to kick in and terminate the script after what always seems like 30 secs.I thought I may need to increase the memory on the script so figured out the peak usage when processing 60MB and adjusted it accordingly for a large and also turn the script time limit off just in case it was that.\[code\]set_time_limit(0);ini_set('memory_limit', '512M');\[/code\]Unfortunately this didn't work, as oom killer appears to be a linux thing that kicks in if memory load (even the right term?) is consistently high.It would be great if I could load xml in chunks somehow as I imagine this will reduce the memory load so that oom killer doesn't stick it's fat nose in and kill my process.Does anyone have any experience validating a large XML file and capturing errors of where it's badly formed, a lot of posts I've read point to SAX and XMLReader that might solve my problem.
 
Back
Top