For a project, I need to convert a Wikipedia XML dump into a plain text corpus file which keeps one document per line. I have found several tools for splitting the XML dump into several different files, but I this is not the needed format and I fear that managing millions of small files will add unnecessary work to my already slow HDD.Any suggestions of good programs for this?