R xml encountering and dealing with html entities in an xml file

Captchas1CEAF

New Member
Hello R's XML package users,I am encountering a weird bug while parsing XML. It has to do with encountering HTML entities like mdash and ndash while parsing XML files.This is the code I use:\[code\]InText = readLines(xmlFileName,n=-1)Text = xmlValue(xmlRoot(xmlTreeParse(InText,trim=FALSE)))\[/code\]I am currently eliminating these entities like mdash and ndash using the following\[code\]InText = gsub("\\&mdash"," ",InText);InText = gsub("\\&ndash"," ",InText);\[/code\]But this can really tedious, as I see the list of possible HTML.4.0 entity list. Any ideas on how I can eliminate these while parsing the XML filesThanks a lot for your help and ideasShivani
 
Back
Top