R XML package weird bug while parsing xml and html files

Sezar

New Member
I am using R's XML package to extract all possible data over a wide variety of html and xml files. These files are basically documentation or build properties or readme file.\[code\]<?xml version='1.0' encoding='utf-8'?><!DOCTYPE chapter PUBLIC '-//OASIS//DTD DocBook XML V4.1.2//EN' 'http://www.oasis-open.org/docbook/xml/4.0 docbookx.dtd'><chapter lang="en"><chapterinfo><author><firstname>Jirka</firstname><surname>Kosek</surname></author><copyright><year>2001</year><holder>Ji&rcaron;&iacute; Kosek</holder></copyright><releaseinfo>$Id: htmlhelp.xml,v 1.1 2002/05/15 17:22:31 isberg Exp $</releaseinfo></chapterinfo><title>Using XSL stylesheets to generate HTML Help</title><?dbhtml filename="htmlhelp.html"?><para>HTML Help (HH) is help-format used in newer versions of MSWindows and applications written for this platform. This format allowsto pack several HTML files together with images, table of contents andindex into single file. Windows contains browser for this file-formatand full-text search is also supported on HH files. If you want knowmore about HH and its capabilities look at <ulinkurl="http://msdn.microsoft.com/library/tools/htmlhelp/chm/HH1Start.htm">HTMLHelp pages</ulink>.</para><section><title>How to generate first HTML Help file from DocBook sources</title><para>Working with HH stylesheets is same as with other XSL DocBookstylesheets. Simply run your favorite XSLT processor on your documentwith stylesheet suited for HH:</para></section></chapter>\[/code\]My goal is to just use xmlValue after parsing the tree using htmlTreeParse or xmlTreeParse using something like this (for xml files ..)\[code\]Text = xmlValue(xmlRoot(xmlTreeParse(XMLFileName)))\[/code\]However, there is one error when I do this for both xml and html files. If there are child nodes at level 2 or more, the text fields get pasted without any space in between them.For example, in the above examplexmlValue(chapterInfo) is \[code\]JirkaKosek2001JiKosek$Id: htmlhelp.xml,v 1.1 2002/05/15 17:22:31 isberg Exp \[/code\]The xmlValues of each child node (recursive) is pasted together without adding space between them. How can I get xmlValue to add a whitespace while extracting this dataThanks a lot for your help in advance,Shivani
 
Back
Top