Nokogiri generating invalid HTML?

inam

New Member
I need to process an HTML document and insert some nodes in a few places. The content I'm processing is not valid, but Nokogiri is smart enough to figure out what it should be. The problem is that I don't want to change the original document's formatting, other than the pieces I'm inserting.Here is an example: \[code\]require 'nokogiri'orig_html = ' <html> <meta name="Generator" content="Microsoft Word 97 O.o"> <body> 1 <b><p>2</p></b> 3 </body></html>'puts Nokogiri::HTML(orig_html).inner_html# >> <html># >> <head># >> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"># >> <meta name="Generator" content="Microsoft Word 97 O.o"># >> </head># >> <body># >> 1# >> <b></b><p>2</p># >> 3# >> </body># >> </html>\[/code\]I'd like the output to be the same as the input. The problem is that I can't have \[code\]<p>\[/code\] inside of \[code\]<b>\[/code\]. My inclination is to switch to XML, but then there are invalid tags such as the \[code\]<meta>\[/code\] tag, which is not closed off. HTML is smart enough to recognize this, but XML is not.
 
Back
Top