exit and reconstruct element/s at <pgbreak>

freee

New Member
I am trying to do this using an XSLT, but I might fall back to python as I am more familiar with python than XSL. However, I'd like to accomplish this in XSL. I also asked a similiar question a few months ago, received a great answer. However, using it with more data I am running into trouble. Therefore, I thought I'd ask a new question. I have an XML file and I am trying to run an XSL transformation to get an HTML5 file.Most of the transformation is straight forward. a <table.cals> becomes a <table>, a ,<quote> becomes a <blockquote>, a <p> a <p>. However, within these elements a <pgBreak> can occur. When this happens I need close the element/s. Traverse up the Hierarchy, and reconstruct the hierarchy on the other side of the <pgBreak>. This should be pretty clear. Input XML<?xml version="1.0" encoding="UTF-8"?> <root> <pgBreak pgId="i"/> <p id="doc-001"> <highlight rend="italic">Bacon ipsum dolor sit amet</highlight> bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip tongue. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t- bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger bacon filet mignon pork chop tail. <note.ref id="0001"><super>1</super></note.ref> <note id="0001"> <p> You may need to consult a <highlight rend="italic">latin</highlight> butcher. Good Luck. </p> </note> Pork loin <pgBreak pgId="ii"/> ribeye bacon pastrami drumstick sirloin, shoulder pig jowl. Salami brisket rump ham, tail hamburger strip steak pig ham hock short ribs jerky shank beef spare ribs. Capicola short ribs swine beef meatball jowl pork belly. Doner leberkas short ribs, flank chuck pancetta bresaola bacon ham hock pork hamburger fatback. </p> <p id="doc-002"> Bacon ipsum dolor sit amet bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip tongue. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t- bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger bacon filet mignon pork chop tail. </p> <pgBreak pgId="01"/> <p id="doc-003"> Bacon ipsum dolor sit amet bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip tongue. <quote> <p> 1. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t- bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger bacon filet mignon pork chop tail. </p> <p> 2. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t- bone. Sirloin <pgBreak pgId="02"/>turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger bacon filet mignon pork chop tail. </p> <p> 3. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t- bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger bacon filet mignon pork chop tail. </p> </quote> </p> <p id="doc-004"> <figure> <title>Table 1.1</title> <table.cals> <tgroup cols="3"> <colspec colnum="1" colname="col1" align="right"/> <colspec colnum="2" colname="col2" align="center"/> <colspec colnum="3" colname="col2" align="left"/> </tgroup> <thead> <row> <entry> <p>Animal</p> </entry> <entry> <p>Sandwhich</p> </entry> <entry> <p>Cost</p> </entry> </row> </thead> <tbody> <row> <entry> <p>Cow</p> </entry> <entry> <p>Brisket</p> </entry> <entry> <p>$4.99</p> </entry> </row> <pgBreak pgId="3"/> <row> <entry> <p>Pig</p> </entry> <entry> <p>Pulled Pork</p> </entry> <entry> <p>$4.99</p> </entry> </row> </tbody> </table.cals> </figure> </p></root>Output HTML <?xml version="1.0" encoding="utf-8"?> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title>Test</title> </head> <body> <div id="pg-i">Page: i</div> <p data-id="doc-001"> <span class="highlight-italic">Bacon ipsum dolor sit amet</span> bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip tongue. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t- bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger bacon filet mignon pork chop tail. <span class="noteRef" id="0001"> <sup>1</sup> </span> </p> <div id="note-0001"> <p> You may need to consult a <span class="highlight-italic">latin</span> butcher. Good Luck. </p> </div> <p data-id="doc-002"> Pork loin </p> <div id="pg-ii">Page: ii</div> <p data-id="doc-002"> ribeye bacon pastrami drumstick sirloin, shoulder pig jowl. Salami brisket rump ham, tail hamburger strip steak pig ham hock short ribs jerky shank beef spare ribs. Capicola short ribs swine beef meatball jowl pork belly. Doner leberkas short ribs, flank chuck pancetta bresaola bacon ham hock pork hamburger fatback. </p> <div id="pg-01">Page: 01</div> <p data-id="doc-003"> Bacon ipsum dolor sit amet bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip tongue. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t- bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger bacon filet mignon pork chop tail. </p> <p data-id="doc-004"> Bacon ipsum dolor sit amet bacon chuck pastrami swine pork rump, shoulder beef ribs doner tri-tip tongue. <blockquote> <p> 1. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t- bone. Sirloin </p> </blockquote> </p> <div id="pg-02">Page: 02</div><p data-id="doc-004"> <blockquote> <p>turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger bacon filet mignon pork chop tail. </p> <p> 2. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t- bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger bacon filet mignon pork chop tail. </p> <p> 3. Tri-tip ground round short ribs capicola meatloaf shank drumstick short loin pastrami t- bone. Sirloin turducken short ribs t-bone andouille strip steak pork loin corned beef hamburger bacon filet mignon pork chop tail. </p> </blockquote> </p> <p data-id="doc-005"> <div class="figure"> <h3> Table 1.1</h3> <table> <thead> <tr> <th>Animal</th> <th>Sandwhich</th> <th>Price</th> </tr> </thead> <tbody> <tr> <td>Cow</td> <td>Brikset</td> <td>$4.99</td> </tr> </tbody> </table> </div> </p> <div id="pg-03">Page: 03</div> <p data-id="doc-005"> <div class="figure"> <table> <tbody> <tr> <td>Pig</td> <td>Pulled Pork</td> <td>$4.99</td> </tr> </tbody> </table> </div> </p> </body> </html>I was using the following XSL file. Beside it not handling the <table.cal> it would break at each paragraph. On page 2 instead of having 1 <blockquote> and 3 <p>'s I wound up with 3 <blockquote>'s and 3 <p>'s XSL<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Test</title> </head> <body> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="p | div"> <xsl:variable name="breaks" select="note | pgBreak | quote" /> <xsl:variable name="firstNonBreak" select="node()[count(. | $breaks) != count($breaks)][2]" /> <xsl:variable name="nonBreaksAfterBreak" select="$breaks/following-sibling::node()[1][count(. | $breaks) != count($breaks)]" /> <xsl:apply-templates select="$breaks | $firstNonBreak | $nonBreaksAfterBreak" mode="sectChild" /> </xsl:template> <!-- Splitting types - notes, page breaks, quotes --> <xsl:template match="pgBreak" mode="sectChild"> <div id="pg-{@pgId}"> <xsl:value-of select="concat('Page ', @pgId)"/> </div> </xsl:template> <xsl:template match="quote | note" mode="sectChild"> <xsl:apply-templates /> </xsl:template> <!-- Receives the first node of each block of content outside of the splitting types and passes processing onto itself and siblings within its block--> <xsl:template match="text() | highlight | note.ref | super" mode="sectChild"> <xsl:variable name="content"> <xsl:apply-templates select="." mode="buildContent" /> </xsl:variable> <xsl:if test="normalize-space($content)"> <xsl:call-template name="Nest"> <xsl:with-param name="hierarchy" select="ancestor::*[not(self::root)]" /> <xsl:with-param name="content" select="$content" /> </xsl:call-template> </xsl:if> </xsl:template> <!-- Recursive template to output nodes from the top level down to content --> <xsl:template name="Nest"> <xsl:param name="topLevel" select="true()"/> <xsl:param name="hierarchy" /> <xsl:param name="content" /> <xsl:variable name="top" select="$hierarchy[1]" /> <xsl:variable name="remainder" select="$hierarchy[position() > 1]" /> <!-- If there's a quote or note yet to come, don't output tags until we get there --> <xsl:variable name="skipTags" select="boolean($remainder[self::quote or self::note])" /> <!-- Recursive output is captured in a variable, to be output later in this template --> <xsl:variable name="inside"> <xsl:if test="$hierarchy"> <xsl:call-template name="Nest"> <xsl:with-param name="topLevel" select="$topLevel and $skipTags" /> <xsl:with-param name="hierarchy" select="$remainder" /> <xsl:with-param name="content" select="$content" /> </xsl:call-template> </xsl:if> </xsl:variable> <xsl:choose> <xsl:when test="not($hierarchy)"> <xsl:copy-of select="$content" /> </xsl:when> <xsl:when test="$top/self::quote"> <blockquote> <xsl:copy-of select="$inside"/> </blockquote> </xsl:when> <xsl:when test="$top/self::note"> <div id="note-{$top/@id}"> <xsl:copy-of select="$inside"/> </div> </xsl:when> <xsl:when test="not($skipTags)"> <xsl:element name="{name($top)}"> <xsl:if test="$topLevel"> </xsl:if> <xsl:copy-of select="$inside"/> </xsl:element> </xsl:when> <xsl:otherwise> <xsl:copy-of select="$inside"/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="node()" mode="buildContent"> <xsl:if test="not(self::note or self::quote or self::pgBreak)"> <!-- output this node --> <xsl:apply-templates select="self::node()[normalize-space(.)]" mode="contentOutput" /> <!-- pass processing onto next sibling --> <xsl:apply-templates select="following-sibling::node()[1]" mode="buildContent" /> </xsl:if> </xsl:template> <!-- Bottom level content - text, note refs, superscript, highlight--> <xsl:template match="text()" mode="contentOutput"> <xsl:copy-of select="."/> </xsl:template> <xsl:template match="note.ref" mode="contentOutput"> <span class="noteRef" id="{@id}"> <xsl:apply-templates mode="contentOutput"/> </span> </xsl:template> <xsl:template match="super" mode="contentOutput"> <sup> <xsl:apply-templates mode="contentOutput"/> </sup> </xsl:template> <xsl:template match="highlight" mode="contentOutput"> <xsl:variable name="class" select="concat(name(.),'-',string(@rend))"/> <span class="{$class}"> <xsl:apply-templates mode="contentOutput"/> </span> </xsl:template></xsl:stylesheet>Finally, I should note that I have an XSL for the <table.cal> but its rather long and since it is a common transformation I am linking to it here. It should be noted that this too isn't handled correctly with the current XSL. Thanks.
 
Back
Top