Tricky XSLT recursive tree transformation

xslt

(Note: If you do well in a functional programming language other than XSLT, this problem might still be for you – it's not necessarily the XSLT that makes this tricky for me.)

How would I transform this:

<!-- note that empty cells are implicit and therefore nonexistent -->
<tables>
  <table pos="1">
    <row pos="1">
      <cell pos="1">Category A</cell>
      <cell pos="10">Category B</cell>
    </row>
    <row pos="2">
      <cell pos="1">Sub-Category 1</cell>
      <cell pos="4">Sub-Category 2</cell>
      <cell pos="6">Sub-Category 3</cell>
      <cell pos="10">Sub-Category 1</cell>
    </row>
  </table>
  <table pos="2">
    <row pos="1">
      <cell pos="1">Category A</cell>
      <cell pos="11">Category B</cell>
    </row>
    <row pos="2">
      <cell pos="1">Sub-Category 1</cell>
      <cell pos="2">Sub-Category 2</cell>
      <cell pos="4">Sub-Category 3</cell>
      <cell pos="10">Sub-Category 4</cell>
      <cell pos="11">Sub-Category 1</cell>
      <cell pos="12">Sub-Category 2</cell>
    </row>
  </table>
</tables>

To this:

<tree>
  <node label="Category A">
    <positions>
      <pos table="1" row="1" cell="1" />
      <pos table="2" row="1" cell="1" />
    </positions>
    <node label="Sub-Category 1">
      <positions>
        <pos table="1" row="2" cell="1" />
        <pos table="2" row="2" cell="1" />
      </positions>
    </node>
    <node label="Sub-Category 2">
      <positions>
        <pos table="1" row="2" cell="4" />
        <pos table="2" row="2" cell="2" />
      </positions>
    </node>
    <node label="Sub-Category 3">
      <positions>
        <pos table="1" row="2" cell="6" />
        <pos table="2" row="2" cell="4" />
      </positions>
    </node>
    <node label="Sub-Category 4">
      <positions>
        <pos table="2" row="2" cell="10" />
      </positions>
    </node>
  </node>
  <node label="Category B">
    <positions>
      <pos table="1" row="1" cell="10" />
      <pos table="2" row="1" cell="11" />
    </positions>
    <node label="Sub-Category 1">
      <positions>
        <pos table="1" row="2" cell="10" />
        <pos table="2" row="2" cell="11" />
      </positions>
    </node>
    <node label="Sub-Category 2">
      <positions>
        <pos table="2" row="2" cell="12" />
      </positions>
    </node>
  </node>
</tree>

Note that each table represents an arbitrarily deep 2D category tree. Each row represents the children of the previous row, where child categories are attached to parents by their respective positions – they are a child if their position is >= the parent to the left and < the next parent to the right (if there is one).

I'd like the output to be a tree, grouped by label (only within the described parent-child relationship, but through tables). For example, "Sub-Category 1" exists separately for both "Category A" and "Category B".

There are n tables with n rows with n cells each. You could imagine the input data structure as a 3D cube, each table representing roughly the same data for a different year.

I can do the above if it was just one table ("year"), but with n tables I have a big problem finding a way of applying the thing, due to the varying positions of the otherwise equal parents.

In the end I'm interested in an extension function free XSLT 1.0 solution, though any (algorithmic) help is greatly appreciated. This problem is haunting me for quite a while now and I'm seemingly unable to wrap my head around it. I feel there must be a clean solution for this, I just can't see it. I'm confident this can be done with a single recursive template, a few keys and some really clever XPath.

I suppose this question is material for the tumbleweed badge, though. 🙂

Best Solution

So the cells in each row are actually the children of the cells in the previous row where the "pos" value of the child cell is >= the "pos" value of the parent cell and < the "pos" value of the subsequent parent (if any) on the previous row?

If you're hell-bent on using XSL for this then take a look at the following-sibling axis. However, it's going to be one messy selection path. Probably easier to do two transforms: one that organizes the flat data into a hierarchical format, and one that produces the final output.

Personally, I think I'd write a program that explicitly processes the DOM tree, particularly if this is a one-off operation. I suspect that it would be just as fast to write, and not have the chance for strange corner cases in the selection logic.

Related Question