A proposal for encoding stemmata codicum in XML


David J. Birnbaum, University of Pittsburgh

Department of Slavic Languages and Literatures
1417 Cathedral of Learning
University of Pittsburgh
Pittsburgh, PA 15260
Email: djbpitt@pitt.edu
URL: http://clover.slavic.pitt.edu/~djb/

Generated: 2008-08-16T14:36:15.203-04:00


Abstract: The present report offers a proposal for encoding a stemma codicum, or tree representation of manuscript transmission, in XML.


Contents

1. What is a stemma codicum?
1.1. General
1.2. Nodes
1.3. Descent
1.4. Contamination
2. The problem
3. A proposal
3.1. General
3.2. Nodes
3.3. Descent
3.4. Contamination
3.5. Faking ID/IDREF validation
3.6. TEI Conformance
4. Rendering
5. Evaluation of variation
6. Conclusions


1. What is a stemma codicum?

1.1. General

A stemma codicum (sometimes called just stemma) is a tree-like graphic structure that has become traditional in manuscript studies for representing textual transmission. Consider the following hypothetical stemma:

stemma_beige.png

1.2. Nodes

The nodes in this stemma represent extant manuscripts (upper-case Latin letters or words beginning with upper-case Latin letters, e.g., “L”; colored aqua in this example), lost manuscripts (lower-case Latin letters, e.g., “t”; colored magenta in this example), or hypothetical stages in the textual transmission (lower-case Greek letters, e.g., “α”; colored gold in this example). Extant manuscripts are existing physical manuscripts. Lost manuscript are those that once existed, but that no longer do, so that their readings are typically taken from editions or notes that were made from them before their loss. Hypothetical stages do not necessarily correspond to real manuscripts. In the example above, scholars might conclude on the basis of similarities in the readings of the extant and lost manuscripts that “L” and “t” share textual material that is not shared with other manuscripts (represented in this case by “δ”) even when no physical manuscript attesting this stage in the textual transmission has ever been identified.

1.3. Descent

Manuscripts are copied from other manuscripts. In the preceding stemma, we hypothesize that all manuscripts go back to a common ancestor (“α”), that the tradition split after that stage into two (“β” and “γ”), etc. Descent by copying is indicated with a solid line. According to this model, “α” is the earliest common hypothetical stage that can be reconstructed, and all nodes below “α” have a single parent, that is, were copied from a single other stage in the tradition (see the discussion of the dotted line between “γ”.and “A” immediately below).

1.4. Contamination

This familiar tree model is complicated because manuscripts sometimes show the influence of more than one ancestor. They may have been copied from one manuscript by a scribe who checked the text in a different manuscript of the same work as he was copying, or perhaps made changes from his memory of a slightly different version of the text that he had read elsewhere. Alternatively, perhaps scribe A copied a manuscript from one source, scribe B made changes in it in the margins or between the lines (either by consulting another source directly or from memory), and another scribe then copied that manuscript, incorporating the changes into the body. Whatever the specific scenario, the result is that it is not uncommon for a manuscript to be based primarily on one source, but to incorporate features of another branch of the tradition. This mixed result is called contamination, and it is reflected in a stemma by a dotted line. Thus, the example above asserts that “A” is copied within the “ε” tradition, but is also contaminated from the “γ” tradition.

The utility of a stemma as a visualization tool is inversely proportional to the degree of contamination in the manuscript tradition. A tradition completely without contamination (called a closed tradition) yields a classic tree, which is easily modeled graphically by a stemma. An open tradition, with substantial contamination, yields a spaghetti-like stemma characterized by crossing dotted lines, which is both difficult to read and not very informative.

2. The problem

Because stemmata are usually imagined graphically, it is tempting to encode a stemma in XML [eXtensible Markup Language] using a graphic description vocabulary, such as SVG [Scalable Vector Graphics] . The problem with this approach is that it encodes the appearance of the stemma, but it fails to describe its conceptual structure, making it difficult to do anything with the information other than render it graphically in a single, particular way. Users who wish to query the stemma to find, for example, all instances of contamination, cannot easily do so if all that has been encoded is letters and lines and circles and points in coordinate space. The principal goal of this report is to provide a schema for encoded a stemma according to its descriptive properties, without regard to presentation. On the other hand, because users are likely to want to render stemmata, the schema should be amenable to conversion to SVG or some other graphic description framework.

3. A proposal

3.1. General

A stemma model must be capable of encoding nodes (with typing information), descent, and contamination, as described above. A model for the stemma illustrated above might be the following:

<node n="α" type="hypothetical">
    <node n="β" type="hypothetical">
        <node n="δ" type="hypothetical">
            <node n="L" type="extant"/>
            <node n="t" type="lost"/>
        </node>
        <node n="ε" type="hypothetical">
            <node n="R" type="extant"/>
            <node n="A" type="extant"/>
        </node>
    </node>
    <node n="γ" type="hypothetical">
        <contaminates target="A"/>
        <node n="I" type="extant"/>
        <node n="X" type="extant"/>
    </node>
</node>

A Relax NG compact syntax schema for this model is:

start = node
node = element node { n, type, (node | contaminates)* }
n = attribute n { text }
type = attribute type { "hypothetical" | "extant" | "lost" }
contaminates = element contaminates { target, empty }
target = attribute target { text }

3.2. Nodes

In this model, nodes are encoded as <node> elements. <node> elements are typed with a @type attribute, which has the possbile values extant, lost, and hypothetical. <node> elements also have a unique identifier attribute @n, which can be used to record that manuscript’s siglum. If one wished, one could use a Relax NG pattern facet or a Schematron assertion to verify that the names are consistent with one’s editorial policy, e.g., one could verify that the names of hypothetical nodes are always lower-case Greek letters. The following Relax NG compact syntax schema requires that the value of the @n attribute of a node representing a hypothetical stage in the transmission be a single lower-case Greek letter, that the value of the same attribute of a node representing an extant manuscript be a single upper-case Latin letter, and that the value of the same attribute of a node representing a lost mauscript be a single lower-case Latin letter (these naming conventions are not universal in manuscript studies, and are intended only as an example of how users can adjust the patterns to suit the needs of their local projects):

start = node
node = element node { atts_node, (node | contaminates)* }
atts_node =
    (attribute type { "hypothetical" },
     attribute n {
         xsd:token { pattern="\p{IsGreek}" pattern="\p{Ll}"} # lower-case Greek
     })
    | (attribute type { "extant" },
       attribute n {
           xsd:token { pattern="\p{IsBasicLatin}" pattern="\p{Lu}"} # upper-case Latin
       })
    | (attribute type { "lost" },
       attribute n {
           xsd:token { pattern="\p{IsBasicLatin}" pattern="\p{Ll}"} # lower-case Latin
       })
contaminates = element contaminates { target, empty }
target = attribute target { text }

Although the value of the @n attribute may be used directly (for example, to represent the manuscript identifier when rendering using SVG), its main function is to serve simultaneously as a pointer and as a target for pointing (see below). Because the single attribute needs to perform both of these functions, it is not practical to make its type either ID or IDREF, which means that any validation similar to that provided by ID/IDREF checking must be handled separately. This issue is discussed below.

3.3. Descent

Because both stemmata and XML documents are basically trees, the most natural way of modeling descent in a stemma is to use the XML hierarchy. Accordingly, <node> elements may contain other <node> elements, with the understanding that when <node> element A directly contains <node> element B, <node> element A is the parent of <node> element B in the stemma. Because the leaf nodes of the tree are also <node> elements, <node> elements may also be empty.

3.4. Contamination

Because contamination interferes with the basic tree structure of the stemma, it must be modeled otherwise than through XML containment. Accordingly, where <node> element A contaminates <node> element B, this model uses an empty <contaminates> element as a child of A, with a @target attribute whose value is the value of the @n attribute of <node> element B. The @target attribute takes a single value; if A contaminates more than one node, that fact is recorded by using multiple <contaminates> elements, each with a different value for the @target attribute. Because the value of the @target attribute in the <contaminates> element is also the value of the @n attribute of the <node> element that is the target of the contamination, in this sense the @n attribute functions similarly to an XML ID and the @target attribute similarly to an XML IDREF.

In this model, all true children and all targets of contamination are encoded similarly in that both are specified through XML containment. A node (manuscript or hypothetical stage in the transmission) may contaminate more than one other node, and a node may also be contaminated by more than one other node.

3.5. Faking ID/IDREF validation

A stemma may be used in systems where full manuscript descriptions (such as TEI [Text Encoding Initiative] <msDesc> elements) are also encoded (either in the same document or elsewhere), in which case the @n attribute on a <node> element can function as a pointer to an <msDesc> element with the same value for its @n attribute. In this sense the @n attribute on the <node> element functions similarly to an IDREF, while the corresponding @n attribute on the <msDesc> element functions similarly to an ID. Because the @n attribute on the <node> element may also function as an ID (see the discussion of Contamination immediately above), neither the ID or the IDREF datatype is appropriate. Note also that because IDs must be unique in a document and a <node> element in a stemma may have the same @n attribute value as an <msDesc> element elsewhere in the document, the use of ID attributes could compromise validity.

The decision to use plain text for the value of the @n and @target attributes is thus based on three considerations:

  1. The values of @n attributes may not observe the XML rules concerning name characters and name start characters.
  2. @n attributes may function simultaneously as both pointers (like IDREFs) and targets of pointing (like IDs).
  3. @n attributes may not be unique in a document if they are used to identify both <node> elements in a stemma and <msDesc> elements elsewhere in that same document.

While it might appear that the decision not to use ID/IDREF attribute types to handle pointing in stemmata (whether of <contaminates> elements to the <node> elements that serve as the target of contamination or of the <node> elements to <msDesc> elements elsewhere) sacrifices a useful feature of XML, the fact that ID/IDREF validation is inherently severely limited means that not very much is lost. For example, ID/IDREF validation of the pointing of a <contaminates> element to a <node> element actually can confirm only that the document contains an element (any element) with a corresponding ID. It cannot ensure that the element in question is of type <node>. Meaningful validation (e.g., validation that <contaminates> elements point only to <node> elements and <node> elements point only to <msDesc> elements) is possible only with a rules-based system like Schematron, but not with a grammar-based schema language. For example, the following Relax NG XML syntax schema uses embedded Schematron rules to ensure that 1) the values of the @n attributes of all <node> elements in the document are unique and 2) the values of the @target attributes of all <contaminates> elements point to the @n attributes of <node> elements in the document:

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:s="http://www.ascc.net/xml/schematron"
  datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <start>
    <ref name="node"/>
  </start>
  <define name="node">
    <element name="node">
      <s:pattern name="Value of @n attribute must be unique in document">
        <s:rule context="node">
          <s:report test="count(//node[@n = current()/@n]) &gt; 1">The value of an @n attribute of 
            an element of type <s:name/> must be unique in the document.</s:report>
        </s:rule>
      </s:pattern>
      <ref name="atts_node"/>
      <zeroOrMore>
        <choice>
          <ref name="node"/>
          <ref name="contaminates"/>
        </choice>
      </zeroOrMore>
    </element>
  </define>
  <define name="atts_node">
    <choice>
      <!-- hypothetical stages are in lower-case Greek -->
      <group>
        <attribute name="type">
          <value>hypothetical</value>
        </attribute>
        <attribute name="n">
          <data type="token">
            <param name="pattern">\p{IsGreek}</param>
            <param name="pattern">\p{Ll}</param>
          </data>
        </attribute>
      </group>
      <!-- extant manuscripts are in upper-case Latin -->
      <group>
        <attribute name="type">
          <value>extant</value>
        </attribute>
        <attribute name="n">
          <data type="token">
            <param name="pattern">\p{IsBasicLatin}</param>
            <param name="pattern">\p{Lu}</param>
          </data>
        </attribute>
      </group>
      <!-- lost manuscripts are in lower-case Latin  -->
      <group>
        <attribute name="type">
          <value>lost</value>
        </attribute>
        <attribute name="n">
          <data type="token">
            <param name="pattern">\p{IsBasicLatin}</param>
            <param name="pattern">\p{Ll}</param>
          </data>
        </attribute>
      </group>
    </choice>
  </define>
  <define name="contaminates">
    <element name="contaminates">
      <s:pattern name="Value of @target attribute must point to node/@n in document">
        <s:rule context="contaminates">
          <s:assert test="@target=//node/@n">The value of a @target attribute of an element of
            type <s:name/> must point to the @n attribute of an element of type node in the
            document.</s:assert>
        </s:rule>
      </s:pattern>
      <ref name="target"/>
      <empty/>
    </element>
  </define>
  <define name="target">
    <attribute name="target"/>
  </define>
</grammar>

3.6. TEI Conformance

TEI P5 already includes several mechanisms for describing arbitrary directed and nondirected graphs (http://www.tei-c.org/release/doc/tei-p5-doc/html/GD.html), most of which are substantially different from the mechanism proposed here. For example, the existing TEI <node> element does not nest inside other <node> elements, and is intended to be used together with <arc> elements to describe a graph. The existing TEI <eTree> (for embedding tree) element, however, behaves similarly to the <node> element proposed in this report (it nests inside other elements of the same type as a way of representing hierarchy and it may serve as a leaf node, although there is also a separate <eLeaf> element). None of the existing TEI models support anything comparable to the <contaminates> element proposed here.

If the present proposal is to be incorporated into the TEI guidelines, one way of achieving this end would be the following:

  1. Use the existing TEI <eTree> element in the function of the <node> element proposed here. The existing TEI <eTree> element already has most of the necessary syntactic properties: it may nest in other elements of the same type, may be empty (may serve as a leaf node), does not require a containing <graph> element, and has a global @n attribute.
  2. Use the existing global @n attribute of <eTree> in the same function as the @n attribute proposed here.
  3. Add a @type attribute to the <eTree> element to support node typing.
  4. Add a <contaminates> element with a @target attribute, as proposed here.

A more generally TEI-consistent approach might use @xml:id and @corresp attributes (pointing to <msDesc> elements or a witness list elsewhere, which would contain the sigla) instead of the @n attribute used here. Whether this consistency justifies the cost of maintaining separate resources for pointing, labeling, and being pointed to depends on one’s priorities.

4. Rendering

Converting descriptive graph markup to a rendering format is computationally intensive, and may be achieved more effectively through specialized tools (e.g., Graphviz [http://www.graphviz.org] or the TeX-based PGF and TikZ [http://sourceforge.net/projects/pgf/]) than through direct conversion of descriptive XML to SVG with XSLT [eXtensible Stylesheet Language Transformations] . For example, the following XSLT stylesheet will transform the sample stemma above into the dot format used by Graphviz. Graphviz can then generate output in several graphics formats (including SVG and the popular raster formats in use on the World Wide Web).

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output encoding="UTF-8" method="text"/>
    <xsl:template match="/">
        <xsl:text>digraph stemma {&#x0a;</xsl:text>
        <xsl:text>edge [arrowhead=none]&#x0a;</xsl:text>
        <!-- nodes are colored according to type -->
        <xsl:for-each select="//node">
            <xsl:value-of select="@n"/>
            <xsl:text> [style=filled fillcolor=</xsl:text>
            <xsl:choose>
                <xsl:when test="@type='hypothetical'">
                    <xsl:text>goldenrod2</xsl:text>
                </xsl:when>
                <xsl:when test="@type='extant'">
                    <xsl:text>cyan1</xsl:text>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:text>orchid1</xsl:text>
                </xsl:otherwise>
            </xsl:choose>
            <xsl:text>]&#x0a;</xsl:text>
        </xsl:for-each>
        <!-- draw an arc to every node except the root from its parent -->
        <xsl:for-each select="//node/node">
            <xsl:value-of select="../@n"/>
            <xsl:text>-></xsl:text>
            <xsl:value-of select="@n"/>
            <xsl:text>&#x0a;</xsl:text>
        </xsl:for-each>
        <!-- draw an arc to each contaminated node from its parent -->
        <xsl:for-each select="//contaminates">
            <xsl:value-of select="../@n"/>
            <xsl:text>-></xsl:text>
            <xsl:value-of select="@target"/>
            <xsl:text> [style=dashed]&#x0a;</xsl:text>
        </xsl:for-each>
        <!-- put all leaves at the same rank -->
        <xsl:text>{rank=same; </xsl:text>
        <xsl:for-each select="//node[not(node)]">
            <xsl:value-of select="@n"/>
            <xsl:text> </xsl:text>
        </xsl:for-each>
        <xsl:text>}&#x0a;</xsl:text>
        <!-- if a node has a hypothetical child, put all its hypothetical children on the same rank -->
        <xsl:for-each select="//node[node[@type='hypothetical']]">
            <xsl:text>{rank=same; </xsl:text>
            <xsl:for-each select="node[@type='hypothetical']">
                <xsl:value-of select="@n"/>
                <xsl:text> </xsl:text>
            </xsl:for-each>
            <xsl:text>}&#x0a;</xsl:text>
        </xsl:for-each>
        <xsl:text>}</xsl:text>
    </xsl:template>
    <!-- don't do anything with any nodes directly; everything gets processed within
    the root, with no calls to other templates-->
    <xsl:template match="node"/>
    <xsl:template match="contaminates"/>
</xsl:stylesheet>

The output of this transformation is the following dot file:

digraph stemma {
edge [arrowhead=none]
α [style=filled fillcolor=goldenrod2]
β [style=filled fillcolor=goldenrod2]
δ [style=filled fillcolor=goldenrod2]
L [style=filled fillcolor=cyan1]
t [style=filled fillcolor=orchid1]
ε [style=filled fillcolor=goldenrod2]
R [style=filled fillcolor=cyan1]
A [style=filled fillcolor=cyan1]
γ [style=filled fillcolor=goldenrod2]
I [style=filled fillcolor=cyan1]
X [style=filled fillcolor=cyan1]
α->β
β->δ
δ->L
δ->t
β->ε
ε->R
ε->A
α->γ
γ->I
γ->X
γ->R [style=dashed]
{rank=same; L t R A I X }
{rank=same; β γ }
{rank=same; δ ε }
}

Graphviz and similar graphing programs are designed to rearrange the tree to optimize the graphic representation. From a stemmatic perspective, the children of a parent node are unordered, but in cases involving contamination, the rendering order of the children determines whether the stemma will require crossing lines (which compromise legibility). For example, in the sample stemma above, witnesses “R” and “A” could be rendered in either order, but putting “A” to the right of “R” means that the dotted contamination line between “γ” and “A” does not need to cross the solid line between “ε” and “R”. One of the advantages of using software designed specifically to render graphs, such as Graphviz, is that it performs this type of reordering automatically; regardless of whether “A” appears before or after “R” in the input dot file, Graphviz will render “A” to the right of “R”.

The following XSLT stylesheet generates SVG directly from the XML document instance, without using Graphviz or any other program as an intermediary. It may be possible to implement in XSLT the sort of reordering logic that is built into Graphviz, but the script below does not include any such routine. This means that if the user wishes to transform the XML document instance directly into SVG, bypassing Graphviz (or something similar), the user is responsible for ensuring that sibling nodes are ordered as they should be rendered. The graph will be drawn correctly in any case (that is, solid and dotted lines will reflect descent and contamination accurately), but if the user does not order the nodes attentively, the resulting SVG file may include crossing lines that could have been avoided (improving legibility).

From a more general perspective, then, sibling nodes are logically unordered, but an order must be imposed for rendering, and the order chosen can affect the number of crossing lines (which should be minimized to improve legibility). The rendering order must be determined either by the user or by the software. Graphviz happily assumes this responsibility; the more direct but less powerful script below requires that the user determine the desired output order during input. Similarly, should a contamination line cross a node, Graphviz will deform (bend) the line around the node. The script below will simply draw the line through (behind) the node.

In general, then, Graphviz incorporates sophisticated and useful layout logic, but requires an intermediary stage in transformation. The script below avoids that intermediate stage, but requires that the user assume responsibility for some of the layout details that Graphviz would otherwise have handled. This responsibility is not onerous, but users of the following script must be aware that sibling nodes will be rendered in the order in which they appear in the source document instance, even when this may not be optimal for legible rendering.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output encoding="UTF-8" indent="yes" doctype-public="-//W3C//DTD SVG 1.0//EN"
        doctype-system="http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"/>
    <xsl:template match="/">
        <!-- global variables
            leafCount = number of leaf nodes
            height = number of tiers in tree
            xRadius = x radius of ellipse
            yRadius = y radius of ellipse
            xDisplace = horizontal displacement of entire image
            yDisplace = vertical displacement of entire image
            xSpacing = spacing between centers of leaf node ellipses
            ySpacing = spacing between centers of nodes on consecutive tiers
            labelShift = nudge added to vertical position of ellipse to center label
        -->
        <xsl:variable name="leafCount" select="count(//node[not(node)])"/>
        <xsl:variable name="height"
            select="max(for $item in //node[not(node)] return count($item/ancestor::node)) + 1"/>
        <xsl:variable name="xRadius" select="50"/>
        <xsl:variable name="yRadius" select="30"/>
        <xsl:variable name="xDisplace" select="70"/>
        <xsl:variable name="yDisplace" select="-80"/>
        <xsl:variable name="xSpacing" select="120"/>
        <xsl:variable name="ySpacing" select="130"/>
        <xsl:variable name="labelShift" select="5"/>
        <svg width="100%" height="100%" version="1.1" xmlns="http://www.w3.org/2000/svg">
            <!-- regular arcs for non-leaves -->
            <!--
                x1,y1 are the parent
                x2,y2 are the child (current)
                -->
            <xsl:for-each select="/node//node[node]">
                <xsl:variable name="x1"
                    select="(count(parent::node/descendant::node[not(node)]) -1) * $xSpacing div 2 +
                    count(parent::node/preceding::node[not(node)]) * $xSpacing + $xDisplace"/>
                <xsl:variable name="y1"
                    select="(count(parent::node/ancestor::node)+ 1) * $ySpacing + $yDisplace"/>
                <xsl:variable name="x2"
                    select="(count(descendant::node[not(node)]) -1) * $xSpacing div 2 +
                    count(preceding::node[not(node)]) * $xSpacing + $xDisplace"/>
                <xsl:variable name="y2"
                    select="(count(ancestor::node) +1) * $ySpacing + $yDisplace"/>
                <line x1="{$x1}" y1="{$y1}" x2="{$x2}" y2="{$y2}"
                    style="stroke:black;stroke-width:1"/>
            </xsl:for-each>
            <!-- regular arcs for leaves -->
            <!--
                x1,y1 are the parent
                x2,y2 are the child (current)
            -->
            <xsl:for-each select="//node[not(node)]">
                <xsl:variable name="x1"
                    select="(count(parent::node/descendant::node[not(node)]) -1) * $xSpacing div 2 +
                    count(parent::node/preceding::node[not(node)]) * $xSpacing + $xDisplace"/>
                <xsl:variable name="y1"
                    select="(count(parent::node/ancestor::node)+ 1) * $ySpacing + $yDisplace"/>
                <xsl:variable name="x2"
                    select="count(preceding::node[not(node)]) * $xSpacing + $xDisplace"/>
                <xsl:variable name="y2"
                    select="$height * $ySpacing + $yDisplace"/>
                <line x1="{$x1}" y1="{$y1}" x2="{$x2}" y2="{$y2}"
                    style="stroke:black;stroke-width:1"/>
            </xsl:for-each>
            <!-- contamination (dotted) arcs for non-leaves -->
            <!-- none in test corpus; add later if needed -->
            <!-- contamination (dotted) arcs for leaves -->
            <!--
                x1,y1 are the parent (current)
                x2,y2 are the target of contamination
            -->
            <xsl:for-each select="//contaminates">
                <xsl:variable name="x1"
                    select="(count(parent::node/descendant::node[not(node)]) -1) * $xSpacing div 2 +
                    count(preceding::node[not(node)]) * $xSpacing + $xDisplace"/>
                <xsl:variable name="y1"
                    select="(count(parent::node/ancestor::node)+ 1) * $ySpacing + $yDisplace"/>
                <xsl:variable name="x2"
                    select="count(//node[@n = current()/@target]/preceding::node[not(node)]) * $xSpacing + $xDisplace"/>
                <xsl:variable name="y2"
                    select="$height * $ySpacing + $yDisplace"/>
                <line x1="{$x1}" y1="{$y1}" x2="{$x2}" y2="{$y2}"
                    style="fill:none; stroke-dasharray: 9, 5; stroke:black;stroke-width:1"/>
            </xsl:for-each>
            <!-- leaves -->
            <xsl:for-each select="//node[not(node)]">
                <!-- variables for leaf placement
                    xCenter = center of each ellipse (y value is calculated from $height and $ySpacing)
                    yLabelPos = y position for label
                    fillColor = color of ellipse
                -->
                <xsl:variable name="xCenter"
                    select="$xSpacing * count(preceding::node[not(node)]) +
                    $xDisplace"/>
                <xsl:variable name="yCenter" select="$height * $ySpacing + $yDisplace"/>
                <xsl:variable name="yLabelCenter" select="$yCenter + $labelShift"/>
                <xsl:variable name="fillColor">
                    <xsl:choose>
                        <xsl:when test="@type='hypothetical'">
                            <xsl:text>goldenrod</xsl:text>
                        </xsl:when>
                        <xsl:when test="@type='extant'">
                            <xsl:text>cyan</xsl:text>
                        </xsl:when>
                        <xsl:otherwise>
                            <xsl:text>orchid</xsl:text>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:variable>
                <ellipse cx="{$xCenter}" cy="{$yCenter}" rx="{$xRadius}" ry="{$yRadius}"
                    style="fill:{$fillColor};
                    stroke:black;stroke-width:2"/>
                <text x="{$xCenter}" y="{$yLabelCenter}" fill="black" text-anchor="middle"
                    font-family="Bookman Old Style">
                    <xsl:value-of select="@n"/>
                </text>
            </xsl:for-each>
            <!-- root node is centered over leaves-->
            <!-- additional variables for root placement (use for-each to make variables local)
                xCenter = center of ellipse (y value is $ySpacing, since we're at the top)
                yLabelPos = y position for label
            -->
            <xsl:for-each select="/node">
                <xsl:variable name="xCenter" select="($leafCount -1) * $xSpacing div 2 + $xDisplace"/>
                <xsl:variable name="yCenter" select="$ySpacing + $yDisplace"/>
                <xsl:variable name="yLabelPos" select="$yCenter + $labelShift"/>
                <ellipse cx="{$xCenter}" cy="{$yCenter}" rx="{$xRadius}" ry="{$yRadius}"
                    style="fill:goldenrod;
                stroke:black;stroke-width:2"/>
                <text x="{$xCenter}" y="{$yLabelPos}" fill="black" text-anchor="middle"
                    font-family="Bookman Old Style">
                    <xsl:value-of select="@n"/>
                </text>
            </xsl:for-each>
            <!-- other intermediate nodes -->
            <xsl:for-each select="/node//node[node]">
                <xsl:variable name="xCenter"
                    select="$xSpacing *
                    (count(descendant::node[not(node)]) -1) div 2 +
                    count(preceding::node[not(node)]) * $xSpacing + $xDisplace"/>
                <xsl:variable name="yCenter" select="$ySpacing * (count(ancestor::node) + 1) + $yDisplace"/>
                <xsl:variable name="yLabelCenter" select="$yCenter + $labelShift"/>
                <xsl:variable name="fillColor">
                    <xsl:choose>
                        <xsl:when test="@type='hypothetical'">
                            <xsl:text>goldenrod</xsl:text>
                        </xsl:when>
                        <xsl:when test="@type='extant'">
                            <xsl:text>cyan</xsl:text>
                        </xsl:when>
                        <xsl:otherwise>
                            <xsl:text>orchid</xsl:text>
                        </xsl:otherwise>
                    </xsl:choose>
                </xsl:variable>
                <ellipse cx="{$xCenter}" cy="{$yCenter}" rx="{$xRadius}" ry="{$yRadius}"
                    style="fill:goldenrod;
                    stroke:black;stroke-width:2"/>
                <text x="{$xCenter}" y="{$yLabelCenter}" fill="black" text-anchor="middle"
                    font-family="Bookman Old Style">
                    <xsl:value-of select="@n"/>
                </text>
            </xsl:for-each>
        </svg>
    </xsl:template>
</xsl:stylesheet>

The output of the preceding script, when applied to the sample file, is (this will be visible only to users of browsers that include SVG support, either natively or though a plugin):

[SVG is rendered here in those browsers that support it]

For the convenience of those whose browsers do not include SVG support, the following is a screen shot of the SVG rendering:

./stemma5_beige.png

5. Evaluation of variation

The model proposed above is suitable for automated querying as part of the evaluation of variants in a set of manuscripts. Assuming the stemma above and a situation where there are only two variant readings (since the stemma has six manuscripts, up to six different readings are possible, but the present exercise is limited to just two readings), there are thirty-two different ways the manuscripts can be grouped.1

For this exercise we ignore the trivial case where all manuscripts agree and we assume that in each case of variation, each manuscript reads either “Chocolate” or “Peanut butter,” and we concentrate on the patterns of agreement, rather than the specific readings. That is, for example, whether we prefer the reading in “R” or the reading in all other manuscripts where “R” reads “Chocolate” and the other manuscripts all read “Peanut butter” involves the same process of decision as the one in which “R” reads “Peanut butter” and the others all read “Chocolate”.

One of the principles of stemmatic textual criticism is that variant readings should be evaluated in a way that minimizes the need for coincidence. For example, if “L” and “t” read “Chocolate” and the other witnesses all read “Peanut butter”, the explanation that requires the fewest coincidences is that “Peanut butter” was the reading in “α” and “Chocolate” was introduced in “δ”. The alternatives require either that “Chocolate” be introduced separately in “L” and “t” or, if we assign “Chocolate” to “α”, that “Peanut butter” be introduced independently at least twice (in “ε” and “γ”).2

If we ignore contamination for the moment, according to these principles each of the thirty-six possible patterns points to one of the following three outcomes:

  1. One specific reading was in “α” and the other was introduced later during copying.
  2. The stemma is balanced, so that either reading could have stood in “α”. This situation is called a crux. The only opportunity for a crux in the stemma above is “LtRA:IX”, where one reading stood in “α” and the other was introduced either in “β” (where it was inherited by “LtRA”) or in “γ” (where it was inherited by “IX”).
  3. The stemma does not indicate which reading stood in “α”.

The stemmatic principles leading to the three alternatives, above, are automatic, as follows:3

  1. In 5:1 patterns, the reading in the 5 manuscripts was in “α” and there was a single change in one manuscript that disagrees with the 5.
  2. Otherwise, if we take the youngest common ancestor of the nodes that share a reading, we need to examine whether all descendants of that ancestor share that reading, or whether some of its descendants have the other reading.
    1. If the youngest ancestor of each reading has no descendants with the other reading, there is a crux.
    2. If the youngest common ancestor of one reading has no descendants with the other reading, but that situation does not obtain for the second reading, the second reading stood in “α”.
    3. If the youngest common ancestors of both readings have descendants that attest the other reading, the stemma cannot be used to decide which reading stood in “α”.

If we ignore contamination for the moment, applying these principles to the thirty-one meaningful possible distributions of two readings over six manuscripts (I omit the case where all manuscripts agree on a reading) yields the following results:

Number Reading Where introduced Stemmatic reading
Chocolate Peanut butter Chocolate Peanut
butter
1 L t R A I X L α (mixed) Peanut butter
2 t L R A I X t α (mixed) Peanut butter
3 R L t A I X R α (mixed) Peanut butter
4 A L t R I X A α (mixed) Peanut butter
5 I L t R A X I α (mixed) Peanut butter
6 X L t R A I X α (mixed) Peanut butter
7 L t R A I X δ α (mixed) Peanut butter
8 L R t A I X β (mixed) α (mixed) (Non-stemmatic pattern)
9 L A t R I X β (mixed) α (mixed) (Non-stemmatic pattern)
10 L I t R A X α (mixed) α (mixed) (Non-stemmatic pattern)
11 L X t R A I α (mixed) α (mixed) (Non-stemmatic pattern)
12 t R L A I X β (mixed) α (mixed) (Non-stemmatic pattern)
13 t A L R I X β (mixed) α (mixed) (Non-stemmatic pattern)
14 t I L R A X α (mixed) α (mixed) (Non-stemmatic pattern)
15 t X L R A I α (mixed) α (mixed) (Non-stemmatic pattern)
16 R A L t I X ε α (mixed) Peanut butter
17 R I L t A X α (mixed) α (mixed) (Non-stemmatic pattern)
18 R X L t A I α (mixed) α (mixed) (Non-stemmatic pattern)
19 A I L t R X α (mixed) α (mixed) (Non-stemmatic pattern)
20 A X L t R I α (mixed) α (mixed) (Non-stemmatic pattern)
21 I X L t R A γ β Chocolate or Peanut butter (Crux)
22 L t R A I X β (mixed) α (mixed) (Non-stemmatic pattern)
23 L t A R I X β (mixed) α (mixed) (Non-stemmatic pattern)
24 L t I R A X α (mixed) α (mixed) (Non-stemmatic pattern)
25 L t X R A I α (mixed) α (mixed) (Non-stemmatic pattern)
26 L R A t I X β (mixed) α (mixed) (Non-stemmatic pattern)
27 L R I t A X α (mixed) α (mixed) (Non-stemmatic pattern)
28 L R X t A I α (mixed) α (mixed) (Non-stemmatic pattern)
29 L A I t R X α (mixed) α (mixed) (Non-stemmatic pattern)
30 L A X t R I α (mixed) α (mixed) (Non-stemmatic pattern)
31 L I X t R A α (mixed) β (mixed) (Non-stemmatic pattern)

If we include contamination, we have to change the evaluation metric to recognize that a contaminated witness (in the case of only two possible readings) must be eliminated from consideration. This modification yields the following results; the important detail is that “LtR:AIX”, which was a non-stemmatic pattern without taking contamination into consideration, is now a crux:

Number Reading Where introduced Stemmatic reading
Chocolate Peanut butter Chocolate Peanut
butter
1 L t R A I X L α (mixed) Peanut butter
2 t L R A I X t α (mixed) Peanut butter
3 R L t A I X R α (mixed) Peanut butter
4 A L t R I X A α Peanut butter
5 I L t R A X I α (mixed) Peanut butter
6 X L t R A I X α (mixed) Peanut butter
7 L t R A I X δ α (mixed) Peanut butter
8 L R t A I X β (mixed) α (mixed) (Non-stemmatic pattern)
9 L A t R I X β (mixed) α (mixed) (Non-stemmatic pattern)
10 L I t R A X α (mixed) α (mixed) (Non-stemmatic pattern)
11 L X t R A I α (mixed) α (mixed) (Non-stemmatic pattern)
12 t R L A I X β (mixed) α (mixed) (Non-stemmatic pattern)
13 t A L R I X β (mixed) α (mixed) (Non-stemmatic pattern)
14 t I L R A X α (mixed) α (mixed) (Non-stemmatic pattern)
15 t X L R A I α (mixed) α (mixed) (Non-stemmatic pattern)
16 R A L t I X ε α (mixed) Peanut butter
17 R I L t A X α (mixed) α (mixed) (Non-stemmatic pattern)
18 R X L t A I α (mixed) α (mixed) (Non-stemmatic pattern)
19 A I L t R X γ (mixed) α (mixed) (Non-stemmatic pattern)
20 A X L t R I γ (mixed) α (mixed) (Non-stemmatic pattern)
21 I X L t R A γ β Chocolate or Peanut butter (Crux)
22 L t R A I X β γ Chocolate or Peanut butter (Crux)
23 L t A R I X β (mixed) α (mixed) (Non-stemmatic pattern)
24 L t I R A X α (mixed) α (mixed) (Non-stemmatic pattern)
25 L t X R A I α (mixed) α (mixed) (Non-stemmatic pattern)
26 L R A t I X β (mixed) α (mixed) (Non-stemmatic pattern)
27 L R I t A X α (mixed) α (mixed) (Non-stemmatic pattern)
28 L R X t A I α (mixed) α (mixed) (Non-stemmatic pattern)
29 L A I t R X α (mixed) α (mixed) (Non-stemmatic pattern)
30 L A X t R I α (mixed) α (mixed) (Non-stemmatic pattern)
31 L I X t R A α (mixed) β (mixed) (Non-stemmatic pattern)

The preceding tables were generated automatically from XML source that encoded the stemma as described above. The XML source document instance is:

<test>
    <node n="α" type="hypothetical">
        <node n="β" type="hypothetical">
            <node n="δ" type="hypothetical">
                <node n="L" type="extant"/>
                <node n="t" type="lost"/>
            </node>
            <node n="ε" type="hypothetical">
                <node n="R" type="extant"/>
                <node n="A" type="extant"/>
            </node>
        </node>
        <node n="γ" type="hypothetical">
            <contaminates target="A"/>
            <node n="I" type="extant"/>
            <node n="X" type="extant"/>
        </node>
    </node>
    <app>
        <rdg wit="L">Chocolate</rdg>
        <rdg wit="t R A I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="t">Chocolate</rdg>
        <rdg wit="L R A I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="R">Chocolate</rdg>
        <rdg wit="L t A I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="A">Chocolate</rdg>
        <rdg wit="L t R I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="I">Chocolate</rdg>
        <rdg wit="L t R A X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="X">Chocolate</rdg>
        <rdg wit="L t R A I ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L t">Chocolate</rdg>
        <rdg wit="R A I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L R">Chocolate</rdg>
        <rdg wit="t A I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L A">Chocolate</rdg>
        <rdg wit="t R I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L I">Chocolate</rdg>
        <rdg wit="t R A X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L X">Chocolate</rdg>
        <rdg wit="t R A I ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="t R">Chocolate</rdg>
        <rdg wit="L A I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="t A">Chocolate</rdg>
        <rdg wit="L R I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="t I">Chocolate</rdg>
        <rdg wit="L R A X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="t X">Chocolate</rdg>
        <rdg wit="L R A I ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="R A">Chocolate</rdg>
        <rdg wit="L t I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="R I">Chocolate</rdg>
        <rdg wit="L t A X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="R X">Chocolate</rdg>
        <rdg wit="L t A I ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="A I">Chocolate</rdg>
        <rdg wit="L t R X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="A X">Chocolate</rdg>
        <rdg wit="L t R I ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="I X">Chocolate</rdg>
        <rdg wit="L t R A ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L t R">Chocolate</rdg>
        <rdg wit="A I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L t A">Chocolate</rdg>
        <rdg wit="R I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L t I">Chocolate</rdg>
        <rdg wit="R A X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L t X">Chocolate</rdg>
        <rdg wit="R A I ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L R A">Chocolate</rdg>
        <rdg wit="t I X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L R I">Chocolate</rdg>
        <rdg wit="t A X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L R X">Chocolate</rdg>
        <rdg wit="t A I ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L A I">Chocolate</rdg>
        <rdg wit="t R X ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L A X">Chocolate</rdg>
        <rdg wit="t R I ">Peanut butter</rdg>
    </app>
    <app>
        <rdg wit="L I X">Chocolate</rdg>
        <rdg wit="t R A ">Peanut butter</rdg>
    </app>
</test>

The XSLT script that generated the HTML [Hypertext Markup Language] tables above is:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:djb="djb" exclude-result-prefixes="#all">
    <xsl:output encoding="UTF-8" method="xml" indent="yes"
        doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
        doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
        exclude-result-prefixes="#all"/>
    <xsl:key name="nodesByDescendant" match="node" use="descendant::node/@n"/>
    <xsl:key name="nodesByDescendantContaminated" match="node"
    use="descendant-or-self::contaminates/@target"/>
    <xsl:key name="nodesByName" match="node" use="@n"/>
    <xsl:variable name="root" select="/"/>
    <xsl:template match="*"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Automating stemmatic textual criticism in XML</title>
            </head>
            <body>
                <h1>Automating stemmatic textual criticism in XML</h1>
                <p>David J. Birnbaum<br/>
                    <a href="mailto:djbpitt@pitt.edu">djbpitt@pitt.edu</a><br/>
                    <xsl:text>Generated: </xsl:text>
                    <xsl:value-of select="current-dateTime()"/></p>
                <hr/>
                <h2>Stemma codicum</h2>
                <p>
                    <img src="stemma.png" alt="stemma.png"/>
                </p>
                <hr/>
                <h2>Table 1: Two readings, ignores contamination</h2>
                <p>Notes:</p>
                <ol>
                    <li>The table contains all possible combinations of 5-1, 4-2, and 3-3 readings.</li>
                    <li>“Mixed” in the “Where introduced” column means that the children of the
                        ancestor have both the indicated reading and the other reading.</li>
                    <li>“Non-stemmatic pattern” in the “Stemmatic reading” column means that the
                        stemma does not provide an unambiguous indication of the reading in
                    “α.”</li>
                </ol>
                <table border="1">
                    <tr>
                        <th rowspan="2">Number</th>
                        <th colspan="2">Reading</th>
                        <th colspan="2">Where introduced</th>
                        <th rowspan="2">Stemmatic reading</th>
                    </tr>
                    <tr>
                        <th>Chocolate</th>
                        <th>Peanut butter</th>
                        <th>Chocolate</th>
                        <th>Peanut<br/>butter</th>
                    </tr>
                    <xsl:for-each select="/test/app">
                        <!-- variables
                            witsChocolateAncestor = youngest common ancestor of all mss listed in rdg[1]/@wits
                            witsChocolateCount = number of mss listed in rdg[1]/@wits
                            witsChocolateAncestorMixed (boolean) = does witsChocolateAncestor have
                                children who are not in @wits (True is it is mixed)
                            witsPbAncestor = youngest common ancestor of all mss listed in rdg[2]/@wits
                            witsPbCount = number of mss listed in rdg[2]/@wits
                            witsPbAncestorMixed (boolean) = does witsPbAncestor have
                                children who are not in @wits (True is it is mixed)
                        -->
                        <xsl:variable name="witsChocolateAncestor" as="xs:string"
                            select="djb:wits-end(tokenize(normalize-space(rdg[1]/@wit),' '), //node)"/>
                        <xsl:variable name="witsChocolateCount"
                            select="count(tokenize(normalize-space(rdg[1]/@wit),' '))"/>
                        <xsl:variable name="witsChocolateAncestorMixed" as="xs:boolean"
                            select="not(every $i in 
                            key('nodesByName', $witsChocolateAncestor)//node[not(node)]/@n
                            satisfies $i =tokenize(normalize-space(rdg[1]/@wit),' '))"/>
                        <xsl:variable name="witsPbAncestor" as="xs:string"
                            select="djb:wits-end(tokenize(normalize-space(rdg[2]/@wit),' '), //node)"/>
                        <xsl:variable name="witsPbCount"
                            select="count(tokenize(normalize-space(rdg[2]/@wit),' '))"/>
                        <xsl:variable name="witsPbAncestorMixed" as="xs:boolean"
                            select="not(every $i in 
                            key('nodesByName', $witsPbAncestor)//node[not(node)]/@n
                            satisfies $i =tokenize(normalize-space(rdg[2]/@wit),' '))"/>
                        <tr>
                            <xsl:if
                                test="not(
                                $witsChocolateCount=1 or
                                $witsPbCount=1 or
                                not($witsChocolateAncestorMixed and $witsPbAncestorMixed)
                                )">
                                <xsl:attribute name="style">background-color:
                                lightgray</xsl:attribute>
                            </xsl:if>
                            <td style="text-align:right">
                                <xsl:number/>
                            </td>
                            <td>
                                <xsl:value-of select="rdg[1]/@wit"/>
                            </td>
                            <td>
                                <xsl:value-of select="rdg[2]/@wit"/>
                            </td>
                            <td>
                                <xsl:choose>
                                    <!-- if there's a singleton, it's the source of its own reading
                                        -->
                                    <xsl:when test="$witsChocolateCount=1">
                                        <xsl:value-of
                                            select="tokenize(normalize-space(rdg[1]/@wit),' ')"/>
                                    </xsl:when>
                                    <xsl:otherwise>
                                        <xsl:value-of select="$witsChocolateAncestor"/>
                                        <xsl:if test="$witsChocolateAncestorMixed">
                                            <xsl:text> (mixed) </xsl:text>
                                        </xsl:if>
                                    </xsl:otherwise>
                                </xsl:choose>
                            </td>
                            <td>
                                <xsl:choose>
                                    <xsl:when test="$witsPbCount=1">
                                        <xsl:value-of
                                            select="tokenize(normalize-space(rdg[2]/@wit),' ')"/>
                                    </xsl:when>
                                    <xsl:otherwise>
                                        <xsl:value-of select="$witsPbAncestor"/>
                                        <xsl:if test="$witsPbAncestorMixed">
                                            <xsl:text> (mixed) </xsl:text>
                                        </xsl:if>
                                    </xsl:otherwise>
                                </xsl:choose>
                            </td>
                            <!-- Possible results:
                                Chocolate
                                Peanut butter
                                Chocolate or Peanut butter (crux)
                                Stemma is inconsistent
                                    How to get there:
                                Singleton -> alpha for other reading
                                One is mixed = alpha for mixed reading
                                Both are mixed = stemma is inconsistent
                                Neither is mixed = crux
                            -->
                            <td>
                                <xsl:choose>
                                    <xsl:when test="$witsChocolateCount=1">
                                        <xsl:value-of select="rdg[2]"/>
                                    </xsl:when>
                                    <xsl:when test="$witsPbCount=1">
                                        <xsl:value-of select="rdg[1]"/>
                                    </xsl:when>
                                    <xsl:when
                                        test="not($witsChocolateAncestorMixed or
                                        $witsPbAncestorMixed)">
                                        <xsl:value-of select="rdg[1]"/>
                                        <xsl:text> or
                                        </xsl:text>
                                        <xsl:value-of select="rdg[2]"/>
                                        <xsl:text> (Crux)</xsl:text>
                                    </xsl:when>
                                    <xsl:when
                                        test="$witsChocolateAncestorMixed and
                                        not($witsPbAncestorMixed)">
                                        <xsl:value-of select="rdg[1]"/>
                                    </xsl:when>
                                    <xsl:when
                                        test="$witsPbAncestorMixed and not($witsChocolateAncestorMixed)">
                                        <xsl:value-of select="rdg[2]"/>
                                    </xsl:when>
                                    <xsl:otherwise>
                                        <xsl:text>(Non-stemmatic pattern)</xsl:text>
                                    </xsl:otherwise>
                                </xsl:choose>
                            </td>
                        </tr>
                    </xsl:for-each>
                </table>
                <hr/>
                <h2>Table 2: Two readings, includes contamination</h2>
                <p>Notes:</p>
                <ol>
                    <li>The table contains all possible combinations of 5-1, 4-2, and 3-3 readings.</li>
                    <li>“Mixed” in the “Where introduced” column means that the children of the
                        ancestor have both the indicated reading and the other reading.</li>
                    <li>“Non-stemmatic pattern” in the “Stemmatic reading” column means that the
                        stemma does not provide an unambiguous indication of the reading in
                        “α.”</li>
                </ol>
                <table border="1">
                    <tr>
                        <th rowspan="2">Number</th>
                        <th colspan="2">Reading</th>
                        <th colspan="2">Where introduced</th>
                        <th rowspan="2">Stemmatic reading</th>
                    </tr>
                    <tr>
                        <th>Chocolate</th>
                        <th>Peanut butter</th>
                        <th>Chocolate</th>
                        <th>Peanut<br/>butter</th>
                    </tr>
                    <xsl:for-each select="/test/app">
                        <!-- variables
                            witsChocolateAncestor = youngest common ancestor of all mss listed in rdg[1]/@wits
                            witsChocolateCount = number of mss listed in rdg[1]/@wits
                            witsChocolateAncestorMixed (boolean) = does witsChocolateAncestor have
                            children who are not in @wits (True is it is mixed)
                            witsPbAncestor = youngest common ancestor of all mss listed in rdg[2]/@wits
                            witsPbCount = number of mss listed in rdg[2]/@wits
                            witsPbAncestorMixed (boolean) = does witsPbAncestor have
                            children who are not in @wits (True is it is mixed)
                        -->
                        <xsl:variable name="witsChocolateAncestor" as="xs:string"
                            select="djb:wits-end-contaminated(tokenize(normalize-space(rdg[1]/@wit),' '), //node)"/>
                        <xsl:variable name="witsChocolateCount" as="xs:integer"
                            select="count(tokenize(normalize-space(rdg[1]/@wit),' '))"/>
                        <xsl:variable name="witsChocolateAncestorMixed" as="xs:boolean"
                            select="not(every $i in 
                            (key('nodesByName', $witsChocolateAncestor)//node[not(node)] except
                            //node[@n=//contaminates/@target])/@n 
                            satisfies $i =tokenize(normalize-space(rdg[1]/@wit),' '))"/>
                        <xsl:variable name="witsPbAncestor" as="xs:string"
                            select="djb:wits-end-contaminated(tokenize(normalize-space(rdg[2]/@wit),' '), //node)"/>
                        <xsl:variable name="witsPbCount" as="xs:integer"
                            select="count(tokenize(normalize-space(rdg[2]/@wit),' '))"/>
                        <xsl:variable name="witsPbAncestorMixed" as="xs:boolean"
                            select="not(every $i in 
                            (key('nodesByName', $witsPbAncestor)//node[not(node)] except
                            //node[@n=//contaminates/@target])/@n 
                            satisfies $i =tokenize(normalize-space(rdg[2]/@wit),' '))"/>
                        <tr>
                            <xsl:if
                                test="not(
                                $witsChocolateCount=1 or
                                $witsPbCount=1 or
                                not($witsChocolateAncestorMixed and $witsPbAncestorMixed)
                                )">
                                <xsl:attribute name="style">background-color:
                                    lightgray</xsl:attribute>
                            </xsl:if>
                            <td style="text-align:right">
                                <xsl:number/>
                            </td>
                            <td>
                                <xsl:value-of select="rdg[1]/@wit"/>
                            </td>
                            <td>
                                <xsl:value-of select="rdg[2]/@wit"/>
                            </td>
                            <td>
                                <xsl:choose>
                                    <!-- if there's a singleton, it's the source of its own reading
                                    -->
                                    <xsl:when test="$witsChocolateCount=1">
                                        <xsl:value-of
                                            select="tokenize(normalize-space(rdg[1]/@wit),' ')"/>
                                    </xsl:when>
                                    <xsl:otherwise>
                                        <xsl:value-of select="$witsChocolateAncestor"/>
                                        <xsl:if test="$witsChocolateAncestorMixed">
                                            <xsl:text> (mixed) </xsl:text>
                                        </xsl:if>
                                    </xsl:otherwise>
                                </xsl:choose>
                            </td>
                            <td>
                                <xsl:choose>
                                    <xsl:when test="$witsPbCount=1">
                                        <xsl:value-of
                                            select="tokenize(normalize-space(rdg[2]/@wit),' ')"/>
                                    </xsl:when>
                                    <xsl:otherwise>
                                        <xsl:value-of select="$witsPbAncestor"/>
                                        <xsl:if test="$witsPbAncestorMixed">
                                            <xsl:text> (mixed) </xsl:text>
                                        </xsl:if>
                                    </xsl:otherwise>
                                </xsl:choose>
                            </td>
                            <!-- Possible results:
                                Chocolate
                                Peanut butter
                                Chocolate or Peanut butter (crux)
                                Stemma is inconsistent
                                How to get there:
                                Singleton -> alpha for other reading
                                One is mixed = alpha for mixed reading
                                Both are mixed = stemma is inconsistent
                                Neither is mixed = crux
                            -->
                            <td>
                                <xsl:choose>
                                    <xsl:when test="$witsChocolateCount=1">
                                        <xsl:value-of select="rdg[2]"/>
                                    </xsl:when>
                                    <xsl:when test="$witsPbCount=1">
                                        <xsl:value-of select="rdg[1]"/>
                                    </xsl:when>
                                    <xsl:when
                                        test="not($witsChocolateAncestorMixed or
                                        $witsPbAncestorMixed)">
                                        <xsl:value-of select="rdg[1]"/>
                                        <xsl:text> or
                                        </xsl:text>
                                        <xsl:value-of select="rdg[2]"/>
                                        <xsl:text> (Crux)</xsl:text>
                                    </xsl:when>
                                    <xsl:when
                                        test="$witsChocolateAncestorMixed and
                                        not($witsPbAncestorMixed)">
                                        <xsl:value-of select="rdg[1]"/>
                                    </xsl:when>
                                    <xsl:when
                                        test="$witsPbAncestorMixed and not($witsChocolateAncestorMixed)">
                                        <xsl:value-of select="rdg[2]"/>
                                    </xsl:when>
                                    <xsl:otherwise>
                                        <xsl:text>(Non-stemmatic pattern)</xsl:text>
                                    </xsl:otherwise>
                                </xsl:choose>
                            </td>
                        </tr>
                    </xsl:for-each>
                </table>
            </body>
        </html>
    </xsl:template>
    <xsl:function name="djb:wits-end" as="xs:string">
        <xsl:param name="wit" as="xs:string*"/>
        <xsl:param name="candidates" as="node()*"/>
        <xsl:choose>
            <xsl:when test="not(exists($wit))">
                <xsl:sequence select="$candidates[last()]/@n"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:variable name="favorites"
                    select="$candidates intersect key('nodesByDescendant', $wit[1], $root)"/>
                <xsl:sequence select="djb:wits-end(remove($wit, 1), $favorites)"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:function>
    <xsl:function name="djb:wits-end-contaminated" as="xs:string">
        <xsl:param name="wit" as="xs:string*"/>
        <xsl:param name="candidates" as="node()*"/>
        <xsl:choose>
            <xsl:when test="not(exists($wit))">
                <xsl:sequence select="$candidates[last()]/@n"/>
            </xsl:when>
            <xsl:otherwise>
                <xsl:variable name="favorites"
                    select="$candidates intersect (key('nodesByDescendant', $wit[1], $root) | 
                    key('nodesByDescendantContaminated', $wit[1], $root))"/>
                <xsl:sequence select="djb:wits-end(remove($wit, 1), $favorites)"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:function>
</xsl:stylesheet>

6. Conclusions

  1. A stemma codicum is a special type of typed directed acyclic graph. It is easily encoded in XML by using XML containment to represent parent/child relationships in the stemma. Contamination may be modeled by adding an additional element to the schema.
  2. A single attribute (@n) is able to fulfill three functions: label the node, serve as the target of pointing to the node (in cases of contamination), and point to additional information about the manuscript represented by the node. An additional attribute (@type) is required for node typing.
  3. Relax NG is able to validate node naming conventions.
  4. Schematron is able to validate the uniqueness of the identifiers and the associations of elements with one another.
  5. The model described here is easily transformed (by XSLT) into a Graphviz dot file, which can be used to generate output in a variety of graphic formats.
  6. This model can also be transformed directly into SVG, although the transformation script presented here requires that the user assume responsibility for ordering sibling nodes during input (which is not necessary when Graphviz is used as an intermediary).
  7. This model can be used to support the semi-automated evaluation of variant readings, as it is much more amenable to querying than a presentationally oriented representation.

Acknowledgements: I am grateful to James Cummings, Matthew Driscoll, Wendell Piez, and Mark Weixel for discussion, comments, and suggestions.


Notes

1 There is one 6:0 pattern, six 5:1 (6 choose 1), fifteen 4:2 (6 choose 2), and ten 3:3 (6 choose 3 = 20, which is halved because each set of three also defines a second set of three).

2 The best description of the principles of stemmatic textual criticism is Maas 1958.

3 It should be noted that the stemmatic evaluation of variation is a probabilistic activity based on finding the explanation that requires the least coincidence. It is the responsibility of the textual critic to determine, through the application of philological expertise, which instances of variation are suitable for stemmatic analysis. Variation that could easily have arisen by chance cannot be explained by stemmatic principles.


Works Cited