Using the TEI Writing System Declaration (WSD)

1. Introduction

This directory contains all the Cyrillic-related files that accompany David J. Birnbaum, Mavis Cournane, and Peter Flynn's "Using the TEI Writing System Declaration (WSD)," Computers and the Humanities 00: 1-9, 1998. The Greek- and Hebrew-related files are available at http://imbolc.ucc.ie/~pflynn/wsd/.

The present system replaces SDATA character entities in the original source SGML file with the UCS and AFII numerical values taken from the WSD, rather than with the replacement text strings in the SDATA entity files associated with the source document. These replacements are numerical identifiers, rather than raw glyphs or characters, because there is no readily-available system-independent way to render UCS-2 characters or AFII glyphs, and the use of numerical identifiers provides a system-independent way to verify that the WSD is being processed and used properly. An eventual production system would need to map these identifiers to actual UCS-2 characters or AFII glyphs in a way that would cause them to be rendered properly.

2. Files

2.1. Basic Files Other Than Omnimark Scripts

chsl.ent
Early Cyrillic character entity files developed by David J. Birnbaum and Ralph Cleminson.
chsl.wsd
TEI Writing System Declaration for early Cyrillic developed by David J. Birnbaum and Ralph Cleminson.
chsl.sgml
Sample SGML text file, to be parsed against the TEI DTDs.
om.cat
Omnimark library catalog, mapped to the system identifiers in use on my system.
chsl.cmd
OS/2 cmd script (like a DOS bat file or Unix shell script) to drive the conversions. Executing this script performs all the Omnimark transformations described in the text of the article.

2.2. Omnimark Scripts

2.2.1. General Comments

The system uses four basic Omnimark scripts (*.xom) to generate four output files (*.out). The two scripts that create new SDATA entity set files on disk create these as temporary files (*.tmp). The source files are then rewritten as new SGML (*.sgml) files that use the new SDATA entity sets, after which a special null Omnimark script (null.xom) is used to parse the new source files against the temporary SDATA entity files. Omnimark errors are written to log (*.log) files; there should be none. In tabular form:

2.2.2. Omnimark Scripts for In-Memory Processing

File Type Character (ucs) Glyph (afii)
Script (*.xom) chsl_ucs_memory.xom chsl_afii_memory.xom
Output (*.out) chsl_ucs_memory.out chsl_afii_memory.out
Error Log (*.log) chsl_ucs_memory.log chsl_afii_memory.log

2.2.3. Omnimark Scripts for Disk-Based Processing

File Type Character (ucs) Glyph (afii)
Script (*.xom) chsl_ucs_disk.xom chsl_afii_disk.xom
Temporary SDATA Entity File (*.tmp) chsl_ucs_disk.tmp chsl_afii_disk.tmp
Temporary SGML Source File (*.sgml) chsl_ucs_disk.sgml chsl_afii_disk.sgml
Temporary File Generation Script (*.xom) use_ucs.xom use_afii.xom
Reparsing Script (*.xom) null.xom null.xom
Output (*.out) chsl_ucs_disk.out chsl_afii_disk.out
Error Log (*.log) chsl_ucs_disk.log chsl_afii_disk.log

3. Notes

3.1 Temporary Files

Temporary files, which would normally be deleted at the end of the script, are retained here for examination.

3.2. Basic TEI Files

The full TEI P3 distribution, including a modified SGML declaration (here sgmldecl.tei) and files added after the original release to support formal public identifiers, is required. TEI support for FPIs requires additional modification to the TEI distribution, as described in my separate Notes on TEI Support for Formal Public Identifiers (FPIs).

3.3. To Be Done

The present scripts are hard-coded for specific filenames. A general production system would generate filenames dynamically from the input files (SGML source, SDATA entities, WSDs).


Last modified 1998-12-02 by David J. Birnbaum (djb@clover.slavic.pitt.edu)