SGML Conversion

Contents


General

After character conversion, the *.SHW files that were identified for preservation were converted to SGML with custom SNOBOL4 scripts. The SNOBOL4 implementation was Phil Budne's SNOBOL-in-C (C-MAINBOL version 0.99.3).

Modification: The SIL source for SNOBOL set an input record length limit of 132 (originally 80!), which caused problems with some of the scripts used in this project. This length limitation was overcome by changing 132 to 1024 in the CARDSZ define in equ.h, to read

# define CARDSZ (1024)
and recompiling.


Frequency Tables

The only frequency file identified for conversion was FREQSHAW.SHW, a space-delimited report with three columns, consisting of wordform and two sometimes-differing frequency counts (exact meaning to be determined). Sample input looks like:

A                                  815     808
AVGUSTOV                             1       1
AVGUSTOM                             1       1

The output of character conversion with the shaw-sgml.rus translit filter looks like:

а                                     815     808
августов    1       1
августом    1       1

The DTD designed for this report was:

<!doctype pfreq [
<!element pfreq - - (entry)+>
<!element entry - - (lexeme,frequency,remainder)>
<!element (lexeme | frequency | remainder) - - (#PCDATA)>
<!entity % ISOcyr1 SYSTEM 'ISOcyr1.ent'>
%ISOcyr1;
]>

<frequency> represents the middle column; two-column frequency files include the second but not third columns, which suggests that this is the one that represents frequency. <remainder> contains the third field. These labels will be changed once the meaning of the fields has been determined conclusively.

Sample SGML output from frequency.sno looks like:

<entry><lexeme>&acy;</lexeme><frequency>815</frequency><remainder>808</remainder></entry>
<entry><lexeme>&acy;&vcy;&gcy;&ucy;&scy;&tcy;&ocy;&vcy;</lexeme><frequency>1</frequency><remainder>1</remainder></entry>
<entry><lexeme>&acy;&vcy;&gcy;&ucy;&scy;&tcy;&ocy;&mcy;</lexeme><frequency>1</frequency><remainder>1</remainder></entry>
    

David J Birnbaum
Last modified: Mon Jun 30 17:10:20 EDT