The Pushkin files recovered from tapes b8402, b8410, and b8411 observed two character mappings, a mnemonic one designed by Tom Shaw and used for *.SHW files and a printer-specific one used to support the original *.OUT printed output files.
Character conversion was implemented with the help of Jan Labanowski's translit program and a shaw-sgml.rus mapping file created locally to map between the Shaw encoding and ISOcyr1 entities. The conversion script assumes that all alphabetic text is Cyrillic, and will need to be modified to deal with the concordance files, which mark off Pushkin's Latin interpolations with _{SL} and {SC}+ delimiters. (The same is probably true of the original poetry files, which have not been discovered.) An improved script will also address angle brackets and will convert pipes (|), which precede stressed vowels, into SGML tags. The current mapping file was designed for the frequency table, where the text is entirely upper-case, and does not address lower-case letters at all. This will have to be changed for the poetry and concordance files, which observe mixed case.
Sample input looks like:
A 815 808 AVGUSTOV 1 1 AVGUSTOM 1 1
The output of character conversion with the shaw-sgml.rus translit filter looks like:
а 815 808 августов 1 1 августом 1 1
*.OUT files were not intended to be read by humans, and were ignored at the outset of the project in the hope that all information available in *.OUT files would also be available in *.SHW files. When I later decided that it might be important to examine the contents of some *.OUT files that did not appear to have *.SHW counterparts (at least on the three tapes that we were able to read at the University of Pittsburgh), I developed a rudimentary out-shaw.rus translit script for conversion from the *.OUT encoding to the regular Shaw encoding. This script was creating by using unoutify.sno to reverse a sed script I found on tape b8402 and then making a small number of modifications by hand. This did not convert formatting codes properly (some *.SHW codes were conflated during conversion to *.OUT, and undoing the conflation would have required more context-analysis than seemed justified), but it did render the textual content legible.
Sample input looks like:
[j3][it1m]Nfv, ult K. cnhe;prbncz[cf1][lfC3.250.3[ep [j3][it1m]Rby;fk K;oa=];ea nty==w M;oa=];ea[cf1][lfTJ.10.9.9[ep
The output of character conversion with the out-shaw.rus translit filter looks like:
{5}Tam, gde L. stru|its$a{M}{6}S3.250.3{7}
{5}Kinhal L<?> ten$i B<?>{M}{6}EO.10.9.9{7}
The original *.SHW counterpart of the preceding example looks like:
{5}Tam, gde L. stru|its$a{6}S3.250.3{7}
{5}Kinhal L<?> ten$i B<?>{6}EO.10.9.9{7}