A Standard Specification for a Braille Translation System (continued)

(This pages concludes the previous page.)

Intermezzo

The previous page described the first part of a new XML format for specifying braille translation systems. That first part contains basically the same data that appears in a typical "translation table" but organizes it in a more useful way. This page describes the remaining part of this new XML format. This remaining part contains new data that makes it possible for a braille translation application to carry out a variety of new functions including providing specialized translation for tagged items.

II.5 Sets File

The sets.xml file is the first major new feature of the new XML format. This file provides the user with a simple mechanism for specifying arbitrary subsets of the replacement rules in the signs.xml file and, optionally, for including the information in the files listed in the exceptions.xml file. The ability to specify a particular subset of replacement rules is intended to support applications which can create and use different tables for translating special items such as proper names without requiring the user to supply these tables directly. The sets.xml file can also be used in conjunction with the rules.xml file to support translating certain items with special algorithms as well as with specially-constructed translation tables.

Sometimes using a different algorithm can be an alternative to using a different table. For example, uncontracted braille can be produced from the same table as contracted braille by a using a special algorithm that is restricted to replacing a single print character at a time. Alternatively, uncontracted braille can be produced using the same algorithm as for contracted braille simply by using a special table which doesn't have any contractions.

Sets File DTD

Here is the DTD for sets.xml file. Sets of replacement rules can be defined by referencing one or more groups of sign or altsign replacement rules by the value of the common value of their type attribute and/or by referencing individual replacement rules by the contents of their print elements. Also, for convenience where not all of the rules in a given group are needed, the DTD includes elements for removing individual replacement rules which had been included as part of a group.


<!ENTITY % shared1 SYSTEM "setSigns.dtd" >
%shared1;
<!ENTITY % ruleTypes SYSTEM "ruleTypes.dtd" >
%ruleTypes;

<!ELEMENT sets (set+, setWFiles+)> 

<!ELEMENT set (signType*, 
               addSign*, removeSign*,
               addAltSign*, removeAltSign*)>
<!ATTLIST set
     name   (justBrlWords
            |%startingPW;
            |%midEndPW;
            |%interiorPW;
            |%endPW;
            |%specialWordSet;
            |%singleChars;) #REQUIRED>

<!ELEMENT setWFiles (fileName+, signType*, 
                     addSign*, removeSign*,
                     addAltSign*, removeAltSign*)>
<!ATTLIST setWFiles
     name   (%words;) #REQUIRED>


<!ELEMENT fileName (#PCDATA)>
<!ELEMENT signType EMPTY>

<!ATTLIST signType
   typeId (%signtype;) #REQUIRED >
<!ELEMENT addSign (#PCDATA)>
<!ELEMENT removeSign (#PCDATA)>
<!ELEMENT addAltSign (#PCDATA)>
<!ELEMENT removeAltSign (#PCDATA)>

Sets File for Standard Translation

Typically in a braille translation system all of the replacements are in a single list with each replacement somehow flagged to identify the contexts where that replacement may be used. While this strategy may save a bit of computer memory, it isn't needed for that purpose on modern computers. Here we see that many of the same replacements are included in the two example replacement sets, one which includes all replacements that can be used at the start of an ordinary word and one which includes all replacements that can be used in the interior of an ordinary word.

 <set name="defaultStartingPW">
  <signType typeId="largesign"/>
  <signType typeId="initialLetterContraction"/>
  <signType typeId="oneSyllableShortform"/>
  <signType typeId="shortform"/>
  <signType typeId="anywherePartWord"/>
  <signType typeId="beginningPartWord"/>
  <signType typeId="letter"/>
  <signType typeId="accentedLetter"/>
  <addSign>-</addSign>
 </set>

 <set name="defaultInteriorPW">
  <ignType typeId="largesign"/>
  <signType typeId="initialLetterContraction"/>
  <signType typeId="finalLetterContraction"/>
  <signType typeId="oneSyllableShortform"/>
  <signType typeId="shortform"/>
  <signType typeId="anywherePartWord"/>
  <signType typeId="midPartWord"/>
  <signType typeId="midEndPartWord"/>
  <signType typeId="letter"/>
  <signType typeId="accentedLetter"/>
  <addSign>-</addSign>
 </set>
As an example of a specialized set, compare the following set of replacements which can be used at the start of proper names with the corresponding set for ordinary words.
 <set name="namesStartingPW">
  <signType typeId="largesign"/>
  <signType typeId="initialLetterContraction"/>
  <signType typeId="anywherePartWord"/>
  <signType typeId="beginningPartWord"/>
  <signType typeId="letter"/>
  <signType typeId="accentedLetter"/>
 </set>

II.6 Rules File

The rules.xml file is perhaps the most significant feature of the new XML format. A translation rule is an association between a set of replacement rules and a translation algorithm. The rules file provides the user with a mechanism for associating appropriate sets of of replacement rules with any of the translation rules or translation algorithms that are implemented in the target braille translation application.

In typical documents, the majority of words can be correctly translated to contracted braille by using the standard translation algorithm or rule which is the one that translates a word from left to right by continually replacing the longest possible print sequence with its locally eligible braille replacement. This "longest eligible" algorithm is used in EBAE for translating ordinary words, proper names, and the component parts of compound words albeit with different sets of replacements for proper names and the non-leading parts of compound words than for ordinary words and the leading parts of compound words. We saw earlier how the sets.xml file supports the creation of various sets of replacements for use with this standard rule.

However, most documents contain a few special types of words such as letter words, homonyms, and hesitations that cannot be translated correctly by the standard translation algorithm even with a special translation table; correct translations of these words require special translation algorithms as well as special translation tables. It isn't feasible to specify actual algorithms in an XML input file but if an application does implement certain special algorithms, it is certainly feasible to specify the replacements those algorithms should employ and the situations under which they should be used.

Rules file DTD for use with XTrans

The XTrans translator implements six different braille translation algorithms. These are supported by the following DTD which specifies the information required by each of these algorithms.

<!ENTITY % shared SYSTEM "shared.dtd" >
%shared;
<!ENTITY % ruleTypes SYSTEM "ruleTypes.dtd" >
%ruleTypes;

<!ELEMENT rules (longestEligible+, specialWords*, byCharacters+, 
                 bySyllables*, hesitate*, homographs*)> 

<!ELEMENT longestEligible (wholeWord, startPW, midPW, endPW)>
<!ATTLIST longestEligible
     name (%longestEligibleRuleNames;) #REQUIRED >

<!ELEMENT specialWords (wordSet)> 
<!ATTLIST specialWords
     name (%specialWordsRuleNames;) #REQUIRED >

<!ELEMENT byCharacters (charSet+)> 
<!ATTLIST byCharacters
     name (%byCharactersRuleNames;) #REQUIRED >

<!ELEMENT bySyllables (startPW, midendPW)> 
<!ATTLIST bySyllables
     name (%bySyllablesRuleNames;) #REQUIRED >

<!ELEMENT hesitate (startPW, startPWnoLow, midPW, midPWnoLow, endPW)> 
<!ATTLIST hesitate
     name  (%hesitateRuleNames;) #REQUIRED
     identified  CDATA #IMPLIED>

<!ELEMENT homographs (wholeWord)> 
<!ATTLIST homographs
     name  (%homographsRuleNames;) #REQUIRED>

<!ELEMENT wholeWord EMPTY>
 <!ATTLIST wholeWord set (%words;) #REQUIRED >   
<!ELEMENT startPW EMPTY>
 <!ATTLIST startPW set (%startingPW;) #REQUIRED >
<!ELEMENT midPW EMPTY>
 <!ATTLIST midPW set (%interiorPW;) #REQUIRED >
<!ELEMENT endPW EMPTY>
 <!ATTLIST endPW set (%endPW;) #REQUIRED >
<!ELEMENT wordSet EMPTY>
 <!ATTLIST wordSet set (%specialWordSet;) #REQUIRED > 
<!ELEMENT charSet EMPTY>
 <!ATTLIST charSet set (%singleChars;) #REQUIRED > 
<!ELEMENT midendPW EMPTY>
 <!ATTLIST midendPW set (%midEndPW;) #REQUIRED > 
<!ELEMENT startPWnoLow EMPTY >
 <!ATTLIST startPWnoLow set (%startingPW;) #REQUIRED > 
<!ELEMENT midPWnoLow (#PCDATA)>
 <!ATTLIST midPWnoLow set (%interiorPW;) #REQUIRED >

Sample rules used with XTran

Here are three examples of rules that can be used with XTrans. The first two examples use the same standard translation algorithm but with different sets of replacement rules. The last example uses a specialized translation algorithm with the required sets of specialized replacement rules.

 <longestEligible name="default">
  <wholeWord set="defaultWords"/>
  <startPW set="defaultStartingPW"/>
  <midPW set="defaultInteriorPW"/>
  <endPW set="defaultEndingPW"/>
 </longestEligible>

 <longestEligible name="properNames">
  <wholeWord set="defaultNames"/>
  <startPW set="namesStartingPW"/>
  <midPW set="defaultInteriorPW"/>
  <endPW set="defaultEndingPW"/>
 </longestEligible>

 <bySyllables name="syllabified">
  <startPW set="syllabifiedPWStart"/>
  <midendPW set="syllabifiedPWMidEnd"/>
 </bySyllables>
(The reason that both the default and properNames rules can use some of the same replacement sets is that XTrans doesn't allow shortforms to be used as replacements other than at the start of a word. It handles these cases as individual exceptions via one of the exceptions files.)

II.7 Semantic Tags File

Finally we come to how BrailleSpec interfaces with print document markup such as ZedAI. ZedAI provides for a lot of markup intended to support braille production. But, of course, this markup isn't useful unless the targetted braille production applicaton knows what to do when it encounters the markup.

A BrailleSpec file uses a very simple mechanism for communicating how a BrailleSpec-enabled translation system should translate marked-up words. It simply matches the text used as markup, specified here as the text contents of an element named userTag, with one of the named, user-specified, translation rules defined in the just-described rules file.


<!ENTITY % shared SYSTEM "shared.dtd" >
%shared;

<!ELEMENT semanticTags (semanticTag+)>
<!ELEMENT semanticTag (userTag, rule)>
<!ELEMENT userTag (#PCDATA)>
<!ELEMENT rule EMPTY>
<!ATTLIST rule 
          name  (%longestEligibleRuleNames;
                |%byCharactersRuleNames;
                |%bySyllablesRuleNames;
                |%hesitateRuleNames;
                |%homographsRuleNames;
                ) #REQUIRED >

Other Files

The modular structure of the BrailleSpec format makes it easy to add additional information to support the features of a particular braille translator. For example, the BrailleSpec fileset used with the latest version of XTrans includes the specification for the BANA Computer Braille Code which is a simple auxiliary braille code often used in conjunction with EBAE.

As another possibility, it would be straightforward to specify a file containing the information describing a system of graded or beginner braille such as US Patterns or UK Learner Braille.

Summary

Braille systems are unavoidably complex because their goal is to convey print documents as meaningfully and unamibigously as possible within the constraints imposed by use of a limited character set and by the terseness necessary to efficient tactile reading. In fact, braille systems are becoming more complex as print documents become more complex.

The ZedAI specification for print source documents is intended to support braille production via adequate markup. This article describes one way to design the braille translation function of a braille production application so as to take advantage of ZedAI markup to produce accurate braille translations.


First posted September 2, 2010. Contact info at dotlessbraille dot org
Updated version posted September 23, 2010