A Proposal for a Standard Specification of Braille Translation Systems for Input to Braille Production Software

Executive Summary

There is not any standard electronic format for specifying the translation component of a braille system for use by braille transcribing software or related purposes. This article details BrailleSpec, a proposed electronic format, that has been tested for specifying print-to-braille translation according to English Braille American Edition (EBAE). This format has a number of advantages:

Note that the present version of BrailleSpec does not address the difficult issue of specifying how braille indicators are used. Achieving this would require designing some sort of language for representing indicator use.

Contents

Overview

The version of the BrailleSpec format as described here accommodates the translation rules of English Braille American Edition (EBAE) but will likely need extensions for other braille systems. I welcome feedback as to needed improvements.

This new format is not simply an XML tagged version of a standard translation table. One significance of the new format is that it is more complete: it supports improved translation accuracy and new features by providing considerably more information than is present in current translation tables. As an example of support for improved accuracy, this new format only requires the user to add a few lines in order for a translation application to have the information needed to automatically produce specialized translation tables, such as those used by some braille systems to translate proper names. Examples of new features easily supported by the BrailleSpec format are user-specified systems of graded braille and production of summary reports on contraction useage in translated documents.

Since this proposed XML format is quite simple in comparison with ZedAI, I've chosen to use DTDs rather than Schemas to define it. These DTDs are detailed below.

I. Master File

The current braille specification format uses a single simple master file which references a number of supporting files via XML include statements. Here is its DTD:

<!ELEMENT brailleSystem (identifier, xi:include+)> 
<!ATTLIST brailleSystem xmlns:xi CDATA #IMPLIED>
<!ELEMENT identifier (#PCDATA)> 
<!ELEMENT xi:include EMPTY>
<!ATTLIST xi:include href CDATA #REQUIRED>         
And here are the first few lines of the master file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE brailleSystem SYSTEM "brailleSystem.dtd" [	
]>
<brailleSystem xmlns:xi="http://www.w3.org/2001/XInclude">
 <identifier>English Braille American Edition, 2007 Update</identifier>
 <xi:include href="cells.xml"/>
 <xi:include href="signs.xml"/>
 ...
<brailleSystem>

II. Supporting Files

The individual supporting files are modular with each specifying one particular type of data.

Typical translation tables combine the data represented here in the two separate Signs and Restrictions Files. However,these tables do not typically include the additional data represented in the numerous other Files.

II.1 Cells File

The first supporting file specifies the representation of the braille cells used in the other supporting files by reference to the corresponding Unicode Braille Patterns. Use of this approach makes it possible for the user to employ any desired representation. Here I've chosen to use North American ASCII Braille as I find it the easiest to verify.

This is the complete DTD for the cells file:

<!ELEMENT cells (cell+)>
<!ELEMENT cell (ABrl, Unicode)>
<!ELEMENT ABrl (#PCDATA)>
<!ELEMENT Unicode (#PCDATA)>
<!ATTLIST Unicode dots CDATA #IMPLIED>
Here are the first few lines of the cells.xml file:
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE brailleSystem SYSTEM "brailleSystem.dtd" [	
]>

<cells>
 <cell><ABrl>A</ABrl><Unicode dots="1">&#x2801;<Unicode></cell>
 <cell>><ABrl>B</ABrl><Unicode dots="12">&#x2803;<Unicode></cell>
 <cell><ABrl>C</ABrl><Unicode dots="14">&#x2809;<Unicode></cell>
 <cell><ABrl>D</ABrl><Unicode dots="145">&#x2819;<Unicode></cell>    
...
Note that if a Unicode-compatible simulated braille font is installed, the simulated braille glyphs will be displayed when a cells.xml file is viewed in a browser.

II.2 Signs File

The second supporting file specifies the official print-to-braille replacements specified by the braille system. This is in contrast to a typical undifferentiated translation table which contains both official and ad hoc replacements. This file should only be modified to fix unintended errors or to reflect changes in the official rules of the targetted braille system.

Signs File DTD

This is the DTD for the signs file with the exception of the full specification optional choiceSign section which is used to simplify the application to American English braille. (The type attribute of a sign element plays a role similar to what is called an opcode in liblouis and similar translation software.)

<!ENTITY % shared1 SYSTEM "setSigns.dtd" >
%shared1;
<!ELEMENT signs (sign+, altsign*, indicatorSign+, choicesign*)> 

<!ELEMENT sign (print, braille, DotlessBraille* )> 
<!ATTLIST sign
     type   (%signtype;)
            #REQUIRED
     uname CDATA   #IMPLIED
     unique (no)   #IMPLIED> 
   
<!ELEMENT altsign (print, braille, DotlessBraille*)> 
<!ATTLIST altsign
     type  (numeric|smartApos|lower|altPrimes) #REQUIRED>

<!ELEMENT print (#PCDATA)>
<!ELEMENT braille (#PCDATA)>
<!ELEMENT DotlessBraille (#PCDATA)>

<!ELEMENT indicatorSign ( braille, DotlessBraille* )>
<!ATTLIST indicatorSign
          name   (%indicatorNames;) #REQUIRED
          unique (no)   #IMPLIED> 
Signs File Example

Since the signs.xml file is a significant part of the specification, it is worth examining an example in detail. Note that there can be up to four different kinds of top-level elements.

First, since it is desirable from the standpoint of implementation to be able to use the contents of the print elements as keys to the corresponding replacements, it turns out that (at least for English braille), one actually needs two separate types of sign elements, sign and altsign, since there are a few cases, including the print period, where English braille uses a different replacement depending on the semantics of the character. Here we show a a fragment of the file with examples of both types of elements.

...
 <sign type="accentedLetter" uname="cap A with grave">
  <print>À</print>
  <braille>A</braille>
  <DotlessBraille>À</DotlessBraille>
 </sign>
...
 <sign type="initialLetterContraction">
  <print>day</print>
  <braille>"D</braille>
  <DotlessBraille>&#x00E3;&#x00E4;</DotlessBraille>
 </sign>
 <sign type="initialLetterContraction">
  <print>there</print>
  <braille>"!</braille>
  <DotlessBraille>čĎ</DotlessBraille>
 </sign>
...
 <sign type="postPunc">
  <print>.</print>
  <braille>4</braille>
  <DotlessBraille>.</DotlessBraille>
 </sign>
...
<!-- Decimal Point, not period -->
 <altsign type="numeric">
  <print>.</print>
  <braille>.</braille>
  <DotlessBraille>&#x0223;</DotlessBraille>
 </altsign> 

Now let's examine the elements in more detail. Each sign element must have a type attribute with its value chosen from the specified list in order that groups of replacement elements with the same value for their type attribute can be referenced elsewhere. The actual attribute values could be arbitrary text but, as these examples illustrate, it can be quite useful to choose text with mnemonic significance to persons familiar with the braille system being specified. The user can employ the optional uname attribute to better identify an unfamiliar replacement. (The unique attribute is used to help address some inconsistencies between different specifications for American English braille.)

The print and braille elements are, of course, the actual replacement rule with the braille cell or cells represented according to the specification in the cells.xml file. The optional DotlessBraille element specifies the corresponding glyph code(s) in the DotlessBraille font. This element can also be used to provide unique character codes for each distinct use of the same braille cell.

The third type of signs element is the indicatorSign element. Although braille indicators are not actually replacements but, rather, markup unique to braille, it is again desirable from the standpoint of implementation to include their representations here. (One important feature currently missing from this braille specification is a generic method for encoding the rules for the use of braille indicators.)

II.3 Restrictions File

The next supporting file is the optional restrictions file. With contracted braille it can be necessary to restrict the use of certain contractions in certain words in order to enhance readability. A still common way of implementing these restrictions, which was originally proposed in 1970 by Dr. Jonathan Millen of Mitre Corporation, is to add additional ad hoc replacement rules similar to the official replacement rules. (This and an alternative approach are described in more detail in a separate article.)

It is typical to include these ad hoc replacement rules together with the official rules in a single translation table. Separating them makes it easier for the user to identify which rules can be changed as necessary to improve translation accuracy. It also makes it easier for the the ad hoc rules to be represented in terms of the official rules.

Having both official and ad hoc rules in a single undifferentiated table has other disadvantages in addition to inconveniencing the user. It has, for example, led to confusion for persons otherwise unfamiliar with the braille system in that they may incorrectly believe the ad hoc rules to be part of the official system. Also, having all of the rules in a single table (or file) makes it difficult to utilize translation algorithms that don't employ the ad hoc rules. (Note that my experience is that a person who is familiar with a braille system can in an hour or so of concentrated effort edit an undifferentiated translation table containing both types of rules so as to separate the offical rules from the ad hoc ones. Of course, once this has been done, one can easily develop a simple application to reconstruct the original table as necessary.)

Here is the DTD:

<!ENTITY % shared1 SYSTEM "setssigns.dtd" >
%shared1;
<!ENTITY % signEl "sign|altsign">

<!ELEMENT restrictions (restriction+)> 

<!ELEMENT restriction (input, use)> 
<!ATTLIST restriction
     type  (%signtype;) #REQUIRED 
     example CDATA #IMPLIED> 

<!ELEMENT input (#PCDATA)>
<!ELEMENT use (print+)>
<!ELEMENT print (#PCDATA)>
<!ATTLIST print type (%signEl;) "sign" >
and here are two examples of actual restriction elements used in American English braille:
<restriction type="beginningPartWord" example="dispirit">
 <input>dispirit</input>
 <use>
  <print type="sign">d</print>
  <print>i</print>
  <print>spirit</print>
 </use>
</restriction>
<restriction type="midEndPartWord" example="dunghill">
 <input>ghill</input>
 <use>
  <print>g</print>
  <print>h</print>
  <print>i</print>
  <print>l</print>
  <print>l</print>
 </use>
</restriction>
Note that the intent of the DTD for restriction elements is that each of the individual replacements used for the restrictions be identical to an official replacement identified by a print element of a sign or altsign element specified in the signs.xml file. (Ensuring that this is the case has to be done by the implementing software.) The approach of referencing the official replacements is necessary to keeping track of which contractions are used and how often they are used and also to supporting graded braille.

II.4 Exception Files File

An alternative or extension to the use of ad hoc replacements for handling exceptions to contraction rules is to use one or more dictionary files containing user-specified print-to-braille translations such as those that typically appear in an appendix of braille transcription manuals. The proposed XML format uses this simple DTD for the data file which specifies the names of the dictionary files:

<!ENTITY % dictionaryFormat "oldStyle" >
<!ELEMENT exceptions (file+)>
<!ELEMENT file (#PCDATA)>
<!ATTLIST file
     format (%dictionaryFormat;) #REQUIRED>
Files that use the "oldStyle" format specify the translations in terms of the official braille replacements in the signs file. As is the case for restrictions, the approach of requiring that the official replacements be referenced has the advantage of making it possible to keep track of contraction useage.


Next


First posted September 2, 2010. Contact info at dotlessbraille dot org
Updated version posted September 23, 2010.