Proposed Extensions to the Portable Embosser Format

Researchers at the Swedish Library of Talking Books and Braille (TPB) have proposed a Portable Embosser Format (PEF) which avoids many of the deficiencies of ASCII Braille .brf files. However, the PEF specification notes several limitations in scope including a lack of support for:

The present article starts with a brief overview of PEF and then proposes several possible extensions to PEF which would avoid certain limitations and provide a technical basis for one or more of the following new features:

  1. braille contents can be represented by an arbitrary encoding scheme which is documented with respect to Unicode Braille Patterns within the extended PEF file
  2. braille contents can be read directly by sighted persons with no knowledge of braille
  3. braille contents can be automatically reverse translated or backtranslated to the equivalent print for interlining or proofing
  4. a PEF file can be reformatted to support a different braille page size
  5. contracted braille contents can be repurposed by grade-relaxing to intermediate or learner braille per any desired specification
Note that achieving the second and third items only requires straightforward changes to the braille encoding while acheiving the last items may be more difficult.

I. PEF 1.0

This is a copy of a sample Portable Embosser Format (PEF) electronic braille file. Note that it includes useful (meta) information in addition to the braille contents. The electronic braille, which is encoded using Unicode Braille Patterns, is formatted row-by-row, i.e. line-by-line, with the insertion of the empty Braille Pattern, Unicode character 2800, to represent leading and embedded spaces.

Note that the Braille Pattern characters are displayed here as six-dot braille glyphs with the use of a Duxbury simulated braille font which associates the six-dot glyphs with the Braille Pattern code points.

<?xml version="1.0" encoding="UTF-8"?>
<pef version="2008-1" xmlns="http://www.daisy.org/ns/2008/pef">
	<head>
		<meta xmlns:dc="http://purl.org/dc/elements/1.1/">
			<dc:title>Om våren</dc:title>
			<dc:creator>Nils Ferlin</dc:creator>
			<dc:date>2008-09-26</dc:date>
			<dc:format>application/x-pef+xml</dc:format>
			<dc:identifier>org.pef.00002<dc:identifier>
			<dc:description>
				A PEF 1.0 example. 
				The transcribed text is a poem by the Swedish poet Nils Ferlin called "Om våren".
			</dc:description>
			<dc:language>sv</dc:language>
		</meta>
	</head>
	<body>
		<volume cols="32" rows="29" rowgap="0" duplex="true">
			<section>
				<page>
					<row>⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠼⠁</row>
				        <row>⠀⠠⠙⠑⠞⠀⠎⠊⠞⠞⠑⠗⠀⠑⠝⠀⠛⠥⠍⠍⠁⠀⠍⠑⠙</row>
					<row>⠀⠃⠇⠕⠍⠍⠕⠗⠀⠌⠀⠏⠡⠀⠓⠥⠅⠀⠊⠀⠑⠝⠀⠛⠁⠞⠥⠧⠗⠡</row>
					<row>⠀⠤⠤⠀⠌⠀⠕⠉⠓⠀⠛⠗⠕⠎⠎⠓⠁⠝⠙⠇⠁⠗⠀⠠⠃⠑⠗⠛⠤</row>
					<row>⠀⠍⠁⠝⠀⠠⠚⠒⠗⠀⠌⠀⠎⠞⠁⠝⠝⠁⠗⠀⠕⠉⠓⠀⠞⠜⠝⠅⠑⠗</row>
					<row>⠀⠎⠡⠒⠀⠌⠌⠀⠠⠙⠥⠀⠛⠁⠍⠇⠁⠀⠕⠉⠓⠀⠎⠞⠊⠇⠇⠤</row>
					<row>⠀⠎⠁⠍⠍⠁⠀⠛⠥⠍⠍⠁⠀⠌⠀⠍⠑⠙⠀⠪⠛⠕⠝⠀⠎⠅⠥⠍⠍⠁</row>
					<row>⠀⠁⠧⠀⠎⠕⠗⠛⠂⠀⠌⠀⠙⠥⠀⠎⠜⠇⠚⠑⠗⠀⠚⠥⠀⠃⠇⠡⠤</row>
					<row>⠀⠎⠊⠏⠏⠕⠗⠂⠀⠎⠑⠗⠀⠚⠁⠛⠀⠌⠀⠤⠤⠀⠠⠝⠥⠀⠅⠪⠤</row>
					<row>⠀⠏⠑⠗⠀⠚⠁⠛⠀⠓⠑⠇⠁⠀⠙⠊⠝⠀⠅⠕⠗⠛⠖</row>
				</page>
			</section>
		</volume>
	</body>
</pef>

I.1 Converting a PEF file to an embosser-ready file

The <body> section of PEF 1.0 file is designed so that it can be converted to an embosser-ready file targetted for a particular braille embosser or braille display in a very simple fashion. The various tags, including <page> and <row>, must be replaced with the appropriate embosser-specific whitespace character(s) as necessary to support the page breaks and line breaks. Finally, if the embosser or display doesn't recognize the Unicode Braille Patterns, the text contents of each <row> element (including, of course, the whitespace represented by the empty Braille Patterns ) has to be transliterated to a supported encoding.

I.2 Proposed Change to Allow Reformatting

PEF files are generated from DTBook via a multi-step process with the braille formatting designed to be appropriate for a particular braille page size such as 29 lines per page and 32 cells per line as in the previous example.

There is not enough information in a PEF 1.0 file to reformat it for a different braille page size. Note, however, that if the tags used to generate the formatting and some indication of hyphenation were to be left in the PEF file, it should be possible to reformat individual PEF files (volumes) with an appropriate transform. [Hyphenation is only seldom used in American English braille documents since hyphenation can change the useage of contractions and thus negatively impact readability. However, if hyphenation is desired, it would only be necessary to retain an auxiliary list with the hyphenation pattern for each unique word in the given document for which hyphenation is an option.]

I am aware that providing for the reformatting a complex document such as a multi-volume textbook with a Table of Contents, an Index, and, perhaps complex planar layouts such as charts, may well be impractical. However, it seems as though providing for the reformatting of brailled leisure reading such as simple short articles and works of fiction should be possible and of benefit to braille readers.

II. Introduction to Extended PEF

TBP has made a persuasive case for the PEF format. However, I believe there is a strategy for enhancing the specification such that it retains all of the current advantages but also makes possible additional benefits.

II. 1 An Overly Simple Extended PEF Example

In order to better illustrate the proposed enchancement we begin with a short example based on English Braille American Edition (EBAE), which is highly contracted, rather than on Swedish braille, which is only slightly contracted. Here, although it may not be obvious, the braille cells are transliterated using North American ASCII Braille which is again displayed as six-dot braille glyphs because of the use of a Duxbury simulated braille font designed to associate the six-dot glyphs with the ASCII characters. (This undocumented use of ASCII Braille rather than Unicode Braille violates the intent of PEF to avoid ambiguity but see the next section for a solution.) )

<?xml version="1.0" encoding="UTF-8"?>
<pef version="????1" xmlns="http://www.daisy.org/ns/2008/pef">
	<head>
		<meta xmlns:dc="http://purl.org/dc/elements/1.1/">
			<dc:title>Mary's Braille Lamb</dc:title>
		</meta>

	</head>
	<body>
		<volume cols="32" rows="29" rowgap="0" duplex="true">
			<section>
				<page>
					<row>   ,M>Y'S ,BRL ,LAMB</row>
				</page>
			</section>
		</volume>
	</body>
</pef>

II.2. Explicitly Specifying the Transliteration or Encoding Scheme in Extended PEF Files

Here, as in the previous example, the braille cells are represented with the producer's choice of the North American ASCII Braille transliteration scheme instead of the Unicode Braille Patterns as in PEF 1.0. However, the transliteration has been clearly specified in terms of the Braille Patterns by using a construct similar to that of tei:character elements. There is thus sufficient information to easily convert the extended PEF file to a standard Unicode-based PEF file.

<?xml version="1.0" encoding="UTF-8"?>
<pef version="2010X" xmlns="http://www.daisy.org/ns/2008/pef">
	<head>
		<meta xmlns:dc="http://purl.org/dc/elements/1.1/">
			<dc:title>Mary's Lamb</dc:title>
		</meta>
		<meta xmlns:tei="http://www.tei-c.org/release/doc/tei-p4-doc/html/WD.html">
			
                          <tei:character id='0' class='lexical'>
                            <form string=' ' entityStd='#x2800'/>
                          </tei:character>
                          <tei:character id='3' class='lexical'>
                            <form string="'" entityStd='#x2804'/>
                          </tei:character>
                          <tei:character id='6' class='lexical'>
                            <form string=',' entityStd='#x2820'/>
                          </tei:character>
                          <tei:character id='345' class='lexical'>
                            <form string='>' entityStd='#x281C;'/>
                          </tei:character>
                          <tei:character id='1' class='lexical'>
                            <form string='A' entityStd='#x2801/>
                          </tei:character>
                          <tei:character id='12' class='lexical'>
                            <form string='B' entityStd='#x2803'/>
                          </tei:character></tei:char>
                          <character id='123' class='lexical'>
                            <form string='L' entityStd='#x2807'/>
                          </character>
                          <character id='134' class='lexical'>
                            <form string='M' entityStd='#x280D'/>
                          </character>
                          <character id='1235' class='lexical'>
                            <form string='R' entityStd='#x2817'/>
                          </character>
                          <character id='234' class='lexical'>
                            <form string='S' entityStd='#x280E'/>
                          </character>
                          <character id='13456' class='lexical'>
                            <form string='Y' entityStd='#x281E'/>
                          </character>

		</meta>
	</head>
	<body>
		<volume cols="32" rows="29" rowgap="0" duplex="true">
			<section>
				<page>
					<row>   ,M>Y'S ,BRL ,LAMB</row>
				</page>
			</section>
		</volume>
	</body>
</pef>

II.3 The Advantages of using Extended Braille in PEF files

The previous section shows how it is possible to use any desired scheme for representing the braille cells while maintaining the unambigous and archival nature of the PEF format simply by explicitly specifying the representation in terms of the Unicode Braille Patterns.

Nonetheless, there is at most only a slight advantage to using a one-for-one transliteration where the information content is the same as that of Unicode Braille Patterns. The problem is that in most braille codes the braille cells have context-sensitive meanings which can be difficult for persons unfamiliar with braille to interpret and difficult to automatically back-translate to print. In the example, the letters b and l are used to represent themselves and also as part of the shortform contraction for the word braille while the single-cell contraction for ar is arbitrarily represented as the ASCII greater-than character.

In the next example the braille cells are represented using a many-to-one representation called extended braille. Extended braille uses a unique character code for each semantically distinct use of a braille cell and can thus be automatically backtranslated to print.

The particular extended braille transliteration used in this example is appropriate for EBAE and is supported by the DotlessBraille font. Use of this font makes it possible for sighted persons with no knowledge of braille to read EBAE extended braille directly. Here the DotlessBraille glyph for the capitalization indicator is a custom symbol similar in appearance to the arrow on a Shift key; the glyph for the ar contraction resembles the two individual letters represented by the contraction squashed together; and the three glyphs for the brl shortform representing braille together resemble all seven letters squashed together.

Translating from print to extended braille is straightforward and no more difficult that translating from print to standard braille. All that is required is to employ print-to-braille translation tables which have been modified such that the braille cells are represented as extended braille rather than standard braille.

<?xml version="1.0" encoding="UTF-8"?>
<pef version="2008-1" xmlns="http://www.daisy.org/ns/2008/pef">
	<head>
		<meta xmlns:dc="http://purl.org/dc/elements/1.1/">
			<dc:title>Mary's Lamb</dc:title>
		</meta>
		<meta xmlns:tei="http://www.tei-c.org/release/doc/tei-p4-doc/html/WD.html">
			
                          <tei:character id='0' class='lexical'>
                            <form string=' ' entityStd=⠀' pr=' '/>
                          </tei:character>
                          <tei:character id='3' class='lexical'>
                            <form string="'" entityStd='#x2804' pr="'"/>
                          </tei:character>
                          <tei:character id='6' class='lexical'>
                            <form string='Ƞ' entityStd='#x2820' pr='IND' />
                          </tei:character>
                          <tei:character id='345' class='lexical'>
                            <form string='ar' entityStd='#x281C'/>
                          <tei:character id='12' class='lexical'>
                            <form string='Ʃ' entityStd='#x2803;' pr='br'/>
                          </tei:character>
                          <tei:character id='1235' class='lexical'>
                            <form string='ƪ' entityStd='#x2817;' pr='ai'/>
                          </tei:character>
                          <tei:character id='123' class='lexical'>
                            <form string='ƫ' entityStd='#x2807;' pr='lle'/>
                          </tei:character>
                          <tei:character id='1' class='lexical'>
                            <form string='a' entityStd='#x2801' pr='a'/>
                          </tei:character>
                          <tei:character id='12' class='lexical'>
                            <form string='b' entityStd='#x2803'pr='b'/>
                          </tei:character></tei:char>
                          <character id='123' class='lexical'>
                            <form string='l' entityStd='#x2807'pr='l'/>
                          </character>
                          <character id='134' class='lexical'>
                            <form string='m' entityStd='#x280D'pr='m'/>
                          </character>
                          <character id='234' class='lexical'>
                            <form string='s' entityStd='#x280E'pr='s'/>
                          </character>
                          <character id='13456' class='lexical'>
                            <form string='y' entityStd='#x281E'pr='y'/>
                          </character>

		</meta>
	</head>
	<body>
		<volume cols="32" rows="29" rowgap="0" duplex="true">
			<section>
				<page>
					<row>   Ƞm°y's ȠƩƪƫ Ƞlamb</row>
				</page>
			</section>
		</volume>
	</body>
</pef>

II.4 Backtranslation

Backtranslation of braille to print is useful for interlining, for synchronizing generated braille with the original print source, and for error checking. However, since the context-sensitive grammars of many braille codes make backtranslation difficult, standard braille software is forced to either use some sort of cumbersome scheme to align the original print source with the generated braille translation or to maintain hidden information to facilitate backtranslation of standard braille to print.

Backtranslation of extended braille is, by contrast to that of standard braille, straightforward and avoids any need to maintain additional information. Since each extended braille character corresponds to exactly one print character or one sequence of print characters, backtranslation consists primarily of simple table lookup and can be applied successfully to an extended PEF file. (Note however that, if, as in the example of DotlessBraille, the extended braille characters representing letters and letter sequences are not case-sensitive, it is also necessary to keep track of the scope of the braille capitalization indicators in order to determine the correct case for backtranslated letters. Alternatively an extended braille representation could include separate lower case, title case, and upper case variants for each letter and letter sequence.)

II.5 Grade Relaxing And Other Customizations of Contracted Braille

Despite the official braille rules, there are many situations where customized braille is preferred. Customization can involve changes to formatting as well as to contracting. For example, persons reading braille on braille displays may want simpler formatting. Also, there are many situations where partially-contracted braille is more appropriate than official, fully-contracted braille.

II.5.1 Intermediate or Grade-relaxed Braille

In practice there is considerable need to produce intermediate or grade-relaxed contracted braille which only uses a subset of available braille contractions. For example, it is common when teaching contracted braille to beginning braille readers to teach only a few contractions at a time. Also some braille readers prefer partially-contracted braille while others may be unable to read fully-contracted braille.

Most braille software produces intermediate braille by maintaining a number of special translation tables corresponding to various official intermediate braille schemes, such as RNIB Fingerprints and TSBVI Clusters, which are intended for teaching. However, the use of extended braille would make possible a much simpler and more flexible protocol. First the print would be translated to extended braille according to the official rules of the contracted braille code. Then an appropriate table would be used to backtranslate any unwanted contractions to the corresponding letter sequences. Since grade-relaxing changes the number of braille cells, grade-relaxing does, of course, have to be done prior to formatting unless there is some method of reformatting.

II.5.2 Producing Customized Braille from Extended PEF Files

If PEF files were to incorporate the information necessary for reformatting and also to represent braille as extended braille, then a PEF file based on an official contracted braille code could be customized for an individual user by spelling out the user's selection of contractions and reformatted as necessary.

Summary

Extended PEF files where the only changes from the current standard for PEF are (1) the braille cells are encoded as extended braille and (2) the mapping from the extended braille representation to Unicode Braille Patterns is included as part of the extended PEF file have the following advantages:

However, if the PEF specfication could be modified such that the braille contents were reformattable, then PEF files could be repurposed both with respect to formatting and the use of braille contractions.


First posted July 5, 2010.