Braille Representation of Print Characters

The Problem of the Braille Representation of Print Characters

Executive Summary
Which characters should a braille system represent?
The character problem in BANA's current braille codes
How the modern print world handles the character problem?
How do sighted persons recognize characters?
How should braille systems ensure that braille readers have correct character information?
Conclusion

Executive Summary. The Braille Authority of North America (BANA) has recently published a three-part article that points out that braille readers need correct knowledge of print characters in order to fully participate in the modern print world. Braille users can't even read or write the word BrailleNote using BANA's standard literary braille code (English Braille American Edition or EBAE) because EBAE assigns another meaning to what needs to be read as an embedded capital letter N. Braille users can't read or write the joke spelling Micro$oft or the name Ke$ha because EBAE assigns another meaning to what would otherwise be read as an embedded dollar sign. Even more significantly, there are numerous commonly-encountered print characters such as the plus sign that EBAE cannot represent at all.

While it turns out that the embedded capital N problem is easily solved by eliminating a single EBAE rule that fortuitously doesn't impact any other rules, this is not true of the embedded dollar sign problem nor of the problem of representing a greater number of print characters. BANA is thus faced with the need to revamp its braille codes to address the character problem (as well as other deficiencies) while at the same time having an obligation to minimize change so as to minimize the impact on braille readers, transcribers, teachers and other persons familiar with its current codes.

BANA is currently considering various solutions including adopting Unified English Braille (UEB). However, if BANA wants to ensure the best possible solution to its braille problems, it needs to clarify its goals and to better leverage the print world's solutions for closely-related problems. A good starting place for understanding the particular problem of print characters is the Unicode Consortium which has already identified more than 100,000 print characters. More importantly, it has defined an unambiguous method for representing each identified character and for representing any new characters identified in the future.

Given the almost universal acceptance of Unicode, the only reasonable approach that can provide absolute certainty as to a print character's identity is for that character either to be encoded in a standard Unicode-based electronic format or to be convertible to such a format. This means that any braille system with the goal of providing such certainty for every Unicode-identified character a braille reader might possibly encounter must make it possible for the reader to extract the Unicode identification from the braille representation.

It is not clear whether or not UEB meets the requirement for consistency with Unicode in any sort of adequate or useful way. UEB does provide a mechanism whereby a braille transcription can make use of arbitrary placeholder symbols for print characters for which it has no explicit braille representation and can include separate documentation stating which print characters the placeholders are intended to represent. (Note that UEB's use of "upper numbers", e.g. the same characters for the decimal digits and the letters a-j, complicates its ability to represent Unicode's hexadecimal numerical codes.)

On the other hand, it may be that BANA will end up choosing a braille system that can't represent every known and possible future Unicode character but simply one that can be counted on not to mislead the braille reader. This would be the case for any braille system that documents which print characters it can represent and what the braille reader should expect of a braille transcription of a print source that includes "missing" characters, i.e. characters that cannot be translated by that braille system. However, given that the liblouis braille translator already supplies Unicode character codes for missing characters, that approach is certainly a feasible solution.

Which characters should a braille system represent?

Since braille documents need to avoid verbosity in order to provide an efficient reading experience for braille readers, it is not possible for a braille system to provide easily-read braille equivalents for the more than 100,000 Unicode characters. This means that any general braille system has to make a selection of which print characters to represent in its most easily-read manner. (This is, of course, why there are different braille systems for different languages.)

One way to select the characters to be represented in a general braille system is to consider how electronic file formats represent characters. This consideration underlines the importance of the 94 ASCII characters and of familiarizing braille readers with Unicode and Unicode protocols. Unicode identifies all characters with unique hexadecimal character codes represented using ASCII characters. Hexadecimal numbers can of course be represented directly in standard braille codes including UEB. However the UEB use of upper numbers means that the UEB representation of a hexadecimal number as standardly represented using the digits 0-9 and letters a-f would have to be backtranslated prior to being entered into a Unicode search engine. The need for backtranslation could make it more difficult for braille readers to determine the identity of unfamiliar characters.

Braille codes intended for general readers need to not only represent the ASCII characters but to provide some convenient way of representing other characters that the targetted reader is most likely to encounter. The contractions in contracted English braille are effective because they were chosen based on word frequencies in an appropriate variety of English texts. A similar approach could provide a basis for the choice of print characters to be represented explicitly in a braille code.

It's an open question as to how many of Unicode's 100,000 characters that a braille system might be able to represent with convenient braille equivalents. I've read that in order to be considered literate in Chinese one must learn about 3000 Chinese characters. I would be very surprised if there are very many sighted non-Chinese persons who can recognize any 3000 characters so this would seem to be an upper limit.

Braille systems such as UEB or NUBS create braille equivalents for print characters by using a multi-cell prefix followed by a single-cell root. UEB permits eight of the 63 six-dot braille cells to be used in a prefix and the remaining 55 to function either as a stand-alone equivalents or as a root. [Braille shortform contractions constructed as a sequence of roots are an exception to the single-cell root specification.] Thus the mathematical upper limit to the number of braille equivalents in UEB is 55 one-cell symbols, 440 (eight times 55) two-cell symbols, and 3,520 ( eight times eight times 55) three-cell symbols, etc. However, there are additional practical limits. Contracted braille systems already use some of the possible symbols for braille contractions and all braille systems have a need to use braille symbols for braille-specific purposes such as markup. Also symbols chosen as braille equivalents for print characters should be easy to remember and not introduce tactile ambiguities.

The character problem in BANA's current braille codes

Braille codes need to be changed if the character identification problem is to be addressed. BANA's three-part article gives the example of the singer Ke$ha who spells her name with a dollar sign in order to illustrate a limitation of EBAE rules which assume that a dollar sign cannot be embedded in the middle of a word. It could be embarassing for a braille reader not to know this spelling. (The article didn't mention not getting the joke or misunderstanding when Microsoft is spelled Micro$oft but this is also a relevant example.)

The BANA article goes on to give a more serious example illustrating the importance of correct character identification.

If a company uses nonstandard symbols in its name and a blind person misspells the company name on a cover letter for a job application because she did not get accurate information from the braille, what are the chances that person will get the job? Should she have to check the spelling using audio or relying on a sighted person to tell her how it is spelled or should braille, the primary literacy tool for people who are blind, be capable of giving the most accurate information?

I think that most of us would agree that the braille reader should at the very least know when the braille is not providing accurate information and that preferably braille should be accurate. However, how a braille could best accomplish this needs careful consideration.

How the modern print world handles the character problem?

Before considering how a braille system might use the limited number of braille cells to represent a large number of different print characters, let's review how it is done in an XML-based electronic file format.

We can divide the Unicode characters into two groups. There are the characters that appear on a computer keyboard and those that don't. Actually that last statement is misleading as there are dozens of different keyboard layouts. However, if we restrict ourselves to the characters that are actually printed on the keys of a standard U.S. computer keyboard there are 94 characters typically referred to as ASCII characters. These are the 26 small letters of the lowercase Latin alphabet, the 26 capital letters of the uppercase Latin alphabet, the 10 ASCII (decimal) digits, and 32 ASCII punctuation and symbols.

Persons familiar with the ASCII characters are fortunate in that these characters were the original electronic characters and their use still dominates electronic communication. This situation gives an advantage to people who read English or other languages that utilize the Latin alphabet or related alphabets.

The only characters permitted for direct use in the majority of common electronic file formats are the ASCII characters. When there is a need to reference other characters they must be referenced indirectly. The most widely accepted modern way to reference non-ASCII characters is by their Unicode numbers. There are also various conventions that use shortened versions of the official Unicode character names but these conventions can lead to difficulties since they aren't standard.

How do sighted persons recognize characters?

When electronic files are rendered for sighted viewing by an application such as a browser or wordprocessor, the rendering requires the use of one or more display fonts. Fonts are associated with glyph tables that match character numbers in the electronic file to visual representations known as glyphs. Most sighted persons have had the experience of attempting to display an electronic file and seeing that some characters are replaced with little squares. These squares mean that the selected font didn't have a match for the character numbers for those characters. However, if all goes well and there is a glyph and the sighted person recognizes it, then the sighted person likely has accurate information as to the character. If there are just little squares or the sighted person isn't confident, their only option may be to examine the electronic file directly and attempt to determine the character's Unicode identifier.

Of course, sighted people don't always recognize a properly displayed character. As one example, non-mathematicians often have to ask someone the name of a less-common mathematical symbol when they are editing or processing technical material. In fact, I'd be very surprised if even those of us with a background in mathematics could recognize all of the Unicode mathematical symbols. The Stix Fonts for math, science, and engineering have approximately 8500 glyphs.

How should braille systems ensure that braille readers have correct character information?

To go back to Ke$ha, the BANA article continued as follows.

For clarity, should the name Ke$ha simply be brailled with an s instead of a dollar sign? That solution might work as far as 'readability,' but it does not provide the braille reader the same information that the print reader has. A transcriber encountering this name may spell it Kesha, but include a transcriber's note indicating that the s is shown as a dollar sign in print. Of course, this solution is clear, but it requires the involvement of a transcriber rather than the name automatically and correctly displaying on a braille device.

This is a tall order. It seems unreasonable to expect a single braille system to have a simple way to represent more than 100,000 different characters. No braille systems I'm aware of attempt this. If a braille reader needs to be absolutely sure what print character is used their only current option may be to examine the electronic source file containing that character and hope that it the file is correctly marked up. If the character in question is an ASCII character, then they can read it directly as computer braille on an eight-dot braille display or as translated by the six-dot Computer Braille Code (CBC). If it is not an ASCII character, they can find out the name of the character by entering the character's Unicode number into a search engine such as the one on the Unicode charts page.

If a braille reader doesn't have an electronic source file but only has is a braille transcription of a document, there is always a chance that there is an error. However if the braille transcription was generated automatically from a properly marked-up electronic file, one can envision a a braille system that either has some means of representing the correct characters or, at the very least, a braille equivalent of the little squares that warn sighted persons of a missing character. It is certainly wrong for a braille transcription to mislead the braille reader into thinking they have correct knowledge when they don't.

Conclusion

If BANA really wants to come up with a general braille code that solves the print character problem they need to go back to the drawing board and clearly define the goal in a manner consistent with the latest technology and with the character frequencies in the types of print documents the targetted braille reader is expected to read. Only then can they design a braille system that achieves their goal.

Also, given that one of BANA's stated goals is for "the braille reader to have the same information that the print reader has," BANA needs to reconsider whether any braille system that uses the same braille cells for the decimal digits as for certain letters of the alphabet is consistent with that goal. After all, letters and digits are not only distinct ASCII characters but, as Stephen Wolfram's article on the History of mathematical notation points out, the print notation for numbers as distinct from letters probably arose around 1000 years ago.

This article first posted February 3, 2012.
A slighted revised version was posted February 10, 2012.
Version with revised Executive Summary posted February 14, 2012.
Contact author: info at dotlessbraille dot org