Transcribing Print to Braille

Transcribing Print to Braille:
An Introduction

Introduction
Generic Problems

Introduction.

Transcribing is the process of converting a printed text to braille. Transcribing is sometimes called translating but this latter term may have the misleading connotation that braille is a different language rather than merely a different system. You can read more about the various Braille systems here.

If you are interested in becoming a certifed transcriber or braillist, you should read the information from the Library of Congress. The purpose of this page is to provide some background information on the range of problems that professional transcribers have to deal with.

The single most important issue that anyone interested in braille transcribing must appreciate is that braille transcriptions must be essentially error-free. The standards are much higher than for print. This level of accuracy is necessary because braille uses the same cells for different purposes in different contexts. As a consequence, even slight errors can cause extreme difficulties in interpretation.

In the past, Braille was produced by a coupled process of transcribing and embossing. A trained braillist—a braille transcriber—would look at the original print, mentally transcribe to braille, and then produce the braille using either a braille slate and stylus or a six-key braille keyboard to operate either a Perkins-style single-copy embosser or a more complex production-oriented machine. These methods of producing braille by hand, character-by-character, were very time-comsuing and easily prone to error. To produce a perfect braille page, a transciber might have to rewrite the entire page many times.

Most braille is now transcribed to an electronic file in which the braille cells are coded numerically, generally using Braille ASCII. Such transcription may be carried out either directly by a braillist or by means of a computer application such as Duxbury Braille Translator (DBT), MegaDots, or Braille 2000. (Note, there is a description on this site of how these programs work in terms of the method used in the open-source NFBTRANS application.)

Transcribing email or simple literary text that has been created in electronic form with braille transcription in mind is now fairly routine using a braille transcribing application. Some braille notetakers even have built-in software that provides realtime forward and backwards translation, generally using the standard contracted literary braille code. When a braille display is attached to a computer and interfaced with appropriate software, the braille reader may even have realtime transcription of accessible material on the Internet. WARNING! This does not mean that all ASCII text, in particular mathematics, will be automatically converted to meaningful braille. See explanation below.

Generic problems.

Despite the availability of computer applications for braille translation as described in the previous section, considerable human involvement is still required for most braille transcriptions. Much of the complexity is a result of the tremendous increase in the variety of print formats that have been made possible by computer typesetting.

Some of the resulting issues are:

Problems requiring real or artificial "intelligence"

Choosing an appropriate code
Judgement calls
Ambiguities

Legal requirements

Transcribing
Proofreading

Lack of suitable electronic sources

Consistent formatting
Inadequate indication of document structure
Inadequacy of syntactical sources like TeX

Full discussion of all the problems of braille transcription is beyond the scope of this site. However, we will approach this topic by giving examples that should at least clarify the nature of the various problems in case you are interested in working on them.

Problems requiring real or artificial "intelligence"

Choosing a code.

The first thing to appreciate about transcribing is that it is necessary for a human to pick a code that is appropriate to the material and in accord with the standards for such texts. It is obvious that music would require a different code than literary texts but many people are surprised to discover that a special braille code is required to transcribe even simple mathematics. In other words, there are some things that just cannot be expressed in a given code. However, there is a proposal for a Unified English Braille Code (UEBC) that would encompass the capabilities of the literary braille, Nemeth, and Computer Braille Codes as well as new signs.

Even if a particular braille code—such as the literary code—seemingly has braille cells that correspond to the needed print characters, this does not mean that anything that can be expressed with these print characters would be meaningful if automatically transcribed using these cells.

As an illustration of this last point, consider an example of spoken mathematics. The mathematical symbol for a function of one variable—when written as the letter eff followed by the letter ex enclosed in parentheses—is commonly spoken aloud as, "eff of ex". However, if this spoken form were to be transcribed by using the actual letters eff and ex, that is "f of x", the literary braille reader would interpret this correctly as "from of its". Using computer terminology, there needs to be an "escape" character in front of the letters used as mathematical symbols to indicate that they are "just letters" and not being used in their default sense as whole-word contractions. The correct escape symbol for this case is dots 5-6, called the letter sign in literary braille and English Letter Indicator in Nemeth.

Judgement calls. Judgement calls arise in situations, such as certain types of presentation, that have no braille analog. A transcriber is often required to use his or her judgement in these cases.

For example, how should colored type in print be indicated in braille? If the color is merely decorative and doesn't add information, it may be best to omit it as being merely distracting to the braille reader. On the other hand, if the transcriber feels that the color adds information, markup symbols can be used to indicate color just as they are used to indicate other forms of print emphasis. Another alternative is for the transcriber to add a transcriber's note explaining the situation so that the braille student would understand a reference to color coding in a textbook.

A similar problem is whether boldface needs to be distinguished from ordinary emphasis since standard literary braille only uses one type of emphasis composition markup, typically referred to as "italics". Of course, many, if not all, issues of this type should be ultimately resolvable by using modern methods to indicate document structure. The slow rate of progress in this area can be appreciated by reading a good background article on Standard Generalized Markup Language (SGML) that was written in 1993 by Joseph Sullivan, the President of Duxbury Systems, Inc. This article also provides an excellent introduction to the concept of structure markup and gives examples of braille-specific situations that are unlikely to be solved by SGML-like approaches.

Ambiguities.

There are numerous situations, like the following example, where it is difficult for a computer program to make a decision that is fairly obvious to a human. Volunteer opportunity. This is an example of a problem that might be soluble using fuzzy logic or artificial intelligence. Determining which braille problems would be amenable to such an approach would be an interesting research area. One of the stated purposes of the proposed Unified English Braille Code (UEBC) is to create a code with rules that reduce such ambiguities although the result has been a considerable increase in the number of cells required to transcribe a given source.

Transcribing the letter sign "x"

Dimensions. Substitute the word by (dots 3-5-6) for the print sign x or X that is printed between numbers to indicate dimensions, e. g., print "9 x 12 ft." should be transcribed as "#9 by #12 ft."
Degree of magnification. When the x sign is used to show degree of magnification, substitute the letter x preceded by the letter indicator and unspaced from the number, e. g., "a 10x lens" should be transcribed as "a #10|x lens" where the vertical bar has been used to represent the letter indicator, dots 5-6.

Multiplication cross. In general literature, the multiplication cross or times sign is transcribed by spelling out the word "times".
Multiplication cross. When using the Nemeth code, the multiplication cross is transcribed by the two-cell symbol, dot 4 followed by dots 1-6.

Legal requirements

Transcribing. The Americans with Disabilities Act (ADA) has mandated that publicly funded schools and universities make braille textbooks available. This has created a paradoxical shortage in braille materials. Many well-meaning organizations choose to require that textbooks be transcribed by Library of Congress (LOC) certified transcribers. While this would seem the right approach to ensure the necessary quality, there is a great shortage of transcribers. The current certification process is long, doesn't necessarily lead to a well-paying career and is somewhat anachronistic.

There is an article on the LOC transcribing service by Mary Lou Stark in the recent book, Braille into the Next Millennium, published, in the year 2000, by the Library of Congress' National Library Service (NLS) for the Blind and Physically Handicapped. In the article, titled "Braille Transcribing in the United States: Past, Present, and Future", Ms. Stark—who is Head of the Braille Development Section of the NLS—writes,

Discussion of a course leading to certification of persons using braille translation software began in 1996. Using this type of software, information is entered into a word processing program in the usual manner and then translated into braille. Questions raised included how thoroughly a person using translation software should know the actual dot assignments for the various braille symbols. Also, should the certificate read the same for a person using translation software as that received by a person using six-key input on a computer or Perkins brailler, or using a slate and stylus. Two advisory committee meetings were held, one in 1996 and one in 1999. Upon resolution of certification criteria it is anticipated that the first persons to take the course will begin during late 2000. (p. 269) [Note, as of July 2001, this course has still not been announced.]

Proofreading. Because of the high standards required for braille texts, it is customary to require that texts not only be transcribed by LOC certified transcribers but also proofed by LOC certified proofreaders. Certified proofreaders are almost always blind and there is a serious shortage of proofreaders, especially in technical areas.

Nemeth proofreading certification has only been available since 1991 and few blind mathematicians and computer scientists—who might otherwise wish to provide some service as proofreaders—are willing to spend the considerable time and effort required to prove that they can read material that they have been reading for years. Another problem is bootstrapping: as accessibility brings blind persons into advanced technical areas, there may simply not be blind persons with the appropriate background available for proofing certain types of technical material.

The primary purpose of this website is to solve the proofreading problem by developing a new print representation of braille and corresponding certification procedures that will allow sighted persons to proofread braille transcriptions. Volunteer opportunity. Consider solutions to this problem.

An example

Quadratic equation in print.

As an example of a readable braille equivalent in print, consider the quadratic equation—as in the image to the left—for which the Nemeth transcription is given in the figure below left using Braille ASCII for the braille cells.

AX^2"+BX+C .K 0
Quadratic equation in Nemeth displayed in Braille ASCII.

The cell represented by the caret, dots 4-5, is the superscript composition indicator and the one by the quotation mark, dot 5, is the corresponding return-to-baseline indicator. Nemeth does not use spaces between the terms in a sum and the addition operator or plus sign. However, the two-cell equals mark, represented by the ".K", which is dots 4-6 followed by dots 1-3, is always space-delimited.

ax^²↓+bx+c = 0
Quadratic equation in Nemeth displayed in DotlessBraille™.

The last figure shows a DotlessBraille™ type display of the Nemeth transcription of the quadratic equation. You can see by comparison with the previous figure that the layout is the same as the braille; there is one print character for one braille cell. This display uses lowercase letters as in the original print. The caret, which is commonly used to indicate superscripts in linear presentations such as computer codes, has been retained while the baseline indicator is shown as a grey "down arrow". (There are, of course, other options.) The two-cell equals mark symbol has been replaced by a (non-breaking) space followed by an actual print equals mark.

Volunteer opportunity! Our hypothesis is that the additional effort required to proofread appropriately displayed print versions of braille transcriptions of technical material over and above the ordinary effort required to assimilate such material should be minimal. In other words, effective displays, such as our proposed DotlessBraille™ display, would allow proofreading to be a concurrent add-on activity to the reading of technical material which a sighted reader desires to read primarily for other purposes.

For example, sighted students could proofread brailled versions of textbooks—for which blind students have a future need—while simultaneously using these textbooks as study material. Similarly, a professor reviewing familiar material in the course of preparing lecture notes might be able to proofread a brailled version of that material concurrently.

Just so there is no possibility for misunderstanding, we are in no way advocating that untrained persons can adequately proofread braille. What we believe is that, given appropriate display methods, it should be possible to certify individuals with respect to the proper transcription rules in limited technical areas in a significantly shorter time, weeks instead of a year, than is currently the case. Such training would, of course, emphasis recognition of the types of errors that are possible with computer-based transcription.

Lack of suitable electronic sources

Many persons are surprised to learn that electronic files are typically not available even for recently published books or to discover that—even when an electronic file is available or has been generated by scanning or retyping—considerable effort is still required to prepare the file for input to an "automated" transcription application. The previously cited article on Braille Transcribing describes this situation.

In 1999, at meetings and conferences, the question is again raised, "Will computer software replace the skilled transcriber?" As translation software becomes more sophisticated, the role of the transcriber will change. There will be less focus on the "dots"—the actual composition of the braille symbols—and more time invested in structuring the computer files that will be translated.

Circumstances have changed in the past. The code will continue to be modified and changed to accommodate the changes in the print world. The future may bring new and different methods of producing braille; however, the limitations inherent in a code with a finite number of symbols ensure that there will always be a need for human intervention to produce a final product that fully conveys the meaning of the print author. (p. 269)

Similar thoughts were expressed by Joseph Sullivan in his 1993 article mentioned above.

Perhaps this is the place to mention that, from what I have observed, a saving of human labor in braille production generally translates into more and better braille, that is to more productive and interesting jobs for those who work with braille, not lost jobs!...Human experience and judgment remain a valued part of the braille transcription process, all the more so because an increased volume of automated transcription can only be accompanied by an increased incidence of the "hard" problems that only people can solve.

Consistent formatting. As stated previously, braille transcriptions must be essentially error-free; this statement applies to format or layout as well as text. Proper formatting is a large part of the difficulty of transcribing and a large part of what a braillist needs to know. Braille transcribing programs, just like browsers, requires their input (source) files to contain embedded formatting codes so that a transcription will be correct in the large as well as the small.

There are well-defined rules for choosing the correct braille code for the given material as well as special braille formats for literary texts, technical articles, recipes, menus, poems, essays, tables of contents, songs, plays, computer code, cartoon strips, tactile graphics and so forth that must all be adhered to. The official reference for formats, produced by the Braille Association of North America (BANA) is Braille Formats: Principles of Print to Braille Transcription, 1997. The latest errata to this publication is on the BANA site.

Inadequate indication of document structure. One key to understanding the formatting problem is to appreciate the significance of indicating document structure to understandability in print. The plethora of ridiculous print formatting options made possible by computer-based typesetting have made most people more aware of the importance of the proper presentation of document structure and more appreciative of the value of good design in this arena.

The problem of embedding format codes into existing source to be automatically transcribed to braille is essentially the same problem that is well-known to anyone who has been responsible for changing the style of or inserting HyperText Markup Language (HTML) tags into a document prepared by another person and has discovered that—rather than using a style sheet or template—the author or typist had simply hand-formatted each structural item such as a heading.

It turns out that, perhaps surprisingly, often the most efficient solution nowadays is to scan a printed version of such a document since modern Optical Character Recognition (OCR) software has very sophisticated algorithms for recognizing document structure. OCR also has the advantage of supplying a more uniform representation of document structure than would typically occur when using electronic files from a variety of sources.

HTML and propriety codes used by modern word processing software are intended to show the structure—rather than the typesetting, style, or presentation—of a document. HTML source, for example, contains tags that indicate whether text is a heading of a certain level, a list element, and so forth. Text that is to be emphasized is properly tagged by "em", for emphasis, or "strong", for strong emphasis rather than by "italic" or "bold". And, although there are applications that are supposed to create properly marked-up HTML source from ordinary text, much of this work is still done by hand just as it is for material that is to be transcribed to braille. Further discussion of this issue as it related to braille transcription is reviewed in the previously mentioned article on SGML.

Inadequacy of syntactical sources like TeX. What structure is attempting to accomplish is to distinguish semantics or meaning from syntax or typesetting. Syntax is commonly the basis of formatting systems, such as TeX, for presenting mathematics.

For example, the concept of superscript is a typesetting convention whereas the concept of exponentiation, which is ordinarily represented by using a superscript, is a semantic construct. Similarly, the concept of subscript is a typesetting convention whereas the concept of indexing, which is ordinarily represented by using a subscript, is a semantic construct. We can see the importance of this distinction because text in which an author has marked up the semantic concepts but that happens to use unconventional typesetting, such as raising indices above the baseline as is sometimes done, could be automatically converted to valid computer code whereas text based on typesetting could not.

This semantic approach may appear to be at odds with the syntactic one taken by Dr. Abraham Nemeth, the originator of the Nemeth braille code for mathematics, who felt that the code would have an advantage if it were based on presentation rather than semantics since a braille transcriber could then transcribe mathematical texts with little knowledge of the underlying mathematics. His article, titled "The Nemeth Code", on the history and philosophy of the Nemeth code in the book, Braille into the Next Millennium states, " A Nemeth Code transcriber need not be proficient in mathematics; all that is required is to look up the symbols and follow the rules. That is what has attracted so many transcribers and what accounts for such a large collection of braille books in math and other natural sciences."

This is a reflection of what Dr. Nemeth calls "The Principle of Meaning Versus Notation" which states, "...it is the transcriber's function to supply only notation, not meaning, in an accessible form (speech or braille.) It is the reader's function to extract the meaning from the notation the transcriber supplies."

It remains to be seen what the longterm solution to braille math transcribing will be. It has already been shown—R. Arrabito and H. Jürgensen, "Computerized Braille typesetting: another view of Mark-up standards", Electronic Publishing, 1(2), 117-131 (September 1988)—that it is not possible to develop a general automated process for converting from TeX to Nemeth even though, or actually because, TeX is also primarily syntactic. (Note, however, that Duxbury has some capability for translating to Nemeth from the restricted TeX output generated by ScientificNotebook in the case of input prepared by a Nemeth-knowledgeable editor. However, this is really no more than the capability already built into Duxbury's MegaDots in a different form.)

These are some of the problems in converting from a typesetting source, like TeX, to Nemeth.

One of the reasons that automatic transcription from a TeX source is not possible is that TeX, like ordinary text, does not always provide sufficient information to distinguish those portions of a text that should be transcribed according to the literary code from those that should be transcribed according to the Nemeth code. This is also a problem with the two Duxbury applications and is one reason why human editors are needed.

A fundamental reason that general TeX cannot be automatically converted to Nemeth is that TeX uses syntax differently from Nemeth. The advantages of syntax-based transcribing for braille math were discussed above; unfortunately different persons don't always agree on syntax. Dr. Nemeth surely did not anticipate that Dr. Knuth, the inventor of TeX, would use its caret "raise text" indicator not only for exponents but also to raise text as in references to footnotes whereas Nemeth distinguishes these as separate constructs. Of course, this aspect of TeX doesn't only impact Nemeth but is also creating problems for TeX-to-MathML converters.

Another problem that may be unique to TeX is that its flexibility in allowing an author to create or "draw" symbols using low-level drawing primitives makes symbols defined in this manner unrecognizable even though they may, in fact, be standard mathematical symbols that have Nemeth equivalents.

Home | Site Map | Our Logo

This page was last modified May 31, 2002

Send questions to info@dotlessbraille.org