UnicodeMathML

This repository is a fork of Noah Doersing’s UnicodeMathML repository with added commits by Murray Sargent III. The changes are summarized at the end of this document. The facility is discussed in the help file.

The repository provides a JavaScript-based translation of UnicodeMath to MathML 4.0, hence the name “UnicodeMathML”. In addition, the facility supports dictation, speech, Nemeth braille, and LaTeX. The interactive playground lets you experiment with UnicodeMath, LaTeX, speech, and braille and gives insight into the translation pipeline.

UnicodeMath is a linear representation of math that often resembles math notation and is easy to enter. It works well in Microsoft desktop apps such as Word, PowerPoint, Outlook, and OneNote but it hasn’t been widely available elsewhere. See also Plurimath.

Methodology

UnicodeMath conversion to MathML starts with parsing the input with a peg grammar, thereby producing an abstract syntax tree (AST). This AST is then recursively preprocessed (via preprocess()) to make a new AST with some intent attributes as well as fix ups not easily accomplished in the grammar parsing. Originally the idea was to create an AST useful for creating not only MathML, but also other formats such as LaTeX. But it turned out that creating LaTeX, speech, and Nemeth braille was more easily accomplished from a MathML DOM. The AST is then recursively converted into a MathML AST (via mtransform()) with additional intent attributes. The MathML AST is run through a prettifier (pretty()) eliminating superfluous mrow’s and compensating for limitations in MathML Core’s table functionality.

LaTeX, dictation, and Nemeth braille inputs are converted to UnicodeMath, which is converted, in turn, to MathML. Since LaTeX, speech, and Nemeth braille outputs are derived from a MathML DOM, a MathML parser would be needed in node.js environments.

UnicodeMath entered into the output window, i.e., in-place editing, is handled by autobuildup routines that manipulate the MathML DOM.

Testing

There are two test pages: ./dist/example.html and ./test/MmlToUM.html used to test conversions and UI behavior.

example.html contains text with myriad UnicodeMath or LaTeX math zones that are converted to MathML and compared to known results. The tests pass if the console reports 0 failures.

MmlToUM.html has a set of buttons for testing UI behavior and conversions other than UnicodeMath/LaTeX to MathML. Clicking on the buttons runs the tests and the results are reported in the console window.

Although there are many tests, they are not exhaustive. They sure help in reducing regressions.

Integration

Documentation will be coming soon on how to include UnicodeMathML in node.js environments.

License

You may use this repository’s contents under the terms of the MIT License.

However, the subdirectories lib/ and playground/assets/lib/ contain some third-party software with its own licenses:

Lastly, Noah Doersing’s Master’s thesis is located at docs/doersing-unicodemath-to-mathml.pdfand is included in this repository as a reference for some implementation details. It’s not intended (or relevant) for general distribution.

Changes in this forked version

Murray Sargent’s forked version is located at https://github.com/MurrayIII/UnicodeMathML/tree/main.

New features

MathML intent-attribute support: