UnicodeMathML

This is a fork of Noah Doersing’s UnicodeMathML repository with added commits by Murray Sargent III. The changes are summarized at the end of this document. To see the changed code, look at the main branch (https://github.com/MurrayIII/UnicodeMathML/tree/main), not the master branch.

This repository provides a JavaScript-based translation of UnicodeMath to MathML (hence “UnicodeMathML”). An interactive “playground” allows for experimentation with UnicodeMath’s syntax and insight into the translation pipeline. UnicodeMathML can be easily integrated into arbitrary HTML or Markdeep documents.

🎮 Get familiar with the syntax via the playground!

📑 Learn how to integrate UnicodeMathML into your website or Markdeep document.

UnicodeMath is an easy-to-read linear format for mathematics initially developed as an input method and interchange representation for Microsoft Office. Its author, Murray Sargent III, has published a Unicode Technical Note detailing the format, based on which this UnicodeMath to MathML translator was built. More in the FAQ section below.

The initial development of UnicodeMathML was part of my Master’s thesis.

Status

Generally consistent with version 3.1 of Sargent’s tech note, some edge cases that aren’t unambiguously specified (or, as UnicodeMath is not wholly context-free, impossible to parse with a PEG-based approach) might differ from the canonical implementation in Microsoft Office. Abstract boxes are largely unimplemented due to insufficient specification.

Getting Started

For a first look, check out…

Depending on whether you’d like to write UnicodeMath in a Markdeep document or use UnicodeMathML on your website, there are two paths. But first:

  1. Clone this repository or download a ZIP.

     git clone https://github.com/doersino/UnicodeMathML.git
    
  2. Before moving on, note that UnicodeMathML by default only transforms math surrounded by the UnicodeMath delimiters and . For example, a typical sentence might read like this:

     Given a function ⁅f⁆ of a real variable ⁅x⁆ and an interval ⁅[a, b]⁆ of the real line, the **definite integral**
    
     ⁅∫_a^b f(x) ⅆx⁆
    
     can be interpreted informally as the signed area of the region in the ⁅xy⁆-plane that is bounded by the graph of ⁅f⁆, the ⁅x⁆-axis and the vertical lines ⁅x = a⁆ and ⁅x = b⁆.
    

HTML

Open dist/example.html in a text editor of your choice and scroll to the bottom. There, you’ll see the following lines:

<script>
    var unicodemathmlOptions = {
        resolveControlWords: true,
    };
</script>
<script src="unicodemathml.js"></script>
<script src="unicodemathml-parser.js"></script>
<script src="unicodemathml-integration.js"></script>
<script>
    document.body.onload = renderUnicodemath();
</script>

You’ll need to include the same lines (modulo path changes) at the bottom of your own HTML document or website (but before the closing </body> tag).

Markdeep

UnicodeMathML comes with a lightly modified variant of Morgan McGuire’s Markdeep that kicks off the translation at the correct point in the document rendering process. Open dist/example.md.html in a text editor of your choice and scroll to the bottom. There, you’ll see the following lines:

<script>
    var unicodemathmlOptions = {
        resolveControlWords: true,
    };
</script>
<script src="unicodemathml.js"></script>
<script src="unicodemathml-parser.js"></script>
<script src="unicodemathml-integration.js"></script>
<script src="markdeep-1.11.js" charset="utf-8"></script>

Replace the Markdeep loading code at the bottom of your document with this code (modulo path changes).

Node

While I haven’t tested server-side translation of UnicodeMath into MathML, there shouldn’t be any problems integrating the core of UnicodeMathML into a Node project – it’s all vanilla JavaScript. If you run into any trouble, or if you would prefer an officially supported NPM package or something, don’t hesitate to file an issue!

Configuration

The unicodemathmlOptions variable must be a dictionary containing one or many of the key-value pairs described below. If you’re happy with the defaults, you can leave unicodemathmlOptions undefined.

var unicodemathmlOptions = {

    // whether a progress meter should be shown in the bottom right of the
    // viewport during translation (you can probably disable this in most cases,
    // but it should remain enabled for large documents containing more than
    // 1000 UnicodeMath expressions where translation might take more than a
    // second or two)
    showProgress: true,

    // whether to resolve control words like "\alpha" to "α", this also includes
    // unicode escapes like "\u1234"
    resolveControlWords: false,

    // a dictionary defining a number of custom control words, e.g.:
    // customControlWords: {'playground': '𝐏𝓁𝔞𝚢𝗴𝑟𝖔𝓊𝙣𝕕'},
    // which would make the control word "\playground" available – this is handy
    // in documents where certain expressions or subexpressions are repeated
    // frequently
    customControlWords: undefined,

    // how to display double-struck symbols (which signify differentials,
    // imaginary numbers, etc.; see section 3.11 of the tech note):
    // "us-tech" (ⅆ ↦ 𝑑), "us-patent" (ⅆ ↦ ⅆ), or "euro-tech" (ⅆ ↦ d)
    doubleStruckMode: "us-tech",

    // a function that will run before the translation is kicked off
    before: Function.prototype,

    // a function that will run after the translation has finished (and after
    // MathJax, if loaded, has been told to render the generated MathML)
    after: Function.prototype
};

FAQ

Got further questions that aren’t answered below, or ideas for potential improvements, or found a bug? Feel free to file an issue!

What’s this UnicodeMath you’re talking about?

UnicodeMath is a linear format for mathematics initially developed as an input method and interchange representation for Microsoft Office. Its author, Murray Sargent III, has published a Unicode Technical Note (a copy of which is included at docs/sargent-unicodemathml-tech-note.pdf) describing its syntax and semantics.

By using Unicode symbols in lieu of keywords wherever possible, it’s significantly more readable than, say, LaTeX in plain text:

UnicodeMath, much like MathML, was desiged with accessibility in mind, taking cues from Nemeth braille and other preceding math encodings.

How does its syntax compare to AsciiMath, (La)TeX, and MathML?

Here’s a table showing a few expressions as you’d formulate them in UnicodeMath, AsciiMath, and LaTeX:

There are many subtleties as you get into the nitty-gritty, of course, but you’ll see that UnicodeMath consistently makes for the most readable and concise plaintext. LaTeX, in contrast, is significantly more verbose – but since it’s been around forever, you might find it to be more versatile in practice.

To summarize, here’s a totally-not-biased-and-super-scientific evaluation of these notations:

Does UnicodeMath support colors, monospaced text and comments?

Not in its canonical form as described in Sargent’s tech note – in Section 1, he mentions that such properties should be delegated to a “higher layer”, which is perfectly reasonable in GUI-based environments like Microsoft Office – but there is not such layer in HTML/Markdeep.

Update: In late 2021, Murray Sargent adopted part of the notation described below into mainline UnicodeMathML; this fix was published as part of UnicodeMath version 3.2. **

To remedy this, UnicodeMathML introduces a few non-standard constructs:

For your copy-and-pasting pleasure, that’s , , , , and . You can use any color name or specification supported by CSS.

Cool, but I can’t find any of these fancy Unicode symbols on my keyboard!

Nobody’s keeping you from adapting Tom Scott’s emoji keyboard idea for math.

More realistically, there’s a bunch of tooling and text editor plugins that can help out here:

Additionally, you can configure UnicodeMathML to automatically translate keywords like \infty into their respective symbols before processing proper commences – see the “Configuration” section above.

Alright, that’s not as big of a problem as I feared. What’s MathML, then?

You could describe MathML as “HTML, but for math”. It’s an XML-based markup language for mathematical expressions that was first released as a W3C recommendation in 1998 – it’s been around for a while!

Einstein’s famous E=mc² can be expressed as follows:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow>
    <mi>E</mi>      <!-- identifier -->
    <mo>=</mo>      <!-- operator -->
    <mrow>          <!-- grouping, similat to <span> in HTML -->
      <mi>m</mi>
      <msup>        <!-- superscript -->
        <mi>c</mi>
        <mn>2</mn>  <!-- number -->
      </msup>
    </mrow>
  </mrow>
</math>

UnicodeMath’s Sargent notes: “MathML has been designed for machine representation of mathematics and is useful for interchange between mathematical applications as well as for rendering the math in technical documents. While very good for these purposes, MathML is awkward for direct human input. Hence it’s desirable to have more user friendly ways of inputting mathematical expressions and equations.”

Isn’t browser support for MathML really lackluster?

Sort of – according to caniuse.com, native support for MathML is available for around 21% of users as of late 2020** since only Firefox and Safari supported MathML.

However, Igalia added MathML rendering support to Chromium and as of Spring 2023, MathML support has arrived in Chrome, Edge, and Opera, reaching a total of 90% of users. To see how it looks with your browser, click here.

All of this isn’t really an issue: MathJax, which you’d probably use to render LaTeX math on the web anyway, provides a polyfill for MathML rendering.

But LaTeX seems much more established in various workflows than MathML, and KaTeX is so much faster than MathJax!

Can’t argue with that! Which is why I’ve been experimenting with extending UnicodeMathML to emit LaTeX code, too – most but not all UnicodeMath features are supported at a basic level. You can take a look at the current state of this feature in the playground by enabling the “Enable EXPERIMENTAL LaTeX output” setting.

I’m not actively working on completing LaTeX code generation at the moment, but feel free to file an issue if this feature is important to you.

Tell me more about the playground.

Sure thing – I’ve originally built it as a parser development aid. Before learning about it in detail, take a gander at this screenshot of its interface:

The playground is designed to keep its state in local storage, so you shouldn’t lose any data if you reload it.

Development

This section is largely a reminder to myself and other potential contributors.

UnicodeMathML is intentionally kept simple and doesn’t have any dependencies beyond PEG.js – that way, it’s easier to maintain and extend.

Local development

Depending on how your browser implements its same-origin policy, you might not be able to serve the playground from the file system (i.e. with a URL like file:///⋯/UnicodeMathML/playground/index.html) during development:

You can work around this by running a static web server that’s serving the root directory of you local clone of this repository. Many programming environments, one of which is surely installed on your system, provide one-liners for this purpose – see here. If you’ve got Python installed, simply run python3 -m http.server 8000 and point your browser at localhost:8000/playground/.

Bundling

The contents of dist/ are generated as follows:

  1. Run the bash script utils/bundle.sh from the root directory of this repository.
  2. Open utils/generate-parser.html in any web browser (the caveats discussed in the “Local development” section above apply) and move the file that will be downloaded into dist/.

License

You may use this repository’s contents under the terms of the MIT License, see LICENSE.

However, the subdirectories lib/ and playground/assets/lib/ contain some third-party software with its own licenses:

Lastly, the docs/ subdirectory contains two PDF files:

Changes in Murray Sargent’s forked version

Murray Sargent’s forked version is located at https://github.com/MurrayIII/UnicodeMathML/tree/main.

New features

MathML intent-attribute support:

To do: