Skip to main content

MathJSON

MathJSON: a lightweight data interchange format for mathematical notation.

MathMathJSON
\displaystyle\frac{n}{1+n}["Divide", "n", ["Add", 1, "n"]]
\sin^{-1}^\prime(x)["Apply", ["Derivative", ["InverseFunction", "Sin"]], "x"]

MathJSON is built on the JSON format. Its focus is on interoperability between software programs to facilitate the exchange of mathematical data and the building of scientific software through the integration of software components communicating with a common format.

It is human-readable, while being easy for machines to generate and parse. It is simple enough that it can be generated, consumed and manipulated using any programming languages.

MathJSON can be transformed from (parsing) and to (serialization) other formats.

Demo

Type an expression in the mathfield below to see its MathJSON representation.

e^{i\pi}+1=0
{}

The Cortex Compute Engine library provides an implementation in JavaScript/TypeScript of utilities that parse LaTeX to MathJSON, serialize MathJSON to LaTeX, and provide a collection of functions for symbolic manipulation and numeric evaluations of MathJSON expressions.

Mathematical notation is used in a broad array of fields, from elementary school arithmetic, engineering, applied mathematics to physics and more. New notations are invented regularly and MathJSON endeavors to be flexible and extensible to account for those notations.

The Compute Engine includes a standard library of functions and symbols which can be extended with custom libraries.

MathJSON is not intended to be suitable as a visual representation of arbitrary mathematical notations, and as such is not a replacement for LaTeX or MathML.

Structure of a MathJSON Expression​

A MathJSON expression is a combination of numbers, symbols, strings and functions.

Number

3.14
314e-2
{"num": "3.14159265358979323846264338327950288419716939937510"}
{"num": "-Infinity"}

Symbol

"x"
"Pi"
"🍎"
"εŠεΎ„"
{"sym": "Pi", "wikidata": "Q167"}

String

"'Diameter of a circle'"
{"str": "Srinivasa Ramanujan"}

Function

["Add", 1, "x"]
{"fn": [{"sym": "Add"}, {"num": "1"}, {"sym": "x"}]}

Numbers, symbols, strings and functions are expressed either as object literals with a "num" "str" "sym" or "fn" key, respectively, or using a shorthand notation as a a JSON number, string or array.

The shorthand notation is more concise and easier to read, but it cannot include metadata properties.

Numbers​

A MathJSON number is either:

  • an object literal with a "num" key
  • a JSON number
  • a JSON string starting with +, - or the digits 0-9. Using a string is useful to represent numbers with a higher precision or greater range than JSON numbers.

Numbers as Object Literals​

Numbers may be represented as an object literal with a "num" key. The value of the key is a string representation of the number.

{
"num": <string>
}

The string representing a number follows the JSON syntax for number, with the following differences:

  • The range or precision of MathJSON numbers may be greater than the range and precision supported by IEEE 754 64-bit float.
{ "num": "1.1238976755823478721365872345683247563245876e-4567" }
  • The string values "NaN" "+Infinity" and "-Infinity" are used to represent respectively an undefined result as per IEEE 754, +\infty, and -\infty.
{ "num": "+Infinity" }
  • If the string includes the pattern /\([0-9]+\)/, that is a series of one or more digits enclosed in parentheses, that pattern is interpreted as repeating digits.
{ "num": "1.(3)" }
{ "num": "0.(142857)" }
{ "num": "0.(142857)e7" }
  • The following characters in a string representing a number are ignored:
U+0009TAB
U+000ALINE FEED
U+000BVERTICAL TAB
U+000CFORM FEED
U+000DCARRIAGE RETURN
U+0020SPACE
U+00A0UNBREAKABLE SPACE

Numbers as Number Literals​

When a number is compatible with the JSON representation of numbers and has no metadata, a JSON number literal may be used.

Specifically:

  • the number fits in a 64-bit binary floating point, as per IEEE 754-2008, with a 53-bit significand (about 15 digits of precision) and 11-bit exponent. If negative, its range is from -1.797693134862315 \cdot 10^{+308} to -2.225073858507201\cdot 10^{-308} and if positive from 2.225073858507201\cdot 10^{-308} to 1.797693134862315\cdot 10^{+308}
  • the number is finite: it is not +Infinity -Infinity or NaN.
0
-234.534e-46

The numeric values below may not be represented as JSON number literals:

// Exponent out of bounds
{ "num": "5.78e400" }

// Too many digits
{ "num": "3.14159265358979323846264338327950288419716" }

// Non-finite numeric value
{ "num": "-Infinity" }

Numbers as String Literals​

An alternate representation of a number with no extra metadata is as a string following the format described above.

This allows for a shorthand representation of numbers with a higher precision or greater range than JSON numbers.

"3.14159265358979323846264338327950288419716"
"+Infinity"

Strings​

A MathJSON string is either:

  • an object literal with a "str" key
  • a JSON string that starts and ends with U+0027 ' APOSTROPHE .

Strings may contain any character represented by a Unicode scalar value (a codepoint in the [0...0x10FFFF] range, except for [0xD800...0xDFFF]), but the following characters must be escaped as indicated:

CodepointNameEscape Sequence
U+0000 to U+001F\u0000 to \u001f
U+0008BACKSPACE\b or \u0008
U+0009TAB\t or \u0009
U+000ALINE FEED\n or \u000a
U+000CFORM FEED\f or \u000c
U+000DCARRIAGE RETURN\r or \u000d
U+0027APOSTROPHE\' or \u0027
U+005CREVERSE SOLIDUS (backslash)\\ or \u005c

The encoding of the string follows the encoding of the JSON payload: UTF-8, UTF-16LE, UTF-16BE, etc...

"'Alan Turing'"

Functions​

A MathJSON function expression is either:

  • an object literal with a "fn" key.
  • a JSON array

Function expressions in the context of MathJSON may be used to represent mathematical functions but are more generally used to represent the application of a function to some arguments.

The function expression ["Add", 2, 3] applies the function named Add to the arguments 2 and 3.

Functions as Object Literal​

The default representation of function expressions is an object literal with a "fn" key. The value of the fn key is an array representing the function operator (its name) and its arguments (its operands).

{
"fn": [Operator, ...Operands[]]
}

For example:

  • 2+x: { "fn": ["Add", 2, "x"] }
  • \sin(2x+\pi): { "fn": ["Sin", ["Add", ["Multiply", 2, "x"], "Pi"]] }
  • x^2-3x+5: { "fn": ["Add", ["Power", "x", 2], ["Multiply", -3, "x"], 5] }

Functions as JSON Arrays​

If a function expression has no extra metadata it may be represented as a JSON array.

For example these two expressions are equivalent:

{ "fn": ["Cos", ["Add", "x", 1]] }

["Cos", ["Add", "x", 1]]
note

An array representing a function must have at least one element, the operator of the function. Therefore [] is not a valid expression.

Function Operator​

The operator of the function expression is the first element in the array. Its presence is required. It indicates the name of the function: this is what the function is about.

The operator is an identifier following the conventions for function names (see below).

// Apply the function "Sin" to the argument "x"
["Sin", "x"]

// Apply "Cos" to a function expression
["Cos", ["Divide", "Pi", 2]]

Following the operator are zero or more arguments (or operands), which are expressions.

CAUTION

The arguments of a function are expressions. To represent an argument which is a list, use a ["List"] expression, do not use a JSON array.

The expression corresponding to \sin^{-1}(x) is:

["Apply", ["InverseFunction", "Sin"], "x"]

The operator of this expression is "Apply" and its argument are the expressions ["InverseFunction", "Sin"] and "x".

Shorthands​

The following shorthands are allowed:

  • A ["Dictionary"] expression may be represented as a string starting with U+007B { LEFT CURLY BRACKET and ending with U+007D } RIGHT CURLY BRACKET. The string must be a valid JSON object literal.
  • A ["List"] expression may be represented as a string starting with U+005B [ LEFT SQUARE BRACKET and ending with U+005D ] RIGHT SQUARE BRACKET. The string must be a valid JSON array.
"{\"x\": 2, \"y\": 3}"
// βž” ["Dictionary", ["Tuple", "x", 2], ["Tuple", "y", 3]]

"[1, 2, 3]"
// βž” ["List", 1, 2, 3]

Symbols​

A MathJSON symbol is either:

  • an object literal with a "sym" key
  • a JSON string

Symbols are identifiers that represent the name of variables, constants and wildcards.

Identifiers​

Identifiers are JSON strings that represent the names of symbols, variables, constants, wildcards and functions.

Before they are used, JSON escape sequences (such as \u sequences, \\, etc.) are decoded.

The identifiers are then normalized to the Unicode Normalization Form C (NFC). They are stored internally and compared using the Unicode NFC.

For example, these four JSON strings represent the same identifier:

  • "Γ…"
  • "A\u030a" U+0041 Aβ€Œ LATIN CAPITAL LETTER + U+030A ̊ COMBINING RING ABOVE
  • "\u00c5" U+00C5 Γ… LATIN CAPITAL LETTER A WITH RING ABOVE
  • "\u0041\u030a" U+0041 Aβ€Œ LATIN CAPITAL LETTER A + U+030A ̊ COMBINING RING ABOVE

Identifiers conforms to a profile of UAX31-R1-1 with the following modifications:

  • The character U+005F _ LOW LINE is added to the Start character set
  • The characters should belong to a recommended script
  • An identifier can be a sequence of one or more emojis. Characters that have both the Emoji and XIDC property are only considered emojis when they are preceded with emoji modifiers. The definition below is based on Unicode TR51 but modified to exclude invalid identifiers.

Identifiers match either the NON_EMOJI_IDENTIFIER or the EMOJI_IDENTIFIER patterns below:

const NON_EMOJI_IDENTIFIER = /^[\p{XIDS}_]\p{XIDC}*$/u;

(from Unicode TR51)

or

const VS16 = "\\u{FE0F}"; // Variation Selector-16, forces emoji presentation
const KEYCAP = "\\u{20E3}"; // Combining Enclosing Keycap
const ZWJ = "\\u{200D}"; // Zero Width Joiner

const FLAG_SEQUENCE = "\\p{RI}\\p{RI}";

const TAG_MOD = `(?:[\\u{E0020}-\\u{E007E}]+\\u{E007F})`;
const EMOJI_MOD = `(?:\\p{EMod}|${VS16}${KEYCAP}?|${TAG_MOD})`;
const EMOJI_NOT_IDENTIFIER = `(?:(?=\\P{XIDC})\\p{Emoji})`;
const ZWJ_ELEMENT = `(?:${EMOJI_NOT_IDENTIFIER}${EMOJI_MOD}*|\\p{Emoji}${EMOJI_MOD}+|${FLAG_SEQUENCE})`;
const POSSIBLE_EMOJI = `(?:${ZWJ_ELEMENT})(${ZWJ}${ZWJ_ELEMENT})*`;
const EMOJI_IDENTIFIER = new RegExp(`^(?:${POSSIBLE_EMOJI})+$`, "u");

In summary, when using Latin characters, identifiers can start with a letter or an underscore, followed by zero or more letters, digits and underscores.

Carefully consider when to use non-latin characters. Use non-latin characters for whole words, for example: "εŠεΎ„" (radius), "ΧžΦ°Χ”Φ΄Χ™Χ¨Χ•ΦΌΧͺ" (speed), "η›΄εΎ‘" (diameter) or "ΰ€Έΰ€€ΰ€Ή" (surface).

Avoid mixing Unicode characters from different scripts in the same identifier.

Do not include bidi markers such as U+200E LTR* or U+200F RTL in identifiers. LTR and RTL marks should be added as needed by the client displaying the identifier. They should be ignored when parsing identifiers.

Avoid visual ambiguity issues that might arise with some Unicode characters. For example:

  • prefer using "gamma" rather than U+0194 Ι£ LATIN SMALL LETTER GAMMA or U+03B3 Ξ³ GREEK SMALL LETTER GAMMA
  • prefer using "Sum" rather than U+2211 βˆ‘ N-ARY SUMMATION, which can be visually confused with U+03A3 Ξ£ GREEK CAPITAL LETTER SIGMA.

The following naming convention for wildcards, variables, constants and function names are recommendations.

Wildcards Naming Convention​

Symbols that begin with U+005F _ LOW LINE (underscore) should be used to denote wildcards and other placeholders.

For example, they may be used to denote the positional parameter in a function expression. They may also denote placeholders and captured expression in patterns.

Wildcard
"_"Wildcard for a single expression or for the first positional argument
"_1"Wildcard for a positional argument
"_β€Š_"Wildcard for a sequence of 1 or more expression
"___"Wildcard for a sequence of 0 or more expression
"_a"Capturing an expression as a wildcard named a

Variables Naming Convention​

  • If a variable is made of several words, use camelCase. For example "newDeterminant"

  • Prefer clarity over brevity and avoid obscure abbreviations.

    Use "newDeterminant" rather than "newDet" or "nDet"

Constants Naming Convention​

  • If using latin characters, the first character of a constant should be an uppercase letter A-Z
  • If a constant name is made up of several words, use camelCase. For example "SpeedOfLight"

Function Names Naming Convention​

  • The name of the functions in the MathJSON Standard Library starts with an uppercase letter A-Z. For example "Sin", "Fold".
  • The name of your own functions can start with a lowercase or uppercase letter.
  • If a function name is made up of several words, use camelCase. For example "InverseFunction"

LaTeX Rendering Conventions​

The following recommendations may be followed by clients displaying MathJSON identifiers with LaTeX, or parsing LaTeX to MathJSON identifiers.

These recommendations do not affect computation or manipulation of expressions following these conventions.

  • An identifier may be composed of a main body, some modifiers, some style variants, some subscripts and subscripts. For example:
    • "alpha_0__prime" \alpha_0^\prime
    • "x_vec" \vec{x}
    • "Re_fraktur" \mathfrak{Re} .
  • Subscripts are indicated by an underscore _ and superscripts by a double-underscore __. There may be more than one superscript or subscripts, but they get concatenated. For example "a_b__c_q__p" -> a_{b, q}^{c, p} \( a_{b, q}^{c, p} \).
  • Modifiers after a superscript or subscript apply to the closest preceding superscript or subscript. For example "a_b_prime" -> a_{b^{\prime}}

Modifiers include:

ModifierLaTeX
_deg\degree\( x\degree \)
_prime{}^\prime\( x^{\prime} \)
_dprime{}^\doubleprime\( x^{\doubleprime} \)
_ring\mathring{}\( \mathring{x} \)
_hat\hat{}\( \hat{x} \)
_tilde\tilde{}\( \tilde{x} \)
_vec\vec{}\( \vec{x} \)
_bar\overline{}\( \overline{x} \)
_underbar\underline{}\( \underline{x} \)
_dot\dot{}\( \dot{x} \)
_ddot\ddot{}\( \ddot{x} \)
_tdot\dddot{}\( \dddot{x} \)
_qdot\ddddot{}\( \dddodt{x} \)
_operator\operatorname{}\( \operatorname{x} \)
_upright\mathrm{}\( \mathrm{x} \)
_italic\mathit{}\( \mathit{x} \)
_bold\mathbf{}\( \mathbf{x} \)
_doublestruck\mathbb{}\( \mathbb{x} \)
_fraktur\mathfrak{}\( \mathfrak{x} \)
_script\mathscr{}\( \mathscr{x} \)
  • The following common names, when they appear as the body or in a subscript/superscript of an identifier, may be replaced with a corresponding LaTeX command:
Common NamesLaTeX
alpha\alpha\( \alpha \)
beta\beta\( \beta \)
gamma\gamma\( \gamma \)
delta\delta\( \delta \)
epsilon\epsilon\( \epsilon \)
epsilonSymbol\varepsilon\( \varepsilon \)
zeta\zeta\( \zeta \)
eta\eta\( \eta \)
theta\theta\( \theta \)
thetaSymbol\vartheta\( \vartheta \)
iota\iota\( \iota \)
kappa\kappa\( \kappa \)
kappaSymbol\varkappa\( \varkappa \)
mu\mu\( \mu \)
nu\nu\( \nu \)
xi\xi\( \xi \)
omicron\omicron\( \omicron \)
piSymbol\varpi\( \varpi \)
rho\rho\( \rho \)
rhoSymbol\varrho\( \varrho \)
sigma\sigma\( \sigma \)
finalSigma\varsigma\( \varsigma \)
tau\tau\( \tau \)
phi\phi\( \phi \)
phiLetter\varphi\( \varphi \)
upsilon\upsilon\( \upsilon \)
chi\chi\( \chi \)
psi\psi\( \psi \)
omega\omega\( \omega \)
Alpha\Alpha\( \Alpha \)
Beta\Beta\( \Beta \)
Gamma\Gamma\( \Gamma \)
Delta\Delta\( \Delta \)
Epsilon\Epsilon\( \Epsilon \)
Zeta\Zeta\( \Zeta \)
Eta\Eta\( \Eta \)
Theta\Theta\( \Theta \)
Iota\Iota\( \Iota \)
Kappa\Kappa\( \Kappa \)
Lambda\Lambda\( \Lambda \)
Mu\Mu\( \Mu \)
Nu\Nu\( \Nu \)
Xi\Xi\( \Xi \)
Omicron\Omicron\( \Omicron \)
Pi\Pi\( \Pi \)
Rho\Rho\( \Rho \)
Sigma\Sigma\( \Sigma \)
Tau\Tau\( \Tau \)
Phi\Phi\( \Phi \)
Upsilon\Upsilon\( \Upsilon \)
Chi\Chi\( \Chi \)
Psi\Psi\( \Psi \)
Omega\Omega\( \Omega \)
digamma\digamma\( \digamma \)
aleph\aleph\( \aleph \)
lambda\lambda\( \lambda \)
bet\beth\( \beth \)
gimel\gimel\( \gimel \)
dalet\dalet\( \dalet \)
ell\ell\( \ell \)
turnedCapitalF\Finv\( \Finv \)
turnedCapitalG\Game\( \Game \)
weierstrass\wp\( \wp \)
eth\eth\( \eth \)
invertedOhm\mho\( \mho \)
hBar\hbar\( \hbar \)
hSlash\hslash\( \hslash \)
blacksquare\hslash\( \hslash \)
bottom\bot\( \bot \)
bullet\bullet\( \bullet \)
circle\circ\( \circ \)
diamond\diamond\( \diamond \)
times\times\( \times \)
top\top\( \top \)
square\square\( \square \)
star\star\( \star \)
  • The following names, when used as a subscript or superscript, may be replaced with a corresponding LaTeX command:
Subscript/SupscriptLaTeX
plus{}_{+} / {}^{+}\( x_{+} x^+\)
minus{}_{-} /{}^{-}\( x_{-} x^-\)
pm{}_\pm /{}^\pm\( x_{\pm} x^\pm \)
ast{}_\ast /{}^\ast\( {x}_\ast x^\ast \)
dag{}_\dag /{}^\dag\( {x}_\dag x^\dag \)
ddag{}_\ddag {}^\ddag\( {x}_\ddag x^\ddag \)
hash{}_\# {}^\#\( {x}_# x^#\)
  • Multi-letter identifiers may be rendered with a \mathit{}, \mathrm{} or \operatorname{} command.

  • Identifier fragments ending in digits may be rendered with a corresponding subscript.

IdentifierLaTeX
time\mathrm{time}\( \mathrm{time} \)
speed_italic\mathit{speed}\( \mathit{speed} \)
P_blackboard__plus\mathbb{P}^{+}\mathbb{P}^+
alpha\alpha\( \alpha \)
mu0\mu_{0}\( \mu_0 \)
m56m_{56}\( m_{56} \)
c_max\mathrm{c_{max}}\( \mathrm{c_{max}} \)

Metadata​

MathJSON object literals may be annotated with supplemental information.

A number represented as a JSON number literal, a symbol or string represented as a JSON string literal, or a function represented as a JSON array must be transformed into the equivalent object literal to be annotated.

The following metadata keys are recommended:

KeyNote
wikidataA short string indicating an entry in a wikibase.
This information can be used to disambiguate the meaning of an identifier. Unless otherwise specified, the entry in this key refers to an enty in the wikidata.org wikibase
commentA human readable plain string to annotate an expression, since JSON does not allow comments in its encoding
documentationA Markdown-encoded string providing documentation about this expression.
latexA visual representation in LaTeX of the expression.
This can be useful to preserve non-semantic details, for example parentheses in an expression or styling attributes
sourceUrlA URL to the source of this expression
sourceContentThe source from which this expression was generated.
It could be a LaTeX expression, or some other source language.
sourceOffsetsA pair of character offsets in sourceContent or sourceUrl from which this expression was produced
hashA string representing a digest of this expression.
{
"sym": "Pi",
"comment": "The ratio of the circumference of a circle to its diameter",
"wikidata": "Q167",
"latex": "\\pi"
}

{
"sym": "Pi",
"comment": "The greek letter ∏",
"wikidata": "Q168",
}

MathJSON Standard Library​

This document defines the structure of MathJSON expression. The MathJSON Standard Library defines a recommended vocabulary to use in MathJSON expressions.

Before considering inventing your own vocabulary, check if the MathJSON Standard Library already provides relevant definitions.

The MathJSON Standard Library includes definitions for:

Topic
ArithmeticAdd Multiply Power Exp Log ExponentialE ImaginaryUnit...
CalculusD Derivative Integrate...
CollectionsList Reverse Filter...
ComplexReal Conjugate ComplexRoots...
Control StructuresIf Block Loop ...
CoreDeclare Assign Error LatexString...
FunctionsFunction Apply Return ...
LogicAnd Or Not True False ForAll ...
SetsUnion Intersection EmptySet RealNumbers Integers ...
Special FunctionsGamma Factorial...
StatisticsStandardDeviation Mean Erf...
StylingDelimiter Style...
TrigonometryPi Cos Sin Tan...

When defining a new function, avoid using a name already defined in the Standard Library.