blog post image
Andrew Lock avatar

Andrew Lock

~9 min read

Creating a .NET AsciiMath parser and using it in a Markdig extension

In this post I describe how and why I ported an AsciiMath to MathML converter to .NET, and how I created a small extension to Markdig that uses the new AsciiMath parser.

AsciiMath, MathML and libraries

In my previous couple of posts I've been discussing AsciiMath as a simpler way to write MathML code that is rendered as math in the browser. For example, AsciiMath that looks like this:

int_(-1)^1 sqrt(1-x^2)dx = pi/2

is converted to MathML that looks like this:

<math display="block"><mrow><msubsup><mo></mo><mrow><mo>-</mo><mn>1</mn></mrow><mn>1</mn></msubsup></mrow><msqrt><mrow><mn>1</mn><mo>-</mo><msup><mi>x</mi><mn>2</mn></msup></mrow></msqrt><mrow><mi>d</mi><mi>x</mi></mrow><mo>=</mo><mfrac><mi>π</mi><mn>2</mn></mfrac></math>

which renders like this:

111x2dx=π2

In my previous post I described the various AsciiMath parsers available, and how there were 3 main JavaScript implementations and a ruby implementation. In that post, I then showed how you could use Jint along with JavaScriptEngineSwitcher to run a JavaScript AsciiMath converter inside your .NET app for doing the conversion.

This approach got the job done, but it feels a little bit janky. The JavaScript library I ended up using worked, but it was relatively out of date, didn't support a huge number of symbols, and ultimately felt like a bit of a hack. Unfortunately I'd not had a lot of success working with the maintained-but-more-complex JavaScript alternatives, so my eye turned again to the Ruby implementation.

As I mentioned in my last past, I'd already investigated running Ruby code inside a .NET app, and ultimately discounted it. So instead, I started eyeing up what it would take to port the library to .NET.

Porting the Ruby AsciiDoctor implementation to .NET

I'm not exactly sure why I chose to port the Ruby code implementation rather than the canonical JavaScript implementation, but I think it's partly due to the simplicity of the ruby API. The "getting started" for the library shows that converting a string AsciiMath to MathML is as simple as:

require 'asciimath'

parsed_expression = AsciiMath.parse(asciimath)
math_ml = parsed_expression.to_mathml

That's exactly the sort of API I wanted! In addition to that simplicity:

  • The whole "core" definition, including parsing and conversion to MathML is implemented in less than 10 files (Yes, some of those files are almost 1,000 lines unfortunately 😅)
  • It had a bunch of tests
  • It supports converting to LaTex too (though I didn't bother investigating that part)

The fact it was in Ruby (with which I have zero experience, other than reading Seven Languages in Seven Weeks) was a bit of a turn-off but I decided to just chip away at in my evenings for a couple of weeks, until finally I had something that worked!

Interestingly, this is the first time I've actually used ChatGPT to do something anything more than mess about. It was brilliant for being able to paste in some syntax and have it provide an understandable description of the results. Normally I prefer googling for stuff and reading an authoritative source, but searching for syntax is really hard to do, so this massively sped up the process.

I didn't use ChatGPT for the conversion per se, just to understand what the original Ruby code was doing. I then used that as a basis for writing the .NET code.

The final result was a small NuGet package called AsciiMath 🎉

The AsciiMath NuGet package

Introducing the AsciiMath NuGet package

Much like the Ruby implementation, this NuGet package provides a vey simple interface. First add the package to your application using:

dotnet add package AsciiMath --prerelease

I've only released a pre-release version so far, as I've been iterating very quickly to make it useful.

The current interface is literally a static method that you pass AsciiMath as a string and get back MathML as a string:

var asciiMath = "int_-1^1 sqrt(1-x^2)dx = pi/2";

string converted = Parser.ToMathMl(asciiMath); 

Console.WriteLine(converted); // <math><msub><mo>&#x222B;</mo><mo>&#x2212;</mo></msub><msup><mn>1</mn><mn>1</mn></msup><msqrt><mrow><mn>1</mn><mo>&#x2212;</mo><msup><mi>x</mi><mn>2</mn></msup></mrow></msqrt><mi>dx</mi><mo>=</mo><mfrac><mi>&#x3C0;</mi><mn>2</mn></mfrac></math>

And that's all there is to it! There's also an optional settings object you can pass into the ToMathMl function:

/// </summary>
public class MathMlOptions
{
    internal static readonly MathMlOptions Defaults = new();

    public MathMlDisplayType DisplayType { get; set; } = MathMlDisplayType.None;

    public bool IncludeTitle { get; set; } = false;
}
  • DisplayType controls whether the final <math> element has display="inline" or display="bock" attribute
  • IncludeTitle controls whether the AsciiMath string should be added as a title attribute on the final <math> element

And that's it. A simple .NET library for converting AsciiMath to MathML.

After I published my JavaScriptEngine-based post, Sébastien Ros suggested using Parlot to create a native .NET version. Had I been aware of it, I almost certainly would have done instead of porting the Ruby code 😅 Maybe I'll update the internals in the future to use it as an experiment!

Differences from the Ruby and JavaScript implementations

My initial port of the AsciiDoctor/AsciiMath implementation was very faithful to the original. I didn't want to try and do much refactoring or anything until I had something working and passing the included tests. When I was initially writing the parsing code I decided to be notionally cognizant of performance, and used Span<T> throughout the core of the parsing.

To be clear, I've done no proper performance analysis or benchmarking. I'm sure there's a lot of areas that could be improved or rewritten, but that hasn't been my focus yet at all.

Another potentially controversial choice was to target only .NET 8. The main reason for that choice was simply that it's all I needed for my use case. I don't think I used any .NET 8-specific APIs, but also I'm not expecting (or trying) for this library to be widely used, so I didn't want to artificially hamstring myself.

Once I had the initial port complete I decided to make a couple of tweaks. The main change is to change how "known" operators like sin and log are rendered. AsciiDoctor renders those using the <mi> element, whereas some other AsciiMath implementations use <mo>. I decided to change my implementation to use <mo> too, mostly because of how it renders in the browser.

For example, if you take the AsciiMath expression d ln n and render it using <mi> for all the elements it renders like this: dlnn. In contrast, if we use <mo> for the known ln function, the spacing looks a lot better in the browser: dlnn.

The canonical JavaScript implementation also includes a suite of unit tests which I tried running against my implementation. About 50% of the tests passed, for a variety of different reasons. I reported my findings to AsciiDoctor, but many of the failures are differences of opinion as opposed to direct failures. There's also a matter of philosophy; the grammar for AsciiMath is very simple, so it depends how many special cases and workarounds you want to add to give a "better" experience. And what you consider "better".

The other main difference is that the library only converts from AsciiMath to MathML, whereas AsciiDoctor supports outputting Word-compatible MathML, LateX, and HTML. I don't have any intention to add support for these formats at this point.

Using the AsciiMath NuGet in a Markdig pipeline

As I mentioned in my original post about MathML and AsciiMath, my original motivation for looking at AsciiMath was that I wanted a nice way to add math formulas in my blog posts. My blog uses a custom .NET static site generator that takes in a bunch of Markdown and Razor files and uses Markdig to generate the final HTML.

Markdig already has support for detecting $..$ and $$..$$ patterns and rendering them as LaTeX/TeX expressions, but I wanted to use AsciiMath instead. So I decided to hack together support for AsciiMath by leveraging the built-in math support!

While this should work for you, I consider this a pretty hacky solution, so I have no intention of creating a NuGet package or anything like that. It's only 4 files though, so just copy-paste it if you want to use it!😄

The core part of the Markdig AsciiMath support is to implement an IMarkdownExtension, and to add renderers for both inline AsciiMath expressions and fenced/block AsciiMath expressions. The following code is based on the built-in MathExtension and is essentially identical. The first Setup() method is literally identical, in that it uses the built-in MathInlineParser and MathBlockParser:

using Markdig;
using Markdig.Extensions.Mathematics;
using Markdig.Renderers;

/// <summary>
/// Extension for adding inline mathematics $...$ using ascii-math
/// </summary>
/// <seealso cref="IMarkdownExtension" />
public class AsciiMathExtension : IMarkdownExtension
{
    public void Setup(MarkdownPipelineBuilder pipeline)
    {
        // Adds the inline parser
        if (!pipeline.InlineParsers.Contains<MathInlineParser>())
        {

            pipeline.InlineParsers.Insert(0, new MathInlineParser());
        }

        // Adds the block parser
        if (!pipeline.BlockParsers.Contains<MathBlockParser>())
        {
            // Insert before EmphasisInlineParser to take precedence
            pipeline.BlockParsers.Insert(0, new MathBlockParser());
        }
    }

    public void Setup(MarkdownPipeline pipeline, IMarkdownRenderer renderer)
    {
        if (renderer is HtmlRenderer htmlRenderer)
        {
            if (!htmlRenderer.ObjectRenderers.Contains<HtmlAsciiMathInlineRenderer>())
            {
                htmlRenderer.ObjectRenderers.Insert(0, new HtmlAsciiMathInlineRenderer());
            }
            if (!htmlRenderer.ObjectRenderers.Contains<HtmlAsciiMathBlockRenderer>())
            {
                htmlRenderer.ObjectRenderers.Insert(0, new HtmlAsciiMathBlockRenderer());
            }
        }
    }
}

The second method, which adds the renderers, defines two new types, the HtmlAsciiMathInlineRenderer and HtmlAsciiMathBlockRenderer. I wrote very simple implementations of these compared to the originals, again, because I didn't need anything more:

using Markdig.Extensions.Mathematics;
using Markdig.Renderers;
using Markdig.Renderers.Html;

public class HtmlAsciiMathBlockRenderer : HtmlObjectRenderer<MathBlock>
{
    protected override void Write(HtmlRenderer renderer, MathBlock obj)
    {
        // grab all the lines inside the fenced $$..$$ block
        var asciiMath = obj.Lines.ToString();

        // Pass it to the converter
        var converted = AsciiMathConverter.Convert(asciiMath, displayInline: false);

        // Write the rendered MathML
        renderer.Write(converted);
    }
}

public class HtmlAsciiMathInlineRenderer : HtmlObjectRenderer<MathInline>
{
    protected override void Write(HtmlRenderer renderer, MathInline obj)
    {
        var asciiMath = obj.Content.ToString();
        // did we use one $ or two $$?
        // If one, use display="inline", otherwise use display="block"
        // This differs from the "default" Markdig behaviour, but better suits my needs
        var displayInline = obj.DelimiterCount == 1;

        var converted = AsciiMathConverter.Convert(asciiMath, displayInline: displayInline);
        renderer.Write(converted);
    }
}

These renderers are very simplistic, they grab the contents, feed it to the AsciiMathConverter, and render the resulting string. The AsciiMathConverter.Convert() function has the same signature as the JavaScript version I showed in the previous post, but now we're using the AsciiMath NuGet package:

using AsciiMath;

public static class AsciiMathConverter
{
    public static string Convert(string asciiMath, bool displayInline)
    {
        return Parser.ToMathMl(asciiMath, new MathMlOptions()
        {
            DisplayType = displayInline ? MathMlDisplayType.Inline : MathMlDisplayType.Block,
            IncludeTitle = true,
        });
    }
}

The final piece is a small extension method to make it easier to add the Markdig extension to a pipeline:

using Markdig;

public static class PipelineExtensions
{
    /// Uses the ASCIIMath extension.
    /// </summary>
    /// <param name="pipeline">The pipeline.</param>
    /// <returns>The modified pipeline</returns>
    public static MarkdownPipelineBuilder UseAsciiMath(this MarkdownPipelineBuilder pipeline)
    {
        pipeline.Extensions.AddIfNotAlready<AsciiMathExtension>();
        return pipeline;
    }
}

Putting it all together, we can use this extension when building the Markdig pipeline:

var markdownPipeline = new MarkdownPipelineBuilder()
                .UseYamlFrontMatter()
                .UsePipeTables()
                .UseAutoLinks()
                .UseAsciiMath() // 👈 Use AsciiMath parser
                .Build();

And there we have it: a .NET AsciiMath parser that works with Markdig!

As I've hopefully made clear, this was mostly just a curiosity for me as to how difficult it would be to do. I don't want to vouch for the AsciiMath code being particularly production-ready, but it works fine on my blog from what I've seen so far, so feel free to take it for a spin if it interests you!

Summary

In this post I described how I ported the AsciiDoctor/AsciiMath Ruby implementation to .NET, and produced the AsciiMath NuGet package for converting an AsciiMath string into a MathML string. Finally, I showed how I created a small Markdig extension for working with AsciiMath. It partially uses the built-in Markdig math extension, but instead of converting TeX input to MathML, it uses the AsciiMath NuGet package.

Andrew Lock | .Net Escapades
Want an email when
there's new posts?