In the previous site-update I’d mentioned that the ‘right’ fix for me to do would be to update the rendering pipeline for this site to use a different flavor of Markdown, preferrably CommonMark.
That has happened! And it wasn’t too bad of an adjustment.
I use Pandoc as the core of how the content on this site gets created and it’s default ‘Pandoc Flavored Markdown’ has served me well.
However CommonMark-flavored-Markdown seems to have taken over in the spaces I operate day-to-day - GitLab/GitHub both use a variation on CommonMark for their Markdown rendering, as-does the in-IDE Markdown-preview in VS Code (which is where I typically do editing).
I do have the ability to generate this site locally, so that I can preview what’s going to go out, but having the renderer for this site be the same as the other renderers I use day-to-day will make for a smoother content-editing experience
At the most basic, I adjusted the ‘from’ parameter I pass to pandoc when building contet for this site:
- PANDOC_OPTS := -f markdown -t html5 -s --template $(template) --shift-heading-level-by=1
+ PANDOC_OPTS := -f commonmark_X -t html5 -s --template $(template) --shift-heading-level-by=1
The first thing that happened is I got loads of warnings from pandoc about a ‘title’ being required by the template I’m using, but the title not being available, and the ‘recent posts’ section of this site breaking. That’s no good!
Pandoc supports placing meta-data about a document in-line in the document itself, which is then available to the output-template. It has two ways the author can format things:
With a %
-prefixed header-block at the top of the
document:
% Post Title
% Post Author
% 2022-12-21
Post body with *Markdown* syntax
Or as a block of yaml, with arbitrary fields expected by your template:
---
title: Post title
date: 2022-12-21
...
Post body with *Markdown* syntax
(I’ve left out the ‘author’ field because my template doesn’t use it.)
All of my posts used the first method, however the commonmark_x parser in Pandoc only supports the second method (which makes sense - it’s more flexible).
The update was tedious but straightforward:
The new parser is brutal but effective:
// r is an io.Reader holding the raw markdown content
:= bufio.NewScanner(r)
s
// Format:
// ---
// title: some-title
// date: some-date
// ...
if !s.Scan() {
return
}
if !bytes.Equal(s.Bytes(), []byte("---")) {
return
}
var done bool
var lines [][]byte
= append(lines, slices.Clone(s.Bytes()))
lines for s.Scan() {
= append(lines, slices.Clone(s.Bytes()))
lines if bytes.Equal(s.Bytes(), []byte("...")) {
= true
done break
}
}
if !done {
return
}
var parsed struct {
string `yaml:"title"`
Title *localdate.LocalDate `yaml:"date"`
Date }
:= bytes.Join(lines, []byte("\n"))
data
= yaml.Unmarshal(data, &parsed)
err if err != nil {
return
}
= parsed.Title
title = parsed.Date
date
return
(This parser assumes that if a document starts with ---
it will find ...
quickly. If this parser were exposed to an
adversary I would need to be more defensive and make sure the
look-ahead/memory-consumption is bounded.)
I also lost some formatting - Pandoc Markdown allows placing a
\
at the end of a line to insert a line-break but also keep
the rendered lines together (I think this is called ‘keep with next’ in
MS Word, for example):
Line one\ Line two
The Commonmark spec claims to
support this syntax, but it doesn’t work in the contexts I’m using it so
I stripped it out by hand. I might be able to do something similar with
<br>
tags, but it’s not crucial.