[tor-dev] Proposal 345: Migrating the tor specifications to mdbook

Nick Mathewson nickm at torproject.org
Tue Oct 3 12:45:14 UTC 2023


```
Filename: 345-specs-in-mdbook.md
Title: Migrating the tor specifications to mdbook
Author: Nick Mathewson
Created: 2023-10-03
Status: Open
```

# Introduction

I'm going to propose that we migrate our specifications to a
set of markdown files, specifically using the [`mdbook`]
tool.

This proposal does _not_ propose a bulk rewrite of our specs;
it is meant to be a low-cost step forward that will produce
better output, and make it easier to continue working on our
specs going forward.

That said, I think that this change will enable rewrites
in the future.  I'll explain more below.

# What is mdbook?

Mdbook is a tool developed by members of the Rust community to
create books with [Markdown].  Each chapter is a single
markdown file; the files are organized into a book using a
[`SUMMARY.md`] file.

Have a look at the [`mdbook` documentation][`mdbook`]; this is
what the output looks like.

Have a look at this [source tree]: that's the input that
produces the output above.

Markdown is extensible: it can use numerous [plugins] to
enhance the semantics of the the markdown input, add diagrams,
output in more formats, and so on.

# What would using mdbook get us _immediately_?

There are a bunch of changes that we could get immediately via
even the simplest migration to mdbook.  These immediate
benefits aren't colossal, but they are things we've wanted for
quite a while.

* We'll have a document that's easier to navigate (via the sidebars).

* We'll finally have good HTML output.

* We'll have all our specifications organized into a single
  "document", able to link to one another and cross reference
  one another.

* We'll have perma-links to sections.

* We'll have a built-in text search function.  (Go to the [`mdbook`]
  documentation and hit "`s`" to try it out.)

## How will mdbook help us _later on_ as we reorganize?

Many of the benefits of mdbook will come later down the line as we
improve our documentation.

* Reorganizing will become much easier.

  * Our links will no longer be based on section number, so we
    won't have to worry about renumbering when we add new sections.
  * We'll be able to create redirects from old section filenames
    to new ones if we need to rename a file completely.
  * It will be far easier to break up our files into smaller files
    when we find that we need to reorganize material.

* We will be able make our documents even _easier_ to navigate.

  * As we improve our documentation, we'll be able to use links
    to cross-reference our sections.

* We'll be able to include real diagrams and tables.

  * We're already using the [`mermaid`] tool in Arti in
    generate [protocol diagrams] and [architecture diagrams];
    we can use this in our specs too, instead of hand-drawn
    ASCII art.

* We'll be able to integrate proposals more easily.

  * New proposals can become new chapters in our specification
    simply by copying them into a new 'md' file or files; we won't
    have to decide between integrating them into existing files
    or creating a new spec.

  * Implemented but unmerged proposals can become additional chapters
    in an appendix to the spec. We can refer to them with permalinks
    that will still work when they move to another place in the specs.


# How should we do this?


## Strategy

My priorities here are:
 * no loss of information,
 * decent-looking output,
 * a quick automated conversion process that won't lose a bunch of time.
 * a process that we can run experimentally until we are satisfied
   with the results

With that in mind, I'm writing a simple set of [`torspec-converter`]
scripts to convert our old torspec.git repository into its new format.
We can tweak the scripts until we like the that they produce.

After running a recent `torspec-converter` on a fairly recent
torspec.git, here is how the branch looks:

https://gitlab.torproject.org/nickm/torspec/-/tree/spec_conversion?ref_type=heads

And here's the example output when running mdbook on that branch:

https://people.torproject.org/~nickm/volatile/mdbook-specs/index.html

> Note: these is not a permanent URL; we won't keep the example output
> forever.  When we actually merge the changes, they will move into
> whatever final location we provide.

The conversion script isn't perfect. It only recognizes three kinds of
content: headings, text, and "other".  Content marked "other" is
marked with \`\`\` to reneder it verbatim.

The choice of which sections to split up and which to keep as a single
page is up to us; I made some initial decisions in the file above, but
we can change it around as we please.  See the [configuration section]
at the end of the `grinder.py` script for details on how it's set up.

## Additional work that will be needed

Assuming that we make this change, we'll want to build an automated CI
process to build it as a website, and update the website whenever
there is a commit to the specifications.

(This automated CI process might be as simple as `git clone && mdbook
build && rsync -avz book/ $TARGET`.)

We'll want to go through our other documentation and update links,
especially the permalinks in spec.torproject.org.

It might be a good idea to use spec.torproject.org as the new location
of this book, assuming weasel (who maintains spec.tpo) also thinks
it's reasonable.
If we do that, we need to
decide on what we want the landing page to look like, and we need
_very much_ to get our permalink story correct.  Right now I'm
generating a .htaccess file as part of the conversion.


## Stuff we shouldn't do.

I think we should continue to use the existing torspec.git repository
for the new material, and just move the old text specs into a new
archival location in torspec. (We *could* make a new repository
entirely, but I don't think that's the best idea.  In either case, we
shouldn't change the text specifications after the initial
conversion.)


We'll want to figure out our practices for keeping links working as we
reorganize these documents.  Mdbook has decent redirect support, but
it's up to us to actually create the redicrets as necessary.



# The transition, in detail

* Before the transition:
    - Work on the script until it produces output we like.
    - Finalize this proposal and determine where we are hosting everything.
    - Develop the CI process as needed to keep the site up to date.
    - Get approval and comment from necessary stakeholders.
    - Write documentation as needed to support the new way of doing things.
    - Decide on the new layout we want for torspec.git.

* Staging the transition:
    - Make a branch to try out the transition; explicitly allow
force-pushing that branch.  (Possibly nickm/torspec.git in a branch
called mdbook-demo, or torspec.git in a branch called mdbook-demo
assuming it is not protected.)
    - Make a temporary URL to target with the transition (possibly
spec-demo.tpo)
    - Once we want to do the transition, shift the scripts to
tpo/torspec.git:main and spec.tpo, possibly?

* The transition:
    - Move existing specs to a new subdirectory in torspec.git.
    - Run the script to produce an mdbook instance in torspec.git with
the right layout.
    - Install the CI process to keep the site up to date.

* Post-transition
    - Update links elsewhere.
    - Continue to improve the specs.

# Integrating proposals

We could make all of our proposals into a separate book, like rust
does at https://rust-lang.github.io/rfcs/ .  We could also leave them
as they are for now.

(I don't currently think we should make all proposals part of the spec
automatically.)


# Timing

I think the right time to do this, if we decide to move ahead, is
before November.  That way we have this issue as something people can
work on during the docs hackathon.

# Alternatives

I've tried experimenting with Docusaurus here, which is even more
full-featured and generates pretty react sites
[like this](https://docusaurus.io/).
(We're likely to use it for managing the Arti documentation and
website.)

For the purposes we have here, it seems slightly overkill, but I do
think a migration is feasible down the road if we decide we _do_ want
to move to docusaurus.  The important thing is the ability to keep our
URLs working, and I'm confident we could do that

The main differences for our purposes here seem to be:

 * The markdown implementation in Docusaurus is extremely picky about
   stuff that looks like HTML but isn't; it rejects it, rather than
   passing it on as text.  Thus, using it would require a more
   painstaking conversion process before we could include text like
   `"<state:on>"` or `"A <-> B"` as our specs do in a few places.

 * Instead of organizing our documents in a `SUMMARY.md` with an MD
   outline format, we'd have to organize them in a `sidebar.js` with a
   javascript syntax.

 * Docusaurus seems to be far more flexible and have a lot more
   features, but also seems trickier to configure.



<-- References -->

[`mdbook`]: https://rust-lang.github.io/mdBook/
[source tree]: https://github.com/rust-lang/mdBook/tree/master/guide/src/
[Markdown]: https://en.wikipedia.org/wiki/Markdown
[`SUMMARY.md`]: https://rust-lang.github.io/mdBook/format/summary.html
[plugins]: https://github.com/rust-lang/mdBook/wiki/Third-party-plugins
[`mermaid`]: https://mermaid.js.org/
[architecture diagrams]:
https://gitlab.torproject.org/tpo/core/arti/-/blob/main/doc/dev/Architecture.md
[protocol diagrams]:
https://gitlab.torproject.org/tpo/core/arti/-/blob/main/doc/dev/hs-overview.md
[`torspec-converter`]: https://gitlab.torproject.org/nickm/torspec-converter
[configuration section]:
https://gitlab.torproject.org/nickm/torspec-converter/-/blob/main/grinder.py?ref_type=heads#L310


More information about the tor-dev mailing list