[tor-dev] Proposal 285: Directory documents should be standardized as UTF-8

Alex Xu alex_y_xu at yahoo.ca
Wed Jan 10 01:36:22 UTC 2018


Quoting teor (2018-01-10 00:19:54)
> These are called "Unicode Scalar Values".
> https://www.unicode.org/glossary/#unicode_scalar_value
> 
> Let's reference that.

"Unicode Scalar Value" includes U+0, which I think we probably want to
exclude.

> >        * each encoded with the shortest possible encoding.
> >        * without any BOM
> > 
> > Are there other restrictions we should make?  If so, how should we phrase them?
> 
> These seem fine, and not tied to a particular unicode version.
> 
> But I don't know enough about Unicode to know if there is anything else we should
> specify.

Skimming through
https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt, I think it
might be good to additionally forbid the code points listed at the end:
U+nFFF{E,F} for n = 0..10, and U+FDD0 through U+FDEF.


More information about the tor-dev mailing list