Re: [tor-dev] [prop-meeting] [prop#285] "Directory documents should be standardized as UTF-8"

14 Feb 2018

...
On 14 Feb 2018, at 11:03, Damian Johnson <atagar@torproject.org> wrote:
...
For the metrics tools there are some guidelines on this we can follow:
https://docs.oracle.com/javase/tutorial/i18n/text/design.html. The other
language would be Python (for stem), but Python developers have probably
got a good understanding of unicode/str/bytes by now. (In Python 3: when
using UTF-8, BOM will not be stripped and will be interpreted as data,
and you can have a NUL in a str).
Hi Iain. Actually, for Stem I'm really looking forward to this too.
Stem has special handling for the contact and platform fields (iirc
the only spot non-ascii content can presently appear). Stem's parsers
and API will be simplified once everything is uniformly utf-8. :P
Possibly a stupid question but any reason not to require the whole
descriptor document to be printable characters?
Requiring printable ASCII throughout the document means that people
can't spell their names and email addresses correctly in contact lines.

Requiring printable unicode introduces a dependency on a particular
unicode version, because we don't know if unallocated blocks will be
printable or not.

I think we could make platform lines printable ASCII without losing
much. Unless there are platforms that have non-ASCII names?

T

--
Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
ricochet:ekmygaiu4rzgsk6n
------------------------------------------------------------------------