[tor-dev] [prop-meeting] [prop#285] "Directory documents should be standardized as UTF-8"

teor teor2345 at gmail.com
Wed Feb 14 00:17:50 UTC 2018



> On 14 Feb 2018, at 11:03, Damian Johnson <atagar at torproject.org> wrote:
> 
>> For the metrics tools there are some guidelines on this we can follow:
>> https://docs.oracle.com/javase/tutorial/i18n/text/design.html. The other
>> language would be Python (for stem), but Python developers have probably
>> got a good understanding of unicode/str/bytes by now. (In Python 3: when
>> using UTF-8, BOM will not be stripped and will be interpreted as data,
>> and you can have a NUL in a str).
> 
> Hi Iain. Actually, for Stem I'm really looking forward to this too.
> Stem has special handling for the contact and platform fields (iirc
> the only spot non-ascii content can presently appear). Stem's parsers
> and API will be simplified once everything is uniformly utf-8. :P
> 
> Possibly a stupid question but any reason not to require the whole
> descriptor document to be printable characters?

Requiring printable ASCII throughout the document means that people
can't spell their names and email addresses correctly in contact lines.

Requiring printable unicode introduces a dependency on a particular
unicode version, because we don't know if unallocated blocks will be
printable or not.

I think we could make platform lines printable ASCII without losing
much. Unless there are platforms that have non-ASCII names?

T

--
Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
ricochet:ekmygaiu4rzgsk6n
------------------------------------------------------------------------




-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20180214/d50dcae9/attachment.sig>


More information about the tor-dev mailing list