On 13 Feb 2018, at 21:55, Iain Learmonth irl@torproject.org wrote:
Hi,
On 12/02/18 23:55, isis agora lovecruft wrote:
- What passes for "canonicalised" "utf-8" in C will be different to what passes for "canonicalised" "utf-8" in Rust. In C, the following will not be allowed (whereas they are allowed in Rust): - NUL (0x00) - Byte Order Mark (0xFEFF)
Much of the metrics software is written in Java. Java strings allow for NUL to appear, but assume that there is no BOM. If a BOM appears, then this would be interpreted as data and, I assume, parsing would probably fail. Should the whole document be rejected if it contains a NUL or BOM, or should these values be stripped and then carry on parsing as if it never happened?
Directory authorities and bridge clients already reject descriptors that contain NUL. (This is an artefact of the C implementation: the descriptor is seen as truncated, so it won't parse.)
We should specify rejection for BOM as well.
- Directory document keywords MUST be printable ASCII.
This can be validated. Should a single document keyword containing printable non-ASCII be enough to reject the document, or should a parser try to recover?
If parsers want to be consistent with the Tor implementation, they should reject.
I'd really like to see a section in the proposal about how parsers should react when they find something unexpected, otherwise all the parsers may end up doing different things.
+1
- This change may break some descriptor/consensus/document parsers. If you are the maintainer of a parser, you may want to start thinking about this now.
For the metrics tools there are some guidelines on this we can follow: https://docs.oracle.com/javase/tutorial/i18n/text/design.html. The other language would be Python (for stem), but Python developers have probably got a good understanding of unicode/str/bytes by now. (In Python 3: when using UTF-8, BOM will not be stripped and will be interpreted as data, and you can have a NUL in a str).
Python for txtorcon Rust for Tor's experimental protover implementation
And perhaps others: https://stem.torproject.org/faq.html#are-there-any-other-controller-librarie... https://trac.torproject.org/projects/tor/wiki/doc/ListOfTorImplementations
T