[tor-bugs] #11821 [metrics-lib]: DescriptorReader gets confused by files containing multiple descriptors if they contain non-ASCII characters

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu May 8 15:11:39 UTC 2014


#11821: DescriptorReader gets confused by files containing multiple descriptors if
they contain non-ASCII characters
-------------------------+-------------------------
 Reporter:  karsten      |          Owner:  karsten
     Type:  defect       |         Status:  new
 Priority:  normal       |      Milestone:
Component:  metrics-lib  |        Version:
 Keywords:               |  Actual Points:
Parent ID:               |         Points:
-------------------------+-------------------------
 When parsing a file containing multiple server descriptors with
 DescriptorReader, the file content is first separated into parts starting
 with `"router "`, and these chunks are then parsed.  However, it seems we
 have an encoding problem there:

 If a server descriptor contains non-ASCII characters, like in its platform
 or contact line, we don't cut off at the right character, so that the next
 descriptor seems to start with `"\nrouter "`.  Empty lines are not
 allowed, so we don't accept that descriptor.

 What's funny is that this problem only happens on the console, not when
 run in Eclipse.  Probably related to different locale settings.

 Not sure what the right fix is.  Maybe we should split input strings as
 long as they're contained in byte[].

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/11821>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list