[tor-bugs] #24384 [Metrics/Onionoo]: Decode percent-encoded characters in qualified search terms

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Nov 27 21:49:19 UTC 2017


#24384: Decode percent-encoded characters in qualified search terms
-----------------------------+--------------------------------
 Reporter:  karsten          |          Owner:  metrics-team
     Type:  defect           |         Status:  needs_revision
 Priority:  High             |      Milestone:  Onionoo-1.8.0
Component:  Metrics/Onionoo  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:                   |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+--------------------------------
Changes (by karsten):

 * status:  merge_ready => needs_revision


Comment:

 Okay, I looked deeper into this issue. I'm writing down here what I found,
 though I admit that some facts are only based on my memory, as I have a
 few dozen browser tabs open here and have totally lost control over my
 ~~mind~~desktop. Anyway, here's what I found, to be treated with care:
  - Relay Search encodes the space character as `%20`. So far so good, we
 handled that before, and we handle that with the patch.
  - Relay Search encodes `"` as `%22`, `'` as `%27`, `<` as `%3C`, and `>`
 as `%3E`. We'll have to decode these in Onionoo, which we don't do yet.
  - Relay Search does not encode `#` as `%23` but instead cuts off anything
 starting at that character. This means we cannot search for `#` in contact
 lines or other, future qualified search terms. I believe it should
 percent-encode the `#` character, but I could be wrong.
  - Relay Search does not encode `%` as `%25`. This leads to an
 `IllegalArgumentException` in the newly patched (and not yet merged)
 Onionoo code. I'm not entirely certain, but I'd think that this is illegal
 on the Relay Search side. And we'll have to catch this in Onionoo and send
 back a 400 status code.
  - Relay Search does not encode `&` as `%26`. I think it'll have to do
 this, because we're looking for the `&` character as delimiter to the next
 parameter. I think the current Onionoo patch doesn't handle this very
 well, by first decoding the entire query string and then splitting it up
 at `&` characters which may have been `%26`'s that we decoded ourselves.
  - Relay Search does not encode `+` as `%2B`. And to be clear, this is not
 strictly required by the Onionoo specification. On the Onionoo side we
 should treat the unencoded `+` the same regardless of whether it appears
 in, say, `search=contact:bs+tor` or `contact=bs+tor`. But we don't:
    - In the first case, `search=contact:bs+tor`, we treat the `+` as `+`
 and not as space character and only return relays with `bs+tor` in the
 contact line. This seems reasonable to me, even though we specify in the
 "contact" parameter that `+` must be encoded as `%2B`. But this is the
 "search" parameter.
    - In the second case, `contact=bs+tor`, Jetty converts the `+` to a
 space character, so that we return all relays with `bs` ''and'' with `tor`
 in the contact line, which clearly outnumber those with `bs+tor` in the
 contact line. I think we should fix this second case by doing the
 conversion of all parameter values ourselves and not leaving that to
 Jetty. If we do, we can remove the requirement to encode `+` in the
 "contact" parameter. But let's focus on the Relay Search side first.
  - Relay Search does not encode `/` as `%2F`. Random example: The search
 for `AJQ2YvMOznqi0SYrNk/AQx94+Aw` does not even trigger a query to
 Onionoo. Only if I truncate at the `/` by searching for
 `AJQ2YvMOznqi0SYrNk`, I get a result. This is a bug in Relay Search that
 we need to fix.
  - Our unit tests in `ResponseServletTest` treat the `+` character
 different than Jetty by not decoding it to a space character.
  - Last but not least, when we attempted to support space characters in
 the qualified search term, like `search=contact:"a b c"`, we overlooked
 that this does not match only relays with `"a b c"` as contact line part
 but all contact lines containing `"a"`, `"b"`, and `"c"` ''somewhere'' in
 the contact line. Potentially useful, but not the use case we had in mind.
 Hilarious!

 How about we take this out of 1.8.0 and resolve all related issues with
 more time on our hands? And I don't mean to ignore this ticket until a few
 days before the next planned release. But let's take the time we need to
 fix everything that needs fixing, and let's include it in the next release
 when it's ready. Sound okay?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24384#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list