[tor-bugs] #28868 [Core Tor/sbws]: best_priority() can starve the worker threads of good relays

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Dec 17 07:49:13 UTC 2018


#28868: best_priority() can starve the worker threads of good relays
---------------------------+-----------------------------------
 Reporter:  teor           |          Owner:  (none)
     Type:  defect         |         Status:  new
 Priority:  Medium         |      Milestone:  sbws: 1.0.x-final
Component:  Core Tor/sbws  |        Version:  sbws: 1.0.2
 Severity:  Normal         |     Resolution:
 Keywords:                 |  Actual Points:
Parent ID:  #28663         |         Points:
 Reviewer:                 |        Sponsor:
---------------------------+-----------------------------------

Comment (by juga):

 Replying to [ticket:28868 teor]:
 > `best_priority()` tries to measure unmeasured and failing relays first.
 >
 > But if `fraction_relays` or `min_relays` always fail, those relays will
 always end up first in the priority queue. (More precisely, those relays
 will end up first in the priority queue, until the results of the good
 relays ~~time out~~ are discarded for being too old.)
 >
 > Thinking about starvation is complicated, because of the
 `freshness_reduction_factor` on some errors.
 >
 > Here's a very simple algorithm that avoids starving good relays for
 failed relays:
 > 1. Count the number of times that sbws has attempted to get a result
 from each relay.

 This is already done when writing the results: ResultError and
 ResultSuccess keep that.

 > 2. Test the relays with the lowest number of attempts first. (Don't
 check if the attempt succeeded or failed.)

 This's what i was proposing by commenting the part where it prioritizes
 ResultError measurements.
 >
 > For this priority rule to work, every time a relay is queued, it must
 get a result. Here's how we can make that happen"
 > * Modify `result_putter_error()` to store an error result to the queue.

 result_putter already writes ResultError.

 Here there're two other bugs, result_putter_error, only happens when:
 1. The relay being measured, doesn't have a descriptor (#28870)
 2. The operator hits KeyboardInterrupt (#28869)

 AFAIK, there're no other cases where the error callback is called.
 > * Make sure timeouts store an error result to the queue.
 > * Add a unit test and integration test that makes sure every queued
 relay has a result.

 Testing this is hard, but i'll see.

 > Here's an alternative that might be simpler to implement:
 > * before a relay is queued using `pool.apply_async()` in
 `run_speedtest()`, store a `ResultAttempt` to the queue
 > * only count `ResultAttempt`s when prioritising relays

 I don't see this easier. I'll evaluate after other changes has been made
 in #28663

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28868#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list