[tor-bugs] #28663 [Core Tor/sbws]: sbws stops accumulating, silently

Thu Dec 13 14:02:02 UTC 2018

#28663: sbws stops accumulating, silently
---------------------------+-----------------------------------
 Reporter:  stefani        |          Owner:  (none)
     Type:  defect         |         Status:  new
 Priority:  Medium         |      Milestone:  sbws: 1.0.x-final
Component:  Core Tor/sbws  |        Version:  sbws: 1.0.2
 Severity:  Major          |     Resolution:
 Keywords:                 |  Actual Points:
Parent ID:  #28639         |         Points:
 Reviewer:                 |        Sponsor:
---------------------------+-----------------------------------

Comment (by juga):

 Replying to [comment:10 teor]:

 > * having sbws not stall when this fix is applied:

 I can confirm that enabling predicted circuits does not solve the problem.
 (https://trac.torproject.org/projects/tor/ticket/28639#comment:20)
 I can also confirm that commenting the lines
 (https://trac.torproject.org/projects/tor/ticket/28663#comment:9) (either
 both or one of them), solves the problem.
 However, it seems to me that there's an underlaying problem with threads
 and locks, and maybe it's just casuality that sbws doesn't stall
 commenting those lines.

 Trying to figure out what's happening, i realized:
 - relaylist, `self._refresh_lock = Lock()` . While it's instanciated, it's
 not being acquired nor released. It's run every 5min.
 - resultdump, it's locking the directory before writing the result, but
 it's not locking the Result itself
 - scanner, `measure_relay()`
   - first calls `Destination.next()`, which instantiates an `RLock`,
 acquire, then call `Destination.is_usable`, then call
 `connect_destination_over_circuit()`, which instantiates an `RLock`, then
 relase the first `RLock`.
   - some lines laters, it calls again `Destination.is_usable`.

 So, things i could do to fix this:
 - lock in the prioritization loop, maybe resultdump
 - ensure that locks acquired and relased
 - fix the double locking in usability tests, but here there's an
 additionally problem.

 I've observed that sometimes sbws fails several times in a row to perform
 the usability test (while the destination is actually alive), which blocks
 sbws 5 min.
 Some solutions to this:
 - don't perform usability test, instead count how many relays fail to be
 measured in a row, after some number warn that maybe it's the destination
 that is down. I prefer to try this, i think this would remove a lot of
 complexity in the code.
 - perform the usability test not through Tor
 - if the usability test fail, the time to perform a new one could be
 different (bigger) than the one it sleeps (smaller).

 Before changing any code and because there might be additional complexity
 i've not spot yet, i'm going to look at which are all the calls that a
 thread pass by and which ones are reading/writing the same memory.
 Hopefully i'll get to understand why commenting lines makes sbws to don't
 stall.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28663#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online