[tor-bugs] #13209 [Metrics/Analysis]: Write a hidden service hsdir health measurer

Tue Dec 20 16:23:49 UTC 2016

#13209: Write a hidden service hsdir health measurer
------------------------------------------------+--------------------------
 Reporter:  arma                                |          Owner:
     Type:  project                             |         Status:  closed
 Priority:  High                                |      Milestone:
Component:  Metrics/Analysis                    |        Version:  Tor:
                                                |  0.2.7
 Severity:  Normal                              |     Resolution:  fixed
 Keywords:  SponsorR, tor-hs, 027-triaged-1-in  |  Actual Points:
Parent ID:                                      |         Points:
                                                |  medium/large
 Reviewer:                                      |        Sponsor:
------------------------------------------------+--------------------------
Changes (by dgoulet):

 * status:  assigned => closed
 * resolution:   => fixed
 * severity:   => Normal

Comment:

 Ok I will close this ticket but before here are some conclusions about
 this and possibly future work. I'm attaching to this ticket the raw result
 taken from May 29th, 2015 to June 14th, 2016. You can find the CSV file
 specification in https://gitlab.com/hs-health/hs-health/blob/master
 /analyze-csv.py#L93

 This experiment showed us few things. With a client always using the
 latest consensus, here are the results of the 6 stable .onion we've
 monitored (output from analyze-csv.py).

 {{{
 Log health.csv period is from 29 May 2015 16:36:03 to 15 Jun 2016 00:08:15
 (9175 hours)
 --> 2.721% failed fetch (3958/145435).
     On average once we fail to fetch once on a specific HSDir, the
 descriptor was missing for 01:14:31 (4471 seconds).

 [+] wlupld3ptjvsgwqw.onion
     3.35% of failed fetch (913/27270) for an average time of 01:29:09
 minutes (5349 seconds)
     After first fail on an HSDir, we have 7.55 failed attempt(s) before
 success
     Churn happened 1.319% of the time (121 times)

 [+] 3g2upl4pq6kufc4m.onion
     1.80% of failed fetch (524/29099) for an average time of 00:50:19
 minutes (3019 seconds)
     After first fail on an HSDir, we have 3.94 failed attempt(s) before
 success
     Churn happened 1.450% of the time (133 times)

 [+] agorahooawayyfoe.onion
     5.07% of failed fetch (596/11744) for an average time of 01:21:02
 minutes (4862 seconds)
     After first fail on an HSDir, we have 6.77 failed attempt(s) before
 success
     Churn happened 0.959% of the time (88 times)

 [+] 4cjw6cwpeaeppfqz.onion
     3.11% of failed fetch (886/28495) for an average time of 01:28:32
 minutes (5312 seconds)
     After first fail on an HSDir, we have 7.38 failed attempt(s) before
 success
     Churn happened 1.308% of the time (120 times)

 [+] zti6p7h6spbtx5xr.onion
     3.05% of failed fetch (497/16289) for an average time of 01:18:47
 minutes (4727 seconds)
     After first fail on an HSDir, we have 6.54 failed attempt(s) before
 success
     Churn happened 0.828% of the time (76 times)

 [+] facebookcorewwwi.onion
     1.90% of failed fetch (542/28580) for an average time of 01:04:11
 minutes (3851 seconds)
     After first fail on an HSDir, we have 4.93 failed attempt(s) before
 success
     Churn happened 1.199% of the time (110 times)
 }}}

 As we can see, it's pretty stable. The churn rate is very low and _always_
 only affect one single HSDir out of the set of 6 (see .csv result, it's
 not printed in the output). On average, a client with latest consensus
 will fail to fetch the descriptor on one HSDir out of the six ~2.71% of
 the time.

 The number of fetches varies because unfortunately the tool is not
 entirely "stable" that is sometimes it crashed and for some period of time
 we go without fetching some .onion while others are still running (python
 threading is ... something....).

 === Conclusion ===

 1. This experiment is not ideal as it '''only''' consider the latest
 consensus on the client side which is not really the reality of things. An
 improved version of this tool would basically run 12 clients with each
 with a different hour consensus spanning over 12 hours. Then using that,
 trying to fetch the descriptor and note down churn and failures.

 2. One key aspect of this tool is that once a fail fetch happened, it went
 into "recover mode" that is retrying every 15 minutes until the descriptor
 can be fetched again thus giving us the interesting statistics of how many
 failed attempt before success and how much time do I need to spend waiting
 for until success. This gets a bit more complicated with clients with
 different consensuses because they need to update their consensus at some
 point over time and deciding which consensus to update to (latest or 2
 hours in past or ...) might affect the results but also creates LOTS of
 cases to test.

 3. A simpler but I think better version of this tool would be to instead
 of taking the latest consensus all the time, it should simply use the tor
 client normal behavior and monitoring the .onion with it. However, the HS
 client side behavior has changed over some tor stable version and might
 change again so this should be made for each maintained tor version which
 would also indicate to us any regression or performance improvement
 between them.

 4. Load on the network considerations. It is all fun and well but if we
 decide to improve this tool (or rewrite a new one), we should consider how
 much load it puts on the network. HS fetch aren't that heavy but if you
 multiply this by 12 times 6 HSDir and then you run this every X minutes,
 lets not forget what it can do to the Guard in front.

 So all things considered, there is much more room for improvement with
 this tool and the results could be useful to have on our metrics website
 but we need to make it a bit more wise and at the very least change it
 with 3.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/13209#comment:14>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online