[tor-dev] Understanding the HS subsystem
desnacked at riseup.net
Wed Feb 4 16:39:10 UTC 2015
[Declassifying this discussion and posting on [tor-dev]]
David Goulet <dgoulet at ev0ke.net> writes:
> Hello HS elves!
> I wrote a document to organize my thought and also list what we have in
> the bug tracker right now about HS behaviours that we want to
> It's a bit long but you can pass the first section describing the
> tickets and go right into the How and The Work to be done.
> Nick, you will see there is a SponsorS component but I didn't go into
> hard details there. We all know we need a testing network but for now
> I'm more focuses on making sure we can collect the right data (for HS).
> Very important part I would like feedback on is the "HS health service"
> for which I would like that we all agree of it's usefulness and way to
> do it properly.
> This document describes the methodology and technical details of an hidden
> service measurement framework/tool/<insert what this is>.
> NOTE: This is NOT intended to be run in the real Tor network. ONLY testing.
> Why and What
> The goal is to answer some questions we have regarding HS behaviours. Most
> of them have a ticket assigned to them but needs an experiment or/and added
> feature(s) so we can measure what we need.
> - Is rend_cache_clean_v2_descs_as_dir cutoff crazy high?
> In order to address this, it seems we need a way to measure all the
> interactions with the cache of an HSDir and a client. We need to assess
> the rend cache cleanup timing values which will also helps with the upload
> and refetch timings.
> - What's the average number of hsdir fetches before we get the hsdesc?
> Using the control port for that is trivial but this needs a testing
> network to be setup and has actual load on it.
> It could also be setup as a feature of an "HS health measurement tool"
> with a client fetching over and over the same .onion address randomly over
> - Write a hidden service hsdir health measurer
> This is a useful one, being able to correlate relay churn and HS desc.
> fetch. This one needs more brainstorming on how we could setup some sort
> of client or service that report/logs the results on crunching the
> consensus for HSDir for a specific .onion address that we know and
> - Refactor rend_client_refetch_v2_renddesc()
> Insure correctness of this very important function that do fetches for the
> client. It's in there that the HSDir (with replicas) are looped on so the
> descriptor can be fetched.
> - Maybe we want three preemptive internal circs for hidden services?
> That's pretty trivial to measure and quantify with the tracing
> instrumentation added in Tor. No need for a new feature but an experiment
> has to be designed to measure 2 internal circuits versus 3.
> - rend_consider_services_upload() sets initial next_upload_time which is
> clobbered when first intro point established?
> Do the RendPostPeriod option is working correctly. What's the exact
> relation in time of service->desc_is_dirty and upload time of a new
> - Do we have edge cases with rend_consider_descriptor_republication()? Can
> we refactor it to be cleaner?
> This is a core function that is called every second so we should make sure
> it behaving as expected and not trying to do uneeded upload.
nice list of tickets. Here are some more ideas if you are looking for
more brainstorming action.
There is #3733 which is about a behavior that affects performance and
could benefit from a testing network.
And there is #8950 which is about the number of IPs per hidden
service. It's very unclear whether this functionality works as
intended or whether it's a good privacy idea.
And there is also #13222 but it's probably easier to hack the solution
here, than to measure its severity.
> Here are some steps I think are needed to be able to measure and answer the
> Why section.
2> 1) Dump the uploaded/fetched HS in a human readable way.
> * Allows us to track descriptor over time while testing and analyse them
> afterwards by correlating events with a readable desc. This kind of
> feature will also be useful for people crawling HS on SponsorR.
> * Should be a control event like for instance (ONLY client side):
> > setconf HSDESC_DUMP /tmp/my/dir
> 2) On how many HSDir (including replicas) have been probed for one
> single .onion request. (Which should be repeated a lot for significant
> * Why have we probed 1 or 5?
> * What made us retry? Failure code?
> * Did the descriptor was actually alive on the HSDir? If not, when did
> it move? (Correlate timings between HSdir and client in a testing network)
> 3) HS desc cache tracker. We want to know, very precisely, how things are
> moving in the cache especially on the HSDir cache side.
> * When and why an HS desc is removed?
> * Why it hasn't been stored in the cache?
> * Count and when a descriptor is requested.
> 4) Track the HS descriptor upload. Log at what time it was done. Use this
> to correlate with RendPostPeriod or when desc_is_dirty is set. Also should
> be correlate with the actual state of the HSDir. Did it already have it?
> Is the HSDir gone?
> What to be done
> * Collect data
> "Collect it all" --> https://i.imgur.com/tVXAcGGl.jpg
> It's clear that we have to collect more data from the HS subsystem. Most of
> it can be collected through the control port but some are missing.
> Measuring precise timing of HS actions (for instance let say descriptor
> store) is not possible with the control port right now and also might not be
> that relevant since the job of this feature is to report high level events
> and push command to the tor daemon.
> Tracing should be used here with a set of events added to the HS subsystem
> to collect the information we need so it can be analyzed after the
> experiment is run. This is only for performance measurement, the rest should
> as much as possible use the control port.
> * Testing network (much SponsorS)
> Once we are able to extract all the data we need, time to design experiment
> that allows us to run scenarios and collect/analyze what we want. A scenario
> could be this example with a set of questions we want to answer going with
> * 50 clients randomly accessing an HS in a busy tor network.
> - What is the failure rate of desc. fetch, RP establishment, ...?
> - What are the timings of each component of the HS subsystem?
> - What are the outliers of the whole process of establishing a connection
> to the HS?
> - How much relay churn affected HS reachability.
> And dump a human readable report/graphs whatever is useful for us to
> investiguate or assess the HS functionnalities.
> * HS health service
> ref: https://trac.torproject.org/projects/tor/ticket/13209
> What about a web page that prints the result of:
> 1) Fetch last 3 concensuses (thus 3 hours)
> 2) Find the union of all HSDir responsible for a.onion (we control that
> HS service and should be up at all time else the results are meaningless.)
> 3) Fetch the descriptor on each of them
> 4) Graph/log how many of them had it thus giving us a probability of
> reaching the HS within a time period.
> So 3) is the tricky one. There are multiple ways of achieving that possibly:
> i) New SOCKS command to tor that a client could use.
> - Command would have an onion address with it and the reply should be 0
> or 1 (successful attempt or not) with the HSDir fingerprint with it.
> ii) Control event.
> > setconf HSDESC_FETCH_ALL <this_is_a.onion>
> Prints out the results as they come in with the HSDir information.
> iii) A weird way of doing this with an option "tor --fetch-on-all-hs-dir
> this_address.onion", print out the results and quit.
> I much prefer i) and ii) here. Not sure which one is best though.
Hm, I think I like (ii) here. It doesn't seem to be much more work
than (i) and a few researchers have been asking for such functionality
More information about the tor-dev