Hello,
this is an attempt to collect tasks that should be done for SponsorR. You can find the SponsorR page here: https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
I'm going to focus only on the subset of those categories that Roger/David told me are the most important for the sponsor. These are: - Safe statistics collection - Tor controller API improvements - Performance improvements - Opt-in HS indexing service
I haven't yet split projects into deliverables; this is a middle step to getting there. Next step is to filter and then ticketify what we have. After that we need to prioritize and pick the projects that will become deliverables.
In each category, I have slightly ordered the items (so, more important items will usually be on the top, but that's not always true). I have also tried to include all the tickets that are marked as SponsorR in trac.
So, let's go:
== Safe statistics collection ==
We've discussed this quite a bit over the past year and I think we all pretty much agree on which stats are safe to collect and which not.
I think we all agree that collecting the number of HS circuits and traffic volume from RPs (#13192) is harmless [0] and useful information to have. We need to clean up Roger's patch to add that information in extra-info descriptors, and then do some visualisations. That would give us a good idea of how much HSes are used.
OTOH, other statistics like "# of HS descriptors" are not that harmless and the upcoming HS redesign will block us from getting this information anyway.
For now, I think we should focus on #13192 for this project.
== Tor controller API improvements ==
To better refine this project, we should think about what we want to get out of it. Here are some outcomes:
a) A better control API allows us to perform better performance measurements for HSes.
Karsten in #1944 worked on performance measurements of HS circuit establishment. You can find his very useful results here: http://ec2-54-92-231-52.compute-1.amazonaws.com/
We should understand exactly how Karsten is gathering those events, and see whether we can improve the timing accuracy or if we are missing any events. We need to also figure out how to do useful measurements in causal events like the race between the INTRODUCE_ACK cell and the RENDEZVOUS2. We also need to find a way to match rendezvous circuits with introduction circuits: https://trac.torproject.org/projects/tor/ticket/1944#comment:35
All in all, this seems like a project worth doing right because it will be useful in the future. It can even act as an automated regression test.
b) This might also be a good time to start working on automated integration tests for HSes.
It should be possible to spin up private Chutney networks and test that particular HSes are reachable. Or perform regression tests; for example, Roger recently suggested writing a regression test to make sure that clocks don't need to be synchronized to build HS circuits (#13494).
We should also make testing networks better for HS testing: - #13401 TestingTorNetwork should crank down RendPostPeriod too?
c) Tor should better expose error messages of failed operations. For example, this could allow TBB to inform users whether they mistyped the onion address or the HS is actually down, and it would also let us do #13208. Proposal 229 and ticket #13212 are related to this. We should see whether the PT team is planning to implement proposal 229 and how we can synchronise.
d) There are various projects that are using HSes these days (TorChat, Pond, GlobaLeaks, Ricochet, etc.). We should think whether we want to support these use cases and how we can make their life easier. For example, Fabio has been asking for a way to spin up HSes using the control port (#5976). What other features do people want from the control port?
And here are some more tickets marked as SponsorR from this category: - #8993 Better hidden service support on Tor control interface - #13206 Write up walkthrough of control port events when accessing a hidden service - #2554 extend torperf to record hidden service time components
== Performance Improvements ==
This is the most juicy section. How can we make HS performance better? IIUC, we are mainly interested in client-side performance, but if a change makes both sides faster that's even better.
Some projects:
a) Looking at Karsten's #1944 results http://ec2-54-92-231-52.compute-1.amazonaws.com/ we see that fetching HS descriptors takes much more time than it should. I wonder why this is the case. Is there another ntohl bug there?
We should perform measurements and get a good understanding of what's going on in this step. Here are some tickets that Roger opened to do exactly that: - #13208 What's the average number of hsdir fetches before we get the hsdesc? - #13209 Write a hidden service hsdir health measurer
And here is a ticket with a potential issue: - #13207 Is rend_cache_clean_v2_descs_as_dir cutoff crazy high?
b) Improving the other parts of the circuit establishment process is also important: - #8239 Hidden services should try harder to reuse their old intro points - #3733 Tor should abandon rendezvous circuits that cause a client request to time out - #13222 Clients accessing a hidden service can establish their rend point in parallel to fetching the hsdesc
Furthermore, an area of Tor that might give us better performance but we haven't really explored yet is preemptive circuits. #13239 is about building more internal circuits for HSes.
And here is a ticket suggesting more measurements: - #13194 Track time between ESTABLISH_RENDEZVOUS and RENDEZVOUS1 cell
c) Another important project in this area is parallelizing HS crypto. I haven't looked at what this would actually entail, but it will probably involve implementing the undone parts of proposal 220/224.
d) This might be the time to implement Encrypted Services? Many people have been asking for this feature and this might be the right time to do it: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/ideas/xxx-enc...
e) Following the trail of #13207, we should look at all the magic numbers currently used by HSes, document them and see if they make sense. This includes the number of IPs (#8950), the number of HSDirs/replicas, the intro point expiration date, etc.
Also, we should revisit the flags used when doing path selection for RPs, IPs, etc.
f) On a more researchy tone, this might also be a good point to start poking at the HS scalability project since it will really affect HS performance.
We should look at Christopher Baines' ideas and write a Tor proposal out of them: https://lists.torproject.org/pipermail/tor-dev/2014-April/006788.html https://lists.torproject.org/pipermail/tor-dev/2014-May/006812.html Last time I looked, Christopher's ideas required implementing proposal225 and #8239.
g) All the projects above are aiming at improving circuit establishment performance, but none of them are dealing with performance improvements after the HS circuit has been established.
On an even more researchy tone, Qingping Hou et al wrote a proposal to reduce the length of HS circuits to 5 hops (down from 6). You can find their proposal here: https://lists.torproject.org/pipermail/tor-dev/2014-February/006198.html
The project is crazy and dangerous and needs lots of analysis, but it's something worth considering. Maybe this is a good time to do this analysis?
h) Back to the community again. There have recently appeared a few messaging protocols that are inherently using HSes to provide link layer confidentiality and anonymity [1]. Examples include Pond, Ricochet and TorChat.
Some of these applications are creating one or more HSes per user, with the assumption that HSes are something easy to make and there is no problem in having lots of them. People are wondering how well these applications scale and whether they are using the Tor network the right way. See John Brooks' mail for a small analysis: https://moderncrypto.org/mail-archive/messaging/2014/000434.html
It might be worth researching these use cases to see how well Tor supports them and how they can be supported better (or whether they are a bad idea entirely).
== Opt-in HS indexing service ==
This seems like a fun project that can be used in various ways in the future. Of course, the feature must remain opt-in so that only services that want to be public will surface.
For this project, we could make some sort of 'HS authority' which collects HS information (the HS descriptor?) from volunteering HSes. It's unclear who will run an HS authority; maybe we can work with ahmia so that they integrate it in their infrastructure?
If we are more experimental, we can even build a basic petname system using the HS authority [2]. Maybe just a "simple" NAME <-> PUBKEY database where HSes can register themselves in a FIFO fashion. This might cause tons of domain camping and attempts for dirty sybil attacks, but it might develop into something useful. Worst case we can shut it down and call the experiment done? AFAIK, I2P has been doing something similar at https://geti2p.net/en/docs/naming
== Security / Miscellaneous ==
I also noticed that some tickets on trac were assigned to SponsorR but I couldn't fit them in the above categories. They are mainly security enhancements or code improvements. Here is a dump of the tickets:
Security: - #13214 HS clients don't validate descriptor-id returned by HSDir - #7803 Clients shouldn't send timestamps in INTRODUCE1 cells - #8243 Getting the HSDir flag should require more effort - #2715 Is rephist-calculated uptime the right metric for HSDir assignment?
Miscellaneous: - #13223 Refactor rend_client_refetch_v2_renddesc() - #13287 Investigate mysterious 24-hour lump in hsdir desc fetches - #8902 Rumors that hidden services have trouble scaling to 100 concurrent connections
== Epilogue ==
What useful projects/tickets did I forget here?
Which tasks from the above we should not do? I just went ahead and wrote down all the projects I could think of, with the idea that we will filter stuff later.
Thanks!
Footnotes:
[0]: since RPs are picked at random by the client and not by the HS.
[1]: see https://moderncrypto.org/mail-archive/messaging/2014/000434.html
[2]: or if someone is more crazy, try to integrate GNUnet's GNS: https://gnunet.org/gns