Hi all,
NRL is effectively partnered with the Tor Project Inc. for the SponsorR efforts. Our (NRL's) tasking is largely overlapping and somewhat complementary to that of TPI. As such I thought it would be good to mention the basics of what we are working on to better inform and coordinate the planning George et al. have begun discussing in this thread.
Our task are
1. to identify which statistics about hidden services can be collected and reported without harming user security.
This is also directly part of TPI's tasking, and I expect we will be collaborating on this directly. We will be working on this probably starting in c. a month.
2. to develop passive measurement techniques to measure information about hidden services. This would, for example, allow the collection of information about the relative popularity of different types of hidden services, for example what fraction of hidden service connections are for highly interactive connections vs. large data downloads vs. etc. Also developing techniques to infer global activity from local observations.
Some of this has already begun. Roger deployed a month ago on a few relays testing to see if a connection was for HSes vs. something else. And we did some initial analysis on the global projection based on estimation of how much bandwidth those relays saw, which varied wildly, although there are lots of potential explanations for that. Roger has also already in this thread touched on some statistics that are interesting but require thought before deciding how/if to collect them.
A primary focus of NRL's work between now and the end of the year has been and will be on devising a secure and accurate relay bandwidth measurement scheme, with an emphasis on something that should be much better than what is now available but also practical and compatible enough that it could be rolled out in Tor w/in c. a year (and we'll also be considering designs that are less directly implementable but more theoretically solid). This is one of Tor's biggest current vulnerabilities. It is pretty easy to get fake inflated BW numbers so as to have a consensus weight that allows you to observe amounts of traffic quite disproportionate to the amount you have actually been carrying in the past. There have been many published attacks based on bandwidth inflation, and Tor's current torflow design was not intended to be secure---and could use some accuracy attention as well. This also becomes important in the context of gathering HS statistics. If we are going to be deploying statistic gathering code in a way that is safe for users and hidden services, it is not enough to say what statistics are safe to honestly collect. We also need to make Tor's system of data gathering for those statistics robust to abuse. And one of the easiest ways to abuse statistics gathering to undermine user and service security is to manipulate BW attribution to increase the raw data is available to malicious entities. Of course any statistics that rely on accurate BW measurement will benefit from this work as well.
3. Designing and testing HS performance improvements, particularly as they affect the crawling and measuring activities on HSes that SponsorR is interested in.
Again we expect lots of collaboration in this area, although our focus will be on the above first.
4. Evaluate planned and future changes to HSes for security and performance, particularly to see how intended SponsorR measuring, crawling, and indexing techniques for HSes may be affected. For example, a technique that assumed directories could know when a new HS is listed would be affected by design changes in proposal 224.
Same comment as for task 3.