[tor-dev] CollecTor data: mapping bridge-network-status to bridge-server-descriptor to bridge-extra-info

Karsten Loesing karsten at torproject.org
Thu Jul 9 08:26:55 UTC 2015

Hash: SHA1

On 09/07/15 05:39, Roger Dingledine wrote:
> On Wed, Jul 08, 2015 at 07:45:04PM -0700, David Fifield wrote:
>> I'm trying to use CollecTor data to find out how much bandwidth
>> is offered by different pluggable transports over time. I.e., I
>> want to be able to say something like, "On July 1, bridges with
>> obfs3 offered X MB/s, bridges with obfs4 offered Y MB/s," etc.
> Great!
>> I'm having trouble because sometimes, a router digest listed in
>> a bridge-network-status document is not found in the same
>> tarball.
> [snip]
>> Here's an example of where it goes wrong. 
>> bridge-descriptors-2015-07/statuses/01/20150701-060138-4A0CCD2DDC7995083D73F5D667100C8A5831F16D
> Yeah, I'm not surprised it goes wrong, since the descriptor from 
> 0701-06:01 was likely published in the previous month.
>> However, I did find it in the previous month's tarball,
> Yep.

I think you picked the wrong example for something going wrong,
because that descriptor is actually included in the 2015-07 tarball.

But there are indeed cases when a status published in 2015-07
references a server descriptor that was published in 2015-06, and that
server descriptor would be contained in the 2015-06 tarball.  Example
from the same status:


contains a line:

r Unnamed ABQ4ZADwj8WkfgApkhVTFalGweU GqjwHG/sFpFzY4sx9SWuzVTcHag
2015-06-30 12:59:03 443 0

which references the following server descriptor:


>> It seems rare that the bridge-server-descriptor is missing. In
>> the 2015-07 tarball, it happened for 5891/477496 relays (1.2%).
> [snip]
>> How do you handle cases like this? I had a browse through the
>> Onionoo source code, but did not quickly understand it.

Onionoo typically reads descriptors from CollecTor's recent/ directory
which have been published in the past 72 hours, not the tarballs in
the archive/ directory that are organized by publication month.

>> Should I just always include the month preceding the earliest
>> month I want to process?

Yes, you should do that.

> How many of the 5891 cases does that resolve?

If you happen to find cases which are not explained by that, please
let me know.

All the best,

Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org


More information about the tor-dev mailing list