-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 09/07/15 05:39, Roger Dingledine wrote:
On Wed, Jul 08, 2015 at 07:45:04PM -0700, David Fifield wrote:
I'm trying to use CollecTor data to find out how much bandwidth is offered by different pluggable transports over time. I.e., I want to be able to say something like, "On July 1, bridges with obfs3 offered X MB/s, bridges with obfs4 offered Y MB/s," etc.
Great!
I'm having trouble because sometimes, a router digest listed in a bridge-network-status document is not found in the same tarball.
[snip]
Here's an example of where it goes wrong. bridge-descriptors-2015-07/statuses/01/20150701-060138-4A0CCD2DDC7995083D73F5D667100C8A5831F16D
Yeah, I'm not surprised it goes wrong, since the descriptor from 0701-06:01 was likely published in the previous month.
However, I did find it in the previous month's tarball,
Yep.
I think you picked the wrong example for something going wrong, because that descriptor is actually included in the 2015-07 tarball.
But there are indeed cases when a status published in 2015-07 references a server descriptor that was published in 2015-06, and that server descriptor would be contained in the 2015-06 tarball. Example from the same status:
bridge-descriptors-2015-07/statuses/01/20150701-060138-4A0CCD2DDC7995083D73F5D667100C8A5831F16D
contains a line:
r Unnamed ABQ4ZADwj8WkfgApkhVTFalGweU GqjwHG/sFpFzY4sx9SWuzVTcHag 2015-06-30 12:59:03 10.135.171.161 443 0
which references the following server descriptor:
bridge-descriptors-2015-06/server-descriptors/1/a/1aa8f01c6fec169173638b31f525aecd54dc1da8
It seems rare that the bridge-server-descriptor is missing. In the 2015-07 tarball, it happened for 5891/477496 relays (1.2%).
[snip]
How do you handle cases like this? I had a browse through the Onionoo source code, but did not quickly understand it.
Onionoo typically reads descriptors from CollecTor's recent/ directory which have been published in the past 72 hours, not the tarballs in the archive/ directory that are organized by publication month.
Should I just always include the month preceding the earliest month I want to process?
Yes, you should do that.
How many of the 5891 cases does that resolve?
If you happen to find cases which are not explained by that, please let me know.
All the best, Karsten