On Wed, Jul 08, 2015 at 07:45:04PM -0700, David Fifield wrote:
I'm trying to use CollecTor data to find out how much bandwidth is offered by different pluggable transports over time. I.e., I want to be able to say something like, "On July 1, bridges with obfs3 offered X MB/s, bridges with obfs4 offered Y MB/s," etc.
Great!
I'm having trouble because sometimes, a router digest listed in a bridge-network-status document is not found in the same tarball.
[snip]
Here's an example of where it goes wrong. bridge-descriptors-2015-07/statuses/01/20150701-060138-4A0CCD2DDC7995083D73F5D667100C8A5831F16D
Yeah, I'm not surprised it goes wrong, since the descriptor from 0701-06:01 was likely published in the previous month.
However, I did find it in the previous month's tarball,
Yep.
It seems rare that the bridge-server-descriptor is missing. In the 2015-07 tarball, it happened for 5891/477496 relays (1.2%).
[snip]
How do you handle cases like this? I had a browse through the Onionoo source code, but did not quickly understand it. Should I just always include the month preceding the earliest month I want to process?
How many of the 5891 cases does that resolve?
--Roger