[tor-dev] CollecTor data: mapping bridge-network-status to bridge-server-descriptor to bridge-extra-info
david at bamsoftware.com
Thu Jul 9 02:45:04 UTC 2015
I'm trying to use CollecTor data to find out how much bandwidth is
offered by different pluggable transports over time. I.e., I want to be
able to say something like, "On July 1, bridges with obfs3 offered X MB/s,
bridges with obfs4 offered Y MB/s," etc. To do this, I'm mapping through
three types of CollecTor documents:
bridge-network-status (where the bandwidth is and which links to router digests)
bridge-server-descriptor (which links to extra-info digests)
bridge-extra-info (where the transports are)
I'm having trouble because sometimes, a router digest listed in a
bridge-network-status document is not found in the same tarball.
Here is an example of what I'm doing, using the above tarball.
This is a bridge-network-status document. One of its entries is:
r starman qgM+62FgGytzEtibYqqiPcPtijQ mdOOBxVOTpw8loBezhSDZxLIcXs 2015-07-03 21:39:31 10.174.163.60 9002 0
s Fast Guard Running Stable Valid
p reject 1-65535
The second base64-encoded string is the router digest.
base64decode("mdOOBxVOTpw8loBezhSDZxLIcXs") = 99D38E07154E4E9C3C96805ECE14836712C8717B
Now we go looking for a bridge-server-descriptor with router
digest 99D38E07154E4E9C3C96805ECE14836712C8717B, which is in the
above file. It has a line:
Now we find a bridge-extra-info with digest
D69106C8BAF5C0044F7331F24DF77E85BBF84027 in the above file. It
tells us what transports the bridge supports (there are two, one
for IPv4 and one for IPv6):
Here's an example of where it goes wrong.
r Unnamed ABk0wg4j6BLCdZKleVtmNrfzJGI eGIOW1mGM/Dbw+t5bXnR8jdnsoY 2015-07-01 05:56:14 10.123.124.91 443 0
s Fast Running Stable Valid
p reject 1-65535
We are looking for router digest 78620E5B598633F0DBC3EB796D79D1F23767B286:
base64decode("eGIOW1mGM/Dbw+t5bXnR8jdnsoY") = 78620E5B598633F0DBC3EB796D79D1F23767B286
But there is no file bridge-descriptors-2015-07/server-descriptors/7/8/78620e5b598633f0dbc3eb796d79d1f23767b286.
However, I did find it in the previous month's tarball,
It seems rare that the bridge-server-descriptor is missing. In the
2015-07 tarball, it happened for 5891/477496 relays (1.2%). An
additional 4/477496 (0.0%) had a bridge-server-descriptor but were
How do you handle cases like this? I had a browse through the Onionoo
source code, but did not quickly understand it. Should I just always
include the month preceding the earliest month I want to process?
More information about the tor-dev