
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 06/12/14 00:26, Anna Kornfeld Simpson wrote:
Thanks all for the responses!
On Fri, Nov 21, 2014 at 4:53 PM, Sebastian Hahn <sebastian@torproject.org> wrote:
Hi there,
On 21 Nov 2014, at 23:44, Damian Johnson <atagar@torproject.org> wrote:
In other words, if I sorted the descriptors by "measured" value, what would that order mean?
I *think* that would be the ordering of 'relays who receive the most tor client traffic due to having a more highly weighted heuristic for relay selection'.
that would be accurate, is my understanding
Is there documentation of why this "heuristic for relay selection" does not correlate that well with "bandwidth" in the descriptor? I've attached a couple of scatter plots pulled from moria1's "measured" and "bandwidth" values for each descriptor a couple hours ago (and the plots look similar from the other bwauths). One shows all values, the other shows the bottom 75% of values (sorted by measurements), and neither shows as much of a correlation as I would expect. Are there factors other than bandwidth that contribute to this "heuristic for relay selection"?
Hi Anna, I don't have answers, but maybe ideas for further investigations: - Not sure if this was mentioned before, but did you take a look at the spec? https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/R... - Maybe try removing bandwidth values close to 10000, or just values exactly at 10000. IIRC, values are capped at that value. (Removing just those values may be more accurate than removing the top 25%.) - Very small bandwidth values might be the result from newly started or restarted relays. (Advertised) bandwidth values are "the volume of traffic, both incoming and outgoing, that a relay is willing to sustain, as configured by the operator and claimed to be observed from recent data transfers." If a relay didn't observe larger data transfers, the reported bandwidth value will be small, but still the (past) measurements might be large. Maybe compare this for single relays over time. - There's an interesting pattern at 1024 (?) kB/s. Maybe there are more at 512 kB/s and others. Can you reduce the amount of overplotting in the graph? In R/ggplot2, you'd set the "alpha" value to something smaller than 1, so that dots become somewhat transparent. Could be that these patterns are normal, because operators tend to pick certain bandwidth rates more often than others. All the best, Karsten -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJUhqytAAoJEJd5OEYhk8hI//UH/Re5nPKAClCMc919YFxtwBsk o5dkCvh7a3fK0G9LOakuHunxNeXpJYrNJHlhA9djYeUKDL54DfzJFytiA80pkdNV jaw3EC00oWsS04S29fBAZVsnRRm8neR16hraL3ULgxYAgMLxUy8XOAzAlO4lHmxh +3aROoAytSvVHgsdwFd7ltRBtG7/NrIJmOxlNGWn8QlG9UYW4QsUYrl56Ghj0alQ 3+J1FIPYNXH0BH+t1CDM1jfjm84WbUTe/WPsXn7e1pWWUOOJOFYyIF9A41KGbJOZ HKRni9lyV1sdfRi8xrdOigZTcN6yHyW9U119kPg8x3/PEAJqmrJGRw9//PQHqdk= =Gm4F -----END PGP SIGNATURE-----