[metrics-bugs] #21315 [Obfuscation/Snowflake]: publish some realtime stats from the broker?

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Apr 11 20:39:04 UTC 2019


#21315: publish some realtime stats from the broker?
-----------------------------------+---------------------------
 Reporter:  arma                   |          Owner:  (none)
     Type:  enhancement            |         Status:  new
 Priority:  Medium                 |      Milestone:
Component:  Obfuscation/Snowflake  |        Version:
 Severity:  Normal                 |     Resolution:
 Keywords:                         |  Actual Points:
Parent ID:  #29461                 |         Points:
 Reviewer:                         |        Sponsor:  Sponsor19
-----------------------------------+---------------------------

Comment (by irl):

 Replying to [comment:5 cohosh]:
 > It sounds like we have a few things we want to achieve/learn from
 collected metrics:
 > - Detect censorship events
 > - Allow current or potential proxies to see if they are needed
 > - Allow clients to see whether their connection issues are due to
 censorship or proxy availability
 > - Help us figure out whether we should be doing something different in
 distributing proxies to clients

 These all seem like good goals.

 > We current collect and "publish" information on:
 > - how many snowflake are currently available along with their SIDs
 (available at broker /debug handler). This is good for more detailed
 monitoring of censorship events. Even though we collect bridge usage
 metrics, collecting broker usage metrics will narrow down where the
 censorship is happening.
 > - country stats of domain-fronted client connections (logged, most
 recent snapshot at broker /debug)
 > - the roundtrip time it takes for a client to connect to get a snowflake
 proxy answer (available at broker /debug)

 Should we be already archiving this data?

 > Some of the metrics mentioned above will be easier to implement than
 others. The best place to collect statistics is at the broker, but some of
 the data mentioned would require proxies to report metrics to the broker
 for collection. We have to be a bit careful with this since anyone can run
 a proxy. It will also impact the decisions we make for #29207.

 We collect a lot of statistics at relays and bridges, which anyone can
 run. We are working on methods of improving robustness against these
 statistics being manipulated, but so far have not detected anyone
 reporting values that are not normal. It is good to have criteria for
 determining, based on stats others report, what you would be expecting so
 that anomalies can be detected. For example, we would expect relay
 bandwidth usage among relays to be proportional to consensus weight.

 > > I would also be interested in stats about users and usage (including
 e.g. number of users being handled divided by number of snowflakes
 handling them)
 >
 > This is a bit tricky. The broker knows which proxies it hands out the
 users but doesn't know the state of the clients' connections to those
 proxies (e.g., when they have been closed). It's also worth noting that
 different "types" of proxies (standalone vs. browser-based) can handle a
 different amount of users at once. Perhaps a more useful metric would be
 for snowflake proxies to advertize to the broker how many available
 "slots/tokens" they have when they poll for clients. This could be added
 to the broker--proxy WebSocket protocol. It would also avoid collecting
 more data on clients which is generally safer

 This sounds like a reasonable approach. You might want to take a look at:

 * https://research.torproject.org/techreports/countingusers-2010-11-30.pdf
 * https://research.torproject.org/techreports/counting-daily-bridge-
 users-2012-10-24.pdf

 This will give you an idea of how we do this for other parts of Tor.

 > > how many times are you giving snowflakes out? How many times did you
 stop giving a snowflake out because you've given it out so many times
 already? These questions tie into the address distribution algorithm
 question

 Can this also be an indirect measurement of number of users?

 > The above comment addresses this as well. The broker doesn't really
 decide whether or not they've given a snowflake out too many times. I
 think more important to deciding whether we are giving out proxies in a
 good way is to try to measure how "reliable" individual proxies have been
 in the past. This is related to setting up persistent identifiers
 (#29260).

 For relays, directory authorities track the mean time between failures,
 and we track this in Tor Metrics too.

 > It might also be interesting to have some kind of proxy diversity metric
 (e.g., whether 90% of all connections are handled by the same proxy). We
 can get some idea with persistent identifiers (#29260), but of course
 using a persistent identifier will always be optional. We can also do
 collection of geoip country stats of proxies.

 We don't really have this metric for relays yet, so if you have ideas that
 would be applicable to relays too then that would be great. We know about
 country/AS distribution, but we haven't quantified the diversity using any
 particular formula.

 > - Log all of the statistics in a reasonable format

 This would ideally be a format that Tor Metrics is already handling. If it
 could be based on the Tor directory protocol meta-format (ยง1.2 dir-spec)
 then that would be great. We don't want to bring in dependencies for
 parsing yaml/toml/etc. if we can help it.

 > - coordinate with the metrics team to get these metrics collected and
 visualized somewhere

 Please also coordinate on what you want to collect, so we can consider if
 that information already comes from somewhere, if we already had a plan
 for it, and if it is safe or not.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21315#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list