Scaling Tor Metrics

25 Nov 2015

      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello devs,

the Tor Metrics website [0] claims to be "the primary place to learn
interesting facts about the Tor network" and invites its visitors who
"come across something that is missing" to contact the website authors
about it.  That's a bold statement I put there! :)

Yet, there's considerable product backlog with possible enhancements
[1] that doesn't seem to ever become shorter.  Even worse, it can be
expected that the backlog will refill quickly once the community
notices that feature requests are suddenly considered.  The main
reason for this unfortunate situation is that Tor Metrics contains
many moving parts, including some heavy database lifting that takes
place below the surface, that all want to be maintained.  Adding more
parts just makes the whole thing even more likely to break.  At the
same time, knowing about the situation that Tor Metrics has become
almost closed to contributions is painful.

This posting shall discuss possible solutions.  The goal is to let Tor
Metrics grow in a healthy fashion that encourages contributions from
the community.  These solutions are not mutually exclusive, and the
best solution may use parts of more than one solution sketched out here.

 1 Make Tor Metrics better and bigger, internally

The obvious solution is that the maintainers of Tor Metrics could just
work harder to overcome the problems stated above.  Let's think this
through.

 1.1 Add more development resources

If only the current Tor Metrics maintainers had more time to devote to
cleaning up existing parts and to add new parts, that would solve our
problem.  They could refactor parts that are hard to maintain, and
they could work off the serious backlog that has piled up.  Of course,
this means dropping or handing over responsibilities for other
products, and it may mean finding (and paying) new developers to help
maintain Tor Metrics.  It's unclear whether anything like this would
fit into Tor's budget, and whether these changed priorities would make
users of tools that had to be dropped or handed over unhappy.

 1.2 Rewrite internal parts of Tor Metrics to encourage external
contributions

Most of Tor Metrics would have run 10 or 15 years ago with only minor
modifications.  It's not necessarily a bad thing to use established
technologies.  But maybe, if we rewrite it using modern
data-processing, web, and visualization frameworks, it becomes more
attractive to other developers to contribute code and help maintain
existing (well, then rewritten) code.  The result would be a larger
Tor Metrics website that is easier to maintain and hopefully
maintained by more people.  It's unclear how realistic this plan is,
though, and it requires attention by Tor Metrics maintainers to bring
it enough into shape for external contributors to get involved.

 2 Add more ways to contribute to Tor Metrics externally

It may be possible to further grow Tor Metrics without adding more
code to it, hence not making it any harder to maintain.  However, if
code to generate visualizations is run elsewhere, there's a certain
risk that results are not perceived as trustworthy as if that code
were run as part of Metrics.  This is primarily a problem of setting
user expectations right.  We could add different ways for contributing
to Tor Metrics, depending on the level of commitment that contributors
are willing to make.  Possible new ways (in addition to filing a Trac
ticket, which is already possible, though not very effective) are:

 2.1 Accept contribution of static data or static graphs

Somebody might contribute data (in a tarball, download link, etc.) or
a static graph (static as in "doesn't break, ever", not "static HTML
with a tiny amount of JavaScript that will surely never break").  The
Tor Metrics team reviews that and puts it on the Tor Metrics website,
together with a short description, author information, license, etc.
There are plenty of visualizations on Trac and on the mailing lists,
so we'll have to define criteria what we add and what not, and we'll
need a good process for making that happen.

 2.2 Link to external websites

Somebody might write a website that visualizes Tor network data.  The
Tor Metrics team reviews the idea behind it, but not necessarily look
at its code, and adds an external link to Tor Metrics.  It becomes
obvious that the authors remain responsible for their visualization,
so there's no risk involved for Tor Metrics, but users may not trust
it as much, because it doesn't have the Tor Metrics label.  Note that
we're already doing this approach by linking to the visualizations
showing "Tor users as percentage of larger Internet population" [2]
and "Data flow in the Tor network" [3].  Also note that we could as
well have hosted the former directly on Tor Metrics with appropriate
attribution, because it's a static image.  This is not the case with
the latter.

 2.3 Run an externally developed website as if it were part of Tor Metrics

Let's imagine that somebody produces a visualization of Tor network
data and would like to make it part of Tor Metrics but without
limiting themselves to the technology used by Tor Metrics.  We could
let them write their visualization as website and integrate it into
Tor Metrics after reviewing its code.

Technically, part of this integration would be to "redress" the
website by applying the Tor Metrics design (which has lots of room for
improvement, but let's just say the result will look as seamlessly
integrated into Tor Metrics as the "Network bubble graphs" [4]).
Another part would probably be to rewrite web requests, so that users
still think they're talking to https://metrics.torproject.org/, but
really they're talking to another webserver behind that.

Regarding hosting and maintenance, in theory, the website could be
hosted by the original creators, but that effectively means that the
Tor Metrics team gives up part of the control about what's on the Tor
Metrics website.  The creators of the external website could change
parts or add new parts that wouldn't be reviewed by Tor Metrics
developers, but they would be perceived as part of Metrics, which
seems bad.  The Tor Metrics team could run the externally developed
website on a separate host or on the same host as Tor Metrics.  We
could imagine variants where the original creator stays around to fix
any issues as they come up, or we could imagine that they donate their
visualization that the Tor Metrics people will then maintain.  We
could even imagine that the Tor Metrics maintainers some day decide to
integrate the originally external website into Tor Metrics proper, but
that would not be required for this model to work.

All these ideas require writing down guidelines, criteria, and
processes.  In particular, they require more thoughts and input from
other people who are not currently involved in Tor Metrics maintenance
and who can be expected more objective.  And once these ideas are
implemented, we'll need more Tor Metrics maintainer than just one.

What are your thoughts?

All the best,
Karsten

[0] https://metrics.torproject.org/

[1]
https://trac.torproject.org/projects/tor/query?status=!closed&component=Metr...

[2] https://metrics.torproject.org/oxford-anonymous-internet.html

[3] https://metrics.torproject.org/uncharted-data-flow.html

[4] https://metrics.torproject.org/bubbles.html

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJWVdmJAAoJEJD5dJfVqbCrmlEH/j6IjRNEkXzRVJBIBcFMKIwR
eAA958Zg+DCzKPHI6Y7KZ/jGCHMP21r+YGIFevbJV4LDos9D2G0RP681/5PK7/dW
if4Pz0xhl/LqbhLoOSqU2wGJG+GdWgjuTxO1TBMPUhYK71lwJsZ0cUzbNee7iAqJ
zjdl3W3o4XJma0ZjZFH/gVVPbemvQHftrO8d0v7L2gFHfXmJg0kEwSZ4lJW0hfOx
wIizEadFSx9/7CrINvbcHbyck0N+DRSfRYyMNSjpyFbnYI7HjYk7/+jvLWALEYZT
0MZbuL/zl/PFpkTDNY0jzE5fPaKjtS2pEoZn85Wn1+0kCjeFLE/Hvulvn6ZFxsA=
=n5p1
-----END PGP SIGNATURE-----

Karsten Loesing

George Kadianakis

David Goulet

thomas lörtsch

Tim Wilson-Brown - teor

Letty

tags

participants (6)