-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hello devs,
the Tor Metrics website [0] claims to be "the primary place to learn interesting facts about the Tor network" and invites its visitors who "come across something that is missing" to contact the website authors about it. That's a bold statement I put there! :)
Yet, there's considerable product backlog with possible enhancements [1] that doesn't seem to ever become shorter. Even worse, it can be expected that the backlog will refill quickly once the community notices that feature requests are suddenly considered. The main reason for this unfortunate situation is that Tor Metrics contains many moving parts, including some heavy database lifting that takes place below the surface, that all want to be maintained. Adding more parts just makes the whole thing even more likely to break. At the same time, knowing about the situation that Tor Metrics has become almost closed to contributions is painful.
This posting shall discuss possible solutions. The goal is to let Tor Metrics grow in a healthy fashion that encourages contributions from the community. These solutions are not mutually exclusive, and the best solution may use parts of more than one solution sketched out here.
1 Make Tor Metrics better and bigger, internally
The obvious solution is that the maintainers of Tor Metrics could just work harder to overcome the problems stated above. Let's think this through.
1.1 Add more development resources
If only the current Tor Metrics maintainers had more time to devote to cleaning up existing parts and to add new parts, that would solve our problem. They could refactor parts that are hard to maintain, and they could work off the serious backlog that has piled up. Of course, this means dropping or handing over responsibilities for other products, and it may mean finding (and paying) new developers to help maintain Tor Metrics. It's unclear whether anything like this would fit into Tor's budget, and whether these changed priorities would make users of tools that had to be dropped or handed over unhappy.
1.2 Rewrite internal parts of Tor Metrics to encourage external contributions
Most of Tor Metrics would have run 10 or 15 years ago with only minor modifications. It's not necessarily a bad thing to use established technologies. But maybe, if we rewrite it using modern data-processing, web, and visualization frameworks, it becomes more attractive to other developers to contribute code and help maintain existing (well, then rewritten) code. The result would be a larger Tor Metrics website that is easier to maintain and hopefully maintained by more people. It's unclear how realistic this plan is, though, and it requires attention by Tor Metrics maintainers to bring it enough into shape for external contributors to get involved.
2 Add more ways to contribute to Tor Metrics externally
It may be possible to further grow Tor Metrics without adding more code to it, hence not making it any harder to maintain. However, if code to generate visualizations is run elsewhere, there's a certain risk that results are not perceived as trustworthy as if that code were run as part of Metrics. This is primarily a problem of setting user expectations right. We could add different ways for contributing to Tor Metrics, depending on the level of commitment that contributors are willing to make. Possible new ways (in addition to filing a Trac ticket, which is already possible, though not very effective) are:
2.1 Accept contribution of static data or static graphs
Somebody might contribute data (in a tarball, download link, etc.) or a static graph (static as in "doesn't break, ever", not "static HTML with a tiny amount of JavaScript that will surely never break"). The Tor Metrics team reviews that and puts it on the Tor Metrics website, together with a short description, author information, license, etc. There are plenty of visualizations on Trac and on the mailing lists, so we'll have to define criteria what we add and what not, and we'll need a good process for making that happen.
2.2 Link to external websites
Somebody might write a website that visualizes Tor network data. The Tor Metrics team reviews the idea behind it, but not necessarily look at its code, and adds an external link to Tor Metrics. It becomes obvious that the authors remain responsible for their visualization, so there's no risk involved for Tor Metrics, but users may not trust it as much, because it doesn't have the Tor Metrics label. Note that we're already doing this approach by linking to the visualizations showing "Tor users as percentage of larger Internet population" [2] and "Data flow in the Tor network" [3]. Also note that we could as well have hosted the former directly on Tor Metrics with appropriate attribution, because it's a static image. This is not the case with the latter.
2.3 Run an externally developed website as if it were part of Tor Metrics
Let's imagine that somebody produces a visualization of Tor network data and would like to make it part of Tor Metrics but without limiting themselves to the technology used by Tor Metrics. We could let them write their visualization as website and integrate it into Tor Metrics after reviewing its code.
Technically, part of this integration would be to "redress" the website by applying the Tor Metrics design (which has lots of room for improvement, but let's just say the result will look as seamlessly integrated into Tor Metrics as the "Network bubble graphs" [4]). Another part would probably be to rewrite web requests, so that users still think they're talking to https://metrics.torproject.org/, but really they're talking to another webserver behind that.
Regarding hosting and maintenance, in theory, the website could be hosted by the original creators, but that effectively means that the Tor Metrics team gives up part of the control about what's on the Tor Metrics website. The creators of the external website could change parts or add new parts that wouldn't be reviewed by Tor Metrics developers, but they would be perceived as part of Metrics, which seems bad. The Tor Metrics team could run the externally developed website on a separate host or on the same host as Tor Metrics. We could imagine variants where the original creator stays around to fix any issues as they come up, or we could imagine that they donate their visualization that the Tor Metrics people will then maintain. We could even imagine that the Tor Metrics maintainers some day decide to integrate the originally external website into Tor Metrics proper, but that would not be required for this model to work.
All these ideas require writing down guidelines, criteria, and processes. In particular, they require more thoughts and input from other people who are not currently involved in Tor Metrics maintenance and who can be expected more objective. And once these ideas are implemented, we'll need more Tor Metrics maintainer than just one.
What are your thoughts?
All the best, Karsten
[0] https://metrics.torproject.org/
[1] https://trac.torproject.org/projects/tor/query?status=!closed&component=...
[2] https://metrics.torproject.org/oxford-anonymous-internet.html
[3] https://metrics.torproject.org/uncharted-data-flow.html
[4] https://metrics.torproject.org/bubbles.html
Karsten Loesing karsten@torproject.org writes:
Hello devs,
the Tor Metrics website [0] claims to be "the primary place to learn
<snip>
2.2 Link to external websites
Somebody might write a website that visualizes Tor network data. The Tor Metrics team reviews the idea behind it, but not necessarily look at its code, and adds an external link to Tor Metrics. It becomes obvious that the authors remain responsible for their visualization, so there's no risk involved for Tor Metrics, but users may not trust it as much, because it doesn't have the Tor Metrics label. Note that we're already doing this approach by linking to the visualizations showing "Tor users as percentage of larger Internet population" [2] and "Data flow in the Tor network" [3]. Also note that we could as well have hosted the former directly on Tor Metrics with appropriate attribution, because it's a static image. This is not the case with the latter.
2.3 Run an externally developed website as if it were part of Tor Metrics
Let's imagine that somebody produces a visualization of Tor network data and would like to make it part of Tor Metrics but without limiting themselves to the technology used by Tor Metrics. We could let them write their visualization as website and integrate it into Tor Metrics after reviewing its code.
Technically, part of this integration would be to "redress" the website by applying the Tor Metrics design (which has lots of room for improvement, but let's just say the result will look as seamlessly integrated into Tor Metrics as the "Network bubble graphs" [4]). Another part would probably be to rewrite web requests, so that users still think they're talking to https://metrics.torproject.org/, but really they're talking to another webserver behind that.
Regarding hosting and maintenance, in theory, the website could be hosted by the original creators, but that effectively means that the Tor Metrics team gives up part of the control about what's on the Tor Metrics website. The creators of the external website could change parts or add new parts that wouldn't be reviewed by Tor Metrics developers, but they would be perceived as part of Metrics, which seems bad. The Tor Metrics team could run the externally developed website on a separate host or on the same host as Tor Metrics. We could imagine variants where the original creator stays around to fix any issues as they come up, or we could imagine that they donate their visualization that the Tor Metrics people will then maintain. We could even imagine that the Tor Metrics maintainers some day decide to integrate the originally external website into Tor Metrics proper, but that would not be required for this model to work.
I find this idea of external graphs interesting and fun with a small potential for disaster.
if the external graphs are added with a strong indication of being "unofficial graphs made by third parties" or "experimental graphs" I think it might help in making them look less official.
Also, even if the graphs are hosted on a third party server, you can always remove the link from metrics, if they end up replacing the graph with a rickroll video or something. Of course, if we don't trust the third parties here and they are malicious, they could do this selectively in a way that we never notice.
Do we have a list of graphs and figures that we would like to include to metrics but we can't currently because they are hard to integrate to the current system? I can imagine the uncharted graphs showing network activity being one of them. What else?
In any case, I liked the thread and I really appreciate we are thinking of scaling metrics for the future. It's really important!
On 25 Nov (16:53:45), Karsten Loesing wrote:
Hello devs,
the Tor Metrics website [0] claims to be "the primary place to learn interesting facts about the Tor network" and invites its visitors who "come across something that is missing" to contact the website authors about it. That's a bold statement I put there! :)
Yet, there's considerable product backlog with possible enhancements [1] that doesn't seem to ever become shorter. Even worse, it can be expected that the backlog will refill quickly once the community notices that feature requests are suddenly considered. The main reason for this unfortunate situation is that Tor Metrics contains many moving parts, including some heavy database lifting that takes place below the surface, that all want to be maintained. Adding more parts just makes the whole thing even more likely to break. At the same time, knowing about the situation that Tor Metrics has become almost closed to contributions is painful.
This posting shall discuss possible solutions. The goal is to let Tor Metrics grow in a healthy fashion that encourages contributions from the community. These solutions are not mutually exclusive, and the best solution may use parts of more than one solution sketched out here.
1 Make Tor Metrics better and bigger, internally
The obvious solution is that the maintainers of Tor Metrics could just work harder to overcome the problems stated above. Let's think this through.
1.1 Add more development resources
If only the current Tor Metrics maintainers had more time to devote to cleaning up existing parts and to add new parts, that would solve our problem. They could refactor parts that are hard to maintain, and they could work off the serious backlog that has piled up. Of course, this means dropping or handing over responsibilities for other products, and it may mean finding (and paying) new developers to help maintain Tor Metrics. It's unclear whether anything like this would fit into Tor's budget, and whether these changed priorities would make users of tools that had to be dropped or handed over unhappy.
1.2 Rewrite internal parts of Tor Metrics to encourage external contributions
Most of Tor Metrics would have run 10 or 15 years ago with only minor modifications. It's not necessarily a bad thing to use established technologies. But maybe, if we rewrite it using modern data-processing, web, and visualization frameworks, it becomes more attractive to other developers to contribute code and help maintain existing (well, then rewritten) code. The result would be a larger Tor Metrics website that is easier to maintain and hopefully maintained by more people. It's unclear how realistic this plan is, though, and it requires attention by Tor Metrics maintainers to bring it enough into shape for external contributors to get involved.
I'm not 100% familiar with the whole process of adding a graph to metrics but I know a bit about the needed Java code and data source setup. In my case, about the graphs I do work with (see http://ygzf7uqcusp4ayjs.onion), I decided to go with Munin for two reasons. First of all, the data source for those graphs are on different machines (3 different for now) and munin offers a _super_ easy way to have remote node where the server just learns what has been deployed, gets the data out of it and auto-graph without any added configuration. Second reason is that I can use whatever language I want to generate those data points. In my case, I use stem extensivelly with Python.
So two things to consider here:
1) _easy_ way to add and deploy new graphs. By that I mean not requiring half a day from a metrics.tpo maintainer.
2) Have a way where the data source collection is decoupled from the graphing mechanism. I think metrics is quite good for that where it pulls CSV from collector.tpo (?) and then some Java/R programs graph it and generates an html page. I think Onionoo is a good tool in that direction (data source).
If we can get that "Java/R" step into an auto discovery way like Munin does or very simple one liner in a config file or a new script in a directory, it would be amazing. Furthermore, if a super epic graph developer wants to contribute, having a way to run metrics.tpo framework locally on a dev machine so it's easy to test would be even more epic.
There are plenty of tools nowadays that can help us do that without reinventing all the things. Food for thoughts :).
2 Add more ways to contribute to Tor Metrics externally
It may be possible to further grow Tor Metrics without adding more code to it, hence not making it any harder to maintain. However, if code to generate visualizations is run elsewhere, there's a certain risk that results are not perceived as trustworthy as if that code were run as part of Metrics. This is primarily a problem of setting user expectations right. We could add different ways for contributing to Tor Metrics, depending on the level of commitment that contributors are willing to make. Possible new ways (in addition to filing a Trac ticket, which is already possible, though not very effective) are:
I would always have graph generated on the metrics.tpo side. The data source for the graph though could be a remote machine but then you end up in the "security/authentication/trust" nightmare :S ...
If the entry bar for new graphs is super low that is technically very easy to add a new one (both data source and graph) then someone could submit (trac ticket) a new visualization and then the metrics team reviews it and merge.
Adding a graph as a "patch" would greatly help avoid more work on the metric team, but it need to be easy, documented and not a complicated framework to run (or at least test that the graph works for metrics.tpo).
2.1 Accept contribution of static data or static graphs
Somebody might contribute data (in a tarball, download link, etc.) or a static graph (static as in "doesn't break, ever", not "static HTML with a tiny amount of JavaScript that will surely never break"). The Tor Metrics team reviews that and puts it on the Tor Metrics website, together with a short description, author information, license, etc. There are plenty of visualizations on Trac and on the mailing lists, so we'll have to define criteria what we add and what not, and we'll need a good process for making that happen.
+1.
2.2 Link to external websites
Somebody might write a website that visualizes Tor network data. The Tor Metrics team reviews the idea behind it, but not necessarily look at its code, and adds an external link to Tor Metrics. It becomes obvious that the authors remain responsible for their visualization, so there's no risk involved for Tor Metrics, but users may not trust it as much, because it doesn't have the Tor Metrics label. Note that we're already doing this approach by linking to the visualizations showing "Tor users as percentage of larger Internet population" [2] and "Data flow in the Tor network" [3]. Also note that we could as well have hosted the former directly on Tor Metrics with appropriate attribution, because it's a static image. This is not the case with the latter.
It comes down to trust here I would say. Like George said in his previous email, we always have the luxury of removing the link if some crazy shit appears after a while but also it could be a sneaky way to deliver malware to users :).
So I would argue to put our effort into making metrics contributions so easy that we should only link to external websites for insane stuff like https://torflow.uncharted.software (from which we helped them).
2.3 Run an externally developed website as if it were part of Tor Metrics
Let's imagine that somebody produces a visualization of Tor network data and would like to make it part of Tor Metrics but without limiting themselves to the technology used by Tor Metrics. We could let them write their visualization as website and integrate it into Tor Metrics after reviewing its code.
Technically, part of this integration would be to "redress" the website by applying the Tor Metrics design (which has lots of room for improvement, but let's just say the result will look as seamlessly integrated into Tor Metrics as the "Network bubble graphs" [4]). Another part would probably be to rewrite web requests, so that users still think they're talking to https://metrics.torproject.org/, but really they're talking to another webserver behind that.
Regarding hosting and maintenance, in theory, the website could be hosted by the original creators, but that effectively means that the Tor Metrics team gives up part of the control about what's on the Tor Metrics website. The creators of the external website could change parts or add new parts that wouldn't be reviewed by Tor Metrics developers, but they would be perceived as part of Metrics, which seems bad. The Tor Metrics team could run the externally developed website on a separate host or on the same host as Tor Metrics. We could imagine variants where the original creator stays around to fix any issues as they come up, or we could imagine that they donate their visualization that the Tor Metrics people will then maintain. We could even imagine that the Tor Metrics maintainers some day decide to integrate the originally external website into Tor Metrics proper, but that would not be required for this model to work.
It goes back a bit to the third part discussion above.
All these ideas require writing down guidelines, criteria, and processes. In particular, they require more thoughts and input from other people who are not currently involved in Tor Metrics maintenance and who can be expected more objective. And once these ideas are implemented, we'll need more Tor Metrics maintainer than just one.
I would be very interested in people actually using/developing visualization tools nowadays and how we could make a transition to something much more fit for external contributions.
What about also a blog post on all of this?
Cheers! David
What are your thoughts?
All the best, Karsten
[0] https://metrics.torproject.org/
[1] https://trac.torproject.org/projects/tor/query?status=!closed&component=...
[2] https://metrics.torproject.org/oxford-anonymous-internet.html
[3] https://metrics.torproject.org/uncharted-data-flow.html
[4] https://metrics.torproject.org/bubbles.html
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Since the Berlin dev meeting I’ve been working on setting up and feeding an "analytics server" that will provide a Big Data infrastructure filled with raw metrics descriptors ready for consumption by anybody who’s interested in finding what’s in the Tor Metrics data. That obviously affects my take on Scaling Tor Metrics.
The whole
You havn’t mentioned the backend very much - the data gathering machinery, the handful of parsers (in as many different languages), the storage. Are you content with all that? You seem to assume that if the frontend is in better shape it will attract more people to the backend. Maybe, but I’m not sure. At least I found your little Antlr excursion recently pretty neat.
So far I haven’t succeeded in getting an overview over all of the parts in the backend that actually make up the Metrics infrastructure today. The number of projects listed in the Roadmap working draft is overwhelming. I see suspiciously many projects with some overlap here and there, a handful of parsers, a lot of attributes gathered. I wonder how much effect on the workload it would have alone to streamline these offerings. I’d like to understand which parts are the essential core, which provide valuable services to lots of people and which are special interest, nice to have or short of abandoned (usage wise). I have an idea of how a Big Data infrastructure could help metrics scale better in several directions but I can’t say how it would fit into the whole machinery. Having a clearer overview of the essential parts, the languages they are written in and the amount of maintenance they require would help.
Data offerings
The main asset of Tor Metrics is the data. Making that data easily available should be paramount. That means: 1) downloadable from the website 2) in a format that's ready for consumption in popular tools (eg. JSON) 3) raw (like it comes from CollecTor) as well as pre-aggregated for common usecases. 4) well documented and tastefully arranged. 5) live queryable (with a graphic interface) Metrics provides 1) but until now neither 2) nor 3). A JSON converter is in the works and a side effect of the analytics server should be the generation of ‘popular’ aggregates. But the analytics server is a project that will require some effort in administration and maintenance if it shall become more than an experiment. 4) is not unreachable. The spec actually goes a long way (but not all of it). 5) may not be achievable in the short term - BUT would be great!
PostgreSQL seems to be an important backbone of the Tor Metrics infrastructure. Like any RDBMS it is optimized for retrieving specific single datasets whereas Big Data analytics solutions are optimized for finding patterns and structures in large amounts of data. Tor Metrics' problem space clearly belongs to the latter category although data volume is on the edge between the realms of RDBMS and Big Data. The tools that we will deploy on the analytics server provide interfaces in R, Java, Python, Scala and even SQL. They are also available as standalone versions for your laptop - no need to set up a Hadoop cluster to work with them locally on some downloaded chunk of JSON. If the experiment with the analytics server plays out well a switch from RDBMS/PostgreSQL to Big Data/Hadoop should be seriously considered. There is one problem though: all that BigData/Hadoop stuff is _not_ in Debian stable.
I would suggest investing some time in better documentation of the workings of Tor and how the data gathered reflects its workings. I also suspect that some of the data gathered is not used any more or never was or is used so seldomly that it could be moved into a “misc” section, clearing the view on central aspects. This may sound arrogant and I sure was a little lazy in my research on this matter, but I also started thinking about documentation for the analytics server recently - because new users will probably need some advice on where to find what and how to find something else - and the prospect of documenting all those fields and attributes is slightly scary. New users might feel scared (away) too if we don’t manage to prioritize information and move a good part of it in the background (of the documentation, not of the database).
Visualizations
Visualizations are just one way of providing access to the data, both very selective and very succinct (if done right). Preproduced visualizations are very useful as answers to common questions (“How many users…”) and to illustrate not so obvious procedures (“How many Authorities consent on which server flags…”) but they can only provide an entry point. I tried to construct a rather generic visualization tool with “Visionion” but so far I failed - hybris probably and I’ll have to re-adjust. There are tools around in the data analytics world that work on the raw data interactively (provided that it’s in a popular format like JSON or csv - so CollecTor data has to be converted first) and generate graphs on the fly, in a slice ’n dice way - ‘Tableau' is such a tool, they are not cheap and there is still no open source offering (but there probably will be some day). And then there are one-off visualizations, handcrafted in visually fine tuned arrangement, made to illustrate specific aspects or prove a point - possibly static, with no live data or interactivity whatsoever. Successfull visualizations live mostly on the edges: they are either very generic or very specific or the tools are expensive. There is very little middle ground, or rather: this middle ground is really hard to achieve. Tor Metrics should produce some of the “most popular” and generally useful graphs and guarantee both correctness and actuality. That is low hanging fruit because it is technically not very hard and brings a lot of benefit in perception. Above that it should concentrate on providing the data that others can use to drive their visualizations in easy to consume formats (pre-aggregated JSON, again). When working on “Visionion” I was spending very much more time on aggregating the data then I had expected. I very much like David Goulets approach of curating (and eventually integrating on the metrics website) visualization scripts that work on the data we provide and that are provided to us as patches. I think that is the right way to go. We will have to provide some essential graphs ourselves and can then wait for contributions of more interesting or experimental stuff. We will then “just" have to check that the code does indeed what it claims to do. I’m not familiar with Munin though and I don’t even know what those .tpo’s he mentions are. As a web developer I would never choose the Java path but go with D3.js which expects JSON which Metrics would have to provide. D3 is the quasi standard for visualizations on the web and a very solid solution. It is very near to the metal of web technologies but requires some work on the data beforhand whereas Munin seems to be very near to the backend but require some Java provess. I don’t know which approach works out better in the long run and for the majority of people here.
To sum this up: - Technically some entry level visualizations (number of users, bandwidth consumed) are low hanging fruit and should be provided by us. It would just be too bad if they were totally missing from the website. And only we can provide them with the required authority. - Providing the data in easily consumable formats and pre-aggregated aspects is key to spur contributions. Visualizations can then be provided to us as patches. - Supporting the right tools to encourage further contributions is a tricky question. Web developers are best served with JSON data and some D3 templates. For backend developers David Goulets proposal might be a good choice. But which grouop is more important? Can they be served both? Is it worth the effort? - Linking to some work others did is okay as long as we make clear that it's not “tested/approved/guaranteed” by us and as we do not include more than a snmall screenshot/appetizer.
Web Frontend
10 or 15 years with the same web framework - hmm ;-) For JavaScript Frameworks that’s ages but given that very few Tor developers are using JavaScript why not choose one of the candidates in Java or Python, ‘Play!’ for example? But then again: isn’t a static site generator like Jekyll combined with a decent CSS-framework (no, I’m not speaking of Bootstrap) enough ? “Many contributers” smells a bit like CMS which I’m under the impression nobody wants. Cristobal/clv has made a proposal about how to combine a static site generator with git to support a multi user scenario with different roles etc. Anyway: the main problem in this space is choices. The questions are: what exactly is needed? Who will maintain it? What language do they favor?
Ciao Thomas
On 25 Nov 2015, at 16:53, Karsten Loesing karsten@torproject.org wrote:
Signed PGP part Hello devs,
the Tor Metrics website [0] claims to be "the primary place to learn interesting facts about the Tor network" and invites its visitors who "come across something that is missing" to contact the website authors about it. That's a bold statement I put there! :)
Yet, there's considerable product backlog with possible enhancements [1] that doesn't seem to ever become shorter. Even worse, it can be expected that the backlog will refill quickly once the community notices that feature requests are suddenly considered. The main reason for this unfortunate situation is that Tor Metrics contains many moving parts, including some heavy database lifting that takes place below the surface, that all want to be maintained. Adding more parts just makes the whole thing even more likely to break. At the same time, knowing about the situation that Tor Metrics has become almost closed to contributions is painful.
This posting shall discuss possible solutions. The goal is to let Tor Metrics grow in a healthy fashion that encourages contributions from the community. These solutions are not mutually exclusive, and the best solution may use parts of more than one solution sketched out here.
1 Make Tor Metrics better and bigger, internally
The obvious solution is that the maintainers of Tor Metrics could just work harder to overcome the problems stated above. Let's think this through.
1.1 Add more development resources
If only the current Tor Metrics maintainers had more time to devote to cleaning up existing parts and to add new parts, that would solve our problem. They could refactor parts that are hard to maintain, and they could work off the serious backlog that has piled up. Of course, this means dropping or handing over responsibilities for other products, and it may mean finding (and paying) new developers to help maintain Tor Metrics. It's unclear whether anything like this would fit into Tor's budget, and whether these changed priorities would make users of tools that had to be dropped or handed over unhappy.
1.2 Rewrite internal parts of Tor Metrics to encourage external contributions
Most of Tor Metrics would have run 10 or 15 years ago with only minor modifications. It's not necessarily a bad thing to use established technologies. But maybe, if we rewrite it using modern data-processing, web, and visualization frameworks, it becomes more attractive to other developers to contribute code and help maintain existing (well, then rewritten) code. The result would be a larger Tor Metrics website that is easier to maintain and hopefully maintained by more people. It's unclear how realistic this plan is, though, and it requires attention by Tor Metrics maintainers to bring it enough into shape for external contributors to get involved.
2 Add more ways to contribute to Tor Metrics externally
It may be possible to further grow Tor Metrics without adding more code to it, hence not making it any harder to maintain. However, if code to generate visualizations is run elsewhere, there's a certain risk that results are not perceived as trustworthy as if that code were run as part of Metrics. This is primarily a problem of setting user expectations right. We could add different ways for contributing to Tor Metrics, depending on the level of commitment that contributors are willing to make. Possible new ways (in addition to filing a Trac ticket, which is already possible, though not very effective) are:
2.1 Accept contribution of static data or static graphs
Somebody might contribute data (in a tarball, download link, etc.) or a static graph (static as in "doesn't break, ever", not "static HTML with a tiny amount of JavaScript that will surely never break"). The Tor Metrics team reviews that and puts it on the Tor Metrics website, together with a short description, author information, license, etc. There are plenty of visualizations on Trac and on the mailing lists, so we'll have to define criteria what we add and what not, and we'll need a good process for making that happen.
2.2 Link to external websites
Somebody might write a website that visualizes Tor network data. The Tor Metrics team reviews the idea behind it, but not necessarily look at its code, and adds an external link to Tor Metrics. It becomes obvious that the authors remain responsible for their visualization, so there's no risk involved for Tor Metrics, but users may not trust it as much, because it doesn't have the Tor Metrics label. Note that we're already doing this approach by linking to the visualizations showing "Tor users as percentage of larger Internet population" [2] and "Data flow in the Tor network" [3]. Also note that we could as well have hosted the former directly on Tor Metrics with appropriate attribution, because it's a static image. This is not the case with the latter.
2.3 Run an externally developed website as if it were part of Tor Metrics
Let's imagine that somebody produces a visualization of Tor network data and would like to make it part of Tor Metrics but without limiting themselves to the technology used by Tor Metrics. We could let them write their visualization as website and integrate it into Tor Metrics after reviewing its code.
Technically, part of this integration would be to "redress" the website by applying the Tor Metrics design (which has lots of room for improvement, but let's just say the result will look as seamlessly integrated into Tor Metrics as the "Network bubble graphs" [4]). Another part would probably be to rewrite web requests, so that users still think they're talking to https://metrics.torproject.org/, but really they're talking to another webserver behind that.
Regarding hosting and maintenance, in theory, the website could be hosted by the original creators, but that effectively means that the Tor Metrics team gives up part of the control about what's on the Tor Metrics website. The creators of the external website could change parts or add new parts that wouldn't be reviewed by Tor Metrics developers, but they would be perceived as part of Metrics, which seems bad. The Tor Metrics team could run the externally developed website on a separate host or on the same host as Tor Metrics. We could imagine variants where the original creator stays around to fix any issues as they come up, or we could imagine that they donate their visualization that the Tor Metrics people will then maintain. We could even imagine that the Tor Metrics maintainers some day decide to integrate the originally external website into Tor Metrics proper, but that would not be required for this model to work.
All these ideas require writing down guidelines, criteria, and processes. In particular, they require more thoughts and input from other people who are not currently involved in Tor Metrics maintenance and who can be expected more objective. And once these ideas are implemented, we'll need more Tor Metrics maintainer than just one.
What are your thoughts?
All the best, Karsten
[0] https://metrics.torproject.org/
[1] https://trac.torproject.org/projects/tor/query?status=!closed&component=...
[2] https://metrics.torproject.org/oxford-anonymous-internet.html
[3] https://metrics.torproject.org/uncharted-data-flow.html
[4] https://metrics.torproject.org/bubbles.html
tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
+ thomas lörtsch + hospitalstr. 95 + d 22767 hamburg +49 173 202 71 99 + tl@rat.io + tomlurge@someOtherServices
++ http://www.zeit.de/politik/ausland/2015-10/islam-saudi-arabien-salafismus-do... http://www.faz.net/aktuell/feuilleton/wohnungsbau-fuer-fluechtlinge-architek... http://www.faz.net/aktuell/politik/fluechtlingskrise/wie-der-fluechtlingsand...
On 28 Nov 2015, at 01:22, thomas lörtsch tl@rat.io wrote:
I don’t even know what those .tpo’s he mentions are.
.tpo is an abbreviation for .torproject.org http://torproject.org/ : a server under the Tor Project's domain.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
teor at blah dot im OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
hi, i want to share some thoughts from a person how is mostly into data visualization and not the backend part.
# making data easy to access that is mandatory! for me that means not just having access to a csv or json file with the data for a limited time period. it also means, that there is the information about how to access a dataset via an api call (rest and streaming) also a great advantage for graph pages on metrics site is the api call for the data behind. new contributors would have an easy entry point for exploring the possibilities of metrics data and data sources.
lets take the example of relays in bridges in the network. * one possible graph (like the existing graph) is just the distribution over time, maybe reduced to a period of time for an event that occurred and you want to explain in a blogpost or whatever. you will be fine with the csv/json file. * another visualization would be a map with all bridges and relays located in a country. (csv or rest api call) As a nice feature you want to make Relays/Bridges visible that went offline or came online in real time, then you need a streaming api that provides those information.
limiting the access to data means also limiting the possible outcome of a visualization / creativity of people who want to contribute new things or improvements. A good documentation was pointed out by Thomas (I really like the clear structure of onionoo protocol [0])
# Visualizations and Metrics site in general I'm (currently) just developing Viz that are interactive and in the web. Means using html, css, js(d3js) and a lot of preprocessing the data in nodejs on my local machine (as Thomas pointed out also).
I absolutely agree with Thomas about the low hanging fruits of nice data viz of important and popular graphs. Is there an existing guideline about what kind of technologies are allowed and used for essentials parts of the Metrics site? Looks like the Page is generated from Java?
The current Metrics site is also more like a list of links and 'hides' the graphs. I think a redesign would be helpful. Something like the example gallery of d3js [1]. You can promote the graphs better and make some categories fe. 'Metrics Data on External Sites' or even give the pictures a badge 'contains javascript' so that its clear to users that they leave the tor website or may need javascript for the interactive version of a graph.
Another thought about contributing visualization or data could be a github repository with all the files. Someone can review everything and choose if this should be a part of metrics or a link to an external site. The Gallery could then easily updated/extended via a pull request with information about the new visualization (screenshot, description, github, link to the site). The pull request can also work in the other direction if the site is no longer trusted. I also like the idea of using github gist [2] for contributing and share visualization. The inventor of d3js build a site for showing these gist's in a gallery [3,4]. But i don't know how difficult that is to re implement, the code for the server is not online i think.
I would also offering my help for more visualizations (especially the low hanging fruits) and helping with the redesign of the website if that would be an option.
Letty
[0] https://onionoo.torproject.org/protocol.html [1] https://github.com/mbostock/d3/wiki/Gallery [2] https://gist.github.com/ [3] http://bl.ocks.org/ [4] http://bl.ocks.org/mbostock