Hi Kostas. Now that we no longer need to worry about accidentally leaking GSoC selection we can talk more openly about your project. Below is an interchange between me and Karsten - thoughts?
---------- Forwarded message ---------- From: Karsten Loesing karsten@torproject.org Date: Thu, May 23, 2013 at 11:37 AM Subject: Re: Metrics Plans To: Damian Johnson atagar@torproject.org Cc: Tor Assistants tor-assistants@lists.torproject.org
On 5/23/13 7:22 PM, Damian Johnson wrote:
Hi Karsten. I just finished reading over Kostas' proposal and while it looks great, I'm not sure if I fully understand the plan. Few clarifying questions...
- What descriptor information will his backend contain? Complete
descriptor attributes (ie, all the attributes from the documents), or only what we need? His proof of concept importer [1] only contains a subset but that's, of course, not necessarily where we're going.
If we're aiming for this to be the 'grand unifying backend' for Onionoo, Exonerator, Relay Search, etc then it seems like we might as well aim for it to be complete. But that naturally means more work with schema updates as descriptors change...
This GSoc idea started a year back as a searchable descriptor search application, totally unrelated to Onionoo. It was when I read Kostas' proposal that I started thinking about an integration with Onionoo. That's why the plan is still a bit vague. We should work together with Kostas very soon to clarify the plan.
- The present relay search renders raw router status entries. Does it
actually store the text of the router status entries within the database? With the new relay search I suppose we'll be retrieving the attributes rather than raw descriptor text, is that right?
The present relay search and ExoneraTor store raw text of router status entries in their databases. But that doesn't mean that the new relay search needs to do that, too.
- Kostas' proposal includes both the backend importing/datastore and
also a Flask frontend for rendering the search results. In terms of the present tools diagram [2] I suppose that would mean replacing metrics-web-R and having a python counterpart of metrics-db-R (with the aim of later deprecating the old metrics-db-R). Is that right?
Not quite. We cannot replace metrics-db-R yet, because that's the tool that downloads relay descriptors for all other services. It needs to work really stable. Replacing metrics-db-R would be a different project. The good thing though is that metrics-db-R offers its files via rsync, so that's a very clean interface for services using its data.
In terms of the tools diagram, Kostas would write a second tool in the "Process" column above Onionoo that would feed two replacement tools for metrics-web-R and metrics-web-E. His processing tool would use data from metrics-db-R and metrics-db-E.
If his tool is supposed to replace more parts of Onionoo and not only replace relay search and ExoneraTor, it would use data from metrics-db-B and metrics-db-P, too.
Maybe we should focus on a 'grand unified backend' rather than splitting Kostas' summer between both a backend and frontend? If he could replace the backends of the majority of our metrics services then that would greatly simplify the metrics ecosystem.
I'm mostly interested in the back-end, too. But I think it won't be as much fun for Kostas if he can't also work on something that's visible to users. I don't know what he prefers though.
In my imagination, here's how the tools diagram looks like by the end of summer:
- Kostas has written an Onionoo-like back-end that allows searches for relays or bridges in our archives since 2007 and provides details for any point in the past. Maybe his tool will implement the existing Onionoo interface, so that Atlas and Compass can switch to using it instead of Onionoo.
- We'll still keep using Onionoo for aggregating bandwidth and weights statistics per relay or bridge, but Kostas' tool would give out that data.
- Thomas has written Visionion and replacements for metrics-web-N and metrics-web-U. You probably saw the long discussion on this list. This is a totally awesome project on its own, but it's sufficiently separate from Kostas' project (Kostas is only interested in single relays/bridges, whereas Thomas is only interested in aggregates).
I'm aware that not all of this may happen in one summer. That's why I'm quite flexible about plans. There are quite a lot of missing puzzle pieces in the overall picture, people can start wherever they want and contribute something useful.
I was very, very tempted to start up a thread on tor-dev@ to discuss this but couldn't figure out a way of doing so without letting Kostas know that we're taking him on. If you can think of a graceful way of including him or tor-dev@ then feel free.
Let's wait four more days, if that's okay for you. Starting a new discussion there about this together with Kostas sounds like a fine plan.
This will be an exciting summer! :)
Best, Karsten
[1] https://github.com/wfn/torsearch/blob/master/tsweb/importer.py#L16 [2] https://metrics.torproject.org/tools.html