Karsten Loesing karsten.loesing at gmx.net
Thu Nov 17 06:51:40 UTC 2011

Hi Qiang,

I'm cc'ing tor-dev, because I think your original request to work on
Java codebases went there.

On 11/17/11 2:13 AM, Qiang Wang wrote:
> This is Qiang, and I am very interested in Tor project, and want to
> contribute something. I was wondering if I can provide any help about
> metrics project?
> Could you please let me know your recommendations?

Sure thing.  Three ideas come to mind:

- Improve the consensus-health script: We have a Java program that
downloads the network statuses from all eight directory authorities and
compares them to each other to detect problems.  One of the outputs is a
web page [0], another output is a status email [1], and a third is a
message sent to an IRC bot [2].  The code [3] needs some love.

- Answer the question what fraction of exit relays exit from a different
IP than is in their descriptor.  This is a typical question that can be
answered using the metrics data [4] we have.  We'd want to publish the
analysis code in the metrics-tasks repository [5], so that others can
reproduce the results or refine the analysis.  There's already a lot of
Java code in that repository, because that's what I use when analyzing
metrics data.  We have plenty of other analysis questions similar to
this one, so if you don't like this one, take a look at the Analysis
component in our bug tracker [6].

- Implement an efficient relay-search database.  We have a web site for
searching relays by IP, nickname, or fingerprint [7], but it's really
slow.  The main problem is that the database originally was designed for
aggregating statistics about relays, not for searching relays.  I have
two ideas here: The first is to design a separate PostgreSQL database
[8] and go crazy with indexes, the second is to try out CouchDB for this
[9].  This task isn't really that Java-specific, except for the fact
that I'm using Java to import data and send queries.

If anything sounds interesting to you, please let me know.  I have more
information about these tasks and can help you get started.  Also look
at the research section of the metrics website [10] to learn more.


[0] https://metrics.torproject.org/consensus-health.html


[2] See the IRC bot nsa in #tor-bots on OFTC, e.g., "< nsa> or:
[consensus-health] The following directory authorities set conflicting
or invalid consensus parameters: ides bwauthbestratio=1 bwauthcircs=0
bwauthdescbw=1 bwauthkp=10000 bwauthpid=1 bwauthtd=0 bwauthti=0
bwauthtidecay=5000 cbtnummodes=3 refuseunknownexits=1"


[4] https://metrics.torproject.org/data.html

[5] https://gitweb.torproject.org/metrics-tasks.git


[7] https://metrics.torproject.org/relay-search.html

[8] https://trac.torproject.org/projects/tor/ticket/2922

[9] https://trac.torproject.org/projects/tor/ticket/4440

[10] https://metrics.torproject.org/research.html

