Hi Qiang,
I'm cc'ing tor-dev, because I think your original request to work on Java codebases went there.
On 11/17/11 2:13 AM, Qiang Wang wrote:
This is Qiang, and I am very interested in Tor project, and want to contribute something. I was wondering if I can provide any help about metrics project?
Could you please let me know your recommendations?
Sure thing. Three ideas come to mind:
- Improve the consensus-health script: We have a Java program that downloads the network statuses from all eight directory authorities and compares them to each other to detect problems. One of the outputs is a web page [0], another output is a status email [1], and a third is a message sent to an IRC bot [2]. The code [3] needs some love.
- Answer the question what fraction of exit relays exit from a different IP than is in their descriptor. This is a typical question that can be answered using the metrics data [4] we have. We'd want to publish the analysis code in the metrics-tasks repository [5], so that others can reproduce the results or refine the analysis. There's already a lot of Java code in that repository, because that's what I use when analyzing metrics data. We have plenty of other analysis questions similar to this one, so if you don't like this one, take a look at the Analysis component in our bug tracker [6].
- Implement an efficient relay-search database. We have a web site for searching relays by IP, nickname, or fingerprint [7], but it's really slow. The main problem is that the database originally was designed for aggregating statistics about relays, not for searching relays. I have two ideas here: The first is to design a separate PostgreSQL database [8] and go crazy with indexes, the second is to try out CouchDB for this [9]. This task isn't really that Java-specific, except for the fact that I'm using Java to import data and send queries.
If anything sounds interesting to you, please let me know. I have more information about these tasks and can help you get started. Also look at the research section of the metrics website [10] to learn more.
Best, Karsten
[0] https://metrics.torproject.org/consensus-health.html
[1] https://lists.torproject.org/pipermail/tor-consensus-health/2011-November/00...
[2] See the IRC bot nsa in #tor-bots on OFTC, e.g., "< nsa> or: [consensus-health] The following directory authorities set conflicting or invalid consensus parameters: ides bwauthbestratio=1 bwauthcircs=0 bwauthdescbw=1 bwauthkp=10000 bwauthpid=1 bwauthtd=0 bwauthti=0 bwauthtidecay=5000 cbtnummodes=3 refuseunknownexits=1"
[3] https://gitweb.torproject.org/metrics-web.git/tree/HEAD:/src/org/torproject/...
[4] https://metrics.torproject.org/data.html
[5] https://gitweb.torproject.org/metrics-tasks.git
[6] https://trac.torproject.org/projects/tor/query?status=!closed&component=...
[7] https://metrics.torproject.org/relay-search.html
[8] https://trac.torproject.org/projects/tor/ticket/2922