[tor-dev] Incorporating your torsearch changes into Onionoo

Fri Oct 11 14:05:52 UTC 2013

On Fri, Oct 11, 2013 at 12:00 PM, Karsten Loesing <karsten at torproject.org>wrote:

> Hi Kostas,
>
> should we move this thread to tor-dev@?
>

Hi Karsten!

sure.

>From our earlier conversation about your GSoC project:
> > In particular, we should discuss how to integrate your project into
> > Onionoo.  I could imagine that we:
> >
> >  - create a database on the Onionoo machine;
> >  - run your database importer cronjob right after the current Onionoo
> > cronjob;
> >  - make your code produce statuses documents and store them on disk,
> > similar to details/weights/bandwidth documents;
> >  - let the ResourceServlet use your database to return the
> > fingerprints to return documents for; and
> >  - extend the ResourceServlet to support the new statuses documents.
> >
> > Maybe I'm overlooking something and you have a better plan?  In any
> > case, we should take the path that implies writing as little code as
> > possible to integrate your code in Onionoo.
>
> Let me know what you think!
>

Sounds good. Responding to particular points:

>  - create a database on the Onionoo machine;
>  - run your database importer cronjob right after the current Onionoo
> cronjob;

These should be no problem and make perfect sense. It's always best to use
raw SQL table creation routines to make sure the database looks exactly
like the one on the dev machine I guess (cf. using SQLAlchemy abstractions
to do that (I did that before)).

Current SQL script to do that is at [1]. I'll look over it. For example,
I'd (still) like to generate some plots showing the chances of two
fingerprints having the same substring (this is for the intermediate
fingerprint table.) (One axis would be substring length, another would be
the possibility in (portions of) %.) As of now, we still use
substr(fingerprint, 0, 12), and it is reflected in the schema.

Overall though, no particular snags here.

>  - make your code produce statuses documents and store them on disk,
> similar to details/weights/bandwidth documents;

Right, so if we are planning to support all V3 network statuses for all
fingerprints, how are we to store all the status documents? The idea is to
preprocess and serve static JSON documents, correct (as in the current
Onionoo)? (cf. the idea of simply caching documents: if we serve a
particular status document, it gets cached, and depending on the query
parameters (date range restriction, e.g.) it may be set not to expire at
all.)

Or should we try and actually store all the statuses (the condensed status
document version [2], of course)?

>  - let the ResourceServlet use your database to return the
> fingerprints to return documents for; and
>  - extend the ResourceServlet to support the new statuses documents.

Sounds good. I assume you are very busy with other things as well, so
ideally maybe you had in mind that I could try and do the Java part? :)
Though, since you are much more familiar with (your own) code, you could
probably do it faster than me. Not sure.
Any particular technical issues/nuances here (re: ResourceServlet)?

cheerio
Kostas.

[1]: https://github.com/wfn/torsearch/blob/master/db/db_create.sql
[2]:
https://github.com/wfn/torsearch/blob/master/docs/onionoo_api.md#network-status-entry-documents
(e.g.
http://ts.mkj.lt:5555/statuses?lookup=9695DFC35FFEB861329B9F1AB04C46397020CE31&condensed=true
 )
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20131011/3a0b4aa1/attachment.html>