[tor-dev] Incorporating your torsearch changes into Onionoo

Kostas Jakeliunas kostas at jakeliunas.com
Fri Oct 25 13:29:45 UTC 2013


On Wed, Oct 23, 2013 at 2:32 PM, Karsten Loesing <karsten at torproject.org>wrote:

> On 10/11/13 4:05 PM, Kostas Jakeliunas wrote:
>
> Oops!  Sorry for the delay in responding!  Responding now.
>
> > On Fri, Oct 11, 2013 at 12:00 PM, Karsten Loesing <
> karsten at torproject.org>wrote:
> >
> >> Hi Kostas,
> >>
> >> should we move this thread to tor-dev@?
> >>
> >
> > Hi Karsten!
> >
> > sure.
> >
> >>From our earlier conversation about your GSoC project:
> >>> In particular, we should discuss how to integrate your project into
> >>> Onionoo.  I could imagine that we:
> >>>
> >>>  - create a database on the Onionoo machine;
> >>>  - run your database importer cronjob right after the current Onionoo
> >>> cronjob;
> >>>  - make your code produce statuses documents and store them on disk,
> >>> similar to details/weights/bandwidth documents;
> >>>  - let the ResourceServlet use your database to return the
> >>> fingerprints to return documents for; and
> >>>  - extend the ResourceServlet to support the new statuses documents.
> >>>
> >>> Maybe I'm overlooking something and you have a better plan?  In any
> >>> case, we should take the path that implies writing as little code as
> >>> possible to integrate your code in Onionoo.
> >>
> >> Let me know what you think!
> >>
> >
> > Sounds good. Responding to particular points:
> >
> >>  - create a database on the Onionoo machine;
> >>  - run your database importer cronjob right after the current Onionoo
> >> cronjob;
> >
> > These should be no problem and make perfect sense. It's always best to
> use
> > raw SQL table creation routines to make sure the database looks exactly
> > like the one on the dev machine I guess (cf. using SQLAlchemy
> abstractions
> > to do that (I did that before)).
> >
> > Current SQL script to do that is at [1]. I'll look over it. For example,
> > I'd (still) like to generate some plots showing the chances of two
> > fingerprints having the same substring (this is for the intermediate
> > fingerprint table.) (One axis would be substring length, another would be
> > the possibility in (portions of) %.) As of now, we still use
> > substr(fingerprint, 0, 12), and it is reflected in the schema.
> >
> > Overall though, no particular snags here.
>
> I don't follow.  But before we get into details here, I must admit that
> I was too optimistic about running your code on the current Onionoo
> machine.  I ran a few benchmark tests on it last week to compare it to
> new hardware, and those tests almost made it fall over.  We should not
> even think about adding new load to the current machine.
>
> New plan: can you run an Onionoo instance with your changes on a
> different machine?  (If you need anything from me, like a tarball of the
> status/ and out/ directories, I'm happy to provide them to you.)  I
> think we should run this instance for a while to see how reliable it is.
>  And once we're confident enough, we'll likely have new hardware for the
> new Onionoo, so that we can move it there.
>

This sounds like a very good idea. Ok, I can try and do this. Sorry for
delaying my response as well, I'll try and follow up with what I need (if
anything).

>>  - make your code produce statuses documents and store them on disk,
> >> similar to details/weights/bandwidth documents;
> >
> > Right, so if we are planning to support all V3 network statuses for all
> > fingerprints, how are we to store all the status documents? The idea is
> to
> > preprocess and serve static JSON documents, correct (as in the current
> > Onionoo)? (cf. the idea of simply caching documents: if we serve a
> > particular status document, it gets cached, and depending on the query
> > parameters (date range restriction, e.g.) it may be set not to expire at
> > all.)
> >
> > Or should we try and actually store all the statuses (the condensed
> status
> > document version [2], of course)?
>
> Let's do it as the current Onionoo does it.  This code does not exist,
> right?
>

I've done some small testing on a local system, it seems the Onionoo way is
plausible, since the generation of all the old(er) status etc. documents
needs to happen only once (obviously, but now I understand this means the
number of resulting status documents and their size is not such a big deal
after all.) I don't have good code for it as of yet.


> >>  - let the ResourceServlet use your database to return the
> >> fingerprints to return documents for; and
> >>  - extend the ResourceServlet to support the new statuses documents.
> >
> > Sounds good. I assume you are very busy with other things as well, so
> > ideally maybe you had in mind that I could try and do the Java part? :)
> > Though, since you are much more familiar with (your own) code, you could
> > probably do it faster than me. Not sure.
> > Any particular technical issues/nuances here (re: ResourceServlet)?
>
> Can you give it a try?  Happy to help with specific questions about
> ResourceServlet, and I'll try hard to reply faster this time.  Again,
> sorry for the delay!
>

Okay! I've been tinkering a bit, actually. Will see if I can produce
something decent and reliable.

Best wishes
Kostas.

>
> > [1]: https://github.com/wfn/torsearch/blob/master/db/db_create.sql
> > [2]:
> >
> https://github.com/wfn/torsearch/blob/master/docs/onionoo_api.md#network-status-entry-documents
> > (e.g.
> >
> http://ts.mkj.lt:5555/statuses?lookup=9695DFC35FFEB861329B9F1AB04C46397020CE31&condensed=true
> >  )
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20131025/013739f4/attachment.html>


More information about the tor-dev mailing list