[tor-bugs] #9204 [Tor Check]: Modularize check.torproject.org

Thu Jul 4 22:10:12 UTC 2013

#9204: Modularize check.torproject.org
-----------------------+----------------------------------------------------
 Reporter:  arma       |          Owner:     
     Type:  project    |         Status:  new
 Priority:  normal     |      Milestone:     
Component:  Tor Check  |        Version:     
 Keywords:             |         Parent:     
   Points:             |   Actualpoints:     
-----------------------+----------------------------------------------------

Comment(by atagar):

 Discussion between Roger and me concerning 'A' (alternative for
 tordnsel/torbel)...

 {{{
 14:20 < armadev> atagar: speaking of migrating stuff to stem, you might
 find some of the sub-tasks of #9204 are things that stem wants to learn
 how to do.
 14:20 < armadev> or things that are made much easier by stem
 14:27 < atagar> For 'A' I'm a little surprised that the directory
 authorities alredy have the exit address. How do they get it? Doesn't that
 largely obsolete the need for
                 tordnsel and torbel?
 14:28 < atagar> It sounds like something that should go into an extra
 descriptor type. Mabye there should be an 'extra' counterpart for router
 status entries for attributes
                 that aren't self-published?
 14:29 < atagar> 'a Tor consensus document, and a destination port, and
 outputs a list of exit IPs that allow connections to that port.' => stem
 already does that, if you give
                 it an exit policy (normal or micro) it can tell you if a
 given destination is allowed or not
 14:30 < atagar> https://stem.torproject.org/api/exit_policy.html
 14:31  * atagar puts this in the ticket
 14:31 < armadev> for 'A', the directory authorities have it in the @source
 annotation of the descriptor
 14:31 < armadev> since the authorities write down what IP they received
 the descriptor from
 14:31 < armadev> sometimes it's another authority, in which case we don't
 know. but if it isn't another authority, it's the outbound IP of the
 relay.
 14:33 < atagar> ahh, gotcha
 14:34 < armadev> e.g.
 14:34 < armadev> @source "89.110.12.236"
 14:34 < armadev> router lumag 91.122.9.198 9001 0 9030
 14:34 < atagar> My two cents is that it should definitely be published in
 some fashion. Considering that two services have been written to get the
 exit address it's evidently
                 kinda-sorta useful. :)
 14:34 < armadev> do the descriptor annotations get archived? does stem
 know what to do with them already?
 14:35 < armadev> it sounds like stem is nearly ready to be this script
 14:35 < atagar> Yup, stem handles annotations (mostly for the cached-*
 files in your data directory). I'm not sure if metrics has those
 annotations or not. I'm guessing that
                 they're on the vote documents but not the conesnsus,
 right?
 14:36 < armadev> they're not even on the vote documents
 14:36 < armadev> they're just in the cached-descriptors file that
 authorities have
 14:36 < armadev> we could probably add them to the vote documents. maybe
 that would be helpful.
 14:36 < armadev> i don't think we want to add them to the consensus
 though, because it's not something every client needs every hour
 14:37 < armadev> i was imagining we'd run a stem like thing on each
 authority, monitoring each descriptor it gets
 14:37 < atagar> Definitely. That's why I suggested an 'extra' document for
 router status entries (like extrainfo descriptors are for server
 descriptors)
 14:37 < armadev> heck, i think there's even a newdescriptor event or
 something, which weasel put in
 14:37 < armadev> the advantage of running it on authorities is that it
 learns answers before the consensus is even made
 14:37 < armadev> so the "oops that relay isn't in the consensus yet sorry"
 false positives go away
 14:38 < atagar> agreed, that would be nice
 14:39 < armadev> i think most authorities don't hook up to a controller
 though
 14:39 < armadev> and that's probably wise
 14:39 < atagar> it wouldn't need to if they're cached to disk
 14:39 < armadev> so it would be more flexible for this thing to just
 monitor their cached-descriptor* files
 14:39 < armadev> right
 14:40 < atagar> Yes, stem already does that. The DescriptorReader keeps
 track of the last modified timestamps so it can pick up new files as
 they're added. Something karsten
                 wanted for metrics use cases...
 14:40 < atagar> https://stem.torproject.org/api/descriptor/reader.html
 14:42 < armadev> gosh
 14:42 < armadev> cool :)
 14:42 < atagar> glad you like :)
 14:42 < armadev> so, here's one architecture approach:
 14:43 < armadev> each authority runs one of these things that watches its
 cached descriptor files. periodically that thing writes out an
 exitaddresses file in some format, and
                  soon after it exports it to some central place
 14:43 < armadev> the central place aggregates them, meaning basically cats
 them together but also handles conflicts in some way
 14:43 < armadev> then we have our exit-addresses file. item A is done, and
 we throw out tordnsel forever.
 14:44 < atagar> I like. In essence this is a feed rather than a
 periodically published document, yes?
 14:44 < armadev> then there's another script-that-uses-stem to take in
 that exit-addresses file, a pile of consensus stanzas (i.e. one or more
 consensus docs, catted together),
                  and produce the output for B.
 14:45 < atagar> ie, 'fingerprint X, address/port Y added/removed'?
 14:45 < armadev> is it easy for stem to take in descriptors as well as
 consensuses for B, so we can input an IP:port destination rather than just
 a port destination?
 14:45 < armadev> yes, it is a feed. is there a better way than 'periodic
 export'?
 14:46 < atagar> I was thinking of a service where you could give it a
 timestamp and batch size, and it would give you events. Then the caller
 keeps a high water mark.
 14:47 < armadev> we need to authenticate the stuff we export. i'd been
 figuring ssh or the like. if it's an online service, that would seem to
 get messier.
 14:47 < armadev> (but even ssh is messy)
 14:51 < atagar> I'd be tempted to do an ssl endpoint and simply either buy
 a cert or use a self-signed one we provide a pgp sig for. But meh, the
 rest of you certainly have
                 stronger feelings about that aspect than me.
 14:51 < armadev> i just know that i hate all the solutions
 14:51 < armadev> so i am open to whatever other people want to do :)
 14:52 < armadev> self-signed is probably better. no need to buy it if
 browsers aren't going to be using it. and we'll be pinning the cert, not
 its signer, anyway.
 14:56 < armadev> do you like the current format of the exit-addresses
 file, or is there something it lacks?
 14:59 < atagar> So to summarize we're thinking of a service with something
 like the following API:
 14:59 < atagar> * Periodic document of all relay address/ports. This could
 be generated, say, once a day. get_last_state(): [timestamp, (address,
 port), (address, port)...]
 14:59 < atagar> * Feed that gets the changes since a given timestamp.
 get_changes_since(timestamp, batch_size) => [(timestamp, added/removed,
 address, port)...]
 14:59 < atagar> Yes? I still think it might be better for authorities to
 publish this in a document of some kind. Then Onionoo could surface this
 API without hacking up
                 authorities to publish this information on their own
 someplace.
 15:00 < atagar> I haven't looked at the exit lists. There's a backlog to
 add support for them at some point:
 https://trac.torproject.org/projects/tor/ticket/8255
 15:00 < atagar> (happy to do it when somebody has a use for the
 functionality)
 15:01 < atagar> armadev: Shall I add this backlog to the ticket? If
 there's something you'd like from stems front I'd be happy to discuss it,
 but this sounds like there's not
                 much needed from me presently.
 15:04 < armadev> atagar: feel free to post backlog
 15:04 < armadev> atagar: if the document is published daily, it will not
 be quick enough for a service like check.
 15:06 < atagar> I don't think you understand. It publishes a daily
 document so sweepers using it can get a snapshot when bootstrapping, then
 poll the feed.
 15:06 < atagar> Otherwise new callers would have a feed, but no idea of
 the relays before that point.
 }}}

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9204#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online