[tor-bugs] #9204 [Tor Check]: Modularize check.torproject.org
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Jul 4 22:10:12 UTC 2013
#9204: Modularize check.torproject.org
Reporter: arma | Owner:
Type: project | Status: new
Priority: normal | Milestone:
Component: Tor Check | Version:
Keywords: | Parent:
Points: | Actualpoints:
Discussion between Roger and me concerning 'A' (alternative for
14:20 < armadev> atagar: speaking of migrating stuff to stem, you might
find some of the sub-tasks of #9204 are things that stem wants to learn
how to do.
14:20 < armadev> or things that are made much easier by stem
14:27 < atagar> For 'A' I'm a little surprised that the directory
authorities alredy have the exit address. How do they get it? Doesn't that
largely obsolete the need for
tordnsel and torbel?
14:28 < atagar> It sounds like something that should go into an extra
descriptor type. Mabye there should be an 'extra' counterpart for router
status entries for attributes
that aren't self-published?
14:29 < atagar> 'a Tor consensus document, and a destination port, and
outputs a list of exit IPs that allow connections to that port.' => stem
already does that, if you give
it an exit policy (normal or micro) it can tell you if a
given destination is allowed or not
14:30 < atagar> https://stem.torproject.org/api/exit_policy.html
14:31 * atagar puts this in the ticket
14:31 < armadev> for 'A', the directory authorities have it in the @source
annotation of the descriptor
14:31 < armadev> since the authorities write down what IP they received
the descriptor from
14:31 < armadev> sometimes it's another authority, in which case we don't
know. but if it isn't another authority, it's the outbound IP of the
14:33 < atagar> ahh, gotcha
14:34 < armadev> e.g.
14:34 < armadev> @source "188.8.131.52"
14:34 < armadev> router lumag 184.108.40.206 9001 0 9030
14:34 < atagar> My two cents is that it should definitely be published in
some fashion. Considering that two services have been written to get the
exit address it's evidently
kinda-sorta useful. :)
14:34 < armadev> do the descriptor annotations get archived? does stem
know what to do with them already?
14:35 < armadev> it sounds like stem is nearly ready to be this script
14:35 < atagar> Yup, stem handles annotations (mostly for the cached-*
files in your data directory). I'm not sure if metrics has those
annotations or not. I'm guessing that
they're on the vote documents but not the conesnsus,
14:36 < armadev> they're not even on the vote documents
14:36 < armadev> they're just in the cached-descriptors file that
14:36 < armadev> we could probably add them to the vote documents. maybe
that would be helpful.
14:36 < armadev> i don't think we want to add them to the consensus
though, because it's not something every client needs every hour
14:37 < armadev> i was imagining we'd run a stem like thing on each
authority, monitoring each descriptor it gets
14:37 < atagar> Definitely. That's why I suggested an 'extra' document for
router status entries (like extrainfo descriptors are for server
14:37 < armadev> heck, i think there's even a newdescriptor event or
something, which weasel put in
14:37 < armadev> the advantage of running it on authorities is that it
learns answers before the consensus is even made
14:37 < armadev> so the "oops that relay isn't in the consensus yet sorry"
false positives go away
14:38 < atagar> agreed, that would be nice
14:39 < armadev> i think most authorities don't hook up to a controller
14:39 < armadev> and that's probably wise
14:39 < atagar> it wouldn't need to if they're cached to disk
14:39 < armadev> so it would be more flexible for this thing to just
monitor their cached-descriptor* files
14:39 < armadev> right
14:40 < atagar> Yes, stem already does that. The DescriptorReader keeps
track of the last modified timestamps so it can pick up new files as
they're added. Something karsten
wanted for metrics use cases...
14:40 < atagar> https://stem.torproject.org/api/descriptor/reader.html
14:42 < armadev> gosh
14:42 < armadev> cool :)
14:42 < atagar> glad you like :)
14:42 < armadev> so, here's one architecture approach:
14:43 < armadev> each authority runs one of these things that watches its
cached descriptor files. periodically that thing writes out an
exitaddresses file in some format, and
soon after it exports it to some central place
14:43 < armadev> the central place aggregates them, meaning basically cats
them together but also handles conflicts in some way
14:43 < armadev> then we have our exit-addresses file. item A is done, and
we throw out tordnsel forever.
14:44 < atagar> I like. In essence this is a feed rather than a
periodically published document, yes?
14:44 < armadev> then there's another script-that-uses-stem to take in
that exit-addresses file, a pile of consensus stanzas (i.e. one or more
consensus docs, catted together),
and produce the output for B.
14:45 < atagar> ie, 'fingerprint X, address/port Y added/removed'?
14:45 < armadev> is it easy for stem to take in descriptors as well as
consensuses for B, so we can input an IP:port destination rather than just
a port destination?
14:45 < armadev> yes, it is a feed. is there a better way than 'periodic
14:46 < atagar> I was thinking of a service where you could give it a
timestamp and batch size, and it would give you events. Then the caller
keeps a high water mark.
14:47 < armadev> we need to authenticate the stuff we export. i'd been
figuring ssh or the like. if it's an online service, that would seem to
14:47 < armadev> (but even ssh is messy)
14:51 < atagar> I'd be tempted to do an ssl endpoint and simply either buy
a cert or use a self-signed one we provide a pgp sig for. But meh, the
rest of you certainly have
stronger feelings about that aspect than me.
14:51 < armadev> i just know that i hate all the solutions
14:51 < armadev> so i am open to whatever other people want to do :)
14:52 < armadev> self-signed is probably better. no need to buy it if
browsers aren't going to be using it. and we'll be pinning the cert, not
its signer, anyway.
14:56 < armadev> do you like the current format of the exit-addresses
file, or is there something it lacks?
14:59 < atagar> So to summarize we're thinking of a service with something
like the following API:
14:59 < atagar> * Periodic document of all relay address/ports. This could
be generated, say, once a day. get_last_state(): [timestamp, (address,
port), (address, port)...]
14:59 < atagar> * Feed that gets the changes since a given timestamp.
get_changes_since(timestamp, batch_size) => [(timestamp, added/removed,
14:59 < atagar> Yes? I still think it might be better for authorities to
publish this in a document of some kind. Then Onionoo could surface this
API without hacking up
authorities to publish this information on their own
15:00 < atagar> I haven't looked at the exit lists. There's a backlog to
add support for them at some point:
15:00 < atagar> (happy to do it when somebody has a use for the
15:01 < atagar> armadev: Shall I add this backlog to the ticket? If
there's something you'd like from stems front I'd be happy to discuss it,
but this sounds like there's not
much needed from me presently.
15:04 < armadev> atagar: feel free to post backlog
15:04 < armadev> atagar: if the document is published daily, it will not
be quick enough for a service like check.
15:06 < atagar> I don't think you understand. It publishes a daily
document so sweepers using it can get a snapshot when bootstrapping, then
poll the feed.
15:06 < atagar> Otherwise new callers would have a feed, but no idea of
the relays before that point.
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9204#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs