[tor-dev] Running a Separate Tor Network

Roger Dingledine arma at mit.edu
Wed Oct 22 10:48:28 UTC 2014


Hi Tom!

Neat stuff. Let me try to point you in useful directions.

On Wed, Oct 15, 2014 at 08:39:12PM -0500, Tom Ritter wrote:
> One of the first things I ran into was a problem where I could not get
> any routers to upload descriptors. [...]
> I imagine what
> would actually happen is Noisebridge and TorServers and a few other
> close friends would set the flag, they would get into the consensus,
> and then the rest of the network would start coming back...

Yep -- that seems like an adequate plan. Given that the Tor network has
been running for the last 12 or 13 years with exactly zero downtime,
and we have a plausible way of easily getting it back going if we need
to, I'm not worried.

> What I had to do was make one of my Directory Authorities an exit -
> this let the other nodes start building circuits through the
> authorities and upload descriptors.

This part seems surprising to me -- directory authorities always publish
their dirport whether they've found it reachable or not, and relays
publish their descriptors directly to the dirport of each directory
authority (not through the Tor network).

So maybe there's a bug that you aren't describing, or maybe you are
misunderstanding what you saw?

See also https://trac.torproject.org/projects/tor/ticket/11973

> Another problem I ran into was that nodes couldn't conduct
> reachability tests when I had exits that were only using the Reduced
> Exit Policy - because it doesn't list the ORPort/DirPort!  (I was
> using nonstandard ports actually, but indeed the reduced exit policy
> does not include 9001 or 9030.)  Looking at the current consensus,
> there are 40 exits that exit to all ports, and 400-something exits
> that use the ReducedExitPolicy.  It seems like 9001 and 9030 should
> probably be added to that for reachability tests?

The reachability tests for the ORPort involve extending the circuit to
the ORPort -- which doesn't use an exit stream. So your relays should
have been able to find themselves reachable, and published a descriptor,
even with no exit relays in the network.

But I think you're right that they would have opted to list their dirport
as 0, since they would not have been able to verify that it's reachable.
And that in turn would have caused clients to skip over them and ask
their questions to the directory authorities, since they're the only
ones advertising (with a non-zero dirport) that they know how to answer
directory questions.

So it would work, but it would be non-ideal from a scalability
perspective.

And once https://trac.torproject.org/projects/tor/ticket/12538 is
resolved it will work more smoothly anyway.

> Continuing in this thread, another problem I hit was that (I believe)
> nodes expect the 'Stable' flag when conducting certain reachability
> tests.  I'm not 100% certain - it may not prevent the relay from
> uploading a descriptor, but it seems like if no acceptable exit node
> is Stable - some reachability tests will be stuck.  I see these sorts
> of errors when there is no stable Exit node (the node generating the
> errors is in fact a Stable Exit though, so it clearly uploaded its
> descriptor and keeps running):

In consider_testing_reachability() we call

    circuit_launch_by_extend_info(CIRCUIT_PURPOSE_TESTING, ei,
                            CIRCLAUNCH_NEED_CAPACITY|CIRCLAUNCH_IS_INTERNAL);

So the ORPort reachability test doesn't require the Stable flag.

The DirPort reachability test just launches a new stream that attaches
to circuits like normal, so whether it prefers the Stable flag will be
a function of whether the destination DirPort is in the LongLivedPorts
set -- usually not I think.

> Oct 13 14:49:46.000 [warn] Making tunnel to dirserver failed.
> Oct 13 14:49:46.000 [warn] We just marked ourself as down. Are your
> external addresses reachable?
> Oct 13 14:50:47.000 [notice] No Tor server allows exit to
> [scrubbed]:25030. Rejecting.

That sure looks like a failed dirport reachability test. Nothing
necessarily to do with the Stable flag.

> Getting a BWAuth running was... nontrivial.
[...]
> There is a tremendous amount of
> code complexity buried beneath the statement 'Scan the nodes and see
> how fast they are', and a tremendous amount of informational
> complexity behind 'Weight the nodes so users can pick a good stream'.

Yeah, no kidding. And worse, its voodoo is no longer correctly tuned
for the current network. And it's not robust to intentional lying
attacks either. See e.g. the paragraph at the very end of
https://lists.torproject.org/pipermail/tor-reports/2014-October/000675.html

> dirvote_add_signatures_to_pending_consensus(): Added -1 signatures to
> consensus.

This one looks like a simple (harmless) bug. The code is

  r = networkstatus_add_detached_signatures(pc->consensus, sigs,
                                            source, severity, msg_out);
  log_info(LD_DIR,"Added %d signatures to consensus.", r);

and it shouldn't be logging that if r is < 0.

> I then added auth5 to a second DirAuth (auth2) as a trusted DirAuth.
> This results in a consensus for auth1, auth2, and auth5 - but auth3
> and auth4 did not sign it or produce a consensus.  Because the
> consensus was only signed by 2 of the 4 Auths (e.g., not a majority) -
> it was rejected by the relays (which did not list auth5).

Right -- when you change the set of directory authorities, you need to
get a sufficient clump of them to change all at once. This coordination
has been a real hassle as we grow the number of directory authorities,
and it's one of the main reasons we don't have more currently.

>  At this
> point something interesting and unexpected happened:
> 
> The other 2 DirAuths (not knowing about auth5) did not have a
> consensus.  This tricked dirvote_recalculate_timing into thinking we
> should use the TestingV3AuthInitialVotingInterval parameters, so they
> got out of sync with the other 3 DirAuths (that did know about auth5).
> That if/else statement seems very odd, and the parameters seem odd as
> well.  First off, I'm not clear what the parameters are intended to
> represent.  The man page says:
> 
> TestingV3AuthInitialVotingInterval N minutes|hours
>   Like V3AuthVotingInterval, but for initial voting interval before
> the first consensus has been created. Changing this requires that
> TestingTorNetwork is set. (Default: 30 minutes)
> TestingV3AuthInitialVoteDelay N minutes|hours
>   Like TestingV3AuthInitialVoteDelay, but for initial voting interval
> before the first consensus has been created. Changing this requires
> that TestingTorNetwork is set. (Default: 5 minutes)
> TestingV3AuthInitialDistDelay N minutes|hours
>   Like TestingV3AuthInitialDistDelay, but for initial voting interval
> before the first consensus has been created. Changing this requires
> that TestingTorNetwork is set. (Default: 5 minutes)

Basically, if you didn't make a consensus, you try to make one every
half hour rather than every hour, on the theory that the network should
recover faster if a lot of authorities are in this boat.

> Notice that the first says "Like V3AuthVotingInterval", but the other
> two just repeat their name?

This was fixed in git commit c03cfc05, and I think the fix went into
Tor 0.2.4.13-alpha. What ancient version is your man page from?

>  And how there _is no_
> V3AuthInitialVotingInterval?  And that you can't modify these
> parameters without turning on TestingTorParameters (despite the fact
> that they will be used without TestingTorNetwork?)  And also,
> unrelated to the naming, these parameters are a fallback case for when
> we don't have a consensus, but if they're not kept in sync with
> V3AuthVotingInterval and their kin - the DirAuth can wind up
> completely out of sync and be unable to recover (except by luck).

Yeah, don't mess with them unless you know what you're doing.

As for the confusing names, you're totally right:
https://trac.torproject.org/projects/tor/ticket/11967

> Other notes:
>  - I was annoyed by TestingAuthDirTimeToLearnReachability several
> times (as I refused to turn on TestingTorNetwork) - I wanted to
> override it. I thought maybe that should be an option, but ultimately
> convinced myself that in the event of a network reboot, the 30 minutes
> would likely still be needed.

Right. If you want to crank down this '30 minutes' value while
actually relying on the reachability tests, you will also need to
crank up the fraction of the network that gets tested on each call to
dirserv_test_reachability() -- i.e. REACHABILITY_MODULO_PER_TEST and
REACHABILITY_TEST_INTERVAL.

>  - The Directory Authority information is a bit out of date.
> Specifically, I was most confused by V1 vs V2 vs V3 Directories.  I am
> not sure if the actual network's DirAuths set V1AuthoritativeDirectory
> or V2AuthoritativeDirectory - but I eventually convinced myself that
> only V3AuthoritativeDirectory was needed.

Correct. Can you submit a ticket to fix this, wherever you found it?
Assuming it wasn't from your ancient man page that is? :)

>  - It seems like an Authority will not vote for itself as an HSDir or
> Stable... but I could't find precisely where that was in the code.
> (It makes sense to not vote itself Stable, but I'm not sure why
> HSDir...)

I think this is a bug. Mostly a harmless one in practice, but it might
be surprising in a tiny test network.

>  - The networkstatus-bridges file is not included in the tor man page

Yep. Please file a ticket.

>  - I feel like the log message "Consensus includes unrecognized
> authority" (currently info) is worthy of being upgraded to notice.

I don't think this is wise -- it is fine to have a consensus that has
been signed by a newer authority than you know about, so long as it has
enough signatures from ones you do know about.

If we made this a notice, then every time we added a new authority,
all the users running stable would see scary-sounding log messages and
report them to us over and over.

>  - I wanted the https://consensus-health.torproject.org/ page for my
> network, but didn't want to run the java code, so I ported it to
> python.  This project is growing, and right now I've been editing
> consensus_health_checker.py as well.
> https://github.com/tomrittervg/doctor/commits/python-website  I have a
> few more TODOs for it (like download statistics), but it's coming
> along.

Neat! Karsten has been wanting to get rid of the consensus-health page
for a while now. Maybe you want to run the replacement?

> Finally, something I wanted to ask after was the idea of a node (an
> OR, not a client) belonging to two or more Tor networks.  From the POV
> of the node operator, I would see it as a node would add some config
> lines (maybe 'AdditionalDirServer' to add to, rather than redefining,
> the default DirServers), and it would upload its descriptors to those
> as well, fetch a consensus from all AdditionalDirServers, and allow
> connections from and to nodes in either.  I'm still reading through
> the code to see which areas would be particularly confusing in the
> context of multiple consensuses, but I thought I'd throw it out there.

This idea should work in theory. In fact, back when Ironkey was
running their own Tor network, I joked periodically about just
dumping the cached-descriptors file from their network into moria1's
cached-descriptors file. I think that by itself would have been sufficient
to add all of those relays into our Tor network.

We're slowly accumulating situations where we want all the relays to
know about all the relays (e.g. RefuseUnknownExits), but I don't think
the world ends when it isn't quite true.

Thanks!
--Roger



More information about the tor-dev mailing list