Restricted entry (helper) nodes in 0.1.1.x

Tue Dec 20 10:48:53 UTC 2005

Hi folks,

I've gotten the codebase to the point that I'm going to start trying
to make helper nodes work well. With luck they will be on by default in
the final 0.1.1.x release.

For background on helper nodes, read
http://wiki.noreply.org/noreply/TheOnionRouter/TorFAQ#RestrictedEntry

First order of business: the phrase "helper node" sucks. We always have
to define it after we say it to somebody. Nick likes the phrase "contact
node", because they are your point-of-contact into the network. That is
better than phrases like "bridge node". The phrase "fixed entry node"
doesn't seem to work with non-math people, because they wonder what was
broken about it. I'm sort of partial to the phrase "entry node" or maybe
"restricted entry node". In any case, if you have ideas on names, please
mail me off-list and I'll collate them.

Right now the code exists to pick helper nodes, store our choices to
disk, and use them for our entry nodes. But there are three topics
to tackle before I'm comfortable turning them on by default. First,
how to handle churn: since Tor nodes are not always up, and sometimes
disappear forever, we need a plan for replacing missing helpers in a
safe way. Second, we need a way to distinguish "the network is down"
from "all my helpers are down", also in a safe way. Lastly, we need to
examine the situation where a client picks three crummy helper nodes
and is forever doomed to a lousy Tor experience. Here's my plan:

How to handle churn.
  - Keep track of whether you have ever actually established a
    connection to each helper. Any helper node in your list that you've
    never used is ok to drop immediately. Also, we don't save that
    one to disk.
  - If all our helpers are down, we need more helper nodes: add a new
    one to the *end*of our list. Only remove dead ones when they have
    been gone for a very long time (months).
  - Pick from the first n (by default 3) helper nodes in your list
    that are up (according to the network-statuses) and reachable
    (according to your local firewall config).
    - This means that order matters when writing/reading them to disk.

How to deal with network down.
  - While all helpers are down/unreachable and there are no established
    or on-the-way testing circuits, launch a testing circuit. (Do this
    periodically in the same way we try to establish normal circuits
    when things are working normally.)
    (Testing circuits are a special type of circuit, that streams won't
    attach to by accident.)
  - When a testing circuit succeeds, mark all helpers up and hold
    the testing circuit open.
  - If a connection to a helper succeeds, close all testing circuits.
    Else mark that helper down and try another.
  - If the last helper is marked down and we already have a testing
    circuit established, then add the first hop of that testing circuit
    to the end of our helper node list, close that testing circuit,
    and go back to square one. (Actually, rather than closing the
    testing circuit, can we get away with converting it to a normal
    circuit and beginning to use it immediately?)

How to pick non-sucky helpers.
  - When we're picking a new helper nodes, don't use ones which aren't
    reachable according to our local ReachableAddresses configuration.
  (There's an attack here: if I pick my helper nodes in a very
   restrictive environment, say "ReachableAddresses 18.0.0.0/255.0.0.0:*",
   then somebody watching me use the network from another location will
   guess where I first joined the network. But let's ignore it for now.)
  - Right now we choose new helpers just like we'd choose any entry
    node: they must be "stable" (claim >1day uptime) and "fast" (advertise
    >10kB capacity). In 0.1.1.11-alpha, clients let dirservers define
    "stable" and "fast" however they like, and they just believe them.
    So the next step is to make them a function of the current network:
    e.g. line up all the 'up' nodes in order and declare the top
    three-quarter to be stable, fast, etc, as long as they meet some
    minimum too.
  - If that's not sufficient (it won't be), dirservers should introduce
    a new status flag: in additional to "stable" and "fast", we should
    also describe certain nodes as "entry", meaning they are suitable
    to be chosen as a helper. The first difference would be that we'd
    demand the top half rather than the top three-quarters. Another
    requirement would be to look at "mean time between returning" to
    ensure that these nodes spend most of their time available. (Up for
    two days straight, once a month, is not good enough.)
  - Lastly, we need a function, given our current set of helpers and a
    directory of the rest of the network, that decides when our helper
    set has become "too crummy" and we need to add more. For example,
    this could be based on currently advertised capacity of each of
    our helpers, and it would also be based on the user's preferences
    of speed vs. security.

Thoughts? Guesses on what I've left out, or security problems with
the above plans?

--Roger