The fallback system is designed to gracefully degrade as fallback

directory mirrors fail. Failures shift load to directory authorities,

and cause brief delays during client bootstrap.

We expect the system to operate well, even if all the fallbacks have

failed. But we try to keep the fallback failure rate below 20-30%.

When the failure rate gets too high, we rebuild the fallback list.

Regular Tasks

This ticket is the parent ticket for the next fallback rebuild:

This ticket contains the "offer list" changes that relay operators have

requested. I usually commit them all at once, but you should feel free

to do them incrementally:

Sometimes, we don't have enough relays on the offer list, and we have

to ask relay operators to opt-in to the list. Ideally, we want at least 100

fallbacks, we usually have between 120-160.

Future Work

It's hard to verify changes to the offer list. Changes are usually sent by

email or through trac tickets. There's no reliable trust path from the

relay key to the email or ticket.

The opt-in process is also a manual process. It can be time-consuming.

To resolve these issues, I had planned to add a signed fallback offer line

to relay descriptors:

Instead of checking the list in the fallback-scripts repository, the script

can check relay descriptors instead. (Or check both, during the transition

period.)

Unresolved Issues

Fallbacks eventually see the entire set of clients. Clients that are active

all the time may only ever contact one fallback. (Clients re-use the same

fallback for authority keys, and then switch to the consensus as soon as

possible.) But clients whose consensuses have expired will choose new

fallbacks at random.

Ideally, clients should select fallback (and maybe authority) guards.

That is, they should retry previously-selected fallbacks. There are some

tradeoffs here: a bad fallback guard can continue to manipulate its

client's view of the network. We can avoid this issue by selecting multiple

fallback guards.

Clients will need persistent state to remember their guards, so transient

systems like TAILS won't benefit from this change.

teor

----------------------------------------------------------------------