[tor-commits] [torspec/master] Four new proposals based on experiments with download size

nickm at torproject.org nickm at torproject.org
Fri Feb 24 16:26:31 UTC 2017


commit 2e5e0cb3f87f6813b789f09459daea6ebcaa4eb4
Author: Nick Mathewson <nickm at torproject.org>
Date:   Fri Feb 24 11:23:31 2017 -0500

    Four new proposals based on experiments with download size
---
 proposals/000-index.txt                      |   8 ++
 proposals/274-rotate-onion-keys-less.txt     | 113 +++++++++++++++++++++++++
 proposals/275-md-published-time-is-silly.txt | 119 +++++++++++++++++++++++++++
 proposals/276-lower-bw-granularity.txt       |  70 ++++++++++++++++
 proposals/277-detect-id-sharing.txt          |  59 +++++++++++++
 5 files changed, 369 insertions(+)

diff --git a/proposals/000-index.txt b/proposals/000-index.txt
index 4e400c8..d3a4100 100644
--- a/proposals/000-index.txt
+++ b/proposals/000-index.txt
@@ -194,6 +194,10 @@ Proposals by number:
 271  Another algorithm for guard selection [CLOSED]
 272  Listed routers should be Valid, Running, and treated as such [FINISHED]
 273  Exit relay pinning for web services [DRAFT]
+274  Rotate onion keys less frequently [OPEN]
+275  Stop including meaningful "published" time in microdescriptor consensus [OPEN]
+276  Report bandwidth with lower granularity in consensus documents [OPEN]
+277  Detect multiple relay instances running with same ID [OPEN]
 
 
 Proposals by status:
@@ -249,6 +253,10 @@ Proposals by status:
    256  Key revocation for relays and authorities
    261  AEZ for relay cryptography
    262  Re-keying live circuits with new cryptographic material
+   274  Rotate onion keys less frequently [for 0.3.1.x-alpha]
+   275  Stop including meaningful "published" time in microdescriptor consensus [for 0.3.1.x-alpha]
+   276  Report bandwidth with lower granularity in consensus documents [for 0.3.1.x-alpha]
+   277  Detect multiple relay instances running with same ID [for 0.3.??]
  ACCEPTED:
    140  Provide diffs between consensuses
    172  GETINFO controller option for circuit information
diff --git a/proposals/274-rotate-onion-keys-less.txt b/proposals/274-rotate-onion-keys-less.txt
new file mode 100644
index 0000000..0d61d5d
--- /dev/null
+++ b/proposals/274-rotate-onion-keys-less.txt
@@ -0,0 +1,113 @@
+Filename: 274-rotate-onion-keys-less.txt
+Title: Rotate onion keys less frequently.
+Author: Nick Mathewson
+Created: 20-Feb-2017
+Status: Open
+Target: 0.3.1.x-alpha
+
+1. Overview
+
+   This document proposes that, in order to limit the bandwidth needed
+   for microdescriptor listing and transmission, we reduce the onion key
+   rotation rate from the current value (7 days) to something closer to
+   28 days.
+
+   Doing this will reduce the total microdescriptor download volume
+   by approximately 70%.
+
+2. Motivation
+
+   Currently, clients must download a networkstatus consensus document
+   once an hour, and must download every unfamiliar microdescriptor
+   listed in that document.  Therefore, we can reduce client directory
+   bandwidth if we can cause microdescriptors to change less often.
+
+   Furthermore, we are planning (in proposal 140) to implement a
+   diff-based mechanism for clients to download only the parts of each
+   consensus that have changed.  If we do that, then by having the
+   microdescriptor for each router change less often, we can make these
+   consensus diffs smaller as well.
+
+3. Analysis
+
+   I analyzed microdescriptor changes over the month of January
+   2017, and found that 94.5% of all microdescriptor transitions
+   were changes in onion key alone.
+
+   Therefore, we could reduce the number of changed "m" lines in
+   consensus diffs by approximately 94.5% * (3/4) =~ 70%,
+   if we were to rotate onion keys one-fourth as often.
+
+   The number of microdescriptors to actually download should
+   decrease by a similar number.
+
+   This amount to a significant reduction: currently, by
+   back-of-the-envelope estimates, an always-on client that downloads
+   all the directory info in a month downloads about 449MB of compressed
+   consensuses and something around 97 MB of compressed
+   microdescriptors.  This proposal would save that user about 12% of
+   their total directory bandwidth.
+
+   If we assume that consensus diffs are implemented (see proposal 140),
+   then the user's compressed consensus downloads fall to something
+   closer to 27 MB.  Under that analysis, the microdescriptors will
+   dominate again at 97 MB -- so lowering the number of microdescriptors
+   to fetch would save more like 55% of the remaining bandwidth.
+
+   [Back-of-the-envelope technique: assume every consensus is
+   downloaded, and every microdesc is downloaded, and microdescs are
+   downloaded in groups of 61, which works out to a constant rate.]
+
+   We'll need to do more analysis to assess the impact on clients that
+   connect to the network infrequently enough to miss microdescriptors:
+   nonetheless, the 70% figure above ought to apply to clients that connect
+   at least weekly.
+
+   (XXXX Better results pending feedback from ahf's analysis.)
+
+4. Security analysis
+
+   The onion key is used to authenticate a relay to a client when the
+   client is building a circuit through that relay.  The only reason to
+   limit their lifetime is to limit the impact if an attacker steals an
+   onion key without being detected.
+
+   If an attacker steals an onion key and is detected, the relay can
+   issue a new onion key ahead of schedule, with little disruption.
+
+   But if the onion key theft is _not_ detected, then the attacker
+   can use that onion key to impersonate the relay until clients
+   start using the relay's next key.  In order to do so, the
+   attacker must also impersonate the target relay at the link
+   layer: either by stealing the relay's link keys, which rotate
+   more frequently, or by compromising the previous relay in the
+   circuit.
+
+   Therefore, onion key rotation provides a small amount of
+   protection only against an attacker who can compromise relay keys
+   very intermittently, and who controls only a small portion of the
+   network.  Against an attacker who can steal keys regularly it
+   does little, and an attacker who controls a lot of the network
+   can already mount other attacks.
+
+5. Proposal
+
+   I propose that we move the default onion key rotation interval
+   from 7 days to 28 days, as follows.
+
+   There should be a new consensus parameter, "onion-key-rotation-days",
+   measuring the key lifetime in days.  Its minimum should be 1, its
+   maximum should be 90, and its default should be 28.
+
+   There should also be a new consensus parameter,
+   "onion-key-grace-period-days", measuring the interval for which
+   older onion keys should still be accepted.  Its minimum should be
+   1, its maximum should be onion-key-rotation-days, and its default
+   should be 7.
+
+   Every relay should list each onion key it generates for
+   onion-key-rotation-days days after generating it, and then
+   replace it.  Relays should continue to accept their most recent
+   previous onion key for an additional onion-key-rotation-days days
+   after it is replaced.
+
diff --git a/proposals/275-md-published-time-is-silly.txt b/proposals/275-md-published-time-is-silly.txt
new file mode 100644
index 0000000..b23e747
--- /dev/null
+++ b/proposals/275-md-published-time-is-silly.txt
@@ -0,0 +1,119 @@
+Filename: 275-md-published-time-is-silly.txt
+Title: Stop including meaningful "published" time in microdescriptor consensus
+Author: Nick Mathewson
+Created: 20-Feb-2017
+Status: Open
+Target: 0.3.1.x-alpha
+
+1. Overview
+
+   This document proposes that, in order to limit the bandwidth needed
+   for networkstatus diffs, we remove "published" part of the "r" lines
+   in microdescriptor consensuses.
+
+   The more extreme, compatibility-breaking version of this idea will
+   reduce ed consensus diff download volume by approximately 55-75%.  A
+   less-extreme interim version would still reduce volume by
+   approximately 5-6%.
+
+2. Motivation
+
+   The current microdescriptor consensus "r" line format is:
+     r Nickname Identity Published IP ORPort DirPort
+   as in:
+     r moria1 lpXfw1/+uGEym58asExGOXAgzjE 2017-01-10 07:59:25 \
+        128.31.0.34 9101 9131
+
+   As I'll show below, there's not much use for the "Published" part
+   of these lines.  By omitting them or replacing them with
+   something more compressible, we can save space.
+
+   What's more, changes in the Published field are one of the most
+   frequent changes between successive networkstatus consensus
+   documents.  If we were to remove this field, then networkstatus diffs
+   (see proposal 140) would be smaller.
+
+3. Compatibility notes
+
+   Above I've talked about "removing" the published field.  But of
+   course, doing this would make all existing consensus consumers
+   stop parsing the consensus successfully.
+
+   Instead, let's look at how this field is used currently in Tor,
+   and see if we can replace the value with something else.
+
+      * Published is used in the voting process to decide which
+        descriptor should be considered.  But that is takend from
+        vote networkstatus documents, not consensuses.
+
+      * Published is used in mark_my_descriptor_dirty_if_too_old()
+        to decide whether to upload a new router descriptor.  If the
+        published time in the consensus is more than 18 hours in the
+        past, we upload a new descriptor.  (Relays are potentially
+        looking at the microdesc consensus now, since #6769 was
+        merged in 0.3.0.1-alpha.)  Relays have plenty of other ways
+        to notice that they should upload new descriptors.
+
+      * Published is used in client_would_use_router() to decide
+        whether a routerstatus is one that we might possibly use.
+        We say that a routerstatus is not usable if its published
+        time is more than OLD_ROUTER_DESC_MAX_AGE (5 days) in the
+        past, or if it is not at least
+        TestingEstimatedDescriptorPropagationTime (10 minutes) in
+        the future. [***] Note that this is the only case where anything
+        is rejected because it comes from the future.
+
+          * client_would_use_router() decides whether we should
+            download a router descriptor (not a microdescriptor)
+            in routerlist.c
+
+          * client_would_use_router() is used from
+            count_usable_descriptors() to decide which relays are
+            potentially usable, thereby forming the denominator of
+            our "have descriptors / usable relays" fraction.
+
+   So we have a fairly limited constraints on which Published values
+   we can safely advertize with today's Tor implementations.  If we
+   advertise anything more than 10 minutes in the future,
+   client_would_use_router() will consider routerstatuses unusable.
+   If we advertize anything more than 18 hours in the past, relays
+   will upload their descriptors far too often.
+
+4. Proposal
+
+   Immediately, in 0.2.9.x-stable (our LTS release series), we
+   should stop caring about published_on dates in the future.  This
+   is a two-line change.
+
+   As an interim solution: We should add a new consensus method number
+   that changes the process by which Published fields in consensuses are
+   generated.  It should set all all Published fields in the consensus
+   should be the same value.  These fields should be taken to rotate
+   every 15 hours, by taking consensus valid-after time, and rounding
+   down to the nearest multiple of 15 hours since the epoch.
+
+   As a longer-term solution: Once all Tor versions earlier than 0.2.9.x
+   are obsolete (in mid 2018), we can update with a new consensus
+   method, and set the published_on date to some safe time in the
+   future.
+
+5. Analysis
+
+   To consider the impact on consensus diffs: I analyzed consensus
+   changes over the month of January 2017, using scripts at [1].
+
+   With the interim solution in place, compressed diff sizes fell by
+   2-7% at all measured intervals except 12 hours, where they increased
+   by about 4%.  Savings of 5-6% were most typical.
+
+   With the longer-term solution in place, and all published times held
+   constant permanently, the compressed diff sizes were uniformly at
+   least 56% smaller.
+
+   With this in mind, I think we might want to only plan to support the
+   longer-term solution.
+
+    [1] https://github.com/nmathewson/consensus-diff-analysis
+
+
+
diff --git a/proposals/276-lower-bw-granularity.txt b/proposals/276-lower-bw-granularity.txt
new file mode 100644
index 0000000..4d3735c
--- /dev/null
+++ b/proposals/276-lower-bw-granularity.txt
@@ -0,0 +1,70 @@
+Filename: 276-lower-bw-granularity.txt
+Title: Report bandwidth with lower granularity in consensus documents
+Author: Nick Mathewson
+Created: 20-Feb-2017
+Status: Open
+Target: 0.3.1.x-alpha
+
+1. Overview
+
+   This document proposes that, in order to limit the bandwidth needed for
+   networkstatus diffs, we lower the granularity with which bandwidth is
+   reported in consensus documents.
+
+   Making this change will reduce the total compressed ed diff download
+   volume by around 10%.
+
+2. Motivation
+
+   Consensus documents currently report bandwidth values as the median
+   of the measured bandwidth values in the votes.  (Or as the median of
+   all votes' values if there are not enough measurements.)  And when
+   voting, in turn, authorities simply report whatever measured value
+   they most recently encountered, clipped to 3 significant base-10
+   figures.
+
+   This means that, from one consensus to the next, these weights very
+   often and with little significance:  A large fraction of bandwidth
+   transitions are under 2% in magnitude.
+
+   As we begin to use consensus diffs, each change will take space to
+   transmit.  So lowering the amount of changes will lower client
+   bandwidth requirements significantly.
+
+3. Proposal
+
+   I propose that we round the bandwidth values as they are placed in
+   the votes to two no more than significant digits.  In addition, for
+   values beginning with decimal "2" through "4", we should round the
+   first two digits the nearest multiple of 2.  For values beginning
+   with decimal "5" though "9", we should round to the nearest multiple
+   of 5.
+
+   This change does not require a consensus method; it will take effect
+   once enough authorities have upgraded.
+
+4. Analysis
+
+   The rounding proposed above will not round any value by more than
+   5%, so the overall impact on bandwidth balancing should be small.
+
+   In order to assess the bandwidth savings of this approach, I
+   smoothed the January 2017 consensus documents' Bandwidth fields,
+   using scripts from [1].  I found that if clients download
+   consensus diffs once an hour, they can expect 11-13% mean savings
+   after xz or gz compression.  For two-hour intervals, the savings
+   is 8-10%; for three-hour or four-hour intervals, the savings only
+   is 6-8%.  After that point, we start seeing diminishing returns,
+   with only 1-2% savings on a 72-hour interval's diff.
+
+    [1] https://github.com/nmathewson/consensus-diff-analysis
+
+5. Open questions:
+
+   Is there a greedier smoothing algorithm that would produce better
+   results?
+
+   Is there any reason to think this amount of smoothing would not
+   be save?
+
+   Would a time-aware smoothing mechanism work better?
diff --git a/proposals/277-detect-id-sharing.txt b/proposals/277-detect-id-sharing.txt
new file mode 100644
index 0000000..dee7f6e
--- /dev/null
+++ b/proposals/277-detect-id-sharing.txt
@@ -0,0 +1,59 @@
+Filename: 277-detect-id-sharing.txt
+Title: Detect multiple relay instances running with same ID.
+Author: Nick Mathewson
+Created: 20-Feb-2017
+Status: Open
+Target: 0.3.??
+
+1. Overview
+
+   This document proposes that we detect multiple relay instances running
+   with the same ID, and block them all, or block all but one of each.
+
+2. Motivation
+
+   While analyzing microdescriptor and relay status transitions (see
+   proposal XXXX), I found that something like 16/10631 router
+   identities from January 2017 were apparently shared by two or
+   more relays, based on their excessive number of onion key
+   transitions.  This is probably accidental: and if intentional,
+   it's probably not achieving whatever the relay operators
+   intended.
+
+   Sharing identities causes all the relays in question to "flip" back
+   and forth onto the network, depending on which one uploaded its
+   descriptor most recently.  One relay's address will be listed; and
+   so will that relay's onion key.  Routers connected to one of the
+   other relays will believe its identity, but be suspicious of its
+   address.  Attempts to extend to the relay will fail because of the
+   incorrect onion key.  No more than one of the relays' bandwidths will
+   actually get significant use.
+
+   So clearly, it would be best to prevent this.
+
+3. Proposal 1: relay-side detection
+
+   Relays should themselves try to detect whether another relay is using
+   its identity.  If a relay, while running, finds that it is listed in
+   a fresh consensus using an onion key other than its current or
+   previous onion key, it should tell its operator about the problem.
+
+   (This proposal borrows from Mike Perry's ideas related to key theft
+   detection.)
+
+4. Proposal 2: offline detection
+
+   Any relay that has a large number of onion-key transitions over time,
+   but only a small number of distinct onion keys, is probably two or
+   more relays in conflict with one another.
+
+   In this case, the operators can be contacted, or the relay
+   blacklisted.
+
+   We could build support for blacklisting all but one of the addresses,
+   but it's probably best to treat this as a misconfiguratino serious
+   enough that it needs to be resolved.
+
+
+
+



More information about the tor-commits mailing list