[or-cvs] [metrics/master] Bring TODO list up-to-date.

karsten at seul.org karsten at seul.org
Thu Jul 2 17:27:52 UTC 2009


Author: Karsten Loesing <karsten.loesing at gmx.net>
Date: Thu, 2 Jul 2009 19:27:11 +0200
Subject: Bring TODO list up-to-date.
Commit: 3c2a632d509bfc0ee8c89b1105a3759e42a6f2eb

---
 TODO |  298 ++++++++++++++++++++++++++++++++++++------------------------------
 1 files changed, 164 insertions(+), 134 deletions(-)

diff --git a/TODO b/TODO
index 40bb0bd..63ea1f3 100644
--- a/TODO
+++ b/TODO
@@ -11,43 +11,38 @@ Legend:
 
 Tasks for July:
 
- - Client requests to directories
+ . Client requests to directories
 *  o May 31: Change timing so that geoip-stats are written at the end of
 *    24-hours periods; get patch into master.
    o Ask folks to run with ./configure --enable-geoip-stats
-   - Also count number of IPs and requests that cannot be resolved by the
-     GeoIP database as country '??'.
 *  o June 30: Analyze geoip-stats files from folks. Come up with a more
 *    automated way of analyzing these files than before.
+   . Also count number of IPs and requests that cannot be resolved by the
+     GeoIP database as country '??'.
+   - Determine v2/v3 share as average over complete period, not only at the
+     end of a period.
+   - Try to find out why some directories write empty geoip-stats files.
+   - Try to find out what happened to trusted on June 22.
+   - Add a remark that distributions of clients to countries in Figure 11
+     are from different measurement intervals.
+   - Fix bug that directories see more IP addresses than requests (like
+     moria5 did on the first day of measurements). If we don't count
+     requests, we shouldn't count IP addresses, either. But first of all,
+     why are we not counting requests?
    - Figure out why estimations are that far off. Where is the flaw in the
      math?
-   d Find out why some directories write empty geoip-stats files.
-   d Find out what happened to trusted on June 22.
-*  - June 30: Write proposal for including client request statistics in
-*    extra-info documents. Consider adding the GeoIP database version
-*    string, too. Make sure that we don't miss a day in case directories
-*    fail to upload their descriptor.
-   d Unskew v2 requests by finding out what fraction of requests results in
-     503 depending on the directory's advertised bandwidth.
-   d Figure out if there are better GeoIP databases available that focus
-     more on small countries and that are still affordable.
-   d Try to estimate the number of concurrent Tor users from active
-     circuits and the probability of clients picking a relay for their
-     circuits.
-   d Investigate the algorithm in global_write_bucket_low() that contains
-     the priorization of some directory requests over others. This
-     algorithm was written when v1 was popular and v2 was new. Do the
-     conditions in that function require an update?
-   d Consider using a dynamic IP database to determine how many users are
-     on dynamic IP addresses.
-   d Investigate assumption that 1 IP address is equivalent to 1 user;
-     consider dynamic IP addresses and NAT, too.
-   d Fix bug that directories see more IP addresses than requests (like
-     moria5 did on the first day of measurements).
-   d Determine v2/v3 share as average over complete period, not as end
-     value.
+   - Add numbers about failed directory requests and timeouts to the stats.
+   - Gather statistics about download times, so that we can infer the
+     distribution of available bandwidth at clients, possibly per country.
+     For a first estimate, measure transmission times of directory
+     downloads using Apache as proxy. In the long term, measure
+     transmission times directly in Tor to include clients downloading via
+     the ORport and include aborted downloads (with a threshold of 50% or
+     2 minutes?).
+   - Analyze directory requests by end of July which are running for about
+     two months by then.
 
- - Cells in circuit queues
+ . Cells in circuit queues
    o Consider circuits that were active in the past 15 minutes rather than
      only those that were closed recently; otherwise long-running curcits
      are not considered adequately.
@@ -60,23 +55,153 @@ Tasks for July:
      Java, and combine app-ward and exit-ward parts of a circuit to a
      single value in Java. Does this result in similar results produced by
      Java as by Tor?
-*  - May 31: Prepare buffer-stats patch for inclusion in master.
    . Run on gabelmoo and have some other folks run the patch; analyze
      results for "1.2, new circuit window sizes. make the default package
      window lower" and "2.1, squeeze loud circuits" in performance roadmap.
+*  - May 31: Prepare buffer-stats patch for inclusion in master.
 *  - June 30: Decide if buffer statistics should be measured in the future
 *    in aggregate form.
-   d Write proposal to include measured network data in extra-info
+
+ . Clients connecting to entry nodes
+*  . May 31: Implement entry-guard statistics in Tor.
+   . Run patch on gabelmoo and a few other places.
+   - Get patch into master.
+*  - June 30: Decide if statistics should be measured in the future in
+*    aggregate form.
+
+ . Exiting traffic by port
+   o Write outgoing traffic by port to logs.
+   o Include patch in master.
+   - Ask 1 or 2 volunteers suggested by Roger to run the patch.
+*  - May 31: Run on one of Jake's nodes.
+   - Evaluate data together with Steven.
+*  - June 30: Decide if statistics should be measured in the future in
+*    aggregate form.
+
+ . Measure throughput
+   o Evaluate torperf run and see if such runs should be performed
+     regularly on gabelmoo.
+   o Handle timeouts better by catching INT signals and writing an error
+     message to stdout.
+   - Set up second torperf installation somewhere, possibly on moria.
+
+ . Directory archives
+   . Set up dedicated database server somewhere; goals are better
+     performance for complex queries and shared access for Steven et al.
+
+---------------------------------------------------------------------------
+
+Tasks for August:
+
+ - PETS 2009
+   - Prepare talk.
+*  - Aug 5-7: Attend conference, plus 4 days travelling.
+
+ - HAR 2009
+*  - Aug 13-16: Attend conference, plus 2 days travelling.
+
+ - Exiting traffic by port
+   - Write proposal to include measured network data in extra-info
+     documents.
+
+ - Alternative requirements for flags
+   - Display actual MTBF/WFU requirements for weakened requirements.
+     Evaluation has finished. Look at results and include them in report.
+*  - June 30: Write proposal for weakened requirements for being a Guard;
+*    see TODO.022.
+
+ - Client requests to directories
+*  - June 30: Write proposal for including client request statistics in
+*    extra-info documents. Consider adding the GeoIP database version
+*    string, too. Make sure that we don't miss a day in case directories
+*    fail to upload their descriptor.
+
+ - Cells in circuit queues
+   - Write proposal to include measured network data in extra-info
      documents.
-   d Extend statistics to medians, and 1st/9th deciles; problematic as
+
+ - Clients connecting to entry nodes
+   - Write proposal to include measured network data in extra-info
+     documents.
+
+---------------------------------------------------------------------------
+
+Tasks for September:
+
+ - Work organization
+   - Add medium and low priority items from Roger's performance mail to
+     this list, too. This list contains only the two high priority items.
+
+ - Directory archives
+   - Look at entropy of directory over the years. Right now the relay
+     choices are not uniform. It is way more likely that clients choose
+     fast relays than slow ones. If we re-normalize it, what is the
+     equivalent number of uniformly-weighted relays in the network?
+     mikeperry has some equations for this in his torflow, but it would be
+     interesting to see whether that number is going up over time, and how
+     it compares to number-of-relays and amount-of-bandwidth.
+   - How many of the German relays that have disappeared in 2008 were set
+     up at the end of 2007?
+   - Is the (major) reason for disappearing nodes in France in mid-2008
+     that OVH stopped supporting Tor relay operation?
+   - Examine bandwidth-per-relay ratios for various countries. Do changes
+     in bandwidth per country result from a few or a lot of relays joining
+     or leaving?
+   - Investigate very old Tor versions. Do these nodes have their contact
+     info set? A possible explanation for these nodes not being updated is
+     that they might run on nodes without knowledge of their owners.
+   - Compare descriptors collected on gabelmoo with those collected by
+     tor26. What fraction of descriptors is missing? Is it worth combining
+     both archives?
+   - Investigate whether the loss of German relays in 2008 was due to the
+     pervasive dynamic IP reachability testing bugs. How?
+   - Compare observed/history bandwidth by time of day to see if traffic is
+     underutilized at night and saturated during the day.
+   - For comparison of relays on dynamic IP addresses, don't count relays
+     that were up for only a short time; consider using a dynamic IP
+     database.
+   - Consider recording bandwidth usage on relays by putting 1 random
+     second of every 15-minute interval into extra-info documents, rather
+     than the sum of transported bytes. Suggestion by Roger/Steven.
+
+ - Tor exit list
+   - Permit queries whether a certain IP was an exit for a certain target
+     at a certain time.
+
+ - Client requests to directories
+   - Why do authorities (at least moria1 and moria2) see such a high
+     request-to-address ratio? Shouldn't clients ask at most once? A
+     possible explanation is that people are running Tor in a way where
+     their cache doesn't survive, maybe old-school Torpark variants or
+     something. Another explanation are people running relays that aren't
+     reachable so aren't ignored in the geoip stats. Further investigate.
+   - Unskew v2 requests by finding out what fraction of requests results in
+     503 depending on the directory's advertised bandwidth.
+   - Figure out if there are better GeoIP databases available that focus
+     more on small countries and that are still affordable.
+   - Try to estimate the number of concurrent Tor users from active
+     circuits and the probability of clients picking a relay for their
+     circuits. This only requires that we know how many circuits users
+     build on average. Hmm.
+   - Investigate the algorithm in global_write_bucket_low() that contains
+     the priorization of some directory requests over others. This
+     algorithm was written when v1 was popular and v2 was new. Do the
+     conditions in that function require an update?
+   - Consider using a dynamic IP database to determine how many users are
+     on dynamic IP addresses.
+   - Investigate assumption that 1 IP address is equivalent to 1 user;
+     consider dynamic IP addresses and NAT, too.
+
+ - Cells in circuit queues
+   - Extend statistics to medians, and 1st/9th deciles; problematic as
      these statistics require keeping more history on Tor relays.
-   d Also extend statistics to outbuffer sizes.
-   d Investigate classification of circuits on a relay: Do most circuits
+   - Also extend statistics to outbuffer sizes.
+   - Investigate classification of circuits on a relay: Do most circuits
      stay inactive, but a few become active, send their cells, become
      inactive, get new cells and become active, and keep oscillating? Or
      are there active circuits that just stay active for seconds at a time
      because they cannot clear their queue?
-   d Investigate timing of circuits flushing their queues: For relays that
+   - Investigate timing of circuits flushing their queues: For relays that
      rate limit, what fraction of each second do they spend with empty
      write buckets? The theory from earlier analyses is that for most
      relays that rate limit, they have a full second's worth of data queued
@@ -87,33 +212,12 @@ Tasks for July:
      reducing the granularity of the token bucket refills, so it sends
      bytes more regularly throughout the second; but first the theory needs
      to be confirmed.
-   d Another theory is that some relays refuse to read from a relay for
+   - Another theory is that some relays refuse to read from a relay for
      a period of multiple seconds. Can this be confirmed by the
      measurements?
-   d Instrument edge streams and how they add cells to their circuits, and
+   - Instrument edge streams and how they add cells to their circuits, and
      how they flush them on the socks side.
 
- - Clients connecting to entry nodes
-*  . May 31: Implement entry-guard statistics in Tor.
-*  - June 30: Decide if statistics should be measured in the future in
-*    aggregate form.
-   d Write proposal to include measured network data in extra-info
-     documents.
-
- . Exiting traffic by port
-   o Write outgoing traffic by port to logs.
-*  - May 31: Run on one of Jake's nodes, compare the results.
-*  - June 30: Decide if statistics should be measured in the future in
-*    aggregate form.
-   d Write proposal to include measured network data in extra-info
-     documents.
-
- - Alternative requirements for flags
-   - Display actual MTBF/WFU requirements for weakened requirements.
-     Evaluation has finished. Look at results and include them in report.
-*  - June 30: Write proposal for weakened requirements for being a Guard;
-*    see TODO.022.
-
  - Directory archives
 *  - June 30: Do guards that have had the guard flag for a long time (weeks
 *    or months) have more load than guards that just got their guard flag?
@@ -121,17 +225,6 @@ Tasks for July:
 *    the time a relay spent in the network with the Guard flag. (see 4.5 in
 *    performance roadmap)
 
- o Measure throughput
-   o Evaluate torperf run and see if such runs should be performed
-     regularly on gabelmoo.
-   o Handle timeouts better by catching INT signals and writing an error
-     message to stdout.
-   d Set up second installation somewhere.
-
- - Work organization
-   - Add medium and low priority items from Roger's performance mail to
-     this list, too. This list contains only the two high priority items.
-
  - Bridge archives
 *  - July 31: Investigate bridge churn to determine how many bridges users
 *    need.
@@ -155,7 +248,7 @@ Tasks for July:
  - Measure throughput
 *  - June 30: Evaluate speedracer results.
 *  - June 30: Passively measure throughput in Tor clients when configured.
-   d Improve usability so that non-developer users in countries like
+   - Improve usability so that non-developer users in countries like
      Tunesia can measure throughput themselves. This can be speedracer,
      torperf, or some other tool. Consider implementing as Vidalia plugin
      once the plugin infrastructure is in place.
@@ -178,73 +271,10 @@ Tasks for July:
 *    circuit window, and b) because Tor has on-average-better circuits
 *    based on Mike Perry's plans.
 
- - Client requests to directories
-   - Analyze directory requests running for more than two months by then.
-
----------------------------------------------------------------------------
-
-Tasks for August:
-
- - PETS 2009
-   - Prepare talk.
-*  - Aug 5-7: Attend conference, plus 4 days travelling.
-
- - HAR 2009
-*  - Aug 13-16: Attend conference, plus 2 days travelling.
-
- d Metrics portal
+ - Metrics portal
    - Write down architecture for TorStatus extension.
    - Implement extensions.
    - Load directory archives into MySQL database and optimize database
      schema so that evaluations are executed quickly.
 *  - August 31: Set up extended TorStatus.
 
----------------------------------------------------------------------------
-
-Tasks for September:
-
- - Directory archives
-   d Look at entropy of directory over the years. Right now the relay
-     choices are not uniform. It is way more likely that clients choose
-     fast relays than slow ones. If we re-normalize it, what is the
-     equivalent number of uniformly-weighted relays in the network?
-     mikeperry has some equations for this in his torflow, but it would be
-     interesting to see whether that number is going up over time, and how
-     it compares to number-of-relays and amount-of-bandwidth.
-   - How many of the German relays that have disappeared in 2008 were set
-     up at the end of 2007?
-   - Is the (major) reason for disappearing nodes in France in mid-2008
-     that OVH stopped supporting Tor relay operation?
-   - Examine bandwidth-per-relay ratios for various countries. Do changes
-     in bandwidth per country result from a few or a lot of relays joining
-     or leaving?
-   - Investigate very old Tor versions. Do these nodes have their contact
-     info set? A possible explanation for these nodes not being updated is
-     that they might run on nodes without knowledge of their owners.
-   - Compare descriptors collected on gabelmoo with those collected by
-     tor26. What fraction of descriptors is missing? Is it worth combining
-     both archives?
-   - Set up dedicated database server somewhere; goals are better
-     performance for complex queries and shared access for Steven et al.
-   - Investigate whether the loss of German relays in 2008 was due to the
-     pervasive dynamic IP reachability testing bugs. How?
-   - Compare observed/history bandwidth by time of day to see if traffic is
-     underutilized at night and saturated during the day.
-   - For comparison of relays on dynamic IP addresses, don't count relays
-     that were up for only a short time; consider using a dynamic IP
-     database.
-   - Consider recording bandwidth usage on relays by putting 1 random
-     second of every 15-minute interval into extra-info documents, rather
-     than the sum of transported bytes. Suggestion by Roger/Steven.
-
- - Client bandwidths
-   . For a first estimate, measure transmission times of directory
-     downloads using Apache as proxy.
-   - Measure transmission times directly in Tor to include clients
-     downloading via the ORport and to include aborted downloads (with a
-     threshold of 50% or 2 minutes?).
-
- - Tor exit list
-   - Permit queries whether a certain IP was an exit for a certain target
-     at a certain time.
-
-- 
1.5.6.5



More information about the tor-commits mailing list