[tor-commits] [stem/master] Rewriting the descriptor tutorial

atagar at torproject.org atagar at torproject.org
Mon Mar 4 08:17:57 UTC 2013


commit a86d250b6702f7dae9ed74778a6e0a5cb2f9358a
Author: Damian Johnson <atagar at torproject.org>
Date:   Sun Mar 3 23:11:30 2013 -0800

    Rewriting the descriptor tutorial
    
    Replacing the 'Mirror, Mirror' tutorial with a new one that gives a better
    overview of the various descriptors and how to get/use them. This keeps the old
    example (listing the fastest exits), but otherwise is a full rewrite.
---
 docs/tutorial/mirror_mirror_on_the_wall.rst   |  167 +++++++++++++++++++------
 docs/tutorial/the_little_relay_that_could.rst |    2 +-
 2 files changed, 132 insertions(+), 37 deletions(-)

diff --git a/docs/tutorial/mirror_mirror_on_the_wall.rst b/docs/tutorial/mirror_mirror_on_the_wall.rst
index d9a4dd8..3679ec1 100644
--- a/docs/tutorial/mirror_mirror_on_the_wall.rst
+++ b/docs/tutorial/mirror_mirror_on_the_wall.rst
@@ -1,47 +1,159 @@
 Mirror Mirror on the Wall
--------------------------
+=========================
 
-A script that tells us our contributed bandwidth is neat and all, but now let's figure out who the *biggest* exit relays are.
+* :ref:`what-is-a-descriptor`
+* :ref:`where-can-i-get-the-current-descriptors`
+* :ref:`where-can-i-get-past-descriptors`
+* :ref:`putting-it-together`
 
-Information about the Tor relay network come from documents called **descriptors**. Descriptors can come from a few things...
+.. _what-is-a-descriptor:
 
-1. The Tor control port with GETINFO options like **desc/all-recent** and **ns/all**.
-2. Files in Tor's data directory, like **cached-descriptors** and **cached-consensus**.
-3. The descriptor archive on `Tor's metrics site <https://metrics.torproject.org/data.html>`_.
+What is a descriptor?
+---------------------
 
-We've already used the control port, so for this example we'll use the cached files directly. First locate Tor's data directory. If your torrc has a DataDirectory line then that's the spot. If not then check Tor's man page for the default location.
+Tor is made up of two parts: the application and a distributed network of a few
+thousand volunteer relays. Information about these relays is public, and made
+up of documents called **descriptors**.
 
-Tor has several descriptor types. For bandwidth information we'll go to the server descriptors, which are located in the **cached-descriptors** file. These have somewhat infrequently changing information published by the relays themselves.
+There are several different kinds of descriptors, the most common ones being...
 
-To read this file we'll use the :class:`~stem.descriptor.reader.DescriptorReader`, a class designed to read descriptor files. The **cached-descriptors** is full of server descriptors, so the reader will provide us with :class:`~stem.descriptor.server_descriptor.RelayDescriptor` instances (a :class:`~stem.descriptor.server_descriptor.ServerDescriptor` subclass for relays).
+====================================================================== ===========
+Descriptor Type                                                        Description
+====================================================================== ===========
+`Server Descriptor <../api/descriptor/server_descriptor.html>`_        Information that relays publish about themselves. Tor clients once downloaded this information, but now they use microdescriptors instead.
+`ExtraInfo Descriptor <../api/descriptor/extrainfo_descriptor.html>`_  Relay information that tor clients do not need in order to function. This is self-published, like server descriptors, but not downloaded by default.
+`Microdescriptor <../api/descriptor/microdescriptor.html>`_            Minimalistic document that just includes the information necessary for tor clients to work.
+`Network Status Document <../api/descriptor/networkstatus.html>`_      Though tor relays are decentralized, the directories that track the overall network are not. These central points are called **directory authorities**, and every hour they publish a document called a **consensus** (aka, network status document). The consensus in turn is made up of **router status entries**.
+`Router Status Entry <../api/descriptor/router_status_entry.html>`_    Relay information provided by the directory authorities including flags, heuristics used for relay selection, etc.
+====================================================================== ===========
+
+.. _where-can-i-get-the-current-descriptors:
+
+Where can I get the current descriptors?
+----------------------------------------
+
+To work tor needs to have up-to-date information about relays within the
+network. As such getting current descriptors is easy: *just run tor*.
+
+Tor only gets the descriptors that it needs by default, so if you're scripting
+against tor you may want to set some of the following in your `torrc
+<https://www.torproject.org/docs/faq.html.en#torrc>`_. Keep in mind that these
+add a small burden to the network, so don't set them in a widely distributed
+application. And, of course, please consider `running tor as a relay
+<https://www.torproject.org/docs/tor-doc-relay.html.en>`_ so you give back to
+the network!
+
+::
+
+  # Descriptors have a range of time during which they're valid. To get the
+  # most recent descriptor information, regardless of if tor needs it or not,
+  # set the following.
+
+  FetchDirInfoExtraEarly 1
+
+  # If you aren't actively using tor as a client then tor will eventually stop
+  # downloading descriptor information that it doesn't need. To prevent this
+  # from happening set...
+
+  FetchUselessDescriptors 1
+
+  # Tor no longer downloads server descriptors by default, opting for
+  # microdescriptors instead. If you want tor to download server descriptors
+  # then set...
+
+  UseMicrodescriptors 0
+
+  # Tor doesn't need extrainfo descriptors to work. If you want tor to download
+  # them anyway then set...
+
+  DownloadExtraInfo 1
+
+Now that tor is happy chugging along up-to-date descriptors are available
+through tor's control socket...
+
+::
+
+  from stem.control import Controller
+
+  with Controller.from_port(control_port = 9051) as controller:
+    controller.authenticate()
+
+    for desc in controller.get_network_statuses():
+      print "found relay %s (%s)" % (desc.nickname, desc.fingerprint)
+
+... or by reading directly from tor's data directory...
+
+::
+
+  from stem.descriptor import parse_file
+
+  for desc in parse_file("/home/atagar/.tor/cached-consensus"):
+    print "found relay %s (%s)" % (desc.nickname, desc.fingerprint)
+
+.. _where-can-i-get-past-descriptors:
+
+Where can I get past descriptors?
+---------------------------------
+
+Descriptor archives are available on `Tor's metrics site
+<https://metrics.torproject.org/data.html>`_. These archives can be read with
+the `DescriptorReader <api/descriptor/reader.html>`_...
 
 ::
 
-  import sys
   from stem.descriptor.reader import DescriptorReader
+
+  with DescriptorReader(["/home/atagar/server-descriptors-2013-03.tar"]) as reader:
+    for desc in reader:
+      print "found relay %s (%s)" % (desc.nickname, desc.fingerprint)
+
+.. _putting-it-together:
+
+Putting it together...
+----------------------
+
+As discussed above there are three methods for reading descriptors...
+
+* With the :class:`~stem.control.Controller` via methods like :func:`~stem.control.Controller.get_server_descriptors` and :func:`~stem.control.Controller.get_network_statuses`.
+* By reading the file directly with :func:`~stem.descriptor.__init__.parse_file`.
+* Reading with the `DescriptorReader <api/descriptor/reader.html>`_. This is best if you have you want to read everything from a directory or archive.
+
+Now lets say you want to figure out who the *biggest* exit relays are. You
+could use any of the methods above, but for this example we'll use the
+:class:`~stem.control.Controller`. This uses server descriptors, so keep in
+mind that you'll likely need to set "UseMicrodescriptors 0" in your torrc for
+this to work.
+
+::
+
+  import sys
+
+  from stem.contorl import Controller
   from stem.util import str_tools
-  
+
   # provides a mapping of observed bandwidth to the relay nicknames
   def get_bw_to_relay():
     bw_to_relay = {}
-    
-    with DescriptorReader(["/home/atagar/.tor/cached-descriptors"]) as reader:
-      for desc in reader:
+
+    with Controller.from_port(control_port = 9051) as controller:
+      controller.authenticate()
+
+      for desc in controller.get_server_descriptors():
         if desc.exit_policy.is_exiting_allowed():
           bw_to_relay.setdefault(desc.observed_bandwidth, []).append(desc.nickname)
-    
+
     return bw_to_relay
-  
+
   # prints the top fifteen relays
-  
+
   bw_to_relay = get_bw_to_relay()
   count = 1
-  
+
   for bw_value in sorted(bw_to_relay.keys(), reverse = True):
     for nickname in bw_to_relay[bw_value]:
       print "%i. %s (%s/s)" % (count, nickname, str_tools.get_size_label(bw_value, 2))
       count += 1
-      
+
       if count > 15:
         sys.exit()
 
@@ -64,20 +176,3 @@ To read this file we'll use the :class:`~stem.descriptor.reader.DescriptorReader
   14. politkovskaja2 (24.93 MB/s)
   15. wau (24.72 MB/s)
 
-This can be easily done through the controller too...
-
-::
-
-  def get_bw_to_relay():
-    bw_to_relay = {}
-    
-    with Controller.from_port(control_port = 9051) as controller:
-      controller.authenticate()
-      
-      for desc in controller.get_server_descriptors():
-        if desc.exit_policy.is_exiting_allowed():
-          bw_to_relay.setdefault(desc.observed_bandwidth, []).append(desc.nickname)
-    
-    return bw_to_relay
-
-
diff --git a/docs/tutorial/the_little_relay_that_could.rst b/docs/tutorial/the_little_relay_that_could.rst
index e3f42d0..96216d1 100644
--- a/docs/tutorial/the_little_relay_that_could.rst
+++ b/docs/tutorial/the_little_relay_that_could.rst
@@ -1,5 +1,5 @@
 The Little Relay that Could
----------------------------
+===========================
 
 Let's say you just set up your very first `Tor relay
 <https://www.torproject.org/docs/tor-doc-relay.html.en>`_ (thank you!), and now





More information about the tor-commits mailing list