Proposal: GETINFO controller option for connection information

Tue Jul 27 14:05:09 UTC 2010

Hi, I've made another addition to the get-info proposal for a piece of
information Sebastian pointed out at the tor dev meeting. It consists of a
GETINFO option and corresponding events for daily aggregated statistics for
the ports we're exiting to (only for exit relays). This would use the
feature we already have for the torrc option:

ExitPortStatistics
When this option is enabled, Tor writes statistics on the number of relayed
bytes and
opened stream per exit port to disk every 24 hours. Cannot be changed while
Tor is
running. (Default: 0)

Cheers! -Damian

On Sat, Jul 10, 2010 at 2:43 PM, Damian Johnson <atagar1 at gmail.com> wrote:

> Here's an alternate proposal...
>>
>
> Yup, that removes a lot of ambiguity and works perfectly for this purpose
> so changed. Thanks!
>
>
> But fetching our own descriptor is kind of needless; we generated it
>> ourself, after all.  Is it not accessible via the control port?  We do
>> *have* it; it's a static variable in router.c.  What am I missing?
>>
>
> I'm not aware of it being available via GETINFO with the exception of
> running:
> "desc/id/<my fingerprint>"
> which I'm assuming means we get a stale version of our own descriptor if
> not fetching new descriptors.
>
> Regardless, lets drop any addition concerning descriptors. With the
> addition of micro-descriptors any usage I have for them will soon be
> irrelevant. Currently I'm just using descriptors for two purposes:
>
> - Our own descriptor for the observed bandwidth if...
>     ... our tor version is too old, lacking consensus entries with the
> measured bandwidth (a more meaningful stat)
>     ... or to figure out what the measured bandwidth stat represents. This
> usage is just a hack and will go away with:
> https://trac.torproject.org/projects/tor/ticket/1690
>
> - Descriptors of other relays for the exit policy, platform, and tor
> version. Micro descriptors are providing the first of these and the rest are
> hardly worth downloading the descriptors unless specifically requested.
>
> In other words I'll soon drop usage of descriptors except for fallback
> behavior in old tor versions... for which any new GETINFO options are
> useless anyways ;)
>
> Cheers! -Damian
>
>
> On Fri, Jul 9, 2010 at 6:38 PM, Nick Mathewson <nickm at freehaven.net>wrote:
>
>> On Tue, Jun 29, 2010 at 11:00 AM, Damian Johnson <atagar1 at gmail.com>
>> wrote:
>> >> Hm.  But we don't necessarily know this.  Our "are we client-facing"
>> >> tests are approximate, not certain, and the only way to tell whether
>> >> we're intermediate or exiting is to wait and see if we're told to
>> >> exit.
>> >
>> > Understood that client facing tests fail by design when dealing with
>> > bridges. In the case of the outbound connection component it sounds like
>> > we'll need to wait until we're either asked to extend the circuit or
>> exit
>> > before counting it as an 'established circuit' and reporting it.
>>
>> Or we could drop the false notion that "middle" or "exit" or "entry"
>> make a partition of established relay or_circuits.   (They aren't a
>> partition because: first, they don't cover or_circuits.  A circuit
>> that has been just extended to us can't be called a middle or an exit.
>>  Second, they aren't exclusive: a circuit that has been extended from
>> us and used as an exit can be called both a middle _and_ an exit.)
>> See below for another possibility.
>>
>> >> In fact, the leaky-pipe topology means that we're potentially
>> >> intermediate _and_ exiting on a single circuit.
>> >
>> > (to save other readers the googling, this means that clients can exit
>> the
>> > circuit prematurely, such as at a middle node if the exit policy permits
>> it)
>> >
>> > You're right, that would mess with the classification. If/when this is
>> > implemented I'd suggest adding anther classification for middle hops
>> when
>> > they're first used to exit traffic.
>>
>> It *IS* implemented server-side.  The clients just don't use it.
>>
>> (Since the point of this design is to safely expose accurate circuit
>> status info via the control port, we might as well try to expose the
>> possible states of current circuits, including states that don't
>> typically occur.  Otherwise, if some future weird client started using
>> them, you'd need to upgrade Tor *and* arm to get an accurate report.)
>>
>> > The goal of these type flags are to indicate to controllers which
>> circuits
>> > are sensitive and which are less so. In arm for instance most
>> information
>> > for client/exit connections are scrubbed. Indicating via a change of the
>> > circuits status (an UPDATE event) when this begins exiting traffic seems
>> > good enough to me.
>>
>> Here's an alternate proposal.  The idea of type flags is good, but
>> instead of the ones in your proposal, let's only use circuit type
>> flags that have an unambiguous meaning from the point of view of Tor's
>> spec and implementation.
>>
>> For instance,
>>   (E)ntry : a connection from a node that doesn't appear to be a Tor
>> server.
>>   E(X)it : has been used for at least one exit stream
>>   (R)elay : has been extended.
>>   Rende(Z)vous : is being used for a rendezvous point
>>   (I)ntroduction : is being used for a hidden service introduction
>>   (N)one of the above: none of the above have happened yet.
>>
>> These all have nice, clear-cut, easy-to-evaluate meanings, some of
>> which are mutually exclusive, and some of which aren't.
>>
>>  [...]
>> >> It sounds like it should be a torrc option saying "Don't stop
>> >> refetching descriptors when there's no network activity."  Actually,
>> >> do we have one of those already?
>> >
>> > Yup, we have FetchUselessDescriptors. However, setting this causes an
>> extra
>> > load on the directory authorities, hence the desire to be able to do
>> this a
>> > bit more selectively (for instance just fetching our own descriptor
>> every
>> > hour but letting the rest go stale).
>>
>> But fetching our own descriptor is kind of needless; we generated it
>> ourself, after all.  Is it not accessible via the control port?  We do
>> *have* it; it's a static variable in router.c.  What am I missing?
>>
>> yrs,
>> --
>> Nick
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20100727/ea392f2e/attachment.htm>
-------------- next part --------------
Filename: xxx-circ-getinfo-option.txt
Title: GETINFO controller option for circuit information
Author: Damian Johnson
Created: 03-June-2010
Status: Draft

Overview:

    This details an additional GETINFO option that would provide information
    concerning a relay's current circuits.

Motivation:

    The original proposal was for connection related information, but Jake make
    the excellent point that any information retrieved from the control port
    is...

      1. completely ineffectual for auditing purposes since either (a) these
      results can be fetched from netstat already or (b) the information would
      only be provided via tor and can't be validated.

      2. The more useful uses for connection information can be achieved with
      much less (and safer) information.

    Hence the proposal is now for circuit based rather than connection based
    information. This would strip the most controversial and sensitive data
    entirely (ip addresses, ports, and connection based bandwidth breakdowns)
    while still being useful for the following purposes:

    - Basic Relay Usage Questions
    How is the bandwidth I'm contributing broken down? Is it being evenly
    distributed or is someone hogging most of it? Do these circuits belong to
    the hidden service I'm running or something else? Now that I'm using exit
    policy X am I desirable as an exit, or are most people just using me as a
    relay?

    - Debugging
    Say a relay has a restrictive firewall policy for outbound connections,
    with the ORPort whitelisted but doesn't realize that tor needs random high
    ports. Tor would report success ("your orport is reachable - excellent")
    yet the relay would be nonfunctional. This proposed information would
    reveal numerous RELAY -> YOU -> UNESTABLISHED circuits, giving a good
    indicator of what's wrong.

    - Visualization
    A nice benefit of visualizing tor's behavior is that it becomes a helpful
    tool in puzzling out how tor works. For instance, tor spawns numerous
    client connections at startup (even if unused as a client). As a newcomer
    to tor these asymmetric (outbound only) connections mystified me for quite
    a while until until Roger explained their use to me. The proposed
    TYPE_FLAGS would let controllers clearly label them as being client
    related, making their purpose a bit clearer.

    At the moment connection data can only be retrieved via commands like
    netstat, ss, and lsof. However, providing an alternative via the control
    port provides several advantages:

      - scrubbing for private data
          Raw connection data has no notion of what's sensitive and what is
          not. The relay's flags and cached consensus can be used to take
          educated guesses concerning which connections could possibly belong
          to client or exit traffic, but this is both difficult and inaccurate.
          Anything provided via the control port can scrubbed to make sure we
          aren't providing anything we think relay operators should not see.

      - additional information
          All connection querying commands strictly provide the ip address and
          port of connections, and nothing else. However, for the uses listed
          above the far more interesting attributes are the circuit's type,
          bandwidth usage and uptime.

      - improved performance
          Querying connection data is an expensive activity, especially for
          busy relays or low end processors (such as mobile devices). Tor
          already internally knows its circuits, allowing for vastly quicker
          lookups.

      - cross platform capability
          The connection querying utilities mentioned above not only aren't
          available under Windows, but differ widely among different *nix
          platforms. FreeBSD in particular takes a very unique approach,
          dropping important options from netstat and assigning ss to a
          spreadsheet application instead. A controller interface, however,
          would provide a uniform means of retrieving this information.

Security Implications:

    This is an open question. This proposal lacks the most controversial pieces
    of information (ip addresses and ports) and insight into potential threats
    this would pose would be very welcomed!

Specification:

   The following addition would be made to the control-spec's GETINFO section:

  "rcirc/id/<Circuit identity>" -- Provides entry for the associated relay
    circuit, formatted as:
      CIRC_ID=<circuit ID> CREATED=<timestamp> UPDATED=<timestamp> TYPE=<flag>
        READ=<bytes> WRITE=<bytes>

    none of the parameters contain whitespace, and additional results must be
    ignored to allow for future expansion. Parameters are defined as follows:
      CIRC_ID - Unique numeric identifier for the circuit this belongs to.
      CREATED - Unix timestamp (as seconds since the Epoch) for when the
          circuit was created.
      UPDATED - Unix timestamp for when this information was last updated.
      TYPE - Single character flags indicating attributes in the circuit:
          (E)ntry : has a connection that doesn't belong to a known Tor server,
            indicating that this is either the first hop or bridged
          E(X)it : has been used for at least one exit stream
          (R)elay : has been extended
          Rende(Z)vous : is being used for a rendezvous point
          (I)ntroduction : is being used for a hidden service introduction
          (N)one of the above: none of the above have happened yet.
      READ - Total bytes transmitted toward the exit over the circuit.
      WRITE - Total bytes transmitted toward the client over the circuit.

  "rcirc/all" -- The 'rcirc/id/*' output for all current circuits, joined by
    newlines.

   The following would be included for circ info update events.

4.1.X. Relay circuit status changed

  The syntax is:
     "650" SP "RCIRC" SP CircID SP Notice [SP Created SP Updated SP Type SP
          Read SP Write] CRLF

     Notice =
            "NEW"    / ; first information being provided for this circuit
            "UPDATE" / ; update for a previously reported circuit
            "CLOSED"   ; notice that the circuit no longer exists

  Notice indicating that queryable information on a relay related circuit has
  changed. If the Notice parameter is either "NEW" or "UPDATE" then this
  provides the same fields that would be given by calling "GETINFO rcirc/id/"
  with the CircID.
-------------- next part --------------
Filename: xxx-getinfo-option-expansion.txt
Title: GETINFO Option Expansion
Author: Damian Johnson
Created: 02-June-2010
Status: Draft

Overview:

    Over the course of developing arm there's been numerous hacks and
    workarounds to gleam pieces of basic, desirable information about the tor
    process. As per Roger's request I've compiled a list of these pain points
    to try and improve the control protocol interface.

Motivation:

    The purpose of this proposal is to expose additional process and relay
    related information that is currently unavailable in a convenient,
    dependable, and/or platform independent way. Examples of this are...

      - The relay's total contributed bandwidth. This is a highly requested
        piece of information and, based on the following patch from pipe, looks
        trivial to include.
        http://www.mail-archive.com/or-talk@freehaven.net/msg13085.html

      - The process ID of the tor process. There is a high degree of guess work
        in obtaining this. Arm for instance uses pidof, netstat, and ps yet
        still fails on some platforms, and Orbot recently got a ticket about
        its own attempt to fetch it with ps:
        https://trac.torproject.org/projects/tor/ticket/1388

    This just includes the pieces of missing information I've noticed
    (suggestions or questions of their usefulness are welcome!).

Security Implications:

    None that I'm aware of. From a security standpoint this seems decently
    innocuous.

Specification:

    The following addition would be made to the control-spec's GETINFO section:

    "relay/bw-limit" -- Effective relayed bandwidth limit.

    "relay/burst-limit" -- Effective relayed burst limit.

    "relay/read-total" -- Total bytes relayed (download).

    "relay/write-total" -- Total bytes relayed (upload).

    "relay/flags" -- Space separated listing of flags currently held by the
    relay as repored by the currently cached consensus.

    "process/user" -- Username under which the tor process is running,
    providing an empty string if none exists.

    "process/pid" -- Process id belonging to the main tor process, -1 if none
    exists for the platform.

    "process/uptime" -- Total uptime of the tor process (in seconds).

    "process/uptime-reset" -- Time since last reset (startup, sighup, or RELOAD
    signal, in seconds).

    "process/descriptors-used" -- Count of file descriptors used.

    "process/descriptor-limit" -- File descriptor limit (getrlimit results).

    "ns/authority" -- Router status info (v2 directory style) for all
    recognized directory authorities, joined by newlines.

    "state/names" -- A space-separated list of all the keys supported by this
    version of Tor's state.

    "state/val/<key>" -- Provides the current state value belonging to the
    given key. If undefined, this provides the key's default value.

    "status/ports-seen" -- A summary of which ports we've seen connections
    circuits connect to recently, formatted the same as the EXITS_SEEN status
    event described in Section 4.1.XX. This GETINFO option is currently
    available only for exit relays.

4.1.XX. Per-port exit stats

  The syntax is:
     "650" SP "EXITS_SEEN" SP TimeStarted SP PortSummary CRLF

  We just generated a new summary of which ports we've seen exiting circuits
  connecting to recently. The controller could display this for the user, e.g.
  in their "relay" configuration window, to give them a sense of how they're
  being used (popularity of the various ports they exit to). Currently only
  exit relays will receive this event.

  TimeStarted is a quoted string indicating when the reported summary
  counts from (in GMT).

  The PortSummary keyword has as its argument a comma-separated, possibly
  empty set of "port=count" pairs. For example (without linebreak),
  650-EXITS_SEEN TimeStarted="2008-12-25 23:50:43"
  PortSummary=80=16,443=8