Proposal: GETINFO controller option for connection information

Damian Johnson atagar1 at gmail.com
Thu Jun 24 05:34:58 UTC 2010


Hi Nick. Thanks for the comments!

* IN_TYPE/OUT_TYPE talk about the type of an inbound/outbound
> "connection."  Do you mean circuits, or connections on the circuits?
> Either way I'm confused.  For example, a control connection is never
> attached to a circuit at all.
>

Yea, that isn't really appropriate and was making the spec messier than it
needed to be. Replaced with a single TYPE parameter to indicate the
placement in the circuit (guard/bridge, relay, exit, or one-hop in case
they're allowing them).

This was a bad attempt to shoehorn certain connection information I thought
would be interesting, such as:
- control connections
- client circuits
- directory mirroring
- hidden service hosting

so users could tell the bandwidth usage on these sorts of connections.
However, while interesting this really has nothing to do with acting as a
relay, so dropped. Oh well... maybe in another proposal some day...

Dropped the split from the bandwidth measurements too so they're simpler.
Also, do you think something like this notice...
"These represent logical circuits, not necessarily network resources (which
might be shared between circuits via connection multiplexing)."
would help, or not?

* The spec addition isn't clear whether we mean all circuits, OR
> circuits, or what.  I believe it's "all circuits", but it should say
> so.
>

The "circ/all" entry already said "all current circuits" so I'm guessing you
meant the inclusiveness of "circ/id/*". Added 'relay' to the description,
but I'm not quite sure what sort of clarification you're looking for here.

* IN_TYPE/OUT_TYPE are specified as being an empty string if there are
> no connections.  That's kind of fragile for parsing, since it would
> mean that users could no longer fold multiple spaces into one like
> they can for all the other control protocol formats .
>

No longer relevant with the changes above.

* There's nothing specifying that "all" is not a valid identifier, so
> "circ/all" is ambiguous. To be consistent with the rest of the getinfo
> formats, let's say that the GETINFO key for a single circuit is
> circ/id/<circid>, and the GETINFO for all circuits is circ/all.
>

Good point! Changed. Also specified that the circuit ID is numeric.

* You clarified in your email that circ/all is a list of data in the
> form given by circ/<ID> , but you didn't make that clarification in
> your spec.
>

Changed (not sure about the wording though...).

* There's no way to get notifications about new OR circuits, so any
> program that wants to keep track of OR circuit state will need to
> repeatedly poll circ/all.  Won't that be expensive on a busy server?
> You wouldn't want to do it, say, once-per-second.  For consistency
> with the rest of the controller protocol, OR circuit state changes
> would probably want to be some kind of event.
>

Very good idea! Added an update parameter to entries so staleness can be
tracked. Also added an event for updates to the information.

Now on to the other proposal...

* By "relay/flags", what do you mean?  The flags held by the relay in
> the current consensus, or something else?
>

Yup. Clarified the entry.

* On "desc/time", why is it important to know the latest time we
> fetched a server descriptor?
>

There's some interesting information here such as your relay's observed
bandwidth. However, there's no way of telling how stale information provided
by the "desc/id/*" is, and since most relays stop fetching descriptors after
a time it's often hideously ancient.

What I'd really like an option to manually refresh descriptors (both
individually and as a batch call), however since I was aiming to keep this
proposal limited to GETINFO options it seemed inappropriate. Do you have any
thoughts on this sort of functionality? What section of the control-spec
would it go under?

* On "desc/time", 'unix timestamp' is ambiguous; do you mean 'a
> decimal integer expressing the current time in seconds since the Unix
> epoch' or what?  I think the only other places in the control protocol
> that express dates do so in ISO8601 format (YYYY-MM-DD HH:MM:SS,
> relative to UTC).
>

Ick! Why use that ISO8601 format? Seconds are much simpler and more easily
comparable.

Yes, I meant the decimal integer. Added clarification there and to the other
spec.

* On "process/user", the spec needs to say what you mean by the
> "user".  On Unix, is it a UID or a username?  What is it on windows?
> The spec needs to say.  (You can't just have the controller tell from
> context; on Unix at least, every valid decimal UID is also a valid
> username.  If process/user tells me '0', am I running as root, or as a
> user named "0"?)
>

Specified that it's the username. Doesn't windows xp on up have users now?
If not, it's an empty string.

* Also on process/user: If you meant "username", then there  should
> indeed be an option to get all the *id stuff discussed upthread.
>

I'm sure this is a dumb question but... why? I've never had use for the
numeric uid, let alone the three other varieties Jake mentioned (actually,
never heard of them before...). If you can think of use cases in which
they're useful then by all means include them. However, I don't particularly
care about that information.

... that said, I can see the appeal of including them simply for
completeness.

* Probably the spec needs to clarify that "process/pid" is the pid of
> the _main_ Tor process, but that Tor might launch other processes from
> time to time and you shouldn't be surprised if it does.
>

Done. A pox upon the complexities of identifying the damn process...

* Is "process/uptime-reset" affected by the SIGNAL command from the
> controller?  The spec should  say.
>

Done.

Thanks again! -Damian

On Wed, Jun 23, 2010 at 10:28 AM, Nick Mathewson <nickm at freehaven.net>wrote:

> On Fri, Jun 4, 2010 at 12:10 AM, Damian Johnson <atagar1 at gmail.com> wrote:
> > Hi Nick, thanks for the feedback!
>
> Hi, Damian!  We're almost there, I think, with just a few points to
> clarify.
>
> On xxx-circ-getinfo-option.txt:
>
> * The spec addition isn't clear whether we mean all circuits, OR
> circuits, or what.  I believe it's "all circuits", but it should say
> so.
>
> * IN_TYPE/OUT_TYPE talk about the type of an inbound/outbound
> "connection."  Do you mean circuits, or connections on the circuits?
> Either way I'm confused.  For example, a control connection is never
> attached to a circuit at all.
>
> * IN_TYPE/OUT_TYPE are specified as being an empty string if there are
> no connections.  That's kind of fragile for parsing, since it would
> mean that users could no longer fold multiple spaces into one like
> they can for all the other control protocol formats .
>
> * There's nothing specifying that "all" is not a valid identifier, so
> "circ/all" is ambiguous. To be consistent with the rest of the getinfo
> formats, let's say that the GETINFO key for a single circuit is
> circ/id/<circid>, and the GETINFO for all circuits is circ/all.
>
> * You clarified in your email that circ/all is a list of data in the
> form given by circ/<ID> , but you didn't make that clarification in
> your spec.
>
> * There's no way to get notifications about new OR circuits, so any
> program that wants to keep track of OR circuit state will need to
> repeatedly poll circ/all.  Won't that be expensive on a busy server?
> You wouldn't want to do it, say, once-per-second.  For consistency
> with the rest of the controller protocol, OR circuit state changes
> would probably want to be some kind of event.
>
>
> On  xxx-getinfo-option-expansion.txt :
>
> * By "relay/flags", what do you mean?  The flags held by the relay in
> the current consensus, or something else?
>
> * On "desc/time", why is it important to know the latest time we
> fetched a server descriptor?
>
> * On "desc/time", 'unix timestamp' is ambiguous; do you mean 'a
> decimal integer expressing the current time in seconds since the Unix
> epoch' or what?  I think the only other places in the control protocol
> that express dates do so in ISO8601 format (YYYY-MM-DD HH:MM:SS,
> relative to UTC).
>
> * On "process/user", the spec needs to say what you mean by the
> "user".  On Unix, is it a UID or a username?  What is it on windows?
> The spec needs to say.  (You can't just have the controller tell from
> context; on Unix at least, every valid decimal UID is also a valid
> username.  If process/user tells me '0', am I running as root, or as a
> user named "0"?)
>
> * Also on process/user: If you meant "username", then there  should
> indeed be an option to get all the *id stuff discussed upthread.
>
> * Probably the spec needs to clarify that "process/pid" is the pid of
> the _main_ Tor process, but that Tor might launch other processes from
> time to time and you shouldn't be surprised if it does.
>
> * Is "process/uptime-reset" affected by the SIGNAL command from the
> controller?  The spec should  say.
>
> yrs,
> --
> Nick
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20100623/d8d197c8/attachment.htm>
-------------- next part --------------
Filename: xxx-circ-getinfo-option.txt
Title: GETINFO controller option for circuit information
Author: Damian Johnson
Created: 03-June-2010
Status: Draft

Overview:

    This details an additional GETINFO option that would provide information
    concerning a relay's current circuits.

Motivation:

    The original proposal was for connection related information, but Jake make
    the excellent point that any information retrieved from the control port
    is...
    
      1. completely ineffectual for auditing purposes since either (a) these
      results can be fetched from netstat already or (b) the information would
      only be provided via tor and can't be validated.
      
      2. The more useful uses for connection information can be achieved with
      much less (and safer) information.
    
    Hence the proposal is now for circuit based rather than connection based
    information. This would strip the most controversial and sensitive data
    entirely (ip addresses, ports, and connection based bandwidth breakdowns)
    while still being useful for the following purposes:

    - Basic Relay Usage Questions
    How is the bandwidth I'm contributing broken down? Is it being evenly
    distributed or is someone hogging most of it? Do these circuits belong to
    the hidden service I'm running or something else? Now that I'm using exit
    policy X am I desirable as an exit, or are most people just using me as a
    relay?

    - Debugging
    Say a relay has a restrictive firewall policy for outbound connections,
    with the ORPort whitelisted but doesn't realize that tor needs random high
    ports. Tor would report success ("your orport is reachable - excellent")
    yet the relay would be nonfunctional. This proposed information would
    reveal numerous RELAY -> YOU -> UNESTABLISHED circuits, giving a good
    indicator of what's wrong.

    - Visualization
    A nice benefit of visualizing tor's behavior is that it becomes a helpful
    tool in puzzling out how tor works. For instance, tor spawns numerous
    client connections at startup (even if unused as a client). As a newcomer
    to tor these asymmetric (outbound only) connections mystified me for quite
    a while until until Roger explained their use to me. The proposed
    TYPE_FLAGS would let controllers clearly label them as being client
    related, making their purpose a bit clearer.

    At the moment connection data can only be retrieved via commands like
    netstat, ss, and lsof. However, providing an alternative via the control
    port provides several advantages:

      - scrubbing for private data
          Raw connection data has no notion of what's sensitive and what is
          not. The relay's flags and cached consensus can be used to take
          educated guesses concerning which connections could possibly belong
          to client or exit traffic, but this is both difficult and inaccurate.
          Anything provided via the control port can scrubbed to make sure we
          aren't providing anything we think relay operators should not see.
     
      - additional information
          All connection querying commands strictly provide the ip address and
          port of connections, and nothing else. However, for the uses listed
          above the far more interesting attributes are the circuit's type,
          bandwidth usage and uptime.
     
      - improved performance
          Querying connection data is an expensive activity, especially for
          busy relays or low end processors (such as mobile devices). Tor
          already internally knows its circuits, allowing for vastly quicker
          lookups.
     
      - cross platform capability
          The connection querying utilities mentioned above not only aren't
          available under Windows, but differ widely among different *nix
          platforms. FreeBSD in particular takes a very unique approach,
          dropping important options from netstat and assigning ss to a
          spreadsheet application instead. A controller interface, however,
          would provide a uniform means of retrieving this information.

Security Implications:

    This is an open question. This proposal lacks the most controversial pieces
    of information (ip addresses and ports) and insight into potential threats
    this would pose would be very welcomed!

Specification:

   The following addition would be made to the control-spec's GETINFO section:

  "circ/id/<Circuit identity>" -- Provides entry for the associated relay
    circuit, formatted as:
      CIRC_ID CREATED UPDATED TYPE READ WRITE

    none of the parameters contain whitespace, and additional results must be
    ignored to allow for future expansion. Parameters are defined as follows:
      CIRC_ID - Unique numeric identifier for the circuit this belongs to.
      CREATED - Unix timestamp (as seconds since the Epoch) for when the
          circuit was created.
      UPDATED - Unix timestamp for when this information was last updated.
      TYPE - Single character flag indicating the positioning in the circuit:
          C: client facing (first hop / bridge)
          M: intermediate
          E: exiting
          B: both client facing and exiting
      READ - Total bytes transmitted toward the exit over the circuit.
      WRITE - Total bytes transmitted toward the client over the circuit.

  "circ/all" -- The 'circ/id/*' output for all current circuits, joined by
    newlines.

   The following would be included for circ info update events.

4.1.X. Relay circuit status changed

  The syntax is:
     "650" SP "RCIRC" SP CircID SP Notice [SP Created SP Updated SP Type SP
          Read SP Write] CRLF
     
     Notice =
            "NEW"    / ; first information being provided for this circuit
            "UPDATE" / ; update for a previously reported circuit
            "CLOSED"   ; notice that the circuit no longer exists
    
  Notice indicating that queryable information on a relay related circuit has
  changed. If the Notice parameter is either "NEW" or "UPDATE" then this
  provides the same fields that would be given by calling "GETINFO circ/id/"
  with the CircID.
-------------- next part --------------
Filename: xxx-getinfo-option-expansion.txt
Title: GETINFO Option Expansion
Author: Damian Johnson
Created: 02-June-2010
Status: Draft

Overview:

    Over the course of developing arm there's been numerous hacks and
    workarounds to gleam pieces of basic, desirable information about the tor
    process. As per Roger's request I've compiled a list of these pain points
    to try and improve the control protocol interface.

Motivation:

    The purpose of this proposal is to expose additional process and relay
    related information that is currently unavailable in a convenient,
    dependable, and/or platform independent way. Examples of this are...
    
      - The relay's total contributed bandwidth. This is a highly requested
        piece of information and, based on the following patch from pipe, looks
        trivial to include.
        http://www.mail-archive.com/or-talk@freehaven.net/msg13085.html
      
      - The process ID of the tor process. There is a high degree of guess work
        in obtaining this. Arm for instance uses pidof, netstat, and ps yet
        still fails on some platforms, and Orbot recently got a ticket about
        its own attempt to fetch it with ps:
        https://trac.torproject.org/projects/tor/ticket/1388
    
    This just includes the pieces of missing information I've noticed
    (suggestions or questions of their usefulness are welcome!).

Security Implications:

    None that I'm aware of. From a security standpoint this seems decently
    innocuous.

Specification:

    The following addition would be made to the control-spec's GETINFO section:
    
    "relay/bw-limit" -- Effective relayed bandwidth limit.
    
    "relay/burst-limit" -- Effective relayed burst limit.
    
    "relay/read-total" -- Total bytes relayed (download).
    
    "relay/write-total" -- Total bytes relayed (upload).
    
    "relay/flags" -- Space separated listing of flags currently held by the
    relay as repored by the currently cached consensus.
    
    "desc/time" -- Unix timestamp (as seconds since the Epoch) for when the
    latest server descriptor was fetched.
    
    "process/user" -- Username under which the tor process is running,
    providing an empty string if none exists.
    
    "process/pid" -- Process id belonging to the main tor process, -1 if none
    exists for the platform.
    
    "process/uptime" -- Total uptime of the tor process (in seconds).
    
    "process/uptime-reset" -- Time since last reset (startup, sighup, or RELOAD
    signal, in seconds).
    
    "process/descriptors-used" -- Count of file descriptors used.
    
    "process/descriptor-limit" -- File descriptor limit (getrlimit results).
    
    "ns/authority" -- Router status info (v2 directory style) for all
    recognized directory authorities, joined by newlines.


More information about the tor-dev mailing list