Patch: Adding country codes to *Nodes config options

Nick Mathewson nickm at freehaven.net
Mon Sep 22 20:02:41 UTC 2008


{Please take pity on us old-school purists and wrap email to under 75 colums,
so that it doesn't overflow our terminals when we reply}

Hi, Robert!  This patch is a good start.  I can clean it up if you
would like, or you can do another version; let me know what you'd
prefer.

On Mon, Sep 22, 2008 at 07:20:02PM +0100, Robert Hogan wrote:
> This patch allows users to add country codes to the following configuration 
> options: EntryNodes, ExitNodes, ExcludeNodes, ExcludeExitNodes.
> 
> As a side effect, it adds the ability to specify IP address patterns for 
> EntryNodes and ExitNodes.
> 

> Country codes are specified as: {cc} and are not case sensitive. So
> for example:
> 
> EntryNodes {gb},{DE},$AAAAAAAAAAAAAAAAA, Unnamed, 233.233.0.0./8
> 
> How the patch works:
> 
> - EntryNodes and ExitNodes become routerset_t's.

Nice.  I hope that wasn't too hard.

> - The element 'int country' is added to routerinfo_t.

This should probably be a uint16_t; we will not have 2**32 countries
(or 2**64 on some CPUs) any time soon unless some of the more
interesting science-fiction novels of the 90s turn out to be
unexpectedly prophetic.

This should probably go into routerstatus_t?  Actually, maybe it
doesn't need to be cached at all; is IP-to-country lookup so
expensive?  I think it's O(log n) in our code.

> - The pattern {cc}, where cc is a valid country code, is added to
> routerset_t.

Keen.

> - Invalid country codes will cause the entire config option to be rejected.
>
> - When a {cc} is specified in a *Nodes option, routerlist is iterated
> and the identity digest of all routers with a matching country is
> added to the routerset_t digest list. If the particular digest already
> exists, it is added anyway.
>
> - Every time Tor parses a new router it checks to see if the
> router's country is specified in any *Nodes option. If so, it adds
> the router to the option's
> routerset. (router_parse_entry_from_string is the only place where a
> router's routerinfo_t is created/updated right?)

Hm.  This is a pretty big shift in how the code handles nodes in the
include/exclude lists.  Previously, there hasn't been any cacheing of
whether a router is excluded, and from what roles: this info was
recalculated lots.

Now, whenever the {Excluded}(Entry/Exit)Nodes list changes, or we get
a new routerinfo, or whenever we change GeoIPFile or reparse the
GeoIPFile, we need to recalculate this information, and we need to
make sure we haven't messed it up.  It looks like we look at the list
of country codes when we're adding routers, but we look at the set of
digests when we're excluding routers.  To me, this seems like a bug
waiting to happen.



> - I think _ExcludeExitNodesUnion was leaking because it was not part
> of option_vars. I've added it option_vars for now - but it looks as
> though config_var_t should be updated to allow internal options that
> can be memory managed in config_var_t but not set/unset/updated by
> the user. I can't see an unintrusive way of doing that, so have left
> it alone for now.

The option_vars list is a list of variables that are accessible as
options; they are NOT a fields in or_options_t that should get freed.
It probably makes sense to add a new or_options_free() function to
wrap config_free() and free extra, non-configurable members of
options, and to call that in lieu of config_free(&options_format,...).


> - routerset_equal is an unfortunate duplication of opt_streq. Maybe
> there's a better way?

Hm.  It seems like we might be happier doing semantic comparisons.


Also, what happens when there is no geoip file?


> Index: src/or/config.c
  [...]
>    /* Check if we need to parse and add the EntryNodes config option. */
>    if (options->EntryNodes &&
>        (!old_options ||
> -       !opt_streq(old_options->EntryNodes, options->EntryNodes)))
> +      (!routerset_equal(old_options->ExitNodes,options->ExitNodes))))
>      entry_nodes_should_be_added();

Did you mean to change this from a comparison of EntryNodes to a
comparison of ExitNodes?



> Index: src/or/routerlist.c
> ===================================================================
> --- src/or/routerlist.c	(revision 16912)
> +++ src/or/routerlist.c	(working copy)
> @@ -4721,6 +4721,54 @@
>    return result;
>  }
>  
> +/** Helper.  Add <b>router</b> to any routersets which contain its country. */
> +static void
> +routerset_add_routerdigest(routerset_t *target, const routerinfo_t *router,
> +                           const char *description)
> +{
> +  log_debug(LD_CONFIG, "Adding identity for router %s to %s",
> +            router->nickname, description);
> +  digestmap_set(target->digests, router->cache_info.identity_digest,
> +                        (void*)1);
> +}

The comment for this function does describe what it does: it adds the
router's digest indiscriminately to a single routerset_t.

Also, maybe we should move the description field into the routerset_t
structure.

> +/** Add all routers in country <b>c</b> to <b>routerstoadd</b>. Return 0
> + * if <b>c</b> is not a valid country.*/
> +static int
> +routerset_add_routers_by_country(smartlist_t *routerstoadd, const char *c,
> +                                 int running_only)
> +{
> +  char *country;
> +
> +  if (!strchr(c,'{') && !strchr(c,'}'))
> +    return 0;
> +
> +  country=tor_strdup(c);
> +  tor_strstrip(country,"}");
> +  tor_strstrip(country,"{");
> +  tor_strlower(country);

So we not only accept {GB} but also }GB{ and {{GB}} and }{{G}{}{{}B}}?
That seems wrong.


>  /** Parse the string <b>s</b> to create a set of routerset entries, and add
>   * them to <b>target</b>.  In log messages, refer to the string as
>   * <b>description</b>.  Return 0 on success, -1 on failure.
> @@ -4733,6 +4781,7 @@
>  routerset_parse(routerset_t *target, const char *s, const char *description)
>  {
>    int r = 0;
> +  smartlist_t *cc_routers=smartlist_create();
>    smartlist_t *list = smartlist_create();
>    smartlist_split_string(list, s, ",",
>                           SPLIT_SKIP_SPACE | SPLIT_IGNORE_BLANK, 0);
> @@ -4748,6 +4797,14 @@
>        } else if (is_legal_nickname(nick)) {
>          log_debug(LD_CONFIG, "Adding nickname %s to %s", nick, description);
>          strmap_set_lc(target->names, nick, (void*)1);
> +      } else if (routerset_add_routers_by_country(cc_routers, nick, 1)) {
> +        log_debug(LD_CONFIG, "Adding identities in %s to %s", nick,
> +                  description);
> +        SMARTLIST_FOREACH(cc_routers, routerinfo_t *, router,
> +        {
> +          routerset_add_routerdigest(target, router, description);
> +        });
> +        smartlist_clear(cc_routers);

This shouldn't happen during the parse phase; we should probably just
have a list or set of countries.  As written, this code treats {cc} as
being an alias for "all the routers in cc, when we parsed cc".

 [...]
>  /** Helper.  Return true iff <b>set</b> contains a router based on the other
>   * provided fields. */
>  static int
> @@ -4827,9 +4898,11 @@
>  routerset_get_all_routers(smartlist_t *out, const routerset_t *routerset,
>                            int running_only)
>  {
> +  smartlist_t *cc_routers=NULL;
>    tor_assert(out);
>    if (!routerset || !routerset->list)
>      return;
> +  cc_routers=smartlist_create();
>    if (!warned_nicknames)
>      warned_nicknames = smartlist_create();
>    SMARTLIST_FOREACH(routerset->list, const char *, name, {
> @@ -4837,6 +4910,10 @@
>      if (router) {
>        if (!running_only || router->is_running)
>          smartlist_add(out, router);
> +    }else if (routerset_add_routers_by_country(cc_routers, name,
> +                                               running_only)) {
> +      smartlist_add_all(out, cc_routers);
> +      smartlist_clear(cc_routers);
>      }

This is the exact same calculation as above; there is no need to do
this both during "parse" and "get_all". 

Also, the approach here seems to make routerset_get_all_routers into
an O(c*n) calculation, where c is the number of countries and r the
number of routers.  If we represented the countries in the routerset
as a strmap_t or a bitset, then the calculation could be O(n).

 [...]
> +/** Return 1 if <b>old</b> and <b>new</b> match, otherwise return 0. */
> +int
> +routerset_equal(const routerset_t *old, const routerset_t *new)

It seems you've gone for a strange order-dependent equality, where
"a,b" is the same as "a , b", but not the same as "b,a".  You might
want to put this in the comment.


 [...]
> Index: src/or/or.h
> ===================================================================
 [...]
> +  struct routerset_t *ExitNodes; /**< Comma-separated list of nicknames of ORs
> +                                  * to consider as exits. */
> +  struct routerset_t *EntryNodes; /**< Comma-separated list of nicknames of ORs
> +                                   * to consider as entry
points. */

These comments are no longer accurate.

-- 
Nick



More information about the tor-dev mailing list