[tor-dev] Update on 259

Fri Apr 8 11:59:15 UTC 2016

Ola Bini <obini at thoughtworks.com> writes:

> [ text/plain ]
> Hey,
>
>> > - OrPort vs DirPort
>> > ORPort is used for regular circuits, while DirPort is used when getting directory information. We need to interpret reachable stuff
>> > differently depending on the purpose.
>> >
>>
>> I'm not actually sure what the comment means here.
> This was more for our own benefit. The OrPort vs DirPort distinction
> has been a bit complicated so far. The comment basically means, when
> we are looking up directory information, we should use the DirPort to
> decide reachability and so on instead, correct?
>
>> Ensuring a min percentage of dirguards in our sampled set could work. Then,
>> when we need a directory guard, we could filter the sampled set and only
>> examine guards that can do directory requests.
> Yeah, we talked about this yesterday and our current thinking is to
> have a sampled set that contains every kind of thing, and then we
> dynamically filter it based on config and so on during START.
>
>> Hm, are you talking about the guardlists here? What's the question?
>>
>> BTW, if we have the ability to do "ensure a min percentage of X in our sampled
>> set", couldn't we just ensure a min percentage of dystopic guards in our
>> sampled set?  And forget about the two separate guardlists?
>>
>> In this case we can have the percentage value be the actual portion of the
>> network that is dystopic guards. So if 20% of the total guard bandwidth is
>> dystopic, we could ensure that at least 20% of our sampled set is
>> dystopic".
> Well, the problem is really that the idea of dystopic doesn't
> necessarily make sense, since it's so dependent on the current network
> position of the client. Our current thinking is to do away with that
> concept as well. =)
>
>> > - DYSTOPIC - is there value in trying 80 and 443?
>> > Probably not.
>> >
>>
>> What does "trying" mean in this case?
> Falling back to guards with 80 and 443.
>
>> Restart pending guard selection algorithms on a SIGHUP? Plausible, but I don't
>> know how hard it would be to implement this.
> Well, the alternative is to just finish the running guard selections
> with the old settings, but use the new settings for new algorithm instances.
>
>> That's not very nice because the USED_GUARDS set that was created when
>> ClientsUseIPv6 or FascistFirewall were on will have reduced diversity. Then
>> even if we switch off those options, we are still stuck with reduced diversity.
>>
>> I'm not sure what's the right way to do this here!
>>
>> We could imagine having multiple USED_GUARDS sets, where we make a new set for
>> each possible filter. This might be worth considering, but I imagine there will
>> be technical difficulties. e.g. when a guard goes down, you need to update its
>> state in all the USED_GUARDS sets that it's in. Also, a person who toggles the
>> FascistFirewall option frequently, will end up using two different sets of
>> guards all the time which is suboptimal.
> Well, one thing you could do is hash the settings (and maybe also
> reachable ports) and use that as a key to differentiate the different
> USED_GUARDS. That would solve the problem, but might lead to a single
> client using lots of different guards in different locations. Might
> that be OK?
>
>> > - Can we make the lists smaller?
>> > Probably. Maybe a sampled set of 30 guards? Or 1.5%?
>> >
>>
>> Plausible. However, if we take the filtering approach but use a small sampled
>> guards list, it could happen that the list is not able to satisfy some of our
>> filtering restrictions.
>>
>> e.g. maybe in our 30 guards there are no IPv6 guards at all, and the user just
>>      turned on ClientUseIPv6. What to do now?
>>
>> This is important to understand, because currently there is no mechanism to add
>> stuff to the sampled guards list if a restriction cannot be satisfied. So what
>> will Tor do, if a user enables ClientsUseIPv6 _and_ FascistFirewall but there
>> are no IPv6+80/443 guards in our sampled guards list?
> Yeah, we talked about that yesterday. Our suggestion is to do
> something like this:
> - if the filtered/reduced sample-set contains less than X (5?) guards,
> expand SAMPLED guards using the regular process.
> - If SAMPLE guards reach SAMPLED_MAX (50?) size, we fail closed with
> an error saying something like "your current network settings make it
> impossible for us to safely choose an entry guard. If you really need
> to connect under these circumstances, consider explicitly setting the
> EntryGuards configuration option"

Hello,

just to give some perspective I wrote a small script that calculates the
bandwidth percentage of various types of guards. Here are the results:

    - Directory guards are 85% of total current guard bandwidth.
    - Guards with ORPorts on 80/443 are 42% of total current guard bandwidth.
    - Guards on IPv6 are 20% of total current guard bandwidth.
    - Guards both on 80/443 and on IPv6 are 8% of total current guard bandwidth.

I include the script in the end of my email.

Thinking about this, I wonder if we should have a minimum filtered sample size
as was suggested in previous emails. So let's say we have
MINIMUM_FILTERED_SAMPLE_SIZE set to 5, the user has ClientPreferIPv6 set, but
their sampled set has only one IPv6 guard. In this case, the user will keep on
sampling till they get four additional IPv6 guards.

With the above probabilities, everytime you sample a guard you have 20%
probability for it to be an IPv6 guard. To get 4 of them, you will first need
to sample a lot of guards (like 20 or so, but some binomial distribution magic
is required to get the exact probabilities) and add them to your sampled guard
set. I wonder if that's worth it.

Maybe if we have a single guard that satisfies our filters in our sampled guard
list, we should use that guard instead of sampling for more? I don't exactly
see value in sampling and exposing ourselves to additional guards if we have
one that we like (and we might have even connected to in the past).

---

Symmetrically, I'm also not sure what's the right thing to do if we have zero
guards that satisfy our filters in our sampled guard set. Should we start
sampling randomly till we hit a guard that satisfies us, or should we sample
directly from the correct set (e.g. only from the set of IPv6 guards). I'm
still not sure about this.

---

from stem.descriptor import parse_file

def main():
    total_guard_bw = 0
        dirguard_bw = 0
            antifa_bw = 0
                ipv6_bw = 0
                    super_antifa_bw = 0

    for desc in parse_file('/home/user/.tor/cached-microdesc-consensus2'):
            is_antifa = False

        if "Guard" not in desc.flags:
                    continue

        total_guard_bw += desc.bandwidth

        if "V2Dir" in desc.flags:
                    dirguard_bw += desc.bandwidth

        if desc.or_port in (80, 443):
                    #print('Found guard %s (%s)' % (desc.nickname,
        desc.or_port))
                    antifa_bw += desc.bandwidth
                                is_antifa = True

        if desc.or_addresses:
                    if (desc.or_addresses[0][2] == True):
                                    ipv6_bw += desc.bandwidth
                                                    if (is_antifa):
                                                                        super_antifa_bw
                    += desc.bandwidth

    print "Dirguard percentage = %f" % (dirguard_bw / float(total_guard_bw))
        print "Antifa (80/443) percentage = %f" % (antifa_bw /
        float(total_guard_bw))
            print "IPv6 percentage = %f" % (ipv6_bw / float(total_guard_bw))
                print "Antifa + IPv6 percentage = %f" % (super_antifa_bw /
        float(total_guard_bw))

if __name__ == '__main__':
    main()