Hey,
- OrPort vs DirPort
ORPort is used for regular circuits, while DirPort is used when getting directory information. We need to interpret reachable stuff differently depending on the purpose.
I'm not actually sure what the comment means here.
This was more for our own benefit. The OrPort vs DirPort distinction has been a bit complicated so far. The comment basically means, when we are looking up directory information, we should use the DirPort to decide reachability and so on instead, correct?
Ensuring a min percentage of dirguards in our sampled set could work. Then, when we need a directory guard, we could filter the sampled set and only examine guards that can do directory requests.
Yeah, we talked about this yesterday and our current thinking is to have a sampled set that contains every kind of thing, and then we dynamically filter it based on config and so on during START.
Hm, are you talking about the guardlists here? What's the question?
BTW, if we have the ability to do "ensure a min percentage of X in our sampled set", couldn't we just ensure a min percentage of dystopic guards in our sampled set? And forget about the two separate guardlists?
In this case we can have the percentage value be the actual portion of the network that is dystopic guards. So if 20% of the total guard bandwidth is dystopic, we could ensure that at least 20% of our sampled set is dystopic".
Well, the problem is really that the idea of dystopic doesn't necessarily make sense, since it's so dependent on the current network position of the client. Our current thinking is to do away with that concept as well. =)
- DYSTOPIC - is there value in trying 80 and 443?
Probably not.
What does "trying" mean in this case?
Falling back to guards with 80 and 443.
Restart pending guard selection algorithms on a SIGHUP? Plausible, but I don't know how hard it would be to implement this.
Well, the alternative is to just finish the running guard selections with the old settings, but use the new settings for new algorithm instances.
That's not very nice because the USED_GUARDS set that was created when ClientsUseIPv6 or FascistFirewall were on will have reduced diversity. Then even if we switch off those options, we are still stuck with reduced diversity.
I'm not sure what's the right way to do this here!
We could imagine having multiple USED_GUARDS sets, where we make a new set for each possible filter. This might be worth considering, but I imagine there will be technical difficulties. e.g. when a guard goes down, you need to update its state in all the USED_GUARDS sets that it's in. Also, a person who toggles the FascistFirewall option frequently, will end up using two different sets of guards all the time which is suboptimal.
Well, one thing you could do is hash the settings (and maybe also reachable ports) and use that as a key to differentiate the different USED_GUARDS. That would solve the problem, but might lead to a single client using lots of different guards in different locations. Might that be OK?
- Can we make the lists smaller?
Probably. Maybe a sampled set of 30 guards? Or 1.5%?
Plausible. However, if we take the filtering approach but use a small sampled guards list, it could happen that the list is not able to satisfy some of our filtering restrictions.
e.g. maybe in our 30 guards there are no IPv6 guards at all, and the user just turned on ClientUseIPv6. What to do now?
This is important to understand, because currently there is no mechanism to add stuff to the sampled guards list if a restriction cannot be satisfied. So what will Tor do, if a user enables ClientsUseIPv6 _and_ FascistFirewall but there are no IPv6+80/443 guards in our sampled guards list?
Yeah, we talked about that yesterday. Our suggestion is to do something like this: - if the filtered/reduced sample-set contains less than X (5?) guards, expand SAMPLED guards using the regular process. - If SAMPLE guards reach SAMPLED_MAX (50?) size, we fail closed with an error saying something like "your current network settings make it impossible for us to safely choose an entry guard. If you really need to connect under these circumstances, consider explicitly setting the EntryGuards configuration option"
I think it asks "What happens when guards in our sampled set drop out of the consensus and get marked as bad?" (see bad_since in entrynodes.c) .
This is also a great question. Especially when combined with the planned "ensure a min percentage of X in our sampled set" logic.
Like, what happens if suddenly most of our sampled IPv6 guards drop out of the consensus when we have ClientsUseIPv6 on? Should we replace them? And if yes, don't we need to replace them with other IPv6 guards to maintain the minimum percentage?
Well, I think if we replace, we should just replace randomly just like we always expand the sampled set. If most ipv6 guards drop out, we will have fewer ipv6 guards in our sampled set, but that also reflects the Tor network.
I suspect a plausible thing to do is to wait a few consensus rounds with expanding the sampled set to replace "bad" guards - they might come back, and under most circumstances we shouldn't need to use the sampled set anyway.
The current proposal says the following about SAMPLED_UTOPIC_GUARDS:
It will be filled in by the algorithm if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD guards after winnowing out older guards.
which I think is a good suggestion. However, what should we do if we end up going with the "ensure minimum percentage" logic?
Yeah, my suggestion is not have minimum percentage of X type guards - instead just fill it randomly, and use the expanding process if we can't find enough guards for a specific purpose.
- EntryNodes
If this is set, never use the algorithm for regular circuits - we should still use it for directory server connections though.
If this is set we should not use our algorithm, but we should instead pick one of the guards in the EntryNodes list. This is for people who want to hardcode their guard. It's used a lot by people currently.
Yeah. Is the guard picked randomly from this list, or using something more complicated?
- UseEntryGuardsAsDirGuards
I don't understand exactly what this settings does.
I'm not sure either. I'd just let it keep the exact same semantics it currently has.
Yeah, except we don't exactly understand what does semantics are, and if we need to change something in our code to match it. =)
Thanks for all the feedback! Hopefully we're getting close to the final iteration of this spec. =)
Cheers
On 6 Apr 2016, at 23:08, Ola Bini obini@thoughtworks.com wrote:
Hey,
- OrPort vs DirPort
ORPort is used for regular circuits, while DirPort is used when getting directory information. We need to interpret reachable stuff differently depending on the purpose.
I'm not actually sure what the comment means here.
This was more for our own benefit. The OrPort vs DirPort distinction has been a bit complicated so far. The comment basically means, when we are looking up directory information, we should use the DirPort to decide reachability and so on instead, correct?
No, clients typically tunnel directory requests over the ORPort when they can. This is better for anonymity.
But they will fall back to the DirPort in some circumstances. And relays use the DirPort all the time.
Hm, are you talking about the guardlists here? What's the question?
BTW, if we have the ability to do "ensure a min percentage of X in our sampled set", couldn't we just ensure a min percentage of dystopic guards in our sampled set? And forget about the two separate guardlists?
In this case we can have the percentage value be the actual portion of the network that is dystopic guards. So if 20% of the total guard bandwidth is dystopic, we could ensure that at least 20% of our sampled set is dystopic".
Well, the problem is really that the idea of dystopic doesn't necessarily make sense, since it's so dependent on the current network position of the client. Our current thinking is to do away with that concept as well. =)
This would make things simpler, and would address many of my issues with the proposal.
That's not very nice because the USED_GUARDS set that was created when ClientsUseIPv6 or FascistFirewall were on will have reduced diversity. Then even if we switch off those options, we are still stuck with reduced diversity.
I'm not sure what's the right way to do this here!
We could imagine having multiple USED_GUARDS sets, where we make a new set for each possible filter. This might be worth considering, but I imagine there will be technical difficulties. e.g. when a guard goes down, you need to update its state in all the USED_GUARDS sets that it's in. Also, a person who toggles the FascistFirewall option frequently, will end up using two different sets of guards all the time which is suboptimal.
Well, one thing you could do is hash the settings (and maybe also reachable ports) and use that as a key to differentiate the different USED_GUARDS. That would solve the problem, but might lead to a single client using lots of different guards in different locations. Might that be OK?
It's worse for the risk of guard compromise. But it's better because it prevents the client being fingerprinted using its guard set.
- Can we make the lists smaller?
Probably. Maybe a sampled set of 30 guards? Or 1.5%?
Plausible. However, if we take the filtering approach but use a small sampled guards list, it could happen that the list is not able to satisfy some of our filtering restrictions.
e.g. maybe in our 30 guards there are no IPv6 guards at all, and the user just turned on ClientUseIPv6. What to do now?
This is important to understand, because currently there is no mechanism to add stuff to the sampled guards list if a restriction cannot be satisfied. So what will Tor do, if a user enables ClientsUseIPv6 _and_ FascistFirewall but there are no IPv6+80/443 guards in our sampled guards list?
Yeah, we talked about that yesterday. Our suggestion is to do something like this:
- if the filtered/reduced sample-set contains less than X (5?) guards,
expand SAMPLED guards using the regular process.
- If SAMPLE guards reach SAMPLED_MAX (50?) size, we fail closed with
an error saying something like "your current network settings make it impossible for us to safely choose an entry guard. If you really need to connect under these circumstances, consider explicitly setting the EntryGuards configuration option"
Oh, wow, I don't think failing closed is a good idea. It means users that move around a lot (and clients which have a longer state history) could fail at some arbitrary time. Why not simply continue to add guards that satisfy the restrictions?
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
Hi,
No, clients typically tunnel directory requests over the ORPort when they can. This is better for anonymity.
But they will fall back to the DirPort in some circumstances. And relays use the DirPort all the time.
Ah, thanks - that's helpful,
It's worse for the risk of guard compromise. But it's better because it prevents the client being fingerprinted using its guard set.
Right, so rock or a hard place? =)
Yeah, we talked about that yesterday. Our suggestion is to do something like this:
- if the filtered/reduced sample-set contains less than X (5?) guards,
expand SAMPLED guards using the regular process.
- If SAMPLE guards reach SAMPLED_MAX (50?) size, we fail closed with
an error saying something like "your current network settings make it impossible for us to safely choose an entry guard. If you really need to connect under these circumstances, consider explicitly setting the EntryGuards configuration option"
Oh, wow, I don't think failing closed is a good idea. It means users that move around a lot (and clients which have a longer state history) could fail at some arbitrary time. Why not simply continue to add guards that satisfy the restrictions?
Well, users that move around a lot will only have an expanded sampled set if they move between several different networks that have severe restrictions - but mutually exclusive such restrictions. And we would only ever hit this fail closed if we can't find anything in the sampled set that matches the current needed restrictions. If we keep adding guards, the idea of the sampled set as a measure to minimize exposure to too many guards fly out the window.
The problem really comes down to this - if you have a network that is actively firewalling every guard that is not under their control, if we keep expanding we will sooner or later be forced to use a guard under adversary control. By failing closed, we can avoid that eventuality. However, it seems you don't like that idea - there seems to be some dissent among the Tor devs which approach is best for this situation.
Cheers
On 6 Apr 2016, at 23:42, Ola Bini ola@olabini.se wrote:
Yeah, we talked about that yesterday. Our suggestion is to do something like this:
- if the filtered/reduced sample-set contains less than X (5?) guards,
expand SAMPLED guards using the regular process.
- If SAMPLE guards reach SAMPLED_MAX (50?) size, we fail closed with
an error saying something like "your current network settings make it impossible for us to safely choose an entry guard. If you really need to connect under these circumstances, consider explicitly setting the EntryGuards configuration option"
I suggest we offer "reinstall Tor Browser" or "delete you state file" as a recommended action, because users are more likely to be able to successfully complete those actions, than correctly set "UseEntryGuards".
Oh, wow, I don't think failing closed is a good idea. It means users that move around a lot (and clients which have a longer state history) could fail at some arbitrary time. Why not simply continue to add guards that satisfy the restrictions?
Well, users that move around a lot will only have an expanded sampled set if they move between several different networks that have severe restrictions - but mutually exclusive such restrictions. And we would only ever hit this fail closed if we can't find anything in the sampled set that matches the current needed restrictions. If we keep adding guards, the idea of the sampled set as a measure to minimize exposure to too many guards fly out the window.
The problem really comes down to this - if you have a network that is actively firewalling every guard that is not under their control, if we keep expanding we will sooner or later be forced to use a guard under adversary control. By failing closed, we can avoid that eventuality. However, it seems you don't like that idea - there seems to be some dissent among the Tor devs which approach is best for this situation.
We have to balance the risk of users being funnelled towards a malicious guard, with the risk of denying users access to non-malicious guards.
I think that the more complex the scheme, the more likely it is to have non-obvious failure modes. I'm particularly concerned about failure modes where a user switches between several locations with mutually exclusive firewalling. How many locations would be needed to cause this issue? 50/3 = 16? That's probably ok.
What about guard churn in the consensus? Does that reduce the number of possible locations over time? Because that's a property I think we should avoid. I'm concerned about these kinds of failure modes that only occur in state files that are months or years old. These are very hard to find in testing, and hard to discover during modelling.
Using a malicious guard has similar consequences to Tor failing closed, and users switching to a non-tor browser. I'm not sure which is worse. It probably depends on the user. But we should try to avoid both scenarios.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B
Ola Bini obini@thoughtworks.com writes:
[ text/plain ] Hey,
<snip>
That's not very nice because the USED_GUARDS set that was created when ClientsUseIPv6 or FascistFirewall were on will have reduced diversity. Then even if we switch off those options, we are still stuck with reduced diversity.
I'm not sure what's the right way to do this here!
We could imagine having multiple USED_GUARDS sets, where we make a new set for each possible filter. This might be worth considering, but I imagine there will be technical difficulties. e.g. when a guard goes down, you need to update its state in all the USED_GUARDS sets that it's in. Also, a person who toggles the FascistFirewall option frequently, will end up using two different sets of guards all the time which is suboptimal.
Well, one thing you could do is hash the settings (and maybe also reachable ports) and use that as a key to differentiate the different USED_GUARDS. That would solve the problem, but might lead to a single client using lots of different guards in different locations. Might that be OK?
Hmm, it could be that in some cases the USED_GUARDS list of settings A would also work fine under settings B. For example, an 80/443 guard might be our top guard under vanilla settings, which would also work fine under FascistFirewall settings.
In those cases (which are not all that rare), it would be suboptimal to use a different guardlist, since you are going to expose yourself to more guards than required.
- Can we make the lists smaller?
Probably. Maybe a sampled set of 30 guards? Or 1.5%?
Plausible. However, if we take the filtering approach but use a small sampled guards list, it could happen that the list is not able to satisfy some of our filtering restrictions.
e.g. maybe in our 30 guards there are no IPv6 guards at all, and the user just turned on ClientUseIPv6. What to do now?
This is important to understand, because currently there is no mechanism to add stuff to the sampled guards list if a restriction cannot be satisfied. So what will Tor do, if a user enables ClientsUseIPv6 _and_ FascistFirewall but there are no IPv6+80/443 guards in our sampled guards list?
Yeah, we talked about that yesterday. Our suggestion is to do something like this:
- if the filtered/reduced sample-set contains less than X (5?) guards,
expand SAMPLED guards using the regular process.
- If SAMPLE guards reach SAMPLED_MAX (50?) size, we fail closed with
an error saying something like "your current network settings make it impossible for us to safely choose an entry guard. If you really need to connect under these circumstances, consider explicitly setting the EntryGuards configuration option"
I share the same concerns as teor here.
As you said, I think it's important to keep the property of restricting the total number of guards we connect to, as to avoid the attacks you mentioned.
Of course, the above property also carries the inherent risk of potentially failing closed. For example, consider the unlikely scenario where _all_ the relays in your sampled set suddenly go offline while your network is fine.
That said, we should be very careful of failing closed like that just because the user changed some torrc options and moved around a bit.
I will try to think more about this problem and write something tomorrow or the day after.
I think it asks "What happens when guards in our sampled set drop out of the consensus and get marked as bad?" (see bad_since in entrynodes.c) .
This is also a great question. Especially when combined with the planned "ensure a min percentage of X in our sampled set" logic.
Like, what happens if suddenly most of our sampled IPv6 guards drop out of the consensus when we have ClientsUseIPv6 on? Should we replace them? And if yes, don't we need to replace them with other IPv6 guards to maintain the minimum percentage?
Well, I think if we replace, we should just replace randomly just like we always expand the sampled set. If most ipv6 guards drop out, we will have fewer ipv6 guards in our sampled set, but that also reflects the Tor network.
I suspect a plausible thing to do is to wait a few consensus rounds with expanding the sampled set to replace "bad" guards - they might come back, and under most circumstances we shouldn't need to use the sampled set anyway.
I think that's a good idea. We shouldn't just drop guards from memory the moment they leave the consensus.
For example, the current Tor codebase keeps them around for ever but marked as bad.
The current proposal says the following about SAMPLED_UTOPIC_GUARDS:
It will be filled in by the algorithm if it's empty, or if it contains less than SAMPLE_SET_THRESHOLD guards after winnowing out older guards.
which I think is a good suggestion. However, what should we do if we end up going with the "ensure minimum percentage" logic?
Yeah, my suggestion is not have minimum percentage of X type guards - instead just fill it randomly, and use the expanding process if we can't find enough guards for a specific purpose.
- EntryNodes
If this is set, never use the algorithm for regular circuits - we should still use it for directory server connections though.
If this is set we should not use our algorithm, but we should instead pick one of the guards in the EntryNodes list. This is for people who want to hardcode their guard. It's used a lot by people currently.
Yeah. Is the guard picked randomly from this list, or using something more complicated?
The guard is picked randomly from the list.
- UseEntryGuardsAsDirGuards
I don't understand exactly what this settings does.
I'm not sure either. I'd just let it keep the exact same semantics it currently has.
Yeah, except we don't exactly understand what does semantics are, and if we need to change something in our code to match it. =)
OK, I looked a bit more at should_use_directory_guards().
I think that config option basically enables the directory guard feature. The option name is a bit confusing, I agree.
If that config option is turned off, Tor will just choose a random directory node from the network, instead of using the entry guard list.
Thanks for all the feedback! Hopefully we're getting close to the final iteration of this spec. =)
Cheers
Ola Bini (https://olabini.se)
"Yields falsehood when quined" yields falsehood when quined. [ signature.asc: application/pgp-signature ]
George Kadianakis desnacked@riseup.net writes:
[ text/plain ] Ola Bini obini@thoughtworks.com writes:
[ text/plain ] Hey,
<snip>
That's not very nice because the USED_GUARDS set that was created when ClientsUseIPv6 or FascistFirewall were on will have reduced diversity. Then even if we switch off those options, we are still stuck with reduced diversity.
I'm not sure what's the right way to do this here!
We could imagine having multiple USED_GUARDS sets, where we make a new set for each possible filter. This might be worth considering, but I imagine there will be technical difficulties. e.g. when a guard goes down, you need to update its state in all the USED_GUARDS sets that it's in. Also, a person who toggles the FascistFirewall option frequently, will end up using two different sets of guards all the time which is suboptimal.
Well, one thing you could do is hash the settings (and maybe also reachable ports) and use that as a key to differentiate the different USED_GUARDS. That would solve the problem, but might lead to a single client using lots of different guards in different locations. Might that be OK?
Hmm, it could be that in some cases the USED_GUARDS list of settings A would also work fine under settings B. For example, an 80/443 guard might be our top guard under vanilla settings, which would also work fine under FascistFirewall settings.
In those cases (which are not all that rare), it would be suboptimal to use a different guardlist, since you are going to expose yourself to more guards than required.
On second thought, I think using a single USED_GUARDS list here should be OK for now. That's also what Tor is doing right now, so this behavior can't be worse than the status quo.
On this note, we should add a small "Discussion" section on the proposal and briefly mention these issues that we might want to solve in the future, but we don't know how now.
- Can we make the lists smaller?
Probably. Maybe a sampled set of 30 guards? Or 1.5%?
Plausible. However, if we take the filtering approach but use a small sampled guards list, it could happen that the list is not able to satisfy some of our filtering restrictions.
e.g. maybe in our 30 guards there are no IPv6 guards at all, and the user just turned on ClientUseIPv6. What to do now?
This is important to understand, because currently there is no mechanism to add stuff to the sampled guards list if a restriction cannot be satisfied. So what will Tor do, if a user enables ClientsUseIPv6 _and_ FascistFirewall but there are no IPv6+80/443 guards in our sampled guards list?
Yeah, we talked about that yesterday. Our suggestion is to do something like this:
- if the filtered/reduced sample-set contains less than X (5?) guards,
expand SAMPLED guards using the regular process.
- If SAMPLE guards reach SAMPLED_MAX (50?) size, we fail closed with
an error saying something like "your current network settings make it impossible for us to safely choose an entry guard. If you really need to connect under these circumstances, consider explicitly setting the EntryGuards configuration option"
I share the same concerns as teor here.
As you said, I think it's important to keep the property of restricting the total number of guards we connect to, as to avoid the attacks you mentioned.
Of course, the above property also carries the inherent risk of potentially failing closed. For example, consider the unlikely scenario where _all_ the relays in your sampled set suddenly go offline while your network is fine.
That said, we should be very careful of failing closed like that just because the user changed some torrc options and moved around a bit.
I will try to think more about this problem and write something tomorrow or the day after.
I'm still not sure what's the right thing to do here.
The approach that you and Tania describe might make sense for now. I imagine that we will want to tweak the exact numbers after testing in the real world.
I wonder if there is something smart we can do here...
Here is a non-smart thing we could do: We could prepopulate our sampled guards list with all the possible guard types. So we include an 80/443 bridge and an IPv6 bridge and an IPv6 bridge that is also on 80/443, and any other thing we can think of. Unfortunately, this would greatly reduce the diversity of our guard list since there can't be too many guards that are IPv6 and on 80/443, and in the end most clients will end up using the same guards.
It might be a good idea to enumerate the guards for each possible filter we will add, and then calculate their guard probabilities, to see how likely it is to randomly choose a guard of that type. If we have filters were there is only 1% probability of picking a bridge of the right type, then these "your current network settings make it impossible for us to safely choose an entry guard" messages might appear more frequently than we would like.
Hey,
On second thought, I think using a single USED_GUARDS list here should be OK for now. That's also what Tor is doing right now, so this behavior can't be worse than the status quo.
On this note, we should add a small "Discussion" section on the proposal and briefly mention these issues that we might want to solve in the future, but we don't know how now.
Both things sound good to me.
It might be a good idea to enumerate the guards for each possible filter we will add, and then calculate their guard probabilities, to see how likely it is to randomly choose a guard of that type. If we have filters were there is only 1% probability of picking a bridge of the right type, then these "your current network settings make it impossible for us to safely choose an entry guard" messages might appear more frequently than we would like.
I'm not sure we can do this - a lot of the filters will be based on backwards compatibility with the existing Tor configuration options, things such as ReachableAddresses - I'm not sure how to reasonably enumerate all possibilities in a useful way.
Cheers
On 7 Apr 2016, at 23:53, George Kadianakis desnacked@riseup.net wrote:
Here is a non-smart thing we could do: We could prepopulate our sampled guards list with all the possible guard types. So we include an 80/443 bridge and an IPv6 bridge and an IPv6 bridge that is also on 80/443, and any other thing we can think of. Unfortunately, this would greatly reduce the diversity of our guard list since there can't be too many guards that are IPv6 and on 80/443, and in the end most clients will end up using the same guards.
It might be a good idea to enumerate the guards for each possible filter we will add, and then calculate their guard probabilities, to see how likely it is to randomly choose a guard of that type. If we have filters were there is only 1% probability of picking a bridge of the right type, then these "your current network settings make it impossible for us to safely choose an entry guard" messages might appear more frequently than we would like.
This sounds very much like ticket #17849. On that ticket, I suggest we use the current IPv4 FascistFirewall proportion as a guide to when we should warn the user. But we never considered failing closed in these circumstances: what if the user just wants circumvention, and not anonymity? https://trac.torproject.org/projects/tor/ticket/17849
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n
Ola Bini obini@thoughtworks.com writes:
[ text/plain ] Hey,
- OrPort vs DirPort
ORPort is used for regular circuits, while DirPort is used when getting directory information. We need to interpret reachable stuff differently depending on the purpose.
I'm not actually sure what the comment means here.
This was more for our own benefit. The OrPort vs DirPort distinction has been a bit complicated so far. The comment basically means, when we are looking up directory information, we should use the DirPort to decide reachability and so on instead, correct?
Ensuring a min percentage of dirguards in our sampled set could work. Then, when we need a directory guard, we could filter the sampled set and only examine guards that can do directory requests.
Yeah, we talked about this yesterday and our current thinking is to have a sampled set that contains every kind of thing, and then we dynamically filter it based on config and so on during START.
Hm, are you talking about the guardlists here? What's the question?
BTW, if we have the ability to do "ensure a min percentage of X in our sampled set", couldn't we just ensure a min percentage of dystopic guards in our sampled set? And forget about the two separate guardlists?
In this case we can have the percentage value be the actual portion of the network that is dystopic guards. So if 20% of the total guard bandwidth is dystopic, we could ensure that at least 20% of our sampled set is dystopic".
Well, the problem is really that the idea of dystopic doesn't necessarily make sense, since it's so dependent on the current network position of the client. Our current thinking is to do away with that concept as well. =)
- DYSTOPIC - is there value in trying 80 and 443?
Probably not.
What does "trying" mean in this case?
Falling back to guards with 80 and 443.
Restart pending guard selection algorithms on a SIGHUP? Plausible, but I don't know how hard it would be to implement this.
Well, the alternative is to just finish the running guard selections with the old settings, but use the new settings for new algorithm instances.
That's not very nice because the USED_GUARDS set that was created when ClientsUseIPv6 or FascistFirewall were on will have reduced diversity. Then even if we switch off those options, we are still stuck with reduced diversity.
I'm not sure what's the right way to do this here!
We could imagine having multiple USED_GUARDS sets, where we make a new set for each possible filter. This might be worth considering, but I imagine there will be technical difficulties. e.g. when a guard goes down, you need to update its state in all the USED_GUARDS sets that it's in. Also, a person who toggles the FascistFirewall option frequently, will end up using two different sets of guards all the time which is suboptimal.
Well, one thing you could do is hash the settings (and maybe also reachable ports) and use that as a key to differentiate the different USED_GUARDS. That would solve the problem, but might lead to a single client using lots of different guards in different locations. Might that be OK?
- Can we make the lists smaller?
Probably. Maybe a sampled set of 30 guards? Or 1.5%?
Plausible. However, if we take the filtering approach but use a small sampled guards list, it could happen that the list is not able to satisfy some of our filtering restrictions.
e.g. maybe in our 30 guards there are no IPv6 guards at all, and the user just turned on ClientUseIPv6. What to do now?
This is important to understand, because currently there is no mechanism to add stuff to the sampled guards list if a restriction cannot be satisfied. So what will Tor do, if a user enables ClientsUseIPv6 _and_ FascistFirewall but there are no IPv6+80/443 guards in our sampled guards list?
Yeah, we talked about that yesterday. Our suggestion is to do something like this:
- if the filtered/reduced sample-set contains less than X (5?) guards,
expand SAMPLED guards using the regular process.
- If SAMPLE guards reach SAMPLED_MAX (50?) size, we fail closed with
an error saying something like "your current network settings make it impossible for us to safely choose an entry guard. If you really need to connect under these circumstances, consider explicitly setting the EntryGuards configuration option"
Hello,
just to give some perspective I wrote a small script that calculates the bandwidth percentage of various types of guards. Here are the results:
- Directory guards are 85% of total current guard bandwidth. - Guards with ORPorts on 80/443 are 42% of total current guard bandwidth. - Guards on IPv6 are 20% of total current guard bandwidth. - Guards both on 80/443 and on IPv6 are 8% of total current guard bandwidth.
I include the script in the end of my email.
Thinking about this, I wonder if we should have a minimum filtered sample size as was suggested in previous emails. So let's say we have MINIMUM_FILTERED_SAMPLE_SIZE set to 5, the user has ClientPreferIPv6 set, but their sampled set has only one IPv6 guard. In this case, the user will keep on sampling till they get four additional IPv6 guards.
With the above probabilities, everytime you sample a guard you have 20% probability for it to be an IPv6 guard. To get 4 of them, you will first need to sample a lot of guards (like 20 or so, but some binomial distribution magic is required to get the exact probabilities) and add them to your sampled guard set. I wonder if that's worth it.
Maybe if we have a single guard that satisfies our filters in our sampled guard list, we should use that guard instead of sampling for more? I don't exactly see value in sampling and exposing ourselves to additional guards if we have one that we like (and we might have even connected to in the past).
---
Symmetrically, I'm also not sure what's the right thing to do if we have zero guards that satisfy our filters in our sampled guard set. Should we start sampling randomly till we hit a guard that satisfies us, or should we sample directly from the correct set (e.g. only from the set of IPv6 guards). I'm still not sure about this.
---
from stem.descriptor import parse_file
def main(): total_guard_bw = 0 dirguard_bw = 0 antifa_bw = 0 ipv6_bw = 0 super_antifa_bw = 0
for desc in parse_file('/home/user/.tor/cached-microdesc-consensus2'): is_antifa = False
if "Guard" not in desc.flags: continue
total_guard_bw += desc.bandwidth
if "V2Dir" in desc.flags: dirguard_bw += desc.bandwidth
if desc.or_port in (80, 443): #print('Found guard %s (%s)' % (desc.nickname, desc.or_port)) antifa_bw += desc.bandwidth is_antifa = True
if desc.or_addresses: if (desc.or_addresses[0][2] == True): ipv6_bw += desc.bandwidth if (is_antifa): super_antifa_bw += desc.bandwidth
print "Dirguard percentage = %f" % (dirguard_bw / float(total_guard_bw)) print "Antifa (80/443) percentage = %f" % (antifa_bw / float(total_guard_bw)) print "IPv6 percentage = %f" % (ipv6_bw / float(total_guard_bw)) print "Antifa + IPv6 percentage = %f" % (super_antifa_bw / float(total_guard_bw))
if __name__ == '__main__': main()
- Directory guards are 85% of total current guard bandwidth. - Guards with ORPorts on 80/443 are 42% of total current guard bandwidth. - Guards on IPv6 are 20% of total current guard bandwidth. - Guards both on 80/443 and on IPv6 are 8% of total current guard bandwidth.
Useful info, thanks. This tells us it's very unlikely that someone will not have any dirguards, or dystopic guards or ipv6 guards. And if we make the sample set size large enough, something like 50 guards - we are very likely to have all capabilities.
With the above probabilities, everytime you sample a guard you have 20% probability for it to be an IPv6 guard. To get 4 of them, you will first need to sample a lot of guards (like 20 or so, but some binomial distribution magic is required to get the exact probabilities) and add them to your sampled guard set. I wonder if that's worth it.
Maybe, maybe not. I don't think the problem of sampling until we have enough to match the filter is a big problem actually.
Maybe if we have a single guard that satisfies our filters in our sampled guard list, we should use that guard instead of sampling for more? I don't exactly see value in sampling and exposing ourselves to additional guards if we have one that we like (and we might have even connected to in the past).
True, if we have one and that works - all fine with me. The problem is that IF that guard doesn't work, we have no fallback mechanism to sample more guards, so we will never correct for that circuit building.
Symmetrically, I'm also not sure what's the right thing to do if we have zero guards that satisfy our filters in our sampled guard set. Should we start sampling randomly till we hit a guard that satisfies us, or should we sample directly from the correct set (e.g. only from the set of IPv6 guards). I'm still not sure about this.
I'm much more in favor of sampling in general, rather than trying to hit the specific thing. If we sample in general, we will not skew the sampled set at least.
Cheers