Prop259 simulator and results

George Kadianakis

17 Feb 2016 17 Feb '16

1:29 p.m.

Hello there, I'm not sure what kind of statistics we get out of the current guard simulator. In general, we are interested in security and performance. For security we are trying to minimize our exposure to network. For performance, we want to minimize our downtime when our current guard becomes unreachable or after our network comes back up. Here are some concrete statistics that we could gather in the simulator: Security statistics: - Number of unique guards we connected to during the course of the simulation. - Time spent connected to lower priority guards while a primary guard was online. - Time spent connected to lower priority guards while a higher priority guard was online and the network was up. Performance statistics: - Time spent cycling through guards. - Time spent cycling through guards while network is up. - Time spent on dystopic mode. - Time spent on dystopic mode while the network was utopic. Is it possible to collect those statistics? I'm curious to learn how the current guard algorithm compares to the new prop259 on those aspects. What other stats are important here you think?

Show replies by date

Reinaldo Junior

17 Feb 17 Feb

7:16 p.m.

imentinOn Wed, Feb 17, 2016 at 8:29 AM, George Kadianakis < desnacked@riseup.net> wrote:

...

Hello there,

I'm not sure what kind of statistics we get out of the current guard simulator.

The simulation creates a network with 1000 relays (all guards) with 96% of reliability, and using simulated time: - every 20 seconds: creates a new circuit each 20 seconds - every 2 minutes: updates node connectivity based on its reliability - every 20 minutes: removes and add new relays to the network By default, we recreate the client (OP) every 2 minutes (which makes it bootstrap, and so on). We can configure to simulate a long lived client, and in this case it fetches a new consensus every hour. We're also able to run this simulation in multiple network scenarios: fascist firewall, flaky network, evil network, sniper network, down network, and a scenario that switches between these networks). See --help and [1] for explanation of the terms. Each simulation runs for 30 hours (in simulated time), for a total of 5400 circuits. The time is discrete with increments of 20 seconds. Everything in the simulation happens with no cost to simulated time. We are experimenting to add some time cost to connections (2 seconds for successful, and 4 to failures) just to have some feeling of how it would impact on the algorithms. We currently have the following metrics: - success rate - avg bandwidth capacity - exposure to guards (how many different guards we connected to) over time (after hour 1, 15, and 30). - number of guards we tried until the first successful circuit - time until the first successful circuit is built A successful circuit is one which we succeeded to find a guard using the algorithm AND we succeeded to connect to it. In general, we are interested in security and performance. For security we

...

are trying to minimize our exposure to network. For performance, we want to minimize our downtime when our current guard becomes unreachable or after our network comes back up.

Here are some concrete statistics that we could gather in the simulator:

Security statistics: - Number of unique guards we connected to during the course of the simulation.

We have this as "exposure after 30 hours".

...

- Time spent connected to lower priority guards while a primary guard was online. - Time spent connected to lower priority guards while a higher priority guard was online and the network was up.

We don't have these. And also I'm not sure about how we should detect network conditions: we can try to guess from the algorithm or look at which network scenario we are using at the moment.

...

Performance statistics: - Time spent cycling through guards. - Time spent cycling through guards while network is up.

Since time is stopped while we're choosing guards we have to come with a different metric for this. And it also requires detecting the network time. - Time spent on dystopic mode.

...

- Time spent on dystopic mode while the network was utopic.

These should be easy as long as we have defined how to detect the network type.

...

Is it possible to collect those statistics? I'm curious to learn how the current guard algorithm compares to the new prop259 on those aspects.

We have tooling to generate graphs with success rate and exposure taken from a round of ~500 simulations. I can send them to you when they finish running ;)

...

What other stats are important here you think?

We have discussed about counting how many network connections we make over time. For now, we have been comparing success and exposure. I guess we can add these stats, we just need to come up with an approach to determine the network condition. All the code is in https://github.com/twstrike/tor_guardsim (branch develop). 1 - doc/stuff-to-test.txt -- *Reinaldo de Souza Jr* | Software Developer *Thought*Works | www.thoughtworks.com GPG: EF84 6530 67A5 1559 5554 D8B2 954A 6BEF AF74 ACD7

Reinaldo Junior

10:07 p.m.

On Wed, Feb 17, 2016 at 2:16 PM, Reinaldo Junior <rjunior@thoughtworks.com> wrote:

...

[...]

By default, we recreate the client (OP) every 2 minutes (which makes it bootstrap, and so on). We can configure to simulate a long lived client, and in this case it fetches a new consensus every hour.

Actually, it's the other way around: By default we use a long-lived client which fetches a new consensus every hour, but we can simulate short-lived client to compare the bootstrap behavior of both algorithms. -- *Reinaldo de Souza Jr* | Software Developer *Thought*Works | www.thoughtworks.com GPG: EF84 6530 67A5 1559 5554 D8B2 954A 6BEF AF74 ACD7

George Kadianakis

11:10 p.m.

Reinaldo Junior <rjunior@thoughtworks.com> writes:

...

- number of guards we tried until the first successful circuit - time until the first successful circuit is built

A successful circuit is one which we succeeded to find a guard using the algorithm AND we succeeded to connect to it.

In general, we are interested in security and performance. For security we

...
are trying to minimize our exposure to network. For performance, we want to minimize our downtime when our current guard becomes unreachable or after our network comes back up.

Here are some concrete statistics that we could gather in the simulator:

Security statistics: - Number of unique guards we connected to during the course of the simulation.

We have this as "exposure after 30 hours".

...
- Time spent connected to lower priority guards while a primary guard was online. - Time spent connected to lower priority guards while a higher priority guard was online and the network was up.

We don't have these. And also I'm not sure about how we should detect network conditions: we can try to guess from the algorithm or look at which network scenario we are using at the moment.

With the above statistics I'm trying to find out how well our guard picking algorithms cope under unreliable networks. For example: - Alice goes to a coffee shop with a FascistFirewall. Her guard (position #2 on her guardlist) is on port 9001 so it stops working. Tor performs the guard picking algorithm and finds a new guard on port 80 that works but is position #6 on her guardlist. Now imagine that the primary guard in position #1 was on port 80 so it _could_ actually work behind a FascistFirewall, but because the guard picking algorithm goes downwards the list Alice ended up with guard #6. This is suboptimal behavior. An optimal guard algorithm would switch directly to the guard in position #1. - Alice travels a lot and over the day she works on her laptop without Internet 70% of the time. Even while she is offline, Tor is active (because who shuts down the system tor), so Tor keeps on cycling through guards continuously. At some point she reaches a coffee shop and goes online, and successfuly makes a circuit to a guard G. Depending on the guard picking algorithm, this guard G might be #1 or #6 or #12 position on the guard list. If it's one of the two latter cases, a good guard algorithm will realize that it did not connect to a high-priority guard, and would somehow go back to #1 (for example, prop259 does the primary guards 3 seconds trigger). By gathering the two statistics suggested above we learn how well a guard picking algorithm can cope under such scenarios. Do you have any scenarios like the above in guardsim? I think particularly the travelling Alice scenario will be very useful for stress testing algorithms. Note that it's different from the FlakyNetwork scenario in tornet.py, because FlakyNetwork just returns "circuit failed" based on some independent probability for each circuit, whereas in the TravellingAlice scenario we want to always return "circuit failed" _for some time_ before we start returning "circuit success" again. The way you should actually "detect" these network conditions on your codebase actually depends on the architecture of the simulator. I might have some time tomorrow to take a look at the code and suggest some approaches. Would that be helpful? I'm only going to touch this subject for now because of lack of time. Will reply in more length tomorrow....

...

...
Performance statistics: - Time spent cycling through guards. - Time spent cycling through guards while network is up.

Since time is stopped while we're choosing guards we have to come with a different metric for this. And it also requires detecting the network time.

- Time spent on dystopic mode.

...
- Time spent on dystopic mode while the network was utopic.

These should be easy as long as we have defined how to detect the network type.

...
Is it possible to collect those statistics? I'm curious to learn how the current guard algorithm compares to the new prop259 on those aspects.

We have tooling to generate graphs with success rate and exposure taken from a round of ~500 simulations. I can send them to you when they finish running ;)

...
What other stats are important here you think?

We have discussed about counting how many network connections we make over time. For now, we have been comparing success and exposure.

I guess we can add these stats, we just need to come up with an approach to determine the network condition.

All the code is in https://github.com/twstrike/tor_guardsim (branch develop).

1 - doc/stuff-to-test.txt

George Kadianakis

18 Feb 18 Feb

11:10 a.m.

Reinaldo Junior <rjunior@thoughtworks.com> writes:

...

imentinOn Wed, Feb 17, 2016 at 8:29 AM, George Kadianakis < desnacked@riseup.net> wrote:

...
Hello there,

I'm not sure what kind of statistics we get out of the current guard simulator.

The simulation creates a network with 1000 relays (all guards) with 96% of reliability, and using simulated time:

- every 20 seconds: creates a new circuit each 20 seconds - every 2 minutes: updates node connectivity based on its reliability - every 20 minutes: removes and add new relays to the network

By default, we recreate the client (OP) every 2 minutes (which makes it bootstrap, and so on). We can configure to simulate a long lived client, and in this case it fetches a new consensus every hour.

We're also able to run this simulation in multiple network scenarios: fascist firewall, flaky network, evil network, sniper network, down network, and a scenario that switches between these networks). See --help and [1] for explanation of the terms.

Each simulation runs for 30 hours (in simulated time), for a total of 5400 circuits. The time is discrete with increments of 20 seconds. Everything in the simulation happens with no cost to simulated time. We are experimenting to add some time cost to connections (2 seconds for successful, and 4 to failures) just to have some feeling of how it would impact on the algorithms.

We currently have the following metrics:

- success rate - avg bandwidth capacity - exposure to guards (how many different guards we connected to) over time (after hour 1, 15, and 30). - number of guards we tried until the first successful circuit - time until the first successful circuit is built

A successful circuit is one which we succeeded to find a guard using the algorithm AND we succeeded to connect to it.

In general, we are interested in security and performance. For security we

...
are trying to minimize our exposure to network. For performance, we want to minimize our downtime when our current guard becomes unreachable or after our network comes back up.

Here are some concrete statistics that we could gather in the simulator:

Security statistics: - Number of unique guards we connected to during the course of the simulation.

We have this as "exposure after 30 hours".

Ah great!

...

...
- Time spent connected to lower priority guards while a primary guard was online. - Time spent connected to lower priority guards while a higher priority guard was online and the network was up.

We don't have these. And also I'm not sure about how we should detect network conditions: we can try to guess from the algorithm or look at which network scenario we are using at the moment.

...
Performance statistics: - Time spent cycling through guards. - Time spent cycling through guards while network is up.

Since time is stopped while we're choosing guards we have to come with a different metric for this. And it also requires detecting the network time.

Hm, what do you mean by "detecting the network time"? I think the approach you mentioned above where you add some time cost to connections (2 seconds for successful and 4 for failures), should work for quantifying the time here, right? FWIW, I have no idea if 2 and 4 seconds are good numbers. They could be. To make sure, you could try launching Tor and actually measure how much time it spends on dead guards and how much time it spends on alive guards.

Reinaldo Junior

6:30 p.m.

On Thu, Feb 18, 2016 at 6:10 AM, George Kadianakis <desnacked@riseup.net> wrote:

...

...
Since time is stopped while we're choosing guards we have to come with a different metric for this. And it also requires detecting the network

time.

...
Hm, what do you mean by "detecting the network time"?

Should be "detecting the network type" (for knowing when the network is up). I blame muscle-memory. I think the approach you mentioned above where you add some time cost to

...

connections (2 seconds for successful and 4 for failures), should work for quantifying the time here, right?

I guess so. At least a ballpark estimate.

...

FWIW, I have no idea if 2 and 4 seconds are good numbers. They could be. To make sure, you could try launching Tor and actually measure how much time it spends on dead guards and how much time it spends on alive guards.

We neither. We just wanted to make failures more expensive than successes. -- *Reinaldo de Souza Jr* | Software Developer *Thought*Works | www.thoughtworks.com GPG: EF84 6530 67A5 1559 5554 D8B2 954A 6BEF AF74 ACD7

Reinaldo Junior

22 Feb 22 Feb

2:02 p.m.

On Wed, Feb 17, 2016 at 2:16 PM, Reinaldo Junior <rjunior@thoughtworks.com> wrote:

...

[...] We have tooling to generate graphs with success rate and exposure taken from a round of ~500 simulations. I can send them to you when they finish running ;)

There's a comparison of both simulations here: https://github.com/twstrike/tor_guardsim/issues/1 (A github issue was the easiest place to host all the graphs by dragging and dropping) -- *Reinaldo de Souza Jr* | Software Developer *Thought*Works | www.thoughtworks.com GPG: EF84 6530 67A5 1559 5554 D8B2 954A 6BEF AF74 ACD7

George Kadianakis

16 Mar 16 Mar

11:06 a.m.

Reinaldo Junior <rjunior@thoughtworks.com> writes:

...

[ text/plain ] On Wed, Feb 17, 2016 at 2:16 PM, Reinaldo Junior <rjunior@thoughtworks.com> wrote:

...
[...] We have tooling to generate graphs with success rate and exposure taken from a round of ~500 simulations. I can send them to you when they finish running ;)

There's a comparison of both simulations here: https://github.com/twstrike/tor_guardsim/issues/1 (A github issue was the easiest place to host all the graphs by dragging and dropping)

Hello Reinaldo et al., I notice that things are happening on github, but I'm a bit lost. Do we have any new graphs of the simulations since the dev meeting? :) Cheers!

3425

Age (days ago)

3453

Last active (days ago)

List overview

Download

7 comments

2 participants

participants (2)

George Kadianakis
Reinaldo Junior