==Guardiness: Yet another external dirauth script==
====Introduction====
One well-known problem with Tor relays, is that Guards will suffer a big loss of traffic as soon as they get the Guard flag. This happens because clients pick guards every 2-3 months, so young guards will not get picked by old clients and mainly attract new clients. This is documented in 'phase three' of Roger's blog post: https://blog.torproject.org/blog/lifecycle-of-a-new-relay
The problem gets even worse if we extend the guard lifetime to 8-9 months.
The plan to solve this problem is to make client load balancing a bit smarter by priotizing guards that suffer this traffic loss as middle relays.
The reason I'm sending this email is because this feature is by far the trickiest part of prop236 (guard node security) and I wanted to inform all dirauths of our plan and ask for feedback on the deployment procedure.
====How guardiness works====
Authorities calculate for each relay how many consensuses it has been a guard for the past 2-3 months, and then they note that fraction down in the consensus.
Then clients parse the consensus and if they see a guard that has been a guard for 55% of the past consensuses, they will consider that relay as 55% guard and 45% non-guard (that's 100% - 55%).
You can find more information at: https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/236-single-gu...
The idea was that the guardiness script will be an external script that is run by Tor in a similar fashion to the bandwidth auth scripts. We chose that because we could write the script in a high-level language and because it could be modular and we could change the algorithm in the future if we wanted. Unfortunately, it seems that external scripts in dirauths are a PITA to maintain as can be seen by the lack of bwauth operators.
====How the guardiness script works====
The guardiness script, is supposed to parse 2-3 months worth of consensuses (but should also be to do the same for 9 months worth of consensuses) and calculate the guard fraction of each guard, save it to a file, and have the dirauth read it to update its routerstatuses.
One problem I encountered from early on, is that stem takes about 30mins to parse 3 months of consesuses (~2000 consensuses). Since this script should ideally be run every hour before each authority votes, such long parsing time is unacceptable.
I mentioned this problem at https://trac.torproject.org/projects/tor/ticket/9321#comment:19 and stated a few possible solutions.
I received some feedback from Nick, and the solution I decided to take in the end is to have another script that is called first and summarizes consensuses to summary files. Summary files are then saved to disk, and parsed by the guardiness script to produce an output file that is read by dirauths.
Summary files are designed to be quick to parse (even with Python) and contain all the necessary information for guardiness. For example, parsing 2000 summary files in my laptop takes about 10 seconds.
FWIW, the guardiness scripts are ready for review and can be found here: https://gitweb.torproject.org/user/asn/hax.git/shortlog/refs/heads/guardines...
====How the guardiness script will be deployed====
The idea is that dirauths will add another script to their crontab that is called every hour (before or after the bwauth scripts).
The script first calls the summarizer script, which goes to the consensus/ directory and summarizes all consensuses it finds and puts them in the summary/ directory. The summarizer script then deletes all the consensuses that got summarized.
Then the script calls the the guardiness script, which goes to the summary/ directory, parses all summary files it finds, and outputs a guardiness output file that gets parsed by the dirauth prior to voting.
That should be all. Easy, eh? :)
Now I will start a FAQ section where I state my doubts and fears.
====FAQ====
- Q: Where do dirauths find all those old consensuses?
There are various ways for dirauths to populate their consensus/ directory. They could fetch consensuses from metrics, or they could add a cron job that copies cached-consensus to a directory every hour.
However, I think the cleanest solution is to use Daniel MartÃ's upcoming consensus diff changes. Daniel will add a torrc option that allows Tor to save consensuses to a directory. My idea was to get dirauths to use Daniel's code to populate their consensus/ directory for two or three months. And then, after two or three months enable the guardiness scripts.
To make sure that this is indeed the best approach, I need to learn from Nick when he plans to merge Daniel's code to Tor.
- Q: How does guardiness look like in the consensus?
Here is how a guard with guardiness (GuardFraction) 10% looks like in the consensus:
r test006r HyS1DRHzEojbQVPZ1B3zAHc/HY0 9St4yWfV4huz5V86mt24HL3Yi2I 2014-09-06 13:44:28 127.0.0.1 5006 7006 s Exit Fast Guard HSDir Running Stable V2Dir Valid v Tor 0.2.6.0-alpha-dev w Bandwidth=111 Unmeasured=1 GuardFraction=10
- Q: What are you afraid of?
I'm mainly afraid of misconfiguration problems. This guardiness system is a bit complex and I'm not expecting dirauths to learn how to use it and debug it, so it should work easily and well...
Here are some specific issues:
-- File management
For example, I'm afraid of the file management mess that summary files cause. We need to make sure that we don't leave old consensuses/summary files rot in the filesystem. Or that we don't summarize the same consensuses over and over again. To do that, I added some optional cleanup switches to both scripts:
Specifically, the summarizer script can delete consensus files that already got summarized and can also delete consensus files older than 3 months (or N months). Similarly, the guardiness.py script can delete summary files older than 3 months (or N months).
The idea is that every time the cron job triggers, both scripts will auto-delete the oldest summary/consensus file, keeping in disk only the useful files.
-- Incomplete consensus data set
I'm afraid that a directory authority might not have a properly populated consensus directory and hence advertise wrong guard fractions. For example, maybe it only has 10 consensuses in its consensus directory instead of 1900 consensuses. Since the authorities only state the guardiness percentage in the consensus, it's not possible to learn how many consensuses were in their dataset. Maybe we need to add a "guardiness-consensus-parsed" in their votes, to easier debug such issues?
Also, 3 months worth of consensuses is 2160 consensuses. Because dirauths sometimes misbehave, it's certain that not all 2160 consensuses will have been issued and that's normal. But how do we understand if dirauths have a sufficiently good consensus data set? Is 2000 out of 2160 consensuses an OK data set? What about 1000 out of 2160 consensuses?
Furthermore, we need to make sure that dirauths don't consider old consensuses in their GuardFraction calculations. To achieve this, both scripts have a mandatory switch that allows operators to specify the maximum consensus age that is acceptable. So for example, if you call the summarizer script with 3 months of consensus age, it will not parse consensuses older than 3 months. Furthermore, there is a CLI switch that allows the scripts to delete expired consensuses.
- Q: Why do you slow stem instead of parsing consensuses with Python on your own?
This is another part where I might have taken the wrong design decision, but I decided to not get into the consensus parsing business and just rely on stem.
This is also because I was hoping to use stem to verify consensus signatures. However, now that we might use Daniel's patch to populate our consensus database, maybe we don't need to treat consensuses as untrusted anymore.
If you think that I should try to parse the consensuses on my own, please tell me and I will give it a try. Maybe it will be fast. Definitely not as fast as summary files, but maybe we can parse 3 months worth of consesuses in 15 to 40 seconds.
- Q: Why do you mess with multiple summary files and you don't just have *one* summary file?
Because of the rolling nature of guardiness (we always want to consider the past 3 months), every hour we need to _discard_ the oldest observations (the consensus from 3 months ago) and start considering the newest consensus.
Because we need to discard that oldest consensus, it's hard to keep information about each consensus in a single summary file. And that's why I chose to have a summary file for each consensus. Maybe it's the wrong decision though...
- Q: What's up with the name "guardiness"?
It's quite terrible I know but it was the name that we used from quite early on about this project.
I think before finalizing this task I'm going to rename everything to 'GuardFraction' since it's more self-explanatory. I'm also considering names like "GuardTrafficLoadBalancer" etc.
On 16 Sep 2014, at 16:15, George Kadianakis desnacked@riseup.net wrote:
====How guardiness works==== The idea was that the guardiness script will be an external script that is run by Tor in a similar fashion to the bandwidth auth scripts. We chose that because we could write the script in a high-level language and because it could be modular and we could change the algorithm in the future if we wanted. Unfortunately, it seems that external scripts in dirauths are a PITA to maintain as can be seen by the lack of bwauth operators.
The problem isn't so much that this is an external script, the problem is that there are never dedicated maintainers for these things (tho it being an external script is additionally problematic). If they are in Tor proper, we have someone who cares about them when new stuff gets introduced, and it gets updated with the rest of Tor, etc.
Summary files are designed to be quick to parse (even with Python) and contain all the necessary information for guardiness. For example, parsing 2000 summary files in my laptop takes about 10 seconds.
Does this scale linearly? 9 months would be ~6500 files.
FWIW, the guardiness scripts are ready for review and can be found here: https://gitweb.torproject.org/user/asn/hax.git/shortlog/refs/heads/guardines...
====How the guardiness script will be deployed====
The idea is that dirauths will add another script to their crontab that is called every hour (before or after the bwauth scripts).
Only 4/9 have such scripts, if at all - it is possible to run a bwauth on a different host, and scp the file over. I don't know if any of the dirauth ops actually do this currently.
The script first calls the summarizer script, which goes to the consensus/ directory and summarizes all consensuses it finds and puts them in the summary/ directory. The summarizer script then deletes all the consensuses that got summarized.
You must not delete files which you did not create. It's not cool to delete consensuses which tor decided to put somewhere if you're not tor. This is relevant to the plan to use mvdan's script.
Then the script calls the the guardiness script, which goes to the summary/ directory, parses all summary files it finds, and outputs a guardiness output file that gets parsed by the dirauth prior to voting.
That should be all. Easy, eh? :)
What are the failure modes? Are there version strings included, does Tor notice if the guardiness file is outdated, etc? What happens when different dirauths use different versions of the guardiness file generation script?
Cheers Sebastian
Sebastian Hahn sebastian@torproject.org writes:
On 16 Sep 2014, at 16:15, George Kadianakis desnacked@riseup.net wrote:
====How guardiness works==== The idea was that the guardiness script will be an external script that is run by Tor in a similar fashion to the bandwidth auth scripts. We chose that because we could write the script in a high-level language and because it could be modular and we could change the algorithm in the future if we wanted. Unfortunately, it seems that external scripts in dirauths are a PITA to maintain as can be seen by the lack of bwauth operators.
The problem isn't so much that this is an external script, the problem is that there are never dedicated maintainers for these things (tho it being an external script is additionally problematic). If they are in Tor proper, we have someone who cares about them when new stuff gets introduced, and it gets updated with the rest of Tor, etc.
I understand and I can maintain the guardiness script.
However, I'm also hoping for other people to review the script before deployment (#13125). It's a pretty small script so it shouldn't be too wrong.
Summary files are designed to be quick to parse (even with Python) and contain all the necessary information for guardiness. For example, parsing 2000 summary files in my laptop takes about 10 seconds.
Does this scale linearly? 9 months would be ~6500 files.
Yes, it should scale linearly. 6500 files should take about 30 to 40 seconds.
(Each consensus parsing is an independent event.)
FWIW, the guardiness scripts are ready for review and can be found here: https://gitweb.torproject.org/user/asn/hax.git/shortlog/refs/heads/guardines...
====How the guardiness script will be deployed====
The idea is that dirauths will add another script to their crontab that is called every hour (before or after the bwauth scripts).
Only 4/9 have such scripts, if at all - it is possible to run a bwauth on a different host, and scp the file over. I don't know if any of the dirauth ops actually do this currently.
This should also be possible with the guardiness stuff, as long as you can arrange fresh consensuses to arrive on the second host.
The script first calls the summarizer script, which goes to the consensus/ directory and summarizes all consensuses it finds and puts them in the summary/ directory. The summarizer script then deletes all the consensuses that got summarized.
You must not delete files which you did not create. It's not cool to delete consensuses which tor decided to put somewhere if you're not tor. This is relevant to the plan to use mvdan's script.
Yes, you are absolutely right on that. FWIW, the --delete-summarized switch of the summarizer script is disabled by default.
If we go ahead with using Daniel's tor feature, we should _not_ delete those consensuses.
Then the script calls the the guardiness script, which goes to the summary/ directory, parses all summary files it finds, and outputs a guardiness output file that gets parsed by the dirauth prior to voting.
That should be all. Easy, eh? :)
What are the failure modes? Are there version strings included, does Tor notice if the guardiness file is outdated, etc? What happens when different dirauths use different versions of the guardiness file generation script?
There are various failure modes I think.
Failure modes include dirauths considering 5 consensuses when they should be considering 3000 of them. Or dirauths considering 12000 consensuses when they should be considering 3000.
As you pointed out, in the future if the guardiness algorithm changes, failure modes could also include dirauths using different guardiness versions.
Based on your comment, I think it might be a good idea to publish the guardiness script version somewhere in the dirauth votes. Also the number of consensuses considered, and the maximum consesus age considered.
It might even be smart to make a consensus health module that ensures that guardiness versions are the same in all dirauths, and they all derive approximately the same guardiness for all relays (contrary to bwauths, two perfectly configured dirauths should derive the exact same guardiness value).
- Q: Why do you slow stem instead of parsing consensuses with Python on your own?
This is another part where I might have taken the wrong design decision, but I decided to not get into the consensus parsing business and just rely on stem.
This is also because I was hoping to use stem to verify consensus signatures. However, now that we might use Daniel's patch to populate our consensus database, maybe we don't need to treat consensuses as untrusted anymore.
If you think that I should try to parse the consensuses on my own, please tell me and I will give it a try. Maybe it will be fast. Definitely not as fast as summary files, but maybe we can parse 3 months worth of consesuses in 15 to 40 seconds.
I'm not sure why you think it was the wrong choice. If Stem isn't providing you the performance you want then seems like speeding it up is the right option rather than writing your own parser. That is, of course, unless you're looking for something highly specialized in which case have fun.
Nick improved parsing performance by around 30% in response to this...
https://trac.torproject.org/projects/tor/ticket/12859
Between that and turning off validation I'd be a little curious where the time is going if it's still too slow for you.
Damian Johnson atagar@torproject.org writes:
- Q: Why do you slow stem instead of parsing consensuses with Python on your own?
This is another part where I might have taken the wrong design decision, but I decided to not get into the consensus parsing business and just rely on stem.
This is also because I was hoping to use stem to verify consensus signatures. However, now that we might use Daniel's patch to populate our consensus database, maybe we don't need to treat consensuses as untrusted anymore.
If you think that I should try to parse the consensuses on my own, please tell me and I will give it a try. Maybe it will be fast. Definitely not as fast as summary files, but maybe we can parse 3 months worth of consesuses in 15 to 40 seconds.
I'm not sure why you think it was the wrong choice. If Stem isn't providing you the performance you want then seems like speeding it up is the right option rather than writing your own parser. That is, of course, unless you're looking for something highly specialized in which case have fun.
Nick improved parsing performance by around 30% in response to this...
https://trac.torproject.org/projects/tor/ticket/12859
Between that and turning off validation I'd be a little curious where the time is going if it's still too slow for you.
Indeed, our use case is quite specialized. The only thing the guardiness script cares about is whether relays have the guard flag. No other consensus parsing actually needs to happen.
However, you have a point that stem performance could be improved and I will look a bit more into stem parsing and see what I can do.
That said, currently stem parses (with validation enabled) 24 consensuses in 25 seconds. That's one consensus per second. If we are aiming for 7000 consenuses in less than a minute, we need to parse 120~ consensuses a second. That will probably require quite some optimization in stem, I think.
George Kadianakis desnacked@riseup.net writes:
Damian Johnson atagar@torproject.org writes:
- Q: Why do you slow stem instead of parsing consensuses with Python on your own?
This is another part where I might have taken the wrong design decision, but I decided to not get into the consensus parsing business and just rely on stem.
This is also because I was hoping to use stem to verify consensus signatures. However, now that we might use Daniel's patch to populate our consensus database, maybe we don't need to treat consensuses as untrusted anymore.
If you think that I should try to parse the consensuses on my own, please tell me and I will give it a try. Maybe it will be fast. Definitely not as fast as summary files, but maybe we can parse 3 months worth of consesuses in 15 to 40 seconds.
I'm not sure why you think it was the wrong choice. If Stem isn't providing you the performance you want then seems like speeding it up is the right option rather than writing your own parser. That is, of course, unless you're looking for something highly specialized in which case have fun.
Nick improved parsing performance by around 30% in response to this...
https://trac.torproject.org/projects/tor/ticket/12859
Between that and turning off validation I'd be a little curious where the time is going if it's still too slow for you.
Indeed, our use case is quite specialized. The only thing the guardiness script cares about is whether relays have the guard flag. No other consensus parsing actually needs to happen.
However, you have a point that stem performance could be improved and I will look a bit more into stem parsing and see what I can do.
That said, currently stem parses (with validation enabled) 24 consensuses in 25 seconds. That's one consensus per second. If we are aiming for 7000 consenuses in less than a minute, we need to parse 120~ consensuses a second. That will probably require quite some optimization in stem, I think.
FWIW, turning off validation helps a bit but not too much. For example, my laptop parsing 24 consensuses with validation takes 25 seconds, and if we disable validation it takes 22 seconds.
This means that to reach the rate of 120~ consensuses a second with parse_file(), we need to make it 100 times faster or so. This sounds much harder than 30% performance increase :/
FWIW, turning off validation helps a bit but not too much. For example, my laptop parsing 24 consensuses with validation takes 25 seconds, and if we disable validation it takes 22 seconds.
This means that to reach the rate of 120~ consensuses a second with parse_file(), we need to make it 100 times faster or so. This sounds much harder than 30% performance increase :/
Yup. I'm growingly kinda in agreement with your earlier sentiment that your use case (bulk processing hundreds of consensuses per second) is a bit specialized. I've occasionally thought that if I was to write Stem's parsers again I'd make them lazy load attributes. That would give you what you're after here.
But as they're written nowadays they're eager, and parsing all the consensus attributes takes time. So assuming you don't care about validation or having yet another thing to maintain a small shell script might serve you just as well.
Cheers! -Damian
George Kadianakis desnacked@riseup.net writes:
==Guardiness: Yet another external dirauth script==
====Introduction====
One well-known problem with Tor relays, is that Guards will suffer a big loss of traffic as soon as they get the Guard flag. This happens because clients pick guards every 2-3 months, so young guards will not get picked by old clients and mainly attract new clients. This is documented in 'phase three' of Roger's blog post: https://blog.torproject.org/blog/lifecycle-of-a-new-relay
The problem gets even worse if we extend the guard lifetime to 8-9 months.
The plan to solve this problem is to make client load balancing a bit smarter by priotizing guards that suffer this traffic loss as middle relays.
The reason I'm sending this email is because this feature is by far the trickiest part of prop236 (guard node security) and I wanted to inform all dirauths of our plan and ask for feedback on the deployment procedure.
<snip>
====How the guardiness script works====
The guardiness script, is supposed to parse 2-3 months worth of consensuses (but should also be to do the same for 9 months worth of consensuses) and calculate the guard fraction of each guard, save it to a file, and have the dirauth read it to update its routerstatuses.
One problem I encountered from early on, is that stem takes about 30mins to parse 3 months of consesuses (~2000 consensuses). Since this script should ideally be run every hour before each authority votes, such long parsing time is unacceptable.
I mentioned this problem at https://trac.torproject.org/projects/tor/ticket/9321#comment:19 and stated a few possible solutions.
I received some feedback from Nick, and the solution I decided to take in the end is to have another script that is called first and summarizes consensuses to summary files. Summary files are then saved to disk, and parsed by the guardiness script to produce an output file that is read by dirauths.
Summary files are designed to be quick to parse (even with Python) and contain all the necessary information for guardiness. For example, parsing 2000 summary files in my laptop takes about 10 seconds.
FWIW, the guardiness scripts are ready for review and can be found here: https://gitweb.torproject.org/user/asn/hax.git/shortlog/refs/heads/guardines...
FWIW, a weasel suggested to me a potentially better solution than the iffy summary files.
He suggested parsing consensuses and putting them in an sqlite3 database. Each time we have a new consensus, parse it and import it in the database. Then query the database when creating the guardiness output file.
The good thing with this approach is that you only need to parse consensuses once, instead of every hour. Also, we don't need to do file management for summary files etc. I like this approach and I will be looking into it the following days.
I think that the weasel suggested that the database should have an entry for each guard, and for each guard it should note down when that guard was observed in a consensus (the precise date is required so that we can discard expired observations).
George Kadianakis desnacked@riseup.net writes:
George Kadianakis desnacked@riseup.net writes:
==Guardiness: Yet another external dirauth script==
<sniped>
FWIW, a weasel suggested to me a potentially better solution than the iffy summary files.
He suggested parsing consensuses and putting them in an sqlite3 database. Each time we have a new consensus, parse it and import it in the database. Then query the database when creating the guardiness output file.
The good thing with this approach is that you only need to parse consensuses once, instead of every hour. Also, we don't need to do file management for summary files etc. I like this approach and I will be looking into it the following days.
I think that the weasel suggested that the database should have an entry for each guard, and for each guard it should note down when that guard was observed in a consensus (the precise date is required so that we can discard expired observations).
So, the script has been rewritten to support the above architecture. It has already received some review in: https://trac.torproject.org/projects/tor/ticket/13125
I now wanted to show you a sample cron script that will be called every hour: https://gitweb.torproject.org/user/asn/hax.git/blob/refs/heads/guardfraction...
Do you think that's a sensible script to cron? So far, I've been running it on my system for a while and it seems to work.
Some notes on the script:
a) I'm wgetting from moria instead of copying from the DataDirectory, because I don't know what kind of user permissions and setup most dirauth operators have. This will probably need to change in the future.
b) I'm keeping imported consensuses around. Maybe people don't want that but I didn't bother making a configurable shell script. If people want me to do that, I can.
c) I should probably do stuff like WGET=/usr/bin/wget, instead of calling wget/python/etc directly. I will fix this before deployment. Also, the dependency to torsocks.