Hello everyone,
Apologies for the delay, I'm a couple of days late.
This is the fourth status report of my Google Summer of Code project, which is to implement consensus diffs for Tor. My mentors - Sebastian and Nick - and myself usually hold meetings on IRC on wednesday at 16h UTC.
On the first week I finished implementing the consensus diff headers as described in the updated proposal #140 [1], including full SHA256 hashes of each of the two consensuses involved: the base one and the resulting one. I also finished the tests for that bit.
Moreover, I thought it would make sense to make sure that the diff generation process never takes too long. In theory it should always be linear in time given any two realistic consensus files, since it would navigate the router entries and take advantage of them being sorted alphabetically. But if no matching router entries are found between the two consensuses, the algorithm would run in quadratic time.
Running in quadratic time over two sets of 25K lines takes about 20 seconds on my laptop, which is pretty bad. So I made the algorithm error if 10K lines are found on both sides without any matching router entry id. Which makes perfect sense for any pair of consensuses - even if you take consensuses months or years apart, it would never occur that they shared so few router ids. And if they did, generating a diff between them would make no sense.
Unfortunately, I wasn't able to do much else during that week since I was at RMLL running a Free Your Android workshop for the whole week. And just when I got home on Saturday, my laptop broke down, which kept me busy for a few days as well.
Nevertheless I am still on schedule. What I must do now is start merging my code into a Tor feature branch, so that I can write the bits of code that will glue everything together. To do that, I've begun rewriting my git history so that it is simpler to understand and read.
If you have a look at my new repo on github [2], you'll notice that the number of commits went down from 122 to 95, and quite a few were rewritten as well. Open to suggestions about how to further improve it.
By the time the next report is due - August the 1st - I should have completed the integration of my code into Tor. Then comes writing some more tests and testing Tor with the new feature in place.
[1] https://gitweb.torproject.org/torspec.git/blob_plain/refs/heads/master:/prop... [2] https://github.com/mvdan/tor/branches