(Resent because I left off tor-dev.)
On 29 Apr 2016, at 16:56, Xiaofan Li xli2@andrew.cmu.edu wrote:
Tim: Sorry for not being specific enough on my questions. I'll try to give more detailed questions later instead of higher-level problems.
Thanks! It will help me help you.
Regarding the frequency of my emails, I apologize for the long intervals but the reason is that I'm not full-time on this project and a lot of times I had exams and I can only work on the QUIC TOR project for a couple of days every week. Fortunately, I'm not nearly done with all my finals for this semester and I can put more time into this project from now on.
That's ok, I don't mind long gaps, but issues are easier to diagnose and fix if the bug reports are small and specific.
Right now, I have two specific questions:
- We just switched to testing on EmuLab (each node is a standalone machine) from chutney. After the switch, a particular bug on chutney disappeared: on chutney, some nodes used to crash mysteriously with no log outputs (all the log simply stops, with no stack trace or anything). This bug only occurs when there's existing cache (the first run after chutney configure is fine). After porting onto EmuLab (a testing framework), using almost identical torrc file, this bug disappeared and everything runs just fine for now. Right now we are ignoring this bug. Have you seen similar issues on chutney?
I generally use a fresh directory with chutney every time, so I haven't run into this bug. Logged as https://trac.torproject.org/projects/tor/ticket/18932 I'd encourage you to use a fresh config with chutney every time until it's fixed.
- The circuit building process is taking too long and many of them expires. We have 4 relays where 2 of which are also authorities. From the logs, I'm seeing a lot of the following lines: • circuit_expire_building(): Abandoning circ XX XXXX:XX:12345 (state 0,0: doing handshakes, purpose 5, len 3) • router_choose_random_node(): We found 3 running nodes.
This should be 4 based on your description if your setup. One of your nodes isn't launching, or doesn't have the right combination of flags to be part of the path. If it only affects 1 node, and you have 2 pairs of 2 identically configured nodes, this is likely a problem with your configuration.
Are you using the 'TestingDirAuthVoteGuard *' option in your authorities' configs?
router_choose_random_node(): We removed 1 excludednodes, leaving 2 nodes.
This is normal. A relay refuses to build a circuit through itself.
router_choose_random_node(): We removed 2 excludedsmartlist, leaving 0 nodes .
This is normal. A relay refuses to build a circuit through the relays already in the path.
The first line happens when we have connected to the first node and waiting for a response from the second or sometimes the third relay. And the second log happens when we are trying to choose the path to use for a circuit. What could I do to increase the number of available nodes? Should I increase the frequency of reachability tests?
You have too few nodes in your network. You need at least 8 for reliable path building from each node to each other node: * 1 is excluded because it's the current node, * 4 are needed to support cannibalised paths, which can be 4 hops long, * 3 are needed to support testing circuits, which don't use guards.
In 0.2.6.2-alpha (commit 22a1e9cac), I added a fix for this last issue when TestingTorNetwork is 1. Given the line number you're using below, it looks like you're using a version of tor without this fix. If you're using a private network, I'm guessing you have the default 3 entry guards.
So please use at least 8 working nodes in your network.
After looking at the code, there's a circuitbuild.c line 2172 describing why some nodes are excluded, which I don't quite understand. Specifically, the comment says: "XXXX025 use the using_as_guard flag to accomplish this." where can I find more information on this XXXX025 issue (committed here)? Why are these routers being excluded?
This looks like a note to change the implementation in a future version (it didn't happen in 0.2.5). Although it's could be fixed and be more accurate, it's unlikely to be the source of the issues you're having.
I also can't see how the issue you describes relates to the commit you linked to: 62fb209d837f3f5510075ef8bdb6e231ebdfa9bc. If it still concerns you, can you check you have the right commit, or explain further?
Please let me know if you want more specific information on those issues.
It will help if you say which version of tor your code is based on, otherwise I have to guess from the line numbers.
Tim
Tim Wilson-Brown (teor)
teor2345 at gmail dot com PGP 968F094B ricochet:ekmygaiu4rzgsk6n