[tor-relays] Problem encountered with Bandwith Authority algorithm (was : "bandwidth authority algorithm is cracked")

julien.robin28 at free.fr julien.robin28 at free.fr
Thu Jan 30 13:02:00 UTC 2014


Hi everybody,


15 days, 16 hours and 15 minutes (without any restart) have passed since my 2 involved in this problem are restarted (started at the same time).

And there is something very clear and interesting to see.
They both have restarted from nearly zero (~100kB/s) in the first days, and a problem is still completly present on 1 of the 2 identity :

-> The first one (the older one also) is stuck at less than 1MB/sec (5,510 in consensus weight) - ArachnideFR94
-> The second one is growing, growing and growing, now around 9MB/sec (and more than 75,000 in the consensus weight) - ArachnideFR94v2

It can be very clearly seen by observing graph on Tor Atlas.
(But there is no more bandwith earthquake like I had on december.)

On this machine, the first daemon is launched by /etc/init.d, while the second one is manually lauched via another non-admin user.

No differences, same nice value, completely the same torrc file apart from nickname, DirPort, ORPort and MyFamily.
Pretty large max bandwith for both of them :

RelayBandwidthRate 25600 KB
RelayBandwidthBurst 122070 KB
ShutdownWaitLength 90

-- Into the log file :

The only one difference was this (I just corrected it) - it was on the server that is working perfectly fine (ArachnideFR94v2) :

Jan 30 06:07:03.000 [warn] Failing because we have 4063 connections already. Please raise your ulimit -n. [6978 similar message(s) suppressed in last 21600 seconds]
Jan 30 12:07:17.000 [warn] Failing because we have 4063 connections already. Please raise your ulimit -n. [20004 similar message(s) suppressed in last 21600 seconds]

-> Oops :s I know very well this problem and I just forgot to prevent it this time - Sorry for that - Corrected by editing /etc/security/limits.conf and adding some lines about the user used by my second tor daemon - I just restarted this one by SIGINT, wait and new launch (logged with a much better ulimit !) today at 12:35 UTC.

Absolutely nothing else abnormal into the log of the 2 daemons. And the problem of ulimit was on the one that runs perfectly well!
--

Any idea of why the first one is completely stuck at such a low value ?

The next steps would be to set the 2 tor daemons on the /etc/init.d using the Multiple Tor Processes example at www.torservers.net on this machine, and to watch if ArachnideFR94v2 launched by /etc/init.d is still working as well as today.

If no solution appears, may be trying to re-install the entire system, and watch what happens would be interesting.
If still no solutions, it will be complicated to understand !

Best regards, and thank you in advance for your ideas
Julien ROBIN


More information about the tor-relays mailing list