On Fri, Mar 07, 2014 at 02:39:18PM +0000, infinity0@torproject.org wrote:
commit 05b9c101ba9afe4653d1eff6f5414f90f22ef042 Author: Ximin Luo infinity0@torproject.org Date: Fri Mar 7 13:39:31 2014 +0000
remove failed connections from proxy_pairs as well - this is a pretty major fix, as the previous behaviour effectively disabled a proxy after 5 failed connections (= max_num_proxy_pairs / 2)
Can you say some more about the circumstances of this fix? What behavior changed for you after making it? I can't reproduce what's described in the commit message.
With 9dea0c6c90dedc868dfaa84add2bfa19a2039281, I set up a phony facilitator with ncat -v -lk 7000 --sh-exec "./facilitator.py" where facilitator.py has the contents
#!/usr/bin/env python import random import sys port = random.randint(10000,11000) print "HTTP/1.0 200 OK\r" print "Content-Type: application/x-www-form-urlencoded\r" print "Cache-Control: no-cache\r" print "Access-Control-Allow-Origin: *\r" print "\r" sys.stdout.write("check-back-in=10&client=127.0.0.1:"+str(port)+"&relay=173.255.221.44:9901")
I then browse to file:///home/david/flashproxy/proxy/embed.html?debug&facilitator=http://127.0.0.1:7000&initial_facilitator_poll_interval=1&unsafe_log... All the "client" connections are going to closed ports, but the "Client: error" lines still appear and clients don't accumulate in the request URL. The same happens if I use an open port with a Ncat listener that just closes the connection. The same happens if I change to a non-listening Internet address like 1.1.1.1.
2014-03-10T16:28:48.173Z | Starting; will contact facilitator in 1 seconds. 2014-03-10T16:28:49.179Z | Facilitator: connecting to http://127.0.0.1:7000?r=1&transport=websocket. 2014-03-10T16:28:49.203Z | Next check in 10 seconds. 2014-03-10T16:28:49.204Z | Facilitator: got client:{ host: "127.0.0.1", port: 10782 } relay:{ host: "173.255.221.44", port: 9901 }. 2014-03-10T16:28:49.205Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: connecting. 2014-03-10T16:28:49.206Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: connecting. 2014-03-10T16:28:49.207Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: error. 2014-03-10T16:28:49.208Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: closed. 2014-03-10T16:28:49.209Z | Complete. 2014-03-10T16:28:49.209Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: error. 2014-03-10T16:28:49.209Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: closed. 2014-03-10T16:28:49.210Z | Complete. 2014-03-10T16:28:59.205Z | Facilitator: connecting to http://127.0.0.1:7000?r=1&transport=websocket. 2014-03-10T16:28:59.226Z | Next check in 10 seconds. 2014-03-10T16:28:59.227Z | Facilitator: got client:{ host: "127.0.0.1", port: 10221 } relay:{ host: "173.255.221.44", port: 9901 }. 2014-03-10T16:28:59.227Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: connecting. 2014-03-10T16:28:59.227Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: connecting. 2014-03-10T16:28:59.229Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: error. 2014-03-10T16:28:59.229Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: closed. 2014-03-10T16:28:59.230Z | Complete. 2014-03-10T16:28:59.230Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: error. 2014-03-10T16:28:59.230Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: closed. 2014-03-10T16:28:59.231Z | Complete. 2014-03-10T16:29:09.227Z | Facilitator: connecting to http://127.0.0.1:7000?r=1&transport=websocket. 2014-03-10T16:29:09.249Z | Next check in 10 seconds. 2014-03-10T16:29:09.250Z | Facilitator: got client:{ host: "127.0.0.1", port: 10069 } relay:{ host: "173.255.221.44", port: 9901 }.
Now what *does* seem to be happening, and what had me confused for a while, is that Firefox seems to be artificially delaying repeated WebSocket connections to the same port with some kind of exponential delay. I originally had a hardcoded port in facilitator.py, and the delay made it appear that something like what the commit message describes was happening. Since WebSocket connections eventually start waiting a long time to fail, it looks like they are not being closed. https://people.torproject.org/~dcf/graphs/firefox-exponential-delay.png is a screenshot showing what looks like exponential growth in the Network panel. Chromium doesn't appear to have the same issue; the requests are linearly spaced in the network pane.
Are you able to confirm any of the above? When you discovered the bug, were you testing with Firefox or Chromium? How did you find the bug, and what did you do (differently from the above) to reproduce it? In any case, what's going on must be different than what's described in the commit message, because even before the fix the proxy continues running for me with max_clients=2.
David
On 10/03/14 17:02, David Fifield wrote:
On Fri, Mar 07, 2014 at 02:39:18PM +0000, infinity0@torproject.org wrote:
commit 05b9c101ba9afe4653d1eff6f5414f90f22ef042 Author: Ximin Luo infinity0@torproject.org Date: Fri Mar 7 13:39:31 2014 +0000
remove failed connections from proxy_pairs as well - this is a pretty major fix, as the previous behaviour effectively disabled a proxy after 5 failed connections (= max_num_proxy_pairs / 2)
Can you say some more about the circumstances of this fix? What behavior changed for you after making it? I can't reproduce what's described in the commit message.
With 9dea0c6c90dedc868dfaa84add2bfa19a2039281, I set up a phony facilitator with ncat -v -lk 7000 --sh-exec "./facilitator.py" where facilitator.py has the contents
#!/usr/bin/env python import random import sys port = random.randint(10000,11000) print "HTTP/1.0 200 OK\r" print "Content-Type: application/x-www-form-urlencoded\r" print "Cache-Control: no-cache\r" print "Access-Control-Allow-Origin: *\r" print "\r" sys.stdout.write("check-back-in=10&client=127.0.0.1:"+str(port)+"&relay=173.255.221.44:9901")
I then browse to file:///home/david/flashproxy/proxy/embed.html?debug&facilitator=http://127.0.0.1:7000&initial_facilitator_poll_interval=1&unsafe_log... All the "client" connections are going to closed ports, but the "Client: error" lines still appear and clients don't accumulate in the request URL. The same happens if I use an open port with a Ncat listener that just closes the connection. The same happens if I change to a non-listening Internet address like 1.1.1.1.
2014-03-10T16:28:48.173Z | Starting; will contact facilitator in 1 seconds. 2014-03-10T16:28:49.179Z | Facilitator: connecting to http://127.0.0.1:7000?r=1&transport=websocket. 2014-03-10T16:28:49.203Z | Next check in 10 seconds. 2014-03-10T16:28:49.204Z | Facilitator: got client:{ host: "127.0.0.1", port: 10782 } relay:{ host: "173.255.221.44", port: 9901 }. 2014-03-10T16:28:49.205Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: connecting. 2014-03-10T16:28:49.206Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: connecting. 2014-03-10T16:28:49.207Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: error. 2014-03-10T16:28:49.208Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: closed. 2014-03-10T16:28:49.209Z | Complete. 2014-03-10T16:28:49.209Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: error. 2014-03-10T16:28:49.209Z | 127.0.0.1:10782|173.255.221.44:9901 : Client: closed. 2014-03-10T16:28:49.210Z | Complete. 2014-03-10T16:28:59.205Z | Facilitator: connecting to http://127.0.0.1:7000?r=1&transport=websocket. 2014-03-10T16:28:59.226Z | Next check in 10 seconds. 2014-03-10T16:28:59.227Z | Facilitator: got client:{ host: "127.0.0.1", port: 10221 } relay:{ host: "173.255.221.44", port: 9901 }. 2014-03-10T16:28:59.227Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: connecting. 2014-03-10T16:28:59.227Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: connecting. 2014-03-10T16:28:59.229Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: error. 2014-03-10T16:28:59.229Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: closed. 2014-03-10T16:28:59.230Z | Complete. 2014-03-10T16:28:59.230Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: error. 2014-03-10T16:28:59.230Z | 127.0.0.1:10221|173.255.221.44:9901 : Client: closed. 2014-03-10T16:28:59.231Z | Complete. 2014-03-10T16:29:09.227Z | Facilitator: connecting to http://127.0.0.1:7000?r=1&transport=websocket. 2014-03-10T16:29:09.249Z | Next check in 10 seconds. 2014-03-10T16:29:09.250Z | Facilitator: got client:{ host: "127.0.0.1", port: 10069 } relay:{ host: "173.255.221.44", port: 9901 }.
Now what *does* seem to be happening, and what had me confused for a while, is that Firefox seems to be artificially delaying repeated WebSocket connections to the same port with some kind of exponential delay. I originally had a hardcoded port in facilitator.py, and the delay made it appear that something like what the commit message describes was happening. Since WebSocket connections eventually start waiting a long time to fail, it looks like they are not being closed. https://people.torproject.org/~dcf/graphs/firefox-exponential-delay.png is a screenshot showing what looks like exponential growth in the Network panel. Chromium doesn't appear to have the same issue; the requests are linearly spaced in the network pane.
I get a different behaviour from you. I was using node-flashproxy with flashproxy.js from tag 1.6. Unlike in your logs above, I would get something like this, paraphrased:
Client: connecting Client: connecting # after 3 minutes, the TCP timeout Client: error # no "Client: closed" and no "Complete"
Then, the proxy_pairs stay in the list, and I can see subsequent "conecting to" lines grow longer and longer until they hit a length of 10.
I think it is caused by the websocket implementation setting ws.readyState to WebSocket.CLOSED in the case of a socket error. In this case, the in-code comment I added to 05b9c101 would apply, namely:
+ // we can't rely on onclose_callback to cleanup, since one common error + // case is when the client fails to connect and the relay never starts. + // in that case close() is a NOP and onclose_callback is never called.
Perhaps Firefox/Chrome don't set ws.readyState to WebSocket.CLOSED. I am not sure if this is a bug in node-ws, or if the WebSocket spec allows it, etc. When I get some more time, I will:
- read any websocket specs - do some more testing with Firefox/Chrome, perhaps log ws.readyState at various points
Are you able to confirm any of the above? When you discovered the bug, were you testing with Firefox or Chromium? How did you find the bug, and what did you do (differently from the above) to reproduce it? In any case, what's going on must be different than what's described in the commit message, because even before the fix the proxy continues running for me with max_clients=2.
David
On Mon, Mar 10, 2014 at 06:12:19PM +0000, Ximin Luo wrote:
I get a different behaviour from you. I was using node-flashproxy with flashproxy.js from tag 1.6. Unlike in your logs above, I would get something like this, paraphrased:
Client: connecting Client: connecting # after 3 minutes, the TCP timeout Client: error # no "Client: closed" and no "Complete"
Then, the proxy_pairs stay in the list, and I can see subsequent "conecting to" lines grow longer and longer until they hit a length of 10.
I think it is caused by the websocket implementation setting ws.readyState to WebSocket.CLOSED in the case of a socket error. In this case, the in-code comment I added to 05b9c101 would apply, namely:
Oh, good. That explains it. And node-flashproxy is exactly where this issue would matter most.
Your patch seems correct, so you don't have look into the WebSocket implementations unless you want to.
David Fifield