[ooni-dev] Tunisian censorship lab

Nathan Wilcox nathan at leastauthority.com
Sun Jun 16 17:59:56 UTC 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I'm not very familiar with the source tree yet, but I notice one bit
of low hanging fruit, one set of distinguishing hypothesis, and one
concern.  Responses inline:


On Sat, Jun 15, 2013 at 8:03 AM, Jacob Appelbaum <jacob at appelbaum.net> wrote:
> Heya,
>
> Greetings from #404Lab in Tunis, Tunisia!
>
> I'm testing the ooniprobe in the censorship lab. I'm making a running
> list of issues and I'll outline them below.
>
> When running the following test:
>
>  ./bin/ooniprobe -l tunisia-404lab-netcache-000.log
> nettests/blocking/http_requests.py;
>
> I find that the test hangs for ~10 minutes:
>
> Log opened.
> [D] No test deck detected
> Starting Tor...
> [D] Setting control port as 9251
> [D] Setting SOCKS port as 9250
> [D] 10%: Finishing handshake with directory server
> [D] 15%: Establishing an encrypted directory connection
> [D] 20%: Asking for networkstatus consensus
> [D] 25%: Loading networkstatus consensus
> [D] 45%: Asking for relay descriptors
> [D] 80%: Connecting to the Tor network
> [D] 90%: Establishing a Tor circuit
> [D] 100%: Done
> [D] Building a TorState
> Successfully bootstrapped Tor
> [D] We now have the following circuits:
> [D]  * <Circuit 1 BUILT [154.35.32.5] for GENERAL>
> [D]  * <Circuit 2 BUILT [31.172.30.3] for GENERAL>
> [D]  * <Circuit 3 BUILT [198.211.114.130] for GENERAL>
> [D]  * <Circuit 5 BUILT [194.132.32.43] for GENERAL>
> [D]  * <Circuit 6 BUILT [31.172.30.3 46.166.174.209 93.115.241.2] for
> GENERAL>
> [D]  * <Circuit 7 EXTENDED [198.211.114.130 74.120.13.132] for GENERAL>
> [D]  * <Circuit 8 EXTENDED [] for GENERAL>
> [D]  * <Circuit 9 EXTENDED [] for GENERAL>
> [D]  * <Circuit 10 EXTENDED [] for GENERAL>
> [D]  * <Circuit 11 EXTENDED [] for GENERAL>
> [D]  * <Circuit 12 EXTENDED [] for GENERAL>
> [D]  * <Circuit 13 EXTENDED [] for GENERAL>
> [D]  * <Circuit 14 EXTENDED [] for GENERAL>
> [D]  * <Circuit 15 EXTENDED [] for GENERAL>
> [D]  * <Circuit 16 EXTENDED [] for GENERAL>
> [D] Obtained our IP address from a Tor Relay None
> Found your IP via Tor 41.231.22.232
> We will include some geo data in the report
> Your AS number is: AS2609
> Reporting using collector: httpo://nkvphnp3p6agi5qq.onion
> [D] adding callback for startNetTest
> [D] Creating ./report-http_requests_test-2013-06-15T100736Z.yamloo
> [D] Writing report with YAML reporter
> [D] Created report with <ooni.reporter.YAMLReporter object at 0x334ca90>
> Reporting httpo://nkvphnp3p6agi5qq.onion/report
> [D] Sending {"software_name": "ooniprobe", "software_version": "0.0.11",
> "content": "---\noptions: []\nprobe_asn: AS2609\nprobe_cc: TN\nprobe_ip:
> 41.231.22.232\nsoftware_name: ooniprobe\nsoftware_version:
> 0.0.11\nstart_time: 1371290856.723735\ntest_name:
> http_requests_test\ntest_version: 0.2.3\n...\n", "test_version":
> "0.2.3", "test_name": "http_requests_test", "probe_asn": "AS2609"}
> Creating report with OONIB Reporter. Please be patient.
> This may take up to 1-2 minutes...
> [D] Created report with id
> 2013-06-15T100948Z_AS2609_UIsgF34hqK8qn2GaxQtaynMnlIQedKCNaH2fNVoY5voh1I4Kdw
> [D] Created report with <ooni.reporter.OONIBReporter object at 0x334cc90>
> [D] Starting this task <generator object generateMeasurements at 0x3c78870>
> [D] Running <class 'nettests.blocking.http_requests.HTTPRequestsTest'>
> test_get
> [D] Finished test setup
>
> ...
>
> I'm not clear what will happen next - it just hangs. I killed it after
> around ten minutes.
>


Some categorical distinctions of the cause:

Here are my intuitions about how to distinguish different causes of
this ten minute "hang":  First, the actual network behavior may have
been such that "correct" code would have taken longer than ten minutes
to complete.  Second, there may have been an event handled in a
silent, non-progressing manner (more below).  Third, there may have
been a non-terminating process (more below).

If the actual networking events added up to more than ten minutes,
then perhaps more fine grained logging of the events would help the
operator distinguish a hang versus a slow network.  (Also, slowness
could be network interference!)

A common case of "silent, non-progressing" processing in twisted
applications is unhandled Failures in a deferred chain.   Deferreds
are designed such that they cannot predict if new callbacks will be
added someday, so if there's a failure that reaches the end of the
chain, Deferred will silently hang onto it assuming some code will
later add an errback.  Aside from scouring the code looking for
missing errbacks (which is difficult), you might try asking in
freenode's #twisted, #tahoe-lafs, or #leastauthority, to ask for a
general solution to detecting/debugging this.

As for non-termination, in twisted this can be hard to spot because
the loop may be spread throughout deferred callbacks.  (For example,
an infinite retry sequence could happen by design or if there is a bug
in the retry limiting.)  I recommend logging at all retry decision
points to detect common cases.



My Concern:  The test name includes "blocking".

I overheard a bit of conversation that sounds like the only source of
blocking networking currently has to do with accessing SSL certificate
details around the time of a handshake.  I believe it's possible to
access certificate details in a somewhat twisted-friendly manner by
using custom verify callbacks.  Here's an example I wrote:

This hooks up the verifier to a twisted SSL listener:

https://bitbucket.org/0x0000/saturnalia/src/00c286319033711059d63ee81ba177b7dd4cd8e4/src/python/saturnalia/sclienttask.py?at=default#cl-454

The "tlsContextFactory" (constructed a few lines above) is what
implements a custom verifier.  It's definitions is here:

https://bitbucket.org/0x0000/saturnalia/src/00c286319033711059d63ee81ba177b7dd4cd8e4/src/python/saturnalia/sclienttask.py?at=default#cl-812


Several strong caveats:

1. I don't know if I understand the SSL / blocking IO issue well
enough to know if the example linked above is useful.

2. I have low confidence in this custom verifier: there may either be
validation logic errors, or API misuse bugs, so please minimize your
security assumptions about this example.

3. I don't know if this is "The Right Way" to get details about a
certificate / certificate chain from python's SSL wrapper.  It's *a*
way which occurs prior to the twisted Protocol layer, during the
handshake.

4. The python SSL bindings here are super sketch.  I have a vague
memory of calling something like a signature verification method on a
cert only to have a core dump.  (Tantalizing...)

5. I *assume* that the verify callback runs on the reactor's thread,
so if it blocks, the entire reactor thread blocks, but it is
(hopefully) safe to twiddle any python variables (owned on the reactor
thread).  Also, the return value is used for SSL validation, so it is
not possible to use twisted deferreds to make a validation decision.

6. There's an obnoxious layering violation gotcha that stems from the
lower SSL abstraction, which gets reified in twisted.  I don't
remember the exact details, but it's something like "connectionMade"
is called *prior* to certificate validation, but some other event
handlers are called after.  Be careful and double check any
authentication assumptions in each event handler separately.

7. The codebase I pointed to uses a different event management
framework (called task) that is then mashed together with twisted, so
please don't emulate the larger code structure there without
understanding it.  The two points I linked to are the useful bits.

I recommend verifying any code you write with #twisted.  There will be
folks there who have much more expertise on this than I do.

(I can repeat these details in a ticket if someone points me to it.)


> I followed up with the following test:
>
> ./bin/ooniprobe -l tunisia-404lab-netcache-000.log
> nettests/manipulation/http_invalid_request_line.py;
>
> It worked without issues.
>
> The following test goes on for a while:
>
> ./bin/ooniprobe -l tunisia-404lab-netcache-000.log
> nettests/blocking/http_requests.py -f inputs/top-1000.txt
>
> It starts throwing the following errors:
> Performing GET request to google.com.ar
> [D] Performing request google.com.ar GET {'User-Agent': ['Mozilla/5.0
> (Windows; U; Windows NT 5.1; de; rv:1.9.2) Gecko/20100115 Firefox/3.6']}
> [!] Unsupported scheme: ''
> [!] Task <ooni.tasks.Measurement object at 0x3302d90> has failed 6 times
> Traceback (most recent call last):
>   File "/ooni-probe/ooni/managers.py", line 59, in _fillSlots
>     self._run(task)
>   File "/ooni-probe/ooni/managers.py", line 70, in _run
>     d = task.start()
>   File "/ooni-probe/ooni/tasks.py", line 91, in start
>     return BaseTask.start(self)
>   File "/ooni-probe/ooni/tasks.py", line 38, in start
>     self._running = defer.maybeDeferred(self.run)
> --- <exception caught here> ---
>   File
> "/ooniprobe/lib/python2.7/site-packages/twisted/internet/defer.py", line
> 138, in maybeDeferred
>     result = f(*args, **kw)
>   File "/ooni-probe/ooni/tasks.py", line 129, in run
>     d = self.netTestMethod()
>   File "/ooni-probe/nettests/blocking/http_requests.py", line 117, in
> test_get
>     headers=headers)
>   File "/ooni-probe/ooni/templates/httpt.py", line 313, in doRequest
>     body_producer)
>   File "/ooni-probe/ooni/utils/txagentwithsocks.py", line 179, in request
>     return client.Agent.request(self, method, uri, headers, bodyProducer)
>   File "/lib/python2.7/site-packages/twisted/web/client.py", line 1218,
> in request
>     parsedURI.port)
>   File "/ooni-probe/ooni/utils/txagentwithsocks.py", line 199, in
> _getEndpoint
>     raise SchemeNotSupported("Unsupported xxx scheme: %r" % (scheme,))
> exceptions.NameError: global name 'SchemeNotSupported' is not defined

Here's the Low Hanging Fruit:

Run pyflakes on the ooni codebase.  Require pyflakes in your official
merge process.  Require it in your official release candidate process.
 Require it during package building it.  Run pyflakes on all python
code you ever touch.  Configure your editor to run pyflakes as soon as
you save a python file.  Add pre-commit hooks that abort your commits
if pyflakes fails.  Run pyflakes in your sleep.  Eat pyflakes for
breakfast.

Then, after you've sprinkled pyflakes everywhere you possibly can,
take a moment of silence and ponder the tragedy of total time lost due
to the fact that python doesn't simply run the pyflakes checks every
time it ever imports any module.


> [!] Permanent failure for <ooni.tasks.Measurement object at 0x3302d90>
> Failed doing measurement: <ooni.tasks.Measurement object at 0x3302d90>
> [D] Starting this task <ooni.tasks.ReportEntry object at 0xbf0c950>
> [D] Starting this task <ooni.tasks.ReportEntry object at 0xbf0ca50>
>
> It then hangs like so:
>
> [D] Checking all tasks for completion 102 == 133
> [D] Successfully performed report <ooni.tasks.ReportEntry object at
> 0xc1405d0>
> [D] None
> [D] Checking all tasks for completion 103 == 133
> [D] Successfully performed report <ooni.tasks.ReportEntry object at
> 0xc1408d0>
> [D] None
>
> I killed it - I'm not sure why it hangs. The host list is just the top
> alexa hosts.
>
>  % ./bin/ooniprobe -l tunisia-404lab-netcache-000.log
> nettests/blocking/tcpconnect.py -f inputs/top-1000-port80.txt
>
> This worked without issue.
>
> I tried:
>
>  % ./bin/ooniprobe -l tunisia-404lab-netcache-000.log
> nettests/blocking/http_requests.py -f inputs/top-1000.txt
>
> I had similar hangs - so the tests never finished:
>
> [D] Checking all tasks for completion 100 == 133
> [D] Successfully performed report <ooni.tasks.ReportEntry object at
> 0xcc7d190>
> [D] None
> [D] Checking all tasks for completion 101 == 133
> [D] Successfully performed report <ooni.tasks.ReportEntry object at
> 0xcc7d490>
> [D] None
> [D] Checking all tasks for completion 102 == 133
> [D] Successfully performed report <ooni.tasks.ReportEntry object at
> 0xcc7d790>
> [D] None
> [D] Checking all tasks for completion 103 == 133
> [D] Successfully performed report <ooni.tasks.ReportEntry object at
> 0xcc7da90>
> [D] None
>
> Thoughts?
>

I have little understanding of this log format, but the "None" entries
seem suspicious.  Could there be a function missing a "return"
statement, that implicitly returns None?


> All the best,
> Jake
> _______________________________________________
> ooni-dev mailing list
> ooni-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-dev



- --
Nathan Wilcox
Least Authoritarian

GPG id: CBA48BDD
GPG fingerprint: 9309 EB72 87D4 192F 0D7E  0F16 1191 AF31 CBA4 8BDD
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIcBAEBAgAGBQJRvfvRAAoJEBGRrzHLpIvdjcAQAKN41WKEmpxqCDEVPav5B5pl
V0nt16VwxJuWMNy96Sipg7RezHKfXfxEyY4K8QJz6JWeDEa9U368UM57F/yj8s36
66+HJsPacAuBeExjM+UGNgSPMD9w2G0c9+i55P42wBizYMmpCcECqiTWN2A0iq/O
vTvr/tjLPyOrsYuivygYW0VU8Ez03zIlRo02gelf5+yq6A/h5kP4QC2ExGgzrtjS
bMWtGqM7Xf2d+y4SJR6lfEYT+CIHnby3uPKfzhODpNQDdocS0t65OJm8ZrqFth04
rkRXlblhbTZ5DSyD/MoEaQ+2TG6/7B9luHM+cPVfdN75DOSFRBHSTwE1A1r27jw4
tKt3uscE6VVLhLpyc3zpRfZ9MYu2J3LY2reeSJfW1exsOsEgekEMRZvpAMd+pJ+O
Tb0qj3td9BQaP0zUEiLa86v4i4yN2U10aCqFy9mRr7FrsBuxaET8Zxfl4rbvGo4F
LXHmoftHcSE89LF4gaPj1uR4coIA8EizcDa/d9bkhHJapX4YCqb8rk1tXkIUilwa
StOcwWngel2D5/v+42yWOMe6qlD3pRnR+M06lJHu8nOdC1yu73lbfUtXXE/iY30V
85TA7O/rDqK0tp5QqBmVGnCKMARasRgLTqTfRLxR9qJQUr/v9i3OGRmHS9jWHDaL
RX2J+0N+sRC0FpvZQdKf
=YJ0l
-----END PGP SIGNATURE-----


More information about the ooni-dev mailing list