[tor-bugs] #7460 [Ooni]: Make list of tests that we feel confident deploying on M-Lab and test them

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Nov 12 13:56:40 UTC 2012


#7460: Make list of tests that we feel confident deploying on M-Lab and test them
----------------------------------------+-----------------------------------
 Reporter:  hellais                     |          Owner:  hellais
     Type:  task                        |         Status:  new    
 Priority:  normal                      |      Milestone:         
Component:  Ooni                        |        Version:         
 Keywords:  ooni_tests, SponsorH201208  |         Parent:         
   Points:                              |   Actualpoints:         
----------------------------------------+-----------------------------------
 In the brussels meeting we took these notes on the tests that we can
 deploy on M-Lab:

 HTTPHost:
 https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/HTTPHost
 (link inaccurate as to what it does)
 How does it work, briefly?
 Uses Host header to identify transparent proxy when the back-end server is
 under our control. Connect to an HTTP server running on M-Lab. Send a get
 request, or /, add the “host field” inside, which consists of something
 we’re trying to determine is/isn’t censored. If what you get back is a
 block page, it’s determinate for censorship. Could also lead to a vendor
 signature.
 Pending qsts for future work:

 What if we cache data (e.g., store the FB homepage)? Would that be a
 copyright probl?
 What data must be collected on the M-Lab server and published?
 HTTP requests being made and the responses.
 What data does it gather from the client?
 The requests the client makes.
 What data does it gather on the server?
 The requests the server receives.
 What is logged on the client?
 NA
 Who has access to client’s logs?
 NA

 Two Way Traceroute:
 https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/TwoWayTraceroute
 How does it work, briefly?
 Multiprotocol multiport traceroute to detect discrepancies in paths based
 on source/destination port and protocol. It is performed both from the
 client to the backend and from the backend to the client.Traceroute per
 protocol/port pair.
 What data must be collected on M-Lab and published?
 Traceroute per protocol/port pair.
 What data does it gather from the client?
 Traceroutes.
 What data does it gather on the server?
 Traceroutes.
 What is logged?
 NA
 Who has access to logs?
 NA
 Note: may need to mask ip address, first and last hop.  Also the client IP
 address will be present in the IP header embedded in the ICMP Time
 Exceeded messages sent by each intervening router. This is a TCP/UDP/ICMP
 traceroute - it performs multi-protocol traces simultaneously on the
 client side.

 Keyword Filtering:
 https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/KeywordFiltering
 How does it work, briefly?
 We establish a connection to the backend and over either HTTP or other
 plain-text protocols we send a set of keywords to be tested for filtering.
 Censorship may be detected either because the keyword does not reach the
 backend or the client and/or backend receive a RST packet.
 The client has a secure channel with the backend to signal the keywords
 being sent.
 What data must be collected on M-Lab and published?
 Keywords that are sent and received, and blocked.
 What data does it gather from the client?
 Keywords that were sent. What event triggered censorship (did the packet
 not arrive at destination, was a RST packet received? Was it received by
 the client or server or both).
 What data does it gather on the server?
 Keywords that were blocked.
 What is logged?
 NA
 Who has access to logs?
 NA
 Notes: Very careful to anonymize this.

 RST Packet Detection:
 https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/RSTPacketDetection
 How does it work, briefly?
 This test is a dependent test, in the sense that it requires some other
 test to trigger it. We listen for RST packets on both the client and
 server and try to trigger the censor to block via RST packets. This can be
 done by injecting keywords into a TCP connection or by contacting certain
 sites.
 Open qsts:

 Is it possible on mlab to ignore RST packets?
 What data must be collected on M-Lab and published?
 RST packets. Requests made that triggered the RST packet.
 What data does it gather from the client?
 RST packets. What kind of client request was made to trigger the RST
 packet.
 What data does it gather on the server?
 RST packets. What server side request was done to trigger the RST packet.
 What is logged?
 NA
 Who has access to logs?
 NA

 Daphn3: https://speakerdeck.com/u/hellais/p/ooni-and-daphn3
 How does it work, briefly?
 This is a dependent test. We have a packet that we know is being censored
 and our objective is to figure out where in the packet the censor is
 matching to trigger censorship. We create a state machine of client sent
 messages and server side messages. We start by mutating the first byte and
 walk through the state machine until a certain mutation does not trigger
 the censor. We have therefore discovered that the censor is fingerprinting
 on that byte.
 What data must be collected on M-Lab and published?
 The censored packet capture. The fingerprint that the censor is matching
 against.
 What data does it gather from the client?
         What (packet, mutation) pair it received and sent.
 What data does it gather on the server?
         What (packet, mutation) pair it received and sent.
 What is logged?
 NA
 Who has access to logs?
 NA

 Header Field Manipulation:
 https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/HeaderFieldManipulation
 How does it work, briefly?
 You establish a connection to the backend system. Send some specially
 crafted HTTP header fields (variating capitalization, add some special
 HTTP headers, etc.) and detect on the backend if these arrive as they were
 created by the client.
 What data must be collected on M-Lab and published?
 The HTTP requests being created by the client and the requests being
 received by the server.
 What data does it gather from the client?
 The requests it sends.
 What data does it gather on the server?
 The requests it receives.
 What is logged?
 NA
 Who has access to logs?
 NA

 Captive Portal:
 https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/CaptivePortal
 How does it work, briefly?
 Mimics vendor captive portal tests (e.g. Chrome, Apple, Firefox, MS,
 etc.), attempts to detect via DNS lookups that should return a bad value,
 and instead return a login page (in the case of Chrome, etc.)
 What data must be collected on M-Lab and published?
 Domains tested and the results.
 Reverse resolutions.
 What data does it gather from the client?
 The resolved domain as seen by the client, and a boolean for whether or
 not this matched the control result. If it doesn’t exactly match, fuzzy
 matching and reverse resolutions will be used (this is useful in the case
 of geolocalized services), and this is specified in the returned results
 as well.
 What data does it gather on the server?
 None, although it could be useful to run the control resolver on M-Lab,
 which can also be done over TCP over Tor with unbound.
 What is logged?
 The domains tested, the sets of resolved IPs from both the experimental
 DNS resolver and the control resolver, and a boolean for whether or not
 the there was an intersection between the tests (True, positive
 intersection means that DNS was not tampered; False, zero intersection
 means that DNS was tampered.)
 Who has access to logs?
 Depending on the Captive Portal test - sometimes upstream servers (MS,
 Apple, etc) may have log data.

 Network Latency
 https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/Networklatency
 How does it work, briefly?
 Measures difference in RTT between different protocols at the sub-100-ms
 granularity to attempt to identify inline manipulation and/or inspection
 of data.difference
 NOT YET IMPLEMENTED
 BISmark related tests seem relevant here?
         Potentially we contact Nick directly
 What data must be collected on M-Lab and published?
 Packets sent and received and the timings of those.
 What data does it gather from the client?
 The exact timing or received and sent packets. The packets being sent and
 received.
 What data does it gather on the server?
 The exact timing or received and sent packets. The packets being sent and
 received.
 What is logged?
 Who has access to logs?

 DNS Lookup:
 https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/DNSLookup
 How does it work, briefly?
 DNS lookups of to be tested hostnames compared to a to be tested DNS
 resolver and lookups to a good DNS server (ie, Google 8.8.8.8).
 Additionally we may want to do DNS queries over a known “clean” channel.
 What data must be collected on M-Lab and published?
 Domain names requested and results from experiment and control resolvers.
 Possibly whether the query was seen by the name server if that’s under our
 control.
 What data does it gather from the client?
 Domain names requested.
 Results from experiment resolver.
 Results from control resolver.
 What data does it gather on the server?
 If running on slice, gather which queries are seen by the server.
 What is logged?
 NA
 Who has access to logs?
 NA
 Note: This could run on a slice if the slice is running a resolver. This
 is not necessary, however.
 Tests for which we can store data, but don’t need to run on a slice


 Bridge Tor:
 https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/BridgeT
 How does it work, briefly?
 A Tor client uses a geoIPdb to determine likelihood of connections to the
 Tor network being blocked, and then automatically iterates through a set
 of types of connections to Tor bridges, ranging from ICMP Echo (Ping) to a
 full Tor protocol connection.
 What data must be collected on M-Lab and published?
 Status per bridge per connection type. What kind of measurements were made
 and their result.
 What data does it gather from the client?
 Which Tor bridges were reachable/unreachable using which connection type.
 The application level view of what request was made (e.x. did an Ping, did
 a Tor connection).
 In the case of the Tor test we want to collect the info level log of Tor.
 What data does it gather on the server?
 None, there is no server component. However, it could be useful to run a
 “canary” Tor bridge on a server along with the triggering mechanism for a
 DPI probe, and then log connections by probes to canaries.
 What is logged?
 Which Tor bridges were reachable/unreachable using which connection type.
 Who has access to logs?

 HTTP scan:
 https://trac.torproject.org/projects/tor/wiki/doc/OONI/Tests/HTTPScan
 How does it work, briefly?
 Queries list of sites to detect blocking. It collects the request being
 made to the website, the content of the website and it does measurements
 to determine how different the structure of the page is from the expected
 page (https://gitweb.torproject.org/ooni-
 probe.git/blob/HEAD:/ooni/plugins/domclass.py).
 What data must be collected on M-Lab and published?
 The content of the sites being contacted, the request being made and the
 HTTP headers in the response. Eigenvector and value.
 What data does it gather from the client?
 The request being made and the response.
 What data does it gather on the server?
 None.
 What is logged?
 NA
 Who has access to logs?
 NA
 Note: doesn’t need to run on M-Lab server, but could go to server and be
 stored in M-Lab repository. If we need something running on the server
 could implement.


 Based on this list we should figure out which ones we feel confident
 deploying on their machines and update the shared google doc providing a
 link to the implementation of the backend component and the client nettest
 component.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/7460>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list