[tor-bugs] #30716 [Circumvention/Obfs4]: Improve the obfs4 obfuscation protocol

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Aug 22 22:17:17 UTC 2019


#30716: Improve the obfs4 obfuscation protocol
-------------------------------------------------+-------------------------
 Reporter:  phw                                  |          Owner:  phw
     Type:  task                                 |         Status:
                                                 |  assigned
 Priority:  High                                 |      Milestone:
Component:  Circumvention/Obfs4                  |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  sponsor28, anti-censorship-roadmap-  |  Actual Points:
  august                                         |
Parent ID:                                       |         Points:  20
 Reviewer:                                       |        Sponsor:
                                                 |  Sponsor28-must
-------------------------------------------------+-------------------------
Description changed by phw:

Old description:

> As part of our work for Sponsor 28, we will evaluate and improve the
> obfs4 obfuscation protocol, which may result in obfs5.
>
> Roger started the discussion [https://lists.torproject.org/pipermail
> /anti-censorship-team/2019-May/000015.html on our anti-censorship-team
> mailing list]. Relevant reading is the CCS'15 paper
> [https://censorbib.nymity.ch/#Wang2015a Seeing through Network-Protocol
> Obfuscation] and the S&P'16 paper
> [https://censorbib.nymity.ch/#Tschantz2016a SoK: Towards Grounding
> CensorshipCircumvention in Empiricism].
>
> Let's use this ticket to keep track of this effort. Below is a list of
> ideas that we may or may not want to incorporate in obfs5.
>
> == Randomisation
>
> Obfs4 already implements randomisation for packet lengths and inter-
> arrival times but there are other protocol aspects that we can randomise.
> Note that the adoption of these strategies may complicate censorship
> analysis: if obfs5 instance X looks very different from obfs5 instance Y,
> then X may end up getting blocked while Y still works. Instead of saying
> "obfs5 is blocked," one may then have to be more specific and say "the
> obfs5 instances that rely on UDP are blocked."
>
> * **Payload**: All bytes that obfs4 writes to the wire are randomly
> distributed. These high-entropy packets may or may not be common on the
> Internet. We could evade a "high-entropy filter" by having obfs4 servers
> derive a formal language from the shared secret. This language could,
> say, use dummy clear-text headers.
>
> * **Cover traffic**: [https://lists.torproject.org/pipermail/tor-
> dev/2017-June/012310.html dcf] explains that obfs4 only sends data when
> it's given data to send. To improve on this, as dcf suggests, we could
> make obfs5 send data even when the application has nothing to send.
>
> * **Packet directions**: An obfs4 flow begins with the client sending
> data to the server. We could randomise packet directions and have, say,
> the server talk first with a server-specific probability.
>
> * **Transport protocol**: An obfs4 server could talk either TCP or UDP or
> SCTP. This may very well not be worth the effort.
>
> == Lessons learned from [https://censorbib.nymity.ch/#Wang2015a CCS'15
> paper]
>
> * DPI boxes tend to classify flows by only inspecting the first N packets
> of a flow. Keeping state is expensive, after all. We could exploit this
> by relaxing our obfuscation techniques after N packets to increase
> throughput.
>
> * The paper's data set may not be representative of what countries or
> ISPs would see:
>   * It's "only" a university uplink. Universities typically have policies
> that prohibit file sharing such as BitTorrent. BitTorrent's "message
> stream encryption" may look similar to obfs3 and obfs4.
>   * The data sets are from 2014, 2012, and 2010, respectively. That's a
> long time in Internet years.
>   * The detectors' false positive rates are non-trivial and, as the
> authors point out themselves, would be problematic for a censor given
> that non-obfuscated traffic significantly outweighs obfuscated traffic.
>   * Does the data set only contain one obfs4 server instance? This may
> have affected their results.
>
> == Miscellaneous
>
> * [https://trac.torproject.org/projects/tor/ticket/30716#comment:1
> yawning writes] that obfs4 doesn't easily support backward incompatible
> protocol alterations.
>
> * [https://trac.torproject.org/projects/tor/ticket/30716#comment:3
> yawning writes] that the framing could use better cryptography.
>
> * Crazy idea: Use a modified TCP stack that ignores RST and FIN segments,
> so the GFW's on-path devices cannot tear down the connection. Instead,
> the obfs5 protocol could signal the end of the connection in an
> authenticated control frame. We could ignore RST and FIN segments by
> using firewall rules, or to get more crazy, by shipping a user space TCP
> stack (this may be easy to fingerprint, though).

New description:

 As part of our work for Sponsor 28, we will evaluate and improve the obfs4
 obfuscation protocol, which may result in obfs5.

 Roger started the discussion [https://lists.torproject.org/pipermail/anti-
 censorship-team/2019-May/000015.html on our anti-censorship-team mailing
 list]. Relevant reading is the CCS'15 paper
 [https://censorbib.nymity.ch/#Wang2015a Seeing through Network-Protocol
 Obfuscation] and the S&P'16 paper
 [https://censorbib.nymity.ch/#Tschantz2016a SoK: Towards Grounding
 CensorshipCircumvention in Empiricism].

 Let's use this ticket to keep track of this effort. Below is a list of
 ideas that we may or may not want to incorporate in obfs5.

 == Randomisation

 Obfs4 already implements randomisation for packet lengths and inter-
 arrival times but there are other protocol aspects that we can randomise.
 Note that the adoption of these strategies may complicate censorship
 analysis: if obfs5 instance X looks very different from obfs5 instance Y,
 then X may end up getting blocked while Y still works. Instead of saying
 "obfs5 is blocked," one may then have to be more specific and say "the
 obfs5 instances that rely on UDP are blocked."

 * **Payload**: All bytes that obfs4 writes to the wire are randomly
 distributed. These high-entropy packets may or may not be common on the
 Internet. We could evade a "high-entropy filter" by having obfs4 servers
 derive a formal language from the shared secret. This language could, say,
 use dummy clear-text headers. The [http://libfte.org/ LibFTE] library may
 be helpful here.

 * **Cover traffic**: [https://lists.torproject.org/pipermail/tor-
 dev/2017-June/012310.html dcf] explains that obfs4 only sends data when
 it's given data to send. To improve on this, as dcf suggests, we could
 make obfs5 send data even when the application has nothing to send.

 * **Packet directions**: An obfs4 flow begins with the client sending data
 to the server. We could randomise packet directions and have, say, the
 server talk first with a server-specific probability.

 * **Transport protocol**: An obfs4 server could talk either TCP or UDP or
 SCTP. This may very well not be worth the effort.

 == Lessons learned from [https://censorbib.nymity.ch/#Wang2015a CCS'15
 paper]

 * DPI boxes tend to classify flows by only inspecting the first N packets
 of a flow. Keeping state is expensive, after all. We could exploit this by
 relaxing our obfuscation techniques after N packets to increase
 throughput.

 * The paper's data set may not be representative of what countries or ISPs
 would see:
   * It's "only" a university uplink. Universities typically have policies
 that prohibit file sharing such as BitTorrent. BitTorrent's "message
 stream encryption" may look similar to obfs3 and obfs4.
   * The data sets are from 2014, 2012, and 2010, respectively. That's a
 long time in Internet years.
   * The detectors' false positive rates are non-trivial and, as the
 authors point out themselves, would be problematic for a censor given that
 non-obfuscated traffic significantly outweighs obfuscated traffic.
   * Does the data set only contain one obfs4 server instance? This may
 have affected their results.

 == Miscellaneous

 * [https://trac.torproject.org/projects/tor/ticket/30716#comment:1 yawning
 writes] that obfs4 doesn't easily support backward incompatible protocol
 alterations.

 * [https://trac.torproject.org/projects/tor/ticket/30716#comment:3 yawning
 writes] that the framing could use better cryptography.

 * Crazy idea: Use a modified TCP stack that ignores RST and FIN segments,
 so the GFW's on-path devices cannot tear down the connection. Instead, the
 obfs5 protocol could signal the end of the connection in an authenticated
 control frame. We could ignore RST and FIN segments by using firewall
 rules, or to get more crazy, by shipping a user space TCP stack (this may
 be easy to fingerprint, though).

--

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/30716#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list