[tor-bugs] #30986 [Circumvention]: Understand the "long tail" of unclassifiable network traffic

Wed Jun 26 16:21:29 UTC 2019

#30986: Understand the "long tail" of unclassifiable network traffic
---------------------------+--------------------------------
 Reporter:  phw            |          Owner:  phw
     Type:  project        |         Status:  assigned
 Priority:  Medium         |      Milestone:
Component:  Circumvention  |        Version:
 Severity:  Normal         |     Resolution:
 Keywords:                 |  Actual Points:
Parent ID:  #30716         |         Points:  5
 Reviewer:                 |        Sponsor:  Sponsor28-must
---------------------------+--------------------------------
Changes (by phw):

 * parent:  #29285 => #30716

Old description:

> The obfs family of obfuscation protocols strives to "look like nothing"
> and falls into the long tail of network traffic that is meant to be
> unclassifiable. That is, if an ISP is monitoring its uplink, it shouldn't
> be able to figure out that one of its users is talking obfs4 to a Tor
> bridge. Instead, the obfs4 connection should show up as "unknown" in the
> log files.
>
> We know next to nothing about this long tail that the obfs family hides
> in. What fraction of flows does it constitute? What fraction of bytes?
> What kind of protocols and implementations are difficult to classify? How
> does the long tail differ across uplinks?
>
> Over at #29285 we're brainstorming features for obfs4's successor but
> before moving forward with obfs5, we should get a better understanding of
> this long tail because it allows us to make informed design decisions.
> Packet traces from the [http://mawi.wide.ad.jp/mawi/ WIDE backbone] is
> one of the data sets that may be helpful here.
>
> Let's use this ticket to track progress and collect insights.

New description:

 The obfs family of obfuscation protocols strives to "look like nothing"
 and falls into the long tail of network traffic that is meant to be
 unclassifiable. That is, if an ISP is monitoring its uplink, it shouldn't
 be able to figure out that one of its users is talking obfs4 to a Tor
 bridge. Instead, the obfs4 connection should show up as "unknown" in the
 log files.

 We know next to nothing about this long tail that the obfs family hides
 in. What fraction of flows does it constitute? What fraction of bytes?
 What kind of protocols and implementations are difficult to classify? How
 does the long tail differ across uplinks?

 Over at #30716 we're brainstorming features for obfs4's successor but
 before moving forward with obfs5, we should get a better understanding of
 this long tail because it allows us to make informed design decisions.
 Packet traces from the [http://mawi.wide.ad.jp/mawi/ WIDE backbone] is one
 of the data sets that may be helpful here.

 Let's use this ticket to track progress and collect insights.

--

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/30986#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online