[Taking this discussion to tor-dev.]

On Sun, Sep 1, 2013 at 6:32 AM, George Kadianakis <desnacked@riseup.net> wrote:
Kevin P Dyer <kpdyer@gmail.com> writes:

> Hi George/David,
>

Hi Kevin,

> I spoke with Roger at USENIX. He said you're the pluggable transport (PT)
> gatekeepers. Please bear with me while I get up to speed.
>
> My goals:
> 1. I want Format-Transforming Encryption (FTE) [4] to be a "deployed" PT in
> the PT TBB.
> 2. I want FTE to be integrated seamlessly with your existing deployment
> process.
>
> My initial roadblocks:
>
> === Building/Testing Tor on Linux/OSX/Windows
> I'm trying to understand exactly how the current build/release process
> works for tor. In regards to the PT TBB it seems like there are a few
> resources [1,2,3]. However, is there a canonical documentation on how the
> release process works? I'm especially interested in what you guys are doing
> to produce builds on Windows. Are you using virtualization or do you have a
> few physical build machines?
>
> Prior to doing anything with FTE. I'd love to be able to create my own
> build environment that produces that current obfs2+obfs3+flash_proxy
> bundles [6] across all 4 OS/architecture configurations.
>

Building PTTBBs is mainly done by David these days. He has documented
his process here:
https://gitweb.torproject.org/pluggable-transports/bundle.git
(For example for Windows you would look here:
https://gitweb.torproject.org/pluggable-transports/bundle.git/blob/HEAD:/bundle-windows.txt)

The release process is not standarized. David is doing PTTBB releases
in a best-effort manner.

Right. On first glance, looks like this process will increase in complexity (and utilize more of David's time) as the number of PTs increases.

I need to better understand the build process, then.
 
> === Implementing Managed Mode in FTE
> I've implemented preliminary functionality for "managed" mode in FTE.
> However, I think I'm confused about the role of managed mode.
>
> Say I add "Bridge fte IP:port" to torrc. Is "IP:port" supposed to be a tor
> bridge, or a server-side PT service? If it's the former, then it seems that
> the PT is completely responsible for managing a list of its own PT servers.
> If it is the latter, I can't figure out how to dynamically determine, via
> the "managed" environmental variables, how to capture user-entered
> "IP:port" information in Vidalia.
>

It is the latter. <IP:port> points to the server-side PT service. It
does *not* point to the ORPort of the bridge (we don't care about it).

> Concretely, if I have "Bridge fte IP1:port" and ""Bridge fte IP2:port" in
> my torrc, how does "IP1:port" and "IP2:port" get propagated to my PT via
> the managed interface?
>

The <IP:port> is *not* passed to your transport using the managed
mode. Instead, <IP:port> is passed to your transport using the SOCKS
protocol. That is, when Tor wants to connect to the bridge, it does a
SOCKS handshake to your transport, and asks your transport to connect
to <IP:port>.

Is this documented anywhere?
 
> === How do we invoke PTs?
> I had this discussion with Roger, but I don't see any open tickets or clear
> discussion on this already. If we have N>1 PTs and at least one bridge per
> PT, how do we select which PT (and which bridge associated with that PT) to
> use? Determinism is bad because then only one PT is used. Booting up all
> PTs is bad, especially if (say) the PTs make network connections prior to
> any incoming SOCKS connections. Selecting a random PT is potentially bad,
> too, depending upon how hostile and persistent and stateful the adversary
> is.
>

That's an interesting question. I'm not sure if the process of Tor
picking bridges is deterministic or not. I should test it out. David
might know.

(A good scenario would be that Tor treats bridges like guards and
selects some at random to build circuits.)

We should definitely try to flesh this out.
 
> We should probably chat about this. (Maybe you already have and I'm out of
> the loop?) It is especially important as the number of PTs increases.
>
> I'm happy to take this discussion (or a subset of it) public, if you think
> it'll help others. I just didn't want to spam tor-dev/tor-assistants with
> this initial email.
>

Yes, let's take it public.
Feel free to CC tor-dev in your next reply.

Done.
 
BTW, on the topic of deploying your PT, have you seen:
https://lists.torproject.org/pipermail/tor-dev/2013-August/005231.html ?

FTE seems to be missing out on the code quality front.
The Python code is quite complex and undocumented. There is also some
C++ code in the codebase that is also complex and undocumented. We
have decided that we won't ship C/C++ code in PTs, except if it's dead
easy to review or if it has gotten heavily scrutinized. Any chance
that the C++ code could be written in a memory-safe language?

Roughly, FTE has an offline mode (building DFAs, needs to be done once) and an online mode (transporting data, deployed to everyone.) In terms of C++ code, there are ~300 line of C++ code used in online mode that were implemented for performance-critical algorithms. Implementing this code in Python will slow FTE down by at least an order of magnitude. I'll document why I made this decision.

Alternatively, any suggestions for a memory-safe language that allows hooks for Python and affords the same performance as C++?

In terms of reducing complexity of the Python code, and increasing code documentation, do you have concrete suggestions? It would be great if you could raise a few issues on FTE's github [7]. My time is limited and I would prefer to focus on the things you care about.

Thanks,
Kevin 

[7] https://github.com/redjack/FTE