[tor-dev] GSoC 2017 - Questions

Thu Mar 30 18:04:18 UTC 2017

On Mon, Mar 27, 2017 at 9:17 PM, Krishna Shukla
<karatekrishna at hotmail.com> wrote:
> Hey,
>
> I'm Krishna Shukla, I'm studying a bachelors of computer science at the
> University of Queensland.
> I guess the relevant subjects I've studied so far covers C and Unix
> programming, Computer Networks, Algorithms and Data Structures, and
> Programming in the large. (got a high distinction in all the above)
>
> My most important question is if I could work on a project but not actually
> be apart of GSoC? - I am unfortunately ineligible as my brother works as a
> Security Engineer at Google Sydney. And if the above is okay would it also
> be okay to not have to strictly abide by their timeline as I don't actually
> have holidays during this time in Australia but I'd like to contribute in my
> free time nonetheless!

Sure; we are always happy to accept volunteers!

You might want to try something simpler than this for your first patch
or two -- it's generally better to get practice with smaller things
before you move to something big and complex.  The documents in
doc/HACKING inside the Tor git repository might be a good place to
find good starting advice.

> As for projects themselves I'm really interested in the relay crypto
> parallelism and the hidden service crypto parallelism. And I have a couple
> of questions regarding them.
>
> For the relay crypto parallelism I wanted to know what is there left to be
> done? When I looked at the tickey #1749 someone called towelenee made a few
> patches that already made it multi threaded, were these changes just not
> accepted?

So, those patches just can't work as they're written now.  To begin
with, they launch a new thread for every hop in an outgoing circuit,
and then they wait for every such thread to finish.  They also have a
pretty serious race condition in their handling of the payload they're
supposed to be encrypting. And finally, they only handle client-side
circuit crypto -- not relay-side crypto at all.

A better approach, and the approach we were hoping for, would be to
parallelize crypto by circuit, not by hop: and to use long-lived
worker threads, and not using one thread per circuit or (worse) one
thread per cell per hop.

>   Also wanted to know if specific knowledge about circuit
> cryptography was required? As I know of it, but I certainly cannot make my
> own fully homomorphic cryptosystem, is it more in the steps of the system
> has already been made, it just needs to be parallelised correctly?

Right; all our crypto is implemented in Tor right now.  I'm not sure
why you're mentioning homomorphic cryptosystems; we don't need one of
those here.

> It also states the code is written to expect immediate responses, I'm not
> sure what you mean by that, after all there is always a slight delay, and if
> it becomes multi threaded we can never know what is running what when, so is
> it more someone is waiting at the other end of a socket and needs it ASAP,
> or is it internally things want the answer quickly (in which case I don't
> know how to solve it other than uses mutexes which is probably not so okay)?

Maybe have a look how we use the function relay_crypt() to see what we
mean here?   A more precise thing to say would have been that the
calls to relay_crypt() are written to block until relay_crypt() is
finished.  But instead, if the work of relay_crypt() is to be done in
another thread, then the functions that call it today need to queue
the work to be done by another thread ... and then continue safely.

> I am interested in the hidden service crypto parallelism in its own right,
> but I was also thinking weather it would be a feasible idea to combine the
> two projects and create a multi-threaded decryption library that could be
> linked to both the tor relay and the hidden services (could release it as a
> cryptosystem library, all the fully homorphic cryptosystem libraries I found
> used GPL licenses and thus not compatible with tors), or are their
> requirements too far apart?

So, the "hidden service" crypto in question here is a set of public
key operations.  But having a separate library for this probably isn't
the right design, IMO.  The idea is not to split _each operation_
across multiple CPU cores, but rather to handle _multiple operations_
by doing them on multiple cores.  The code for each individual
operation could remain single-threaded.

If you'd like to see an example of how we do this in Tor today for our
server-side circuit extension handshakes, have a look at workqueue.c
and onion.c in the Tor source code, to get a sense of how they work
together.  You'll notice that the crypto operations themselves are
handled in regular single-threaded code (eg in onion_ntor.c), and that
the parallelism happens on a higher level than a single crypto
operation.

> Also I was wondering how the Ahmia automated blacklisting was planned to
> work?

This isn't something I've been working on; maybe somebody else can
answer this question.  (Ahmia folks?)

best wishes,
-- 
Nick