[tor-talk] High-latency hidden services

The Doctor drwho at virtadpt.net
Tue Jul 8 20:21:34 UTC 2014

Hash: SHA512

On 07/03/2014 03:16 PM, Seth David Schoen wrote:

> That's great, but in the context of this thread I would want to
> imagine a future-generation version that does a much better job of
> hiding who is downloading which pages -- by high-latency mixing,
> like an anonymous remailer chain.

I realized that too late; thank you for pointing that out.

I've been thinking a bit about this lately, and I think it might be

A while back I chanced across a description of how Richard Stallman
browses the Net much of the time.  He uses a Perl script which is
executed by Postfix via an e-mail alias.  If the sender's e-mail
address matches one hardcoded in the config file, it parses the e-mail
for URLs to grab and then uses LWP::UserAgent to download the URL and
e-mail it back to the script's owner.

The Git repo with the implementation:


So... I've been toying with this idea but haven't had time to sit down
and implement it yet:

It would be possible to write a relatively simple utility that runs as
a hidden service; perhaps on the user's (virtual) machine, perhaps on
a known Tor hidden service node.  Perhaps it doesn't use a hidden
service for itself but only listens on the loopback interface on a
high port, and the user connects to http://localhost:9393/ from within
the TBB.  Perhaps any of those options, dependent upon a command line
switch or configuration file setting.  The user connects to the
application and types or pastes a URL into a field.  The utility
accepts the URL, verifies that it's a well formed URL, and records it
internally, perhaps in a queue.  Every once in a while on a
pseudorandom basis (computers, 'true' randomness, we've all seen the
mailing list threads) the utility wakes up, picks the oldest URL in
its queue out, and tries to download whatever it points to through the
Tor network.

If it successfully acquires an HTML page it could then attempt to
parse it (using something like Beautiful Soup, maybe) to verify that
it was a fully downloaded and validated HTML page.  It would also pick
through the parsed tags for things like CSS or images, construct URLs
to download them using the original URL (if no full URLs to them are
in the HTML), and add them to the queue of things to get.  It doesn't
seem unreasonable to rewrite the HTML to make links to those
additional resources local instead of remote (./css/foo.css instead of
css/foo.css) so the additional files downloaded would be referenced by
the browser.  It also doesn't seem unreasonable that a particular
instance of this utility could be configured to ignore certain kinds
of resources (no .js files, no images, no CSS files) and snip tags
that reference them from the HTML entirely.  When the resources for
the page in question are fully downloaded (none are left in the queue)
the user is alerted somehow (which suggests a personal application but
there are other ways of notifying users).

The timeframe in which an entire page could be downloaded could be
extremely long, from seconds between requests, to requiring a new
circuit for each request, to even weeks or months to grab an entire page.

I don't know if such a thing could be written as a distributed
application (lots of instances of this utility spread across a
percentage of the Tor network keeping each other appraised of bits and
pieces of web pages to download and send someplace).  I'll admit that
I've never tried to write such a thing before.  The security profile
of such a thing would certainly be a concern.

Representing each page and its resources in memory would take a little
doing but is far from impossible.  Depending on the user's threat
model it may not be desirable to cache the page+resources on disk
(holding them in RAM but making them accessible to the web browser,
say, with a simple HTTP server listening on the loopback on a high
port (I'm thinking instead of http://localhost:9393/ the user would
access http://localhost:9393/pages/foo)), or the user may be
comfortable with creating a subdirectory to hold the resources of a
single page.  This is the technique that Scrapbook uses, and being
workable aside seems very easy to implement:

~/.mozilla/firefox/<profile name>/Scrapbook/data/<datestamp>/<web page
and all resources required to view it stored here in a single directory>

A problem that would probably arise is Tor circuits dropping at odd
intervals due to the phase of the moon, Oglogoth grinding its teeth,
sunspots, or whatever and the connection timing out or dropping.  I'm
not sure how to handle this yet.  Another potential problem is a user
browsing a slowly downloaded page and clicking a link, which the
browser would then jump directly to and avoiding slow-download
entirely.  Warn the user this will happen?  Rewrite or remove the
links?  I'm not sure yet what the Right Thing To Do(tm) would be.
There are undoubtedly other gotchas that I haven't thought of or run
into yet which others will notice immediately.

- -- 
The Doctor [412/724/301/703] [ZS]
Developer, Project Byzantium: http://project-byzantium.org/

PGP: 0x807B17C1 / 7960 1CDC 85C9 0B63 8D9F  DD89 3BD8 FF2B 807B 17C1
WWW: https://drwho.virtadpt.net/

"So many paths lead to tomorrow/what love has lost, can you forgive?"
- --The Cruxshadows



More information about the tor-talk mailing list