[tor-talk] High-latency hidden services

Aymeric Vitte vitteaymeric at gmail.com
Wed Jul 9 08:38:14 UTC 2014

According to your description you intend to reconsitute the page 
removing eventually what can be dangerous, this is very difficult to do 
(assuming that you want this page to behave like a real one and not like 
opening something similar to offline/mypage.html from your disk and 
assuming that you want to use the browsers as they are, ie not using 
plugins/extensions and not hacking into the code), I have described how 
it can be done in [1]

But finally, if the interesting information are some resources to be 
fetched from this page, then [2] does apply and is from far much more 
easy to do.

You can look at [3] to [6] which are projects to fetch/parse a page on 
server side (headless browser, handling js too) and extract things from 
it, the same principles apply on browser side for what people want to do 
here, when the fecthing is coupled to [7] it does provide anonymity, 
whether on browser or server side.

[1] https://lists.torproject.org/pipermail/tor-talk/2014-July/033636.html
[2] https://lists.torproject.org/pipermail/tor-talk/2014-July/033697.html
[3] https://github.com/Ayms/node-dom
[4] https://github.com/Ayms/node-bot
[5] https://github.com/Ayms/node-gadgets
[6] https://github.com/Ayms/node-googleSearch
[7] https://github.com/Ayms/node-Tor

Le 08/07/2014 22:21, The Doctor a écrit :
> Hash: SHA512
> On 07/03/2014 03:16 PM, Seth David Schoen wrote:
>> That's great, but in the context of this thread I would want to
>> imagine a future-generation version that does a much better job of
>> hiding who is downloading which pages -- by high-latency mixing,
>> like an anonymous remailer chain.
> I realized that too late; thank you for pointing that out.
> I've been thinking a bit about this lately, and I think it might be
> doable.
> A while back I chanced across a description of how Richard Stallman
> browses the Net much of the time.  He uses a Perl script which is
> executed by Postfix via an e-mail alias.  If the sender's e-mail
> address matches one hardcoded in the config file, it parses the e-mail
> for URLs to grab and then uses LWP::UserAgent to download the URL and
> e-mail it back to the script's owner.
> The Git repo with the implementation:
> git://git.gnu.org/womb/hacks.git
> So... I've been toying with this idea but haven't had time to sit down
> and implement it yet:
> It would be possible to write a relatively simple utility that runs as
> a hidden service; perhaps on the user's (virtual) machine, perhaps on
> a known Tor hidden service node.  Perhaps it doesn't use a hidden
> service for itself but only listens on the loopback interface on a
> high port, and the user connects to http://localhost:9393/ from within
> the TBB.  Perhaps any of those options, dependent upon a command line
> switch or configuration file setting.  The user connects to the
> application and types or pastes a URL into a field.  The utility
> accepts the URL, verifies that it's a well formed URL, and records it
> internally, perhaps in a queue.  Every once in a while on a
> pseudorandom basis (computers, 'true' randomness, we've all seen the
> mailing list threads) the utility wakes up, picks the oldest URL in
> its queue out, and tries to download whatever it points to through the
> Tor network.
> If it successfully acquires an HTML page it could then attempt to
> parse it (using something like Beautiful Soup, maybe) to verify that
> it was a fully downloaded and validated HTML page.  It would also pick
> through the parsed tags for things like CSS or images, construct URLs
> to download them using the original URL (if no full URLs to them are
> in the HTML), and add them to the queue of things to get.  It doesn't
> seem unreasonable to rewrite the HTML to make links to those
> additional resources local instead of remote (./css/foo.css instead of
> css/foo.css) so the additional files downloaded would be referenced by
> the browser.  It also doesn't seem unreasonable that a particular
> instance of this utility could be configured to ignore certain kinds
> of resources (no .js files, no images, no CSS files) and snip tags
> that reference them from the HTML entirely.  When the resources for
> the page in question are fully downloaded (none are left in the queue)
> the user is alerted somehow (which suggests a personal application but
> there are other ways of notifying users).
> The timeframe in which an entire page could be downloaded could be
> extremely long, from seconds between requests, to requiring a new
> circuit for each request, to even weeks or months to grab an entire page.
> I don't know if such a thing could be written as a distributed
> application (lots of instances of this utility spread across a
> percentage of the Tor network keeping each other appraised of bits and
> pieces of web pages to download and send someplace).  I'll admit that
> I've never tried to write such a thing before.  The security profile
> of such a thing would certainly be a concern.
> Representing each page and its resources in memory would take a little
> doing but is far from impossible.  Depending on the user's threat
> model it may not be desirable to cache the page+resources on disk
> (holding them in RAM but making them accessible to the web browser,
> say, with a simple HTTP server listening on the loopback on a high
> port (I'm thinking instead of http://localhost:9393/ the user would
> access http://localhost:9393/pages/foo)), or the user may be
> comfortable with creating a subdirectory to hold the resources of a
> single page.  This is the technique that Scrapbook uses, and being
> workable aside seems very easy to implement:
> ~/.mozilla/firefox/<profile name>/Scrapbook/data/<datestamp>/<web page
> and all resources required to view it stored here in a single directory>
> A problem that would probably arise is Tor circuits dropping at odd
> intervals due to the phase of the moon, Oglogoth grinding its teeth,
> sunspots, or whatever and the connection timing out or dropping.  I'm
> not sure how to handle this yet.  Another potential problem is a user
> browsing a slowly downloaded page and clicking a link, which the
> browser would then jump directly to and avoiding slow-download
> entirely.  Warn the user this will happen?  Rewrite or remove the
> links?  I'm not sure yet what the Right Thing To Do(tm) would be.
> There are undoubtedly other gotchas that I haven't thought of or run
> into yet which others will notice immediately.
> - -- 
> The Doctor [412/724/301/703] [ZS]
> Developer, Project Byzantium: http://project-byzantium.org/
> PGP: 0x807B17C1 / 7960 1CDC 85C9 0B63 8D9F  DD89 3BD8 FF2B 807B 17C1
> WWW: https://drwho.virtadpt.net/
> "So many paths lead to tomorrow/what love has lost, can you forgive?"
> - --The Cruxshadows
> VfACPXeO8SKbjyLQxU5RYVS0Q/nwS8QgSr/cCc7tqjtmkGypolJFsFIeKLc4m1bG
> 93H7FWoSDO++oiWyqxYBb7+q6CzktAGFesb0YFUtIe5ADKTVIqcynWD++6NByN0v
> rPpd5awppjL2f8r4l+bNBRWpk2d7KpilAG6KAwsQyDyvmLJWw9Pr2yN0o6SrrpHl
> hxK7jti6HAt2pMuFxbl3mI3MMN717XDymE04CLkGopiprhF0YAk7K0tPEak69e7/
> OD2AI2O0nLZzWZUdX9zrYs9OICuVfzVf0XUeTNKogh/30UBw3KdqNNPcVFbajnEe
> YBeAe6iNnDIgE70nv4OiIuFL9XO4rLmNOvCB9F3mRqIJVl/8mq7WTQeVBGt+dmz3
> 7srHnR1nenCmTHnyfaKYtn4+N0TGhdXHLR3e/4+v4RmU+Zueo/wf4ggM0ZUlfUVk
> JRPhfdDe9bx3O+nYIObPd5V0/atAHuXJjh9SJasOaxnQjiye75wZuswRa3vR5pdz
> Sd4vSWi6eQ+9s4xlD+dy5309yxQTDyFbr/O7lshtLPx51PC8ObU2+NMJMQY2HhyU
> pgLL6Gj6W2evwJAsS7pqQZNLzV3XdJwgNka2zO1XVES5/odKlvuItXDCC4g5ftxI
> 75FDhASj5VzcZjOJqcAc
> =LLtg

Peersm : http://www.peersm.com
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms

More information about the tor-talk mailing list