[tor-dev] HSDir Auth and onion descriptor scraping

Gareth Owen gareth.owen at port.ac.uk
Mon Nov 10 10:25:11 UTC 2014


Grarpamp

I'm only not publishing it because of privacy concerns - ultimately some HS
operators might not wish to have their existence publically known..  I
would be open to supplying it to bona fide and verifiable tor project
members if it is for a legitimate research purpose.

I am collecting version 2 descriptors.  I have exactly 445994 hidden
service descriptors - for approximately 70,000 unique hidden services.  I
do not believe the introduction points are secret, having a list of IPs
doesn't help you connect to the hidden service.

Best
Gareth

On 9 November 2014 23:39, grarpamp <grarpamp at gmail.com> wrote:

> On Sun, Nov 9, 2014 at 3:22 PM, Gareth Owen <gareth.owen at port.ac.uk>
> wrote:
> > I have several hundred thousand (or million? Haven't counted) hs
> descriptors
> > saved on my hard disk from a data collection experiment (from 70k HSes).
> > I'm a bit nervous about sharing these en masse as whilst not confidential
> > they're supposed to be difficult to obtain in this quantity.  However, if
> > someone wants to write a quick script that goes through all of them and
> > counts the number of authenticated vs nonauthed then I do not mind
> running
> > it on the dataset and publishing the results.  I have a directory where
> each
> > file is a hs descriptor.
> >
> > The introduction point data is base64 encoded plaibtext when unauthed or
> has
> > high entropy otherwise.
>
> What version descriptors are you collecting?
>
> There are a few reports I could think to run against your dataset, even if
> the IntroPoints were replaced with 127.0.0.n (n set to 1, 2, 3, n for each
> IntroPoint in respective descriptors list)... or even 1:1 mapped for all
> descriptors either a) randomly into a new parallel IPv4/IPv6 space
> (dot-quad),
> or b) serially into a respective 32 or 128 bit number (not dot-quad).
>
> Whether on or off list I could use your collection patches, and a raw
> sample of a single recent on disk descriptor from a public service such as
> hbjw7wjeoltskhol or kpvz7ki2v5agwt35 so we know your data format.
>
> It's effectively public info anyways, I'll get to it sooner or later,
> others
> already have.
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>



-- 
Dr Gareth Owen
Senior Lecturer
Forensic Computing Course Leader
School of Computing, University of Portsmouth

*Office:* BK1.25
*Tel:* +44 (0)2392 84 (6423)
*Web*: ghowen.me
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20141110/56f5f1be/attachment.html>


More information about the tor-dev mailing list