HSDir Auth and onion descriptor scraping

George K: I suspect that HS authorization is very rare in the current network, and if we believe it's a useful tool, it might be worthwhile to make it more useable by people.
Is anyone making their HSDir onion descriptor scraping patches available somewhere? I'd suspect the rarity of HS authorization could also be determined with that since some fields would be obfuscated and thus not match patterns. s/scraping/logging/ rend--spec.txt: 2. Authentication and authorization. 2.1. Service with large-scale client authorization 2.2. Authorization for limited number of clients 2.3. Hidden service configuration 2.4. Client configuration

I have several hundred thousand (or million? Haven't counted) hs descriptors saved on my hard disk from a data collection experiment (from 70k HSes). I'm a bit nervous about sharing these en masse as whilst not confidential they're supposed to be difficult to obtain in this quantity. However, if someone wants to write a quick script that goes through all of them and counts the number of authenticated vs nonauthed then I do not mind running it on the dataset and publishing the results. I have a directory where each file is a hs descriptor. The introduction point data is base64 encoded plaibtext when unauthed or has high entropy otherwise. Best Gareth On 19:06, 9 Nov 2014, at 19:06, grarpamp <grarpamp@gmail.com> wrote:
George K: I suspect that HS authorization is very rare in the current network, and if we believe it's a useful tool, it might be worthwhile to make it more useable by people.
Is anyone making their HSDir onion descriptor scraping patches available somewhere? I'd suspect the rarity of HS authorization could also be determined with that since some fields would be obfuscated and thus not match patterns.
s/scraping/logging/
rend--spec.txt: 2. Authentication and authorization. 2.1. Service with large-scale client authorization 2.2. Authorization for limited number of clients 2.3. Hidden service configuration 2.4. Client configuration _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

On Sun, Nov 9, 2014 at 3:22 PM, Gareth Owen <gareth.owen@port.ac.uk> wrote:
I have several hundred thousand (or million? Haven't counted) hs descriptors saved on my hard disk from a data collection experiment (from 70k HSes). I'm a bit nervous about sharing these en masse as whilst not confidential they're supposed to be difficult to obtain in this quantity. However, if someone wants to write a quick script that goes through all of them and counts the number of authenticated vs nonauthed then I do not mind running it on the dataset and publishing the results. I have a directory where each file is a hs descriptor.
The introduction point data is base64 encoded plaibtext when unauthed or has high entropy otherwise.
What version descriptors are you collecting? There are a few reports I could think to run against your dataset, even if the IntroPoints were replaced with 127.0.0.n (n set to 1, 2, 3, n for each IntroPoint in respective descriptors list)... or even 1:1 mapped for all descriptors either a) randomly into a new parallel IPv4/IPv6 space (dot-quad), or b) serially into a respective 32 or 128 bit number (not dot-quad). Whether on or off list I could use your collection patches, and a raw sample of a single recent on disk descriptor from a public service such as hbjw7wjeoltskhol or kpvz7ki2v5agwt35 so we know your data format. It's effectively public info anyways, I'll get to it sooner or later, others already have.

Grarpamp I'm only not publishing it because of privacy concerns - ultimately some HS operators might not wish to have their existence publically known.. I would be open to supplying it to bona fide and verifiable tor project members if it is for a legitimate research purpose. I am collecting version 2 descriptors. I have exactly 445994 hidden service descriptors - for approximately 70,000 unique hidden services. I do not believe the introduction points are secret, having a list of IPs doesn't help you connect to the hidden service. Best Gareth On 9 November 2014 23:39, grarpamp <grarpamp@gmail.com> wrote:
On Sun, Nov 9, 2014 at 3:22 PM, Gareth Owen <gareth.owen@port.ac.uk> wrote:
I have several hundred thousand (or million? Haven't counted) hs descriptors saved on my hard disk from a data collection experiment (from 70k HSes). I'm a bit nervous about sharing these en masse as whilst not confidential they're supposed to be difficult to obtain in this quantity. However, if someone wants to write a quick script that goes through all of them and counts the number of authenticated vs nonauthed then I do not mind running it on the dataset and publishing the results. I have a directory where each file is a hs descriptor.
The introduction point data is base64 encoded plaibtext when unauthed or has high entropy otherwise.
What version descriptors are you collecting?
There are a few reports I could think to run against your dataset, even if the IntroPoints were replaced with 127.0.0.n (n set to 1, 2, 3, n for each IntroPoint in respective descriptors list)... or even 1:1 mapped for all descriptors either a) randomly into a new parallel IPv4/IPv6 space (dot-quad), or b) serially into a respective 32 or 128 bit number (not dot-quad).
Whether on or off list I could use your collection patches, and a raw sample of a single recent on disk descriptor from a public service such as hbjw7wjeoltskhol or kpvz7ki2v5agwt35 so we know your data format.
It's effectively public info anyways, I'll get to it sooner or later, others already have. _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
-- Dr Gareth Owen Senior Lecturer Forensic Computing Course Leader School of Computing, University of Portsmouth *Office:* BK1.25 *Tel:* +44 (0)2392 84 (6423) *Web*: ghowen.me

Gareth Owen <gareth.owen@port.ac.uk> writes:
Grarpamp
I'm only not publishing it because of privacy concerns - ultimately some HS operators might not wish to have their existence publically known.. I would be open to supplying it to bona fide and verifiable tor project members if it is for a legitimate research purpose.
I am collecting version 2 descriptors. I have exactly 445994 hidden service descriptors - for approximately 70,000 unique hidden services. I do not believe the introduction points are secret, having a list of IPs doesn't help you connect to the hidden service.
From the number of introduction points you might be able to deduce the popularity of the hidden service. Fortunately, this feature doesn't work very well: https://trac.torproject.org/projects/tor/ticket/8950
participants (3)
-
Gareth Owen
-
George Kadianakis
-
grarpamp