George K: I suspect that HS authorization is very rare in the current network, and if we believe it's a useful tool, it might be worthwhile to make it more useable by people.
Is anyone making their HSDir onion descriptor scraping patches available somewhere? I'd suspect the rarity of HS authorization could also be determined with that since some fields would be obfuscated and thus not match patterns.
s/scraping/logging/
rend--spec.txt: 2. Authentication and authorization. 2.1. Service with large-scale client authorization 2.2. Authorization for limited number of clients 2.3. Hidden service configuration 2.4. Client configuration
I have several hundred thousand (or million? Haven't counted) hs descriptors saved on my hard disk from a data collection experiment (from 70k HSes). I'm a bit nervous about sharing these en masse as whilst not confidential they're supposed to be difficult to obtain in this quantity. However, if someone wants to write a quick script that goes through all of them and counts the number of authenticated vs nonauthed then I do not mind running it on the dataset and publishing the results. I have a directory where each file is a hs descriptor.
The introduction point data is base64 encoded plaibtext when unauthed or has high entropy otherwise.
Best Gareth
On 19:06, 9 Nov 2014, at 19:06, grarpamp grarpamp@gmail.com wrote:
George K: I suspect that HS authorization is very rare in the current network, and if we believe it's a useful tool, it might be worthwhile to make it more useable by people.
Is anyone making their HSDir onion descriptor scraping patches available somewhere? I'd suspect the rarity of HS authorization could also be determined with that since some fields would be obfuscated and thus not match patterns.
s/scraping/logging/
rend--spec.txt: 2. Authentication and authorization. 2.1. Service with large-scale client authorization 2.2. Authorization for limited number of clients 2.3. Hidden service configuration 2.4. Client configuration _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
On Sun, Nov 9, 2014 at 3:22 PM, Gareth Owen gareth.owen@port.ac.uk wrote:
I have several hundred thousand (or million? Haven't counted) hs descriptors saved on my hard disk from a data collection experiment (from 70k HSes). I'm a bit nervous about sharing these en masse as whilst not confidential they're supposed to be difficult to obtain in this quantity. However, if someone wants to write a quick script that goes through all of them and counts the number of authenticated vs nonauthed then I do not mind running it on the dataset and publishing the results. I have a directory where each file is a hs descriptor.
The introduction point data is base64 encoded plaibtext when unauthed or has high entropy otherwise.
What version descriptors are you collecting?
There are a few reports I could think to run against your dataset, even if the IntroPoints were replaced with 127.0.0.n (n set to 1, 2, 3, n for each IntroPoint in respective descriptors list)... or even 1:1 mapped for all descriptors either a) randomly into a new parallel IPv4/IPv6 space (dot-quad), or b) serially into a respective 32 or 128 bit number (not dot-quad).
Whether on or off list I could use your collection patches, and a raw sample of a single recent on disk descriptor from a public service such as hbjw7wjeoltskhol or kpvz7ki2v5agwt35 so we know your data format.
It's effectively public info anyways, I'll get to it sooner or later, others already have.
Grarpamp
I'm only not publishing it because of privacy concerns - ultimately some HS operators might not wish to have their existence publically known.. I would be open to supplying it to bona fide and verifiable tor project members if it is for a legitimate research purpose.
I am collecting version 2 descriptors. I have exactly 445994 hidden service descriptors - for approximately 70,000 unique hidden services. I do not believe the introduction points are secret, having a list of IPs doesn't help you connect to the hidden service.
Best Gareth
On 9 November 2014 23:39, grarpamp grarpamp@gmail.com wrote:
On Sun, Nov 9, 2014 at 3:22 PM, Gareth Owen gareth.owen@port.ac.uk wrote:
I have several hundred thousand (or million? Haven't counted) hs
descriptors
saved on my hard disk from a data collection experiment (from 70k HSes). I'm a bit nervous about sharing these en masse as whilst not confidential they're supposed to be difficult to obtain in this quantity. However, if someone wants to write a quick script that goes through all of them and counts the number of authenticated vs nonauthed then I do not mind
running
it on the dataset and publishing the results. I have a directory where
each
file is a hs descriptor.
The introduction point data is base64 encoded plaibtext when unauthed or
has
high entropy otherwise.
What version descriptors are you collecting?
There are a few reports I could think to run against your dataset, even if the IntroPoints were replaced with 127.0.0.n (n set to 1, 2, 3, n for each IntroPoint in respective descriptors list)... or even 1:1 mapped for all descriptors either a) randomly into a new parallel IPv4/IPv6 space (dot-quad), or b) serially into a respective 32 or 128 bit number (not dot-quad).
Whether on or off list I could use your collection patches, and a raw sample of a single recent on disk descriptor from a public service such as hbjw7wjeoltskhol or kpvz7ki2v5agwt35 so we know your data format.
It's effectively public info anyways, I'll get to it sooner or later, others already have. _______________________________________________ tor-dev mailing list tor-dev@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
Gareth Owen gareth.owen@port.ac.uk writes:
Grarpamp
I'm only not publishing it because of privacy concerns - ultimately some HS operators might not wish to have their existence publically known.. I would be open to supplying it to bona fide and verifiable tor project members if it is for a legitimate research purpose.
I am collecting version 2 descriptors. I have exactly 445994 hidden service descriptors - for approximately 70,000 unique hidden services. I do not believe the introduction points are secret, having a list of IPs doesn't help you connect to the hidden service.
From the number of introduction points you might be able to deduce the
popularity of the hidden service. Fortunately, this feature doesn't work very well: https://trac.torproject.org/projects/tor/ticket/8950