[tor-talk] [tor-relays] Is Kaspersky right to be concerned?

grarpamp grarpamp at gmail.com
Wed Mar 12 03:30:42 UTC 2014


> If I understand you correctly, you basically state that their
> data does not match the underlying set.

I erred in threading the link as part of that reply, which was meant
to prior talk on what a casual surfer/reporter might see and surmise
about hidden services.

The paper... is good. But maybe not the whole picture the casual
reader of it might believe it to be either.

> they used public *and* private onions

Given current onion discovery mechanics as in the paper, public vs.
private is defined largely by the access restrictions operators put
up, not by whether they posted the address somewhere or not.
(descriptor-cookie and stealth auth may have more to say about
that.)


Revisiting categorization...

- They find 44% which some might call 'bad', at least for such broad
definitions of bad as 'adult, drugs, counterfeit, weapons'. That
means there is 56% good, for those definitions. Of that 44% bad...

a - They don't break down the 17% 'adult' further into what is commonly
divided as legal/good/commercial (18+) and illegal/bad (underage).

b - If you surf around, a large amount of the other 27% appear
to actually be one page scams and trolls of various kinds. So thus
other than being in their own category of bogus, they're not really
the genuine 'original bad'.

- The paper is focused on the mechanics and meta level rather than
a deeper editorial content analysis of onionspace. That's its chosen
scope and not a fault. (Oppurtunity exists for someone to do that
analysis project.)

- However they fail to list any of the 'good' non index/search/host/wiki
services in their top 547 popularity list. Inline they mentioned
some by name (wikileaks, strongbox, etc), by type (politics, games,
libraries), and should have seen other obvious ones that would be
likely to rank (the popular general social and discussion services
can see well over 100 posts/users/connections a day [compare that
to their lowest request counts]). A big table of only bad/neutral
doesn't present fairly. Though the paper doesn't read an anti good
HS bias, the reason for this table omission is unknown.


Using the paper to claim 'the vast majority of hidden services are
not socially laudable' would be difficult. (Especially considering
that what a lot of people complain about is just free speech that
they don't happen to like/understand and can't easily censor. Big
range between words like illegal, laudable, protected, activism and
so forth.) The last paragraph and a third of their conclusion is
spot on.

As to unseen services...

- When all you have is a TCP port serving http, unless a forward
URL list is published by the admin or users somewhere, what is
effectively your 'GET / request to an IP address' cannot possibly
discover all the virtual hosting and pages any given onion may
serve within.

- Missing are specific mention of many services/protocols we know
for a fact are online in onionspace such as sftp, nntp, xmpp, imap,
smtp, torrent, telnet, onioncat, etc. (Though perhaps generalized
by 'less than 50 each of those ... found 495 unique ports total,')

- No real mention is made of pairs for which they could identify
the protocol but could not access further due to various user facing
authentication methods. (The '5% ssh' may be a partial start there.)

- There were 15k onions (and their would be ports) which were offline
during the scans. So the purpose behind their existance remained
unknown. Same for the 1k previously active onion:port pairs offline
during the content phase.

- The scale may not have been large or long enough to collect a
full snapshot of the entire/intermittant onion space. They discuss
scale factors in their former paper, here there is little. Then
only 62% of onions collected were portscanned. And of those, scanning
only reached 87% coverage. Presumably ports 1-65535 were scanned
but stated is only 'full scan'.


Regardless of whether you're just manually browsing around the
surface, using the best crawler indexes, or using the methods in
the paper... there's still more riding on top of anonymity networks
than you think, and more than your probe will ever reveal or be
granted access to "see".

Just something to keep in mind when covering and balancing what is
out there.


ps: The likely reason they're seeing clients requesting nonexistant
descriptors is because there are other projects polling other
[old/defaced] onion lists. (one answer to pg 3 pp 2).


More information about the tor-talk mailing list