On Mon, Jan 12, 2015 at 06:26:14PM +0100, Tom van der Woerdt wrote:
On 12 Jan 2015, at 16:25, Philipp Winter phw@nymity.ch wrote: Versions | Amount total | Amount w/o duplicate hosts ---------+---------------+--------------------------- 1 and 2 | 34,648 (9%) | 21,552 (23%)
We debugged this last week on IRC, as 1,2 is an invalid combination according to the specification. After correlating the ip addresses, we concluded that this is GFW scanning and not actual client usage.
I'm sure some of the 1+2 is GFW scanning, but probably not all of it. Mainstream tor definitely sends 1+2 when using a v2 handshake.
https://gitweb.torproject.org/tor.git/tree/src/or/connection_or.c?id=b0c3210...
/** Array of recognized link protocol versions. */ static const uint16_t or_protocol_versions[] = { 1, 2, 3, 4 }; /** Number of versions in <b>or_protocol_versions</b>. */ static const int n_or_protocol_versions = (int)( sizeof(or_protocol_versions)/sizeof(uint16_t) );
/** Send a VERSIONS cell on <b>conn</b>, telling the other host about the * link protocol versions that this Tor can support. * * If <b>v3_plus</b>, this is part of a V3 protocol handshake, so only * allow protocol version v3 or later. If not <b>v3_plus</b>, this is * not part of a v3 protocol handshake, so don't allow protocol v3 or * later. **/ int connection_or_send_versions(or_connection_t *conn, int v3_plus) { var_cell_t *cell; int i; int n_versions = 0; const int min_version = v3_plus ? 3 : 0; const int max_version = v3_plus ? UINT16_MAX : 2; tor_assert(conn->handshake_state && !conn->handshake_state->sent_versions_at); cell = var_cell_new(n_or_protocol_versions * 2); cell->command = CELL_VERSIONS; for (i = 0; i < n_or_protocol_versions; ++i) { uint16_t v = or_protocol_versions[i]; if (v < min_version || v > max_version) continue; set_uint16(cell->payload+(2*n_versions), htons(v)); ++n_versions; } cell->payload_len = n_versions * 2;
connection_or_write_var_cell_to_buf(cell, conn); conn->handshake_state->sent_versions_at = time(NULL);
var_cell_free(cell); return 0; }
Are you sure you are deduplicating correctly? That's a lot of hosts.
Even if it were only GFW probing, GFW rarely uses duplicate IPs, except for a few. Most IPs you will only see once or twice over the course of months.
David Fifield