David Fifield schreef op 12/01/15 om 18:46:
On Mon, Jan 12, 2015 at 06:26:14PM +0100, Tom van der Woerdt wrote:
On 12 Jan 2015, at 16:25, Philipp Winter phw@nymity.ch wrote: Versions | Amount total | Amount w/o duplicate hosts ---------+---------------+--------------------------- 1 and 2 | 34,648 (9%) | 21,552 (23%)
We debugged this last week on IRC, as 1,2 is an invalid combination according to the specification. After correlating the ip addresses, we concluded that this is GFW scanning and not actual client usage.
I'm sure some of the 1+2 is GFW scanning, but probably not all of it. Mainstream tor definitely sends 1+2 when using a v2 handshake.
https://gitweb.torproject.org/tor.git/tree/src/or/connection_or.c?id=b0c3210...
/** Array of recognized link protocol versions. */ static const uint16_t or_protocol_versions[] = { 1, 2, 3, 4 }; /** Number of versions in <b>or_protocol_versions</b>. */ static const int n_or_protocol_versions = (int)( sizeof(or_protocol_versions)/sizeof(uint16_t) );
/** Send a VERSIONS cell on <b>conn</b>, telling the other host about the
- link protocol versions that this Tor can support.
- If <b>v3_plus</b>, this is part of a V3 protocol handshake, so only
- allow protocol version v3 or later. If not <b>v3_plus</b>, this is
- not part of a v3 protocol handshake, so don't allow protocol v3 or
- later.
**/ int connection_or_send_versions(or_connection_t *conn, int v3_plus) { var_cell_t *cell; int i; int n_versions = 0; const int min_version = v3_plus ? 3 : 0; const int max_version = v3_plus ? UINT16_MAX : 2; tor_assert(conn->handshake_state && !conn->handshake_state->sent_versions_at); cell = var_cell_new(n_or_protocol_versions * 2); cell->command = CELL_VERSIONS; for (i = 0; i < n_or_protocol_versions; ++i) { uint16_t v = or_protocol_versions[i]; if (v < min_version || v > max_version) continue; set_uint16(cell->payload+(2*n_versions), htons(v)); ++n_versions; } cell->payload_len = n_versions * 2;
connection_or_write_var_cell_to_buf(cell, conn); conn->handshake_state->sent_versions_at = time(NULL);
var_cell_free(cell); return 0; }
Are you sure you are deduplicating correctly? That's a lot of hosts.
Even if it were only GFW probing, GFW rarely uses duplicate IPs, except for a few. Most IPs you will only see once or twice over the course of months.
David Fifield
Wow, nice find. Then, based on the fact that 23% of the network (WTF?) is still running old clients, maybe it's best to wait with dropping the old link versions for now. (OTOH if it takes that long for people to update, dropping it will take years anyway, and removing it from a few relays won't hurt anything)
23% is a lot though - so high that I really doubt it's true. The ratios between handshakes and deduplicated handshakes is also rather strange. Is there anything we can do to the dataset to find out why the amount is so high?
Tom