[onionoo/master] Use recent GeoIP database without A1 entries.

commit 95623efb0e415d1c9c9fa176a967f1a05f942b45 Author: Karsten Loesing <karsten.loesing@gmx.net> Date: Mon Feb 11 08:12:39 2013 +0100 Use recent GeoIP database without A1 entries. The IP-to-city database to be deployed with Onionoo needs to have its "A1" ("Anonymous Proxy") entries fixed just like Tor's IP-to-country file. See Tor's src/config/README.geoip for detailed information. - Ship with a variant of Tor's deanonymind.py that removes A1 entries from IP-to-city databases. Also ship with a custom geoip-manual for manual replacements.. - Use our own GeoIP file parser, because MaxMind's library doesn't work with .csv files. On the plus side this removes a dependency and makes it easier to build Onionoo. On the minus side it adds a bunch of new code. - Update index.html to say that some _name entries may be missing if empty. - Update .gitignore and INSTALL. --- .gitignore | 29 ++- INSTALL | 79 +++++-- geoip/deanonymind.py | 175 ++++++++++++ geoip/geoip-manual | 354 +++++++++++++++++++++++++ src/org/torproject/onionoo/CurrentNodes.java | 364 +++++++++++++++++++++++--- src/org/torproject/onionoo/Main.java | 3 +- web/index.html | 6 +- 7 files changed, 936 insertions(+), 74 deletions(-) diff --git a/.gitignore b/.gitignore index 40f5895..ac44c7d 100755 --- a/.gitignore +++ b/.gitignore @@ -1,16 +1,21 @@ -relay-search-data.csv -in/ -status/ -lib/ +.classpath +.project classes/ -out/ -onionoo.war -etc/web.xml etc/context.xml -GeoIP.dat -GeoIPASNum.dat -GeoLiteCity.dat +etc/web.xml +geoip/Automatic-GeoLiteCity-Blocks.csv +geoip/GeoIPASNum2.csv +geoip/GeoIPASNum2.zip +geoip/GeoLiteCity-Blocks.csv +geoip/GeoLiteCity-Location.csv +geoip/GeoLiteCity-latest.zip +geoip/Manual-GeoLiteCity-Blocks.csv +geoip/iso3166.csv +geoip/region.csv +in/ +lib/ log -.classpath -.project +onionoo.war +out/ +status/ diff --git a/INSTALL b/INSTALL index 0e6269d..b3d5d0a 100644 --- a/INSTALL +++ b/INSTALL @@ -1,9 +1,14 @@ Clone the Onionoo server repository ----------------------------------- -Clone the Onionoo server repository into /srv/onionoo/. +Create working directory /srv/onionoo/, make it writable for the metrics +user, and clone the Onionoo server repository into it. Commands prefixed +with # are meant to be run by root, commands with $ by user metrics: -$ git clone git://github.com/kloesing/Onionoo /srv/onionoo/ +# mkdir /srv/onionoo +# chown metrics:metrics /srv/onionoo +$ git clone https://git.torproject.org/onionoo.git /srv/onionoo/ +$ cd /srv/onionoo Install Java 1.5 or higher, ant 1.8 or higher, and Tomcat 6 @@ -20,13 +25,13 @@ Provide required .jar files --------------------------- Download or build the following .jar files and put them in the lib/ -directory using the given filename (or update build.xml if filenames are -different): +directory: -- Apache Commons Codec 1.4, lib/commons-codec-1.4.jar -- Servlet API, e.g., from Tomcat 6, lib/servlet-api.jar -- Maxmind GeoIP Java API, lib/maxmindgeoip.jar -- Tor Metrics Descriptor Library, lib/descriptor.jar +- Apache Commons Codec 1.4 +- Apache Commons Compress 1.4.1 +- Apache Commons Lang 2.6 +- Servlet API, e.g., from Tomcat 6 +- Tor Metrics Descriptor Library, metrics-lib Attempt to compile the Java sources to make sure that everything works correctly: @@ -37,14 +42,50 @@ $ ant compile Download GeoIP and ASN database files ------------------------------------- -Download the GeoLite City database from Maxmind and put it in -/srv/onionoo/GeoLiteCity.dat. If no such file is found, relay IP -addresses will not be resolved to country codes, latitudes, and -longitudes. +Onionoo uses an IP-to-city database and an IP-to-ASN database to provide +additional information about a relay's location. -Also download the GeoLite ASN database from Maxmind and put it in -/srv/onionoo/GeoIPASNum.dat. If no such file is found, relay IP -addresses will not be resolved to AS numbers and names. +The IP-to-city database to be deployed with Onionoo needs to have its "A1" +("Anonymous Proxy") entries fixed just like Tor's IP-to-country file. See +Tor's src/config/README.geoip for detailed information. + +First, change to the geoip/ directory: + +$ cd geoip/ + +Download the most recent MaxMind GeoLite City database and unzip it in the +current directory, junking paths: + +$ wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity_CSV/GeoLiteCi... +$ unzip -j GeoLiteCity-latest.zip + +Run deanonymind.py in the local directory: + +$ python deanonymind.py + +Review the output to learn about applied automatic/manual changes and +watch out for any warnings. Possibly edit geoip-manual to make +more/fewer/different manual changes and re-run deanonymind.py. To look at +automatic and manual changes, run: + +$ diff -U1 GeoLiteCity-Blocks.csv Automatic-GeoLiteCity-Blocks.csv +$ diff -U1 Automatic-GeoLiteCity-Blocks.csv Manual-GeoLiteCity-Blocks.csv + +Download MaxMind's country and region codes files to the current +directory: + +$ wget http://dev.maxmind.com/static/csv/codes/iso3166.csv +$ wget http://dev.maxmind.com/static/csv/codes/maxmind/region.csv + +Download the most recent MaxMind ASN database file and unzip it in the +current directory: + +$ wget http://www.maxmind.com/download/geoip/database/asnum/GeoIPASNum2.zip +$ unzip GeoIPASNum2.zip + +Change back to the root working directory: + +$ cd ../ Test the rsync of descriptors from metrics.torproject.org @@ -57,10 +98,10 @@ $ rsync -arz metrics.torproject.org::metrics-recent in The result should be around 1G of data in the in/ directory, as of January 2012. -(If you want to pre-populate the bandwidth data with archived data, -download the tarballs from https://metrics.torproject.org/data.html and -process them one after the other. There is no requirement to process data -in any given order.) +(If you want to pre-populate bandwidth and weights data with archived +data, download the tarballs from https://metrics.torproject.org/data.html +and process them one after the other. There is no requirement to process +data in any given order.) Test the hourly data processing process diff --git a/geoip/deanonymind.py b/geoip/deanonymind.py new file mode 100755 index 0000000..9ac3568 --- /dev/null +++ b/geoip/deanonymind.py @@ -0,0 +1,175 @@ +#!/usr/bin/env python +import optparse +import os +import sys +import zipfile + +""" +Take a MaxMind GeoLite City blocks file as input and replace A1 entries +with the block number of the preceding entry iff the preceding +(subsequent) entry ends (starts) directly before (after) the A1 entry and +both preceding and subsequent entries contain the same block number. + +Then apply manual changes, either replacing A1 entries that could not be +replaced automatically or overriding previously made automatic changes. +""" + +def main(): + options = parse_options() + assignments = read_file(options.in_maxmind) + assignments = apply_automatic_changes(assignments, + options.block_number) + write_file(options.out_automatic, assignments) + manual_assignments = read_file(options.in_manual, must_exist=False) + assignments = apply_manual_changes(assignments, manual_assignments) + write_file(options.out_manual, assignments) + +def parse_options(): + parser = optparse.OptionParser() + parser.add_option('-i', action='store', dest='in_maxmind', + default='GeoLiteCity-Blocks.csv', metavar='FILE', + help='use the specified MaxMind GeoLite City blocks .csv ' + 'file as input [default: %default]') + parser.add_option('-b', action='store', dest='block_number', + default=242, metavar='NUM', + help='replace entries with this block number [default: ' + '%default]') + parser.add_option('-g', action='store', dest='in_manual', + default='geoip-manual', metavar='FILE', + help='use the specified .csv file for manual changes or to ' + 'override automatic changes [default: %default]') + parser.add_option('-a', action='store', dest='out_automatic', + default="Automatic-GeoLiteCity-Blocks.csv", metavar='FILE', + help='write full input file plus automatic changes to the ' + 'specified .csv file [default: %default]') + parser.add_option('-m', action='store', dest='out_manual', + default='Manual-GeoLiteCity-Blocks.csv', metavar='FILE', + help='write full input file plus automatic and manual ' + 'changes to the specified .csv file [default: %default]') + (options, args) = parser.parse_args() + return options + +def read_file(path, must_exist=True): + if not os.path.exists(path): + if must_exist: + print 'File %s does not exist. Exiting.' % (path, ) + sys.exit(1) + else: + return + csv_file = open(path) + csv_content = csv_file.read() + csv_file.close() + assignments = [] + for line in csv_content.split('\n'): + stripped_line = line.strip() + if len(stripped_line) > 0 and not stripped_line.startswith('#'): + assignments.append(stripped_line) + return assignments + +def apply_automatic_changes(assignments, block_number): + print '\nApplying automatic changes...' + result_lines = [] + prev_line = None + a1_lines = [] + block_number_str = '"%d"' % (block_number, ) + for line in assignments: + if block_number_str in line: + a1_lines.append(line) + else: + if len(a1_lines) > 0: + new_a1_lines = process_a1_lines(prev_line, a1_lines, line) + for new_a1_line in new_a1_lines: + result_lines.append(new_a1_line) + a1_lines = [] + result_lines.append(line) + prev_line = line + if len(a1_lines) > 0: + new_a1_lines = process_a1_lines(prev_line, a1_lines, None) + for new_a1_line in new_a1_lines: + result_lines.append(new_a1_line) + return result_lines + +def process_a1_lines(prev_line, a1_lines, next_line): + if not prev_line or not next_line: + return a1_lines # Can't merge first or last line in file. + if len(a1_lines) > 1: + return a1_lines # Can't merge more than 1 line at once. + a1_line = a1_lines[0].strip() + prev_entry = parse_line(prev_line) + a1_entry = parse_line(a1_line) + next_entry = parse_line(next_line) + touches_prev_entry = int(prev_entry['end_num']) + 1 == \ + int(a1_entry['start_num']) + touches_next_entry = int(a1_entry['end_num']) + 1 == \ + int(next_entry['start_num']) + same_block_number = prev_entry['block_number'] == \ + next_entry['block_number'] + if touches_prev_entry and touches_next_entry and same_block_number: + new_line = format_line_with_other_country(a1_entry, prev_entry) + print '-%s\n+%s' % (a1_line, new_line, ) + return [new_line] + else: + return a1_lines + +def parse_line(line): + if not line: + return None + keys = ['start_num', 'end_num', 'block_number'] + stripped_line = line.replace('"', '').strip() + parts = stripped_line.split(',') + entry = dict((k, v) for k, v in zip(keys, parts)) + return entry + +def format_line_with_other_country(original_entry, other_entry): + return '"%s","%s","%s"' % (original_entry['start_num'], + original_entry['end_num'], other_entry['block_number'], ) + +def apply_manual_changes(assignments, manual_assignments): + if not manual_assignments: + return assignments + print '\nApplying manual changes...' + manual_dict = {} + for line in manual_assignments: + start_num = parse_line(line)['start_num'] + if start_num in manual_dict: + print ('Warning: duplicate start number in manual ' + 'assignments:\n %s\n %s\nDiscarding first entry.' % + (manual_dict[start_num], line, )) + manual_dict[start_num] = line + result = [] + for line in assignments: + entry = parse_line(line) + start_num = entry['start_num'] + if start_num in manual_dict: + manual_line = manual_dict[start_num] + manual_entry = parse_line(manual_line) + if entry['end_num'] == manual_entry['end_num']: + if len(manual_entry['block_number']) == 0: + print '-%s' % (line, ) # only remove, don't replace + else: + new_line = format_line_with_other_country(entry, + manual_entry) + print '-%s\n+%s' % (line, new_line, ) + result.append(new_line) + del manual_dict[start_num] + else: + print ('Warning: only partial match between ' + 'original/automatically replaced assignment and ' + 'manual assignment:\n %s\n %s\nNot applying ' + 'manual change.' % (line, manual_line, )) + result.append(line) + else: + result.append(line) + if len(manual_dict) > 0: + print ('Warning: could not apply all manual assignments: %s' % + ('\n '.join(manual_dict.values())), ) + return result + +def write_file(path, assignments): + out_file = open(path, 'w') + out_file.write('\n'.join(assignments)) + out_file.close() + +if __name__ == '__main__': + main() + diff --git a/geoip/geoip-manual b/geoip/geoip-manual new file mode 100644 index 0000000..6188957 --- /dev/null +++ b/geoip/geoip-manual @@ -0,0 +1,354 @@ +# This file contains manual overrides of A1 entries (and possibly others) +# in MaxMind's GeoLite City database. Use deanonymind.py in the same +# directory to process this file when producing a new geoip file. See +# INSTALL for details. + +# From geoip-manual (country): +# Remove MaxMind entry 0.116.0.0-0.119.255.255 which MaxMind says is AT, +# but which is part of reserved range 0.0.0.0/8. -KL 2012-06-13 +"7602176","7864319","" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"135013632","135013887","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"520493568","520494079","77" + +# From geoip-manual (country): +# NL, because previous MaxMind entry 31.171.128.0-31.171.133.255 is NL, +# and RIR delegation files say 31.171.128.0-31.171.135.255 is NL. +# -KL 2012-11-27 +"531334656","531335167","161" + +# From geoip-manual (country): +# EU, because next MaxMind entry 37.139.64.1-37.139.64.9 is EU, because +# RIR delegation files say 37.139.64.0-37.139.71.255 is EU, and because it +# just makes more sense for the next entry to start at .0 and not .1. +# -KL 2012-11-27 +"629882880","629882880","3" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"644048128","644048383","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"644121856","644122111","223" + +# From geoip-manual (country): +# CH, because previous MaxMind entry 46.19.141.0-46.19.142.255 is CH, and +# RIR delegation files say 46.19.136.0-46.19.143.255 is CH. +# -KL 2012-11-27 +"773033728","773033983","44" + +# From geoip-manual (country): +# GB, because next MaxMind entry 46.166.129.0-46.166.134.255 is GB, and +# RIR delegation files say 46.166.128.0-46.166.191.255 is GB. +# -KL 2012-11-27 +"782663680","782663935","77" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"786817152","786817215","195" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"846537728","846537983","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"846542848","846543103","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1077383168","1077384191","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1077840384","1077840639","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1083264384","1083264447","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1083264464","1083264511","223" + +# From geoip-manual (country): +# US, though could as well be CA. Previous MaxMind entry +# 64.237.32.52-64.237.34.127 is US, next MaxMind entry +# 64.237.34.144-64.237.34.151 is CA, and RIR delegation files say the +# entire block 64.237.32.0-64.237.63.255 is US. -KL 2012-11-27 +"1089282688","1089282703","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1093730816","1093731071","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1095314944","1095314944","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1109848832","1109849087","39" + +# From geoip-manual (country): +# US, though could as well be UY. Previous MaxMind entry +# 67.15.170.0-67.15.182.255 is US, next MaxMind entry +# 67.15.183.128-67.15.183.159 is UY, and RIR delegation files say the +# entire block 67.15.0.0-67.15.255.255 is US. -KL 2012-11-27 +"1125103360","1125103487","223" + +# From geoip-manual (country): +# US, because next MaxMind entry 67.43.145.0-67.43.155.255 is US, and RIR +# delegation files say 67.43.144.0-67.43.159.255 is US. +# -KL 2012-11-27 +"1126928384","1126928639","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1126931456","1126931711","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1138622208","1138622463","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1145334528","1145335039","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1159676928","1159677183","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1160905216","1160905471","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1170375168","1170375679","223" + +# From geoip-manual (country): +# US, because previous MaxMind entry 70.159.21.51-70.232.244.255 is US, +# because next MaxMind entry 70.232.245.58-70.232.245.59 is A2 ("Satellite +# Provider") which is a country information about as useless as A1, and +# because RIR delegation files say 70.224.0.0-70.239.255.255 is US. +# -KL 2012-11-27 +"1189672192","1189672249","223" + +# From geoip-manual (country): +# US, because next MaxMind entry 70.232.246.0-70.240.141.255 is US, +# because previous MaxMind entry 70.232.245.58-70.232.245.59 is A2 +# ("Satellite Provider") which is a country information about as useless +# as A1, and because RIR delegation files say 70.224.0.0-70.239.255.255 is +# US. -KL 2012-11-27 +"1189672252","1189672447","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1249050624","1249051135","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1249051904","1249052671","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1249091584","1249092607","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1286389760","1286390271","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1286390528","1286390783","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1286391296","1286391807","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1286393856","1286394623","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1286395392","1286396159","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1286398976","1286399487","223" + +# From geoip-manual (country): +# GB, despite neither previous (GE) nor next (LV) MaxMind entry being GB, +# but because RIR delegation files agree with both previous and next +# MaxMind entry and say GB for 91.228.0.0-91.228.3.255. -KL 2012-11-27 +"1541668864","1541669887","77" + +# From geoip-manual (country): +# GB, because next MaxMind entry 91.232.125.0-91.232.125.255 is GB, and +# RIR delegation files say 91.232.124.0-91.232.125.255 is GB. +# -KL 2012-11-27 +"1541962752","1541963007","77" + +# From geoip-manual (country): +# GB, despite neither previous (RU) nor next (PL) MaxMind entry being GB, +# but because RIR delegation files agree with both previous and next +# MaxMind entry and say GB for 91.238.214.0-91.238.215.255. +# -KL 2012-11-27 +"1542379008","1542379519","77" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1632587008","1632587263","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1673576896","1673576959","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1795558656","1795558911","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"1933909760","1933910015","17" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"2360215808","2360216063","223" + +# From geoip-manual (country): +# US, because next MaxMind entry 173.0.16.0-173.0.65.255 is US, and RIR +# delegation files say 173.0.0.0-173.0.15.255 is US. -KL 2012-11-27 +"2902458368","2902462463","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"2918536448","2918536703","223" + +# From geoip-manual (country): +# US, because next MaxMind entry 176.67.84.0-176.67.84.79 is US, and RIR +# delegation files say 176.67.80.0-176.67.87.255 is US. -KL 2012-11-27 +"2957201408","2957202431","223" + +# From geoip-manual (country): +# US, because previous MaxMind entry 176.67.84.192-176.67.85.255 is US, +# and RIR delegation files say 176.67.80.0-176.67.87.255 is US. +# -KL 2012-11-27 +"2957202944","2957203455","223" + +# From geoip-manual (country): +# EU, despite neither previous (RU) nor next (UA) MaxMind entry being EU, +# but because RIR delegation files agree with both previous and next +# MaxMind entry and say EU for 193.200.150.0-193.200.150.255. +# -KL 2012-11-27 +"3251148288","3251148543","3" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3341849376","3341853471","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3341873152","3341875199","223" + +# From geoip-manual (country): +# US, because previous MaxMind entry 199.96.68.0-199.96.87.127 is US, and +# RIR delegation files say 199.96.80.0-199.96.87.255 is US. +# -KL 2012-11-27 +"3344979840","3344979967","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3346193920","3346194431","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3355430912","3355432959","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3450078464","3450079487","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3483239424","3483239679","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3483240704","3483240959","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3483247360","3483247871","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3485724672","3485728767","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3500664576","3500664831","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3500666752","3500666879","223" + +# From geoip-manual (country): +# US, because previous MaxMind entry 209.58.176.144-209.59.31.255 is US, +# and RIR delegation files say 209.59.32.0-209.59.63.255 is US. +# -KL 2012-11-27 +"3510312960","3510321151","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3519352832","3519352959","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3519354048","3519354111","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3519355392","3519355519","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3520644608","3520644863","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3520656384","3520656639","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3632994048","3632994303","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3633782528","3633782783","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3633823488","3633823743","223" + +# Previous and next entry are same country, set to country number without +# city information. -KL 2013-02-10 +"3634982400","3634982655","223" + +# From geoip-manual (country): +# FR, because previous MaxMind entry 217.15.166.0-217.15.166.255 is FR, +# and RIR delegation files contain a block 217.15.160.0-217.15.175.255 +# which, however, is EU, not FR. But merging with next MaxMind entry +# 217.15.176.0-217.15.191.255 which is KZ and which fully matches what +# the RIR delegation files say seems unlikely to be correct. +# -KL 2012-11-27 +"3641681664","3641683967","75" + diff --git a/src/org/torproject/onionoo/CurrentNodes.java b/src/org/torproject/onionoo/CurrentNodes.java index 9e5d0db..487cf4d 100644 --- a/src/org/torproject/onionoo/CurrentNodes.java +++ b/src/org/torproject/onionoo/CurrentNodes.java @@ -11,13 +11,17 @@ import java.io.IOException; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Arrays; +import java.util.HashMap; +import java.util.HashSet; import java.util.Iterator; -import java.util.Locale; +import java.util.Map; +import java.util.Set; import java.util.SortedMap; import java.util.SortedSet; import java.util.TimeZone; import java.util.TreeMap; import java.util.TreeSet; +import java.util.regex.Pattern; import org.torproject.descriptor.BridgeNetworkStatus; import org.torproject.descriptor.Descriptor; @@ -27,10 +31,6 @@ import org.torproject.descriptor.DescriptorSourceFactory; import org.torproject.descriptor.NetworkStatusEntry; import org.torproject.descriptor.RelayNetworkStatusConsensus; -import com.maxmind.geoip.Location; -import com.maxmind.geoip.LookupService; -import com.maxmind.geoip.regionName; - /* Store relays and bridges that have been running in the past seven * days. */ public class CurrentNodes { @@ -343,53 +343,341 @@ public class CurrentNodes { } } - public void lookUpCountries() { - File geoLiteCityDatFile = new File("GeoLiteCity.dat"); - if (!geoLiteCityDatFile.exists()) { - System.err.println("No GeoLiteCity.dat file in /."); + public void lookUpCitiesAndASes() { + + /* Make sure we have all required .csv files. */ + File[] geoLiteCityBlocksCsvFiles = new File[] { + new File("geoip/Manual-GeoLiteCity-Blocks.csv"), + new File("geoip/Automatic-GeoLiteCity-Blocks.csv"), + new File("geoip/GeoLiteCity-Blocks.csv") + }; + File geoLiteCityBlocksCsvFile = null; + for (File file : geoLiteCityBlocksCsvFiles) { + if (file.exists()) { + geoLiteCityBlocksCsvFile = file; + break; + } + } + if (geoLiteCityBlocksCsvFile == null) { + System.err.println("No *GeoLiteCity-Blocks.csv file in geoip/."); + return; + } + File geoLiteCityLocationCsvFile = + new File("geoip/GeoLiteCity-Location.csv"); + if (!geoLiteCityLocationCsvFile.exists()) { + System.err.println("No GeoLiteCity-Location.csv file in geoip/."); + return; + } + File iso3166CsvFile = new File("geoip/iso3166.csv"); + if (!iso3166CsvFile.exists()) { + System.err.println("No iso3166.csv file in geoip/."); + return; + } + File regionCsvFile = new File("geoip/region.csv"); + if (!regionCsvFile.exists()) { + System.err.println("No region.csv file in geoip/."); + return; + } + File geoIPASNum2CsvFile = new File("geoip/GeoIPASNum2.csv"); + if (!geoIPASNum2CsvFile.exists()) { + System.err.println("No GeoIPASNum2.csv file in geoip/."); + return; + } + + /* Obtain a map from relay IP address strings to numbers. */ + Map<String, Long> addressStringNumbers = new HashMap<String, Long>(); + Pattern ipv4Pattern = Pattern.compile("^[0-9\\.]{7,15}$"); + for (Node relay : this.currentRelays.values()) { + String addressString = relay.getAddress(); + long addressNumber = -1L; + if (ipv4Pattern.matcher(addressString).matches()) { + String[] parts = addressString.split("\\.", 4); + if (parts.length == 4) { + addressNumber = 0L; + for (int i = 0; i < 4; i++) { + addressNumber *= 256L; + int octetValue = -1; + try { + octetValue = Integer.parseInt(parts[i]); + } catch (NumberFormatException e) { + } + if (octetValue < 0 || octetValue > 255) { + addressNumber = -1L; + break; + } + addressNumber += octetValue; + } + } + } + if (addressNumber >= 0L) { + addressStringNumbers.put(addressString, addressNumber); + } + } + if (addressStringNumbers.isEmpty()) { + System.err.println("No relay IP addresses to resolve to cities or " + + "ASN."); return; } + + /* Obtain a map from IP address numbers to blocks. */ + Map<Long, Long> addressNumberBlocks = new HashMap<Long, Long>(); try { - LookupService ls = new LookupService(geoLiteCityDatFile, - LookupService.GEOIP_MEMORY_CACHE); - for (Node relay : currentRelays.values()) { - Location location = ls.getLocation(relay.getAddress()); - if (location != null) { - relay.setLatitude(String.format(Locale.US, "%.6f", - location.latitude)); - relay.setLongitude(String.format(Locale.US, "%.6f", - location.longitude)); - relay.setCountryCode(location.countryCode.toLowerCase()); - relay.setCountryName(location.countryName); - relay.setRegionName(regionName.regionNameByCode( - location.countryCode, location.region)); - relay.setCityName(location.city); + SortedSet<Long> sortedAddressNumbers = new TreeSet<Long>( + addressStringNumbers.values()); + long firstAddressNumber = sortedAddressNumbers.first(); + BufferedReader br = new BufferedReader(new FileReader( + geoLiteCityBlocksCsvFile)); + String line; + long previousStartIpNum = -1L; + while ((line = br.readLine()) != null) { + if (!line.startsWith("\"")) { + continue; + } + String[] parts = line.replaceAll("\"", "").split(",", 3); + if (parts.length != 3) { + System.err.println("Illegal line '" + line + "' in " + + geoLiteCityBlocksCsvFile.getAbsolutePath() + "."); + br.close(); + return; + } + try { + long startIpNum = Long.parseLong(parts[0]); + if (startIpNum <= previousStartIpNum) { + System.err.println("Line '" + line + "' not sorted in " + + geoLiteCityBlocksCsvFile.getAbsolutePath() + "."); + br.close(); + return; + } + previousStartIpNum = startIpNum; + while (firstAddressNumber < startIpNum && + firstAddressNumber != -1L) { + sortedAddressNumbers.remove(firstAddressNumber); + if (sortedAddressNumbers.isEmpty()) { + firstAddressNumber = -1L; + } else { + firstAddressNumber = sortedAddressNumbers.first(); + } + } + long endIpNum = Long.parseLong(parts[1]); + while (firstAddressNumber <= endIpNum && + firstAddressNumber != -1L) { + long blockNumber = Long.parseLong(parts[2]); + addressNumberBlocks.put(firstAddressNumber, blockNumber); + sortedAddressNumbers.remove(firstAddressNumber); + if (sortedAddressNumbers.isEmpty()) { + firstAddressNumber = -1L; + } else { + firstAddressNumber = sortedAddressNumbers.first(); + } + } + if (firstAddressNumber == -1L) { + break; + } + } + catch (NumberFormatException e) { + System.err.println("Number format exception while parsing line " + + "'" + line + "' in " + + geoLiteCityBlocksCsvFile.getAbsolutePath() + "."); + br.close(); + return; } } - ls.close(); + br.close(); } catch (IOException e) { - System.err.println("Could not look up countries for relays."); + System.err.println("I/O exception while reading " + + geoLiteCityBlocksCsvFile.getAbsolutePath() + "."); + return; + } + + /* Obtain a map from relevant blocks to location lines. */ + Map<Long, String> blockLocations = new HashMap<Long, String>(); + try { + Set<Long> blockNumbers = new HashSet<Long>( + addressNumberBlocks.values()); + BufferedReader br = new BufferedReader(new FileReader( + geoLiteCityLocationCsvFile)); + String line; + while ((line = br.readLine()) != null) { + if (line.startsWith("C") || line.startsWith("l")) { + continue; + } + String[] parts = line.replaceAll("\"", "").split(",", 9); + if (parts.length != 9) { + System.err.println("Illegal line '" + line + "' in " + + geoLiteCityLocationCsvFile.getAbsolutePath() + "."); + br.close(); + return; + } + try { + long locId = Long.parseLong(parts[0]); + if (blockNumbers.contains(locId)) { + blockLocations.put(locId, line); + } + } + catch (NumberFormatException e) { + System.err.println("Number format exception while parsing line " + + "'" + line + "' in " + + geoLiteCityLocationCsvFile.getAbsolutePath() + "."); + br.close(); + return; + } + } + br.close(); + } catch (IOException e) { + System.err.println("I/O exception while reading " + + geoLiteCityLocationCsvFile.getAbsolutePath() + "."); + return; } - } - public void lookUpASes() { - File geoIPASNumDatFile = new File("GeoIPASNum.dat"); - if (!geoIPASNumDatFile.exists()) { - System.err.println("No GeoIPASNum.dat file in /."); + /* Read country names to memory. */ + Map<String, String> countryNames = new HashMap<String, String>(); + try { + BufferedReader br = new BufferedReader(new FileReader( + iso3166CsvFile)); + String line; + while ((line = br.readLine()) != null) { + String[] parts = line.replaceAll("\"", "").split(",", 2); + if (parts.length != 2) { + System.err.println("Illegal line '" + line + "' in " + + iso3166CsvFile.getAbsolutePath() + "."); + br.close(); + return; + } + countryNames.put(parts[0].toLowerCase(), parts[1]); + } + br.close(); + } catch (IOException e) { + System.err.println("I/O exception while reading " + + iso3166CsvFile.getAbsolutePath() + "."); return; } + + /* Read region names to memory. */ + Map<String, String> regionNames = new HashMap<String, String>(); + try { + BufferedReader br = new BufferedReader(new FileReader( + regionCsvFile)); + String line; + while ((line = br.readLine()) != null) { + String[] parts = line.replaceAll("\"", "").split(",", 3); + if (parts.length != 3) { + System.err.println("Illegal line '" + line + "' in " + + regionCsvFile.getAbsolutePath() + "."); + br.close(); + return; + } + regionNames.put(parts[0].toLowerCase() + "," + + parts[1].toLowerCase(), parts[2]); + } + br.close(); + } catch (IOException e) { + System.err.println("I/O exception while reading " + + regionCsvFile.getAbsolutePath() + "."); + return; + } + + /* Obtain a map from IP address numbers to ASN. */ + Map<Long, String> addressNumberASN = new HashMap<Long, String>(); try { - LookupService ls = new LookupService(geoIPASNumDatFile); - for (Node relay : currentRelays.values()) { - String org = ls.getOrg(relay.getAddress()); - if (org != null && org.indexOf(" ") > 0 && org.startsWith("AS")) { - relay.setASNumber(org.substring(0, org.indexOf(" "))); - relay.setASName(org.substring(org.indexOf(" ") + 1)); + SortedSet<Long> sortedAddressNumbers = new TreeSet<Long>( + addressStringNumbers.values()); + long firstAddressNumber = sortedAddressNumbers.first(); + BufferedReader br = new BufferedReader(new FileReader( + geoIPASNum2CsvFile)); + String line; + long previousStartIpNum = -1L; + while ((line = br.readLine()) != null) { + String[] parts = line.replaceAll("\"", "").split(",", 3); + if (parts.length != 3) { + System.err.println("Illegal line '" + line + "' in " + + geoIPASNum2CsvFile.getAbsolutePath() + "."); + br.close(); + return; + } + try { + long startIpNum = Long.parseLong(parts[0]); + if (startIpNum <= previousStartIpNum) { + System.err.println("Line '" + line + "' not sorted in " + + geoIPASNum2CsvFile.getAbsolutePath() + "."); + br.close(); + return; + } + previousStartIpNum = startIpNum; + while (firstAddressNumber < startIpNum && + firstAddressNumber != -1L) { + sortedAddressNumbers.remove(firstAddressNumber); + if (sortedAddressNumbers.isEmpty()) { + firstAddressNumber = -1L; + } else { + firstAddressNumber = sortedAddressNumbers.first(); + } + } + long endIpNum = Long.parseLong(parts[1]); + while (firstAddressNumber <= endIpNum && + firstAddressNumber != -1L) { + if (parts[2].startsWith("AS") && + parts[2].split(" ", 2).length == 2) { + addressNumberASN.put(firstAddressNumber, parts[2]); + } + sortedAddressNumbers.remove(firstAddressNumber); + if (sortedAddressNumbers.isEmpty()) { + firstAddressNumber = -1L; + } else { + firstAddressNumber = sortedAddressNumbers.first(); + } + } + if (firstAddressNumber == -1L) { + break; + } + } + catch (NumberFormatException e) { + System.err.println("Number format exception while parsing line " + + "'" + line + "' in " + + geoIPASNum2CsvFile.getAbsolutePath() + "."); + br.close(); + return; } } - ls.close(); + br.close(); } catch (IOException e) { - System.err.println("Could not look up ASes for relays."); + System.err.println("I/O exception while reading " + + geoIPASNum2CsvFile.getAbsolutePath() + "."); + return; + } + + /* Finally, set relays' city and ASN information. */ + for (Node relay : currentRelays.values()) { + String addressString = relay.getAddress(); + if (addressStringNumbers.containsKey(addressString)) { + long addressNumber = addressStringNumbers.get(addressString); + if (addressNumberBlocks.containsKey(addressNumber)) { + long blockNumber = addressNumberBlocks.get(addressNumber); + if (blockLocations.containsKey(blockNumber)) { + String[] parts = blockLocations.get(blockNumber). + replaceAll("\"", "").split(",", -1); + String countryCode = parts[1].toLowerCase(); + relay.setCountryCode(countryCode); + if (countryNames.containsKey(countryCode)) { + relay.setCountryName(countryNames.get(countryCode)); + } + String regionCode = countryCode + "," + + parts[2].toLowerCase(); + if (regionNames.containsKey(regionCode)) { + relay.setRegionName(regionNames.get(regionCode)); + } + if (parts[3].length() > 0) { + relay.setCityName(parts[3]); + } + relay.setLatitude(parts[5]); + relay.setLongitude(parts[6]); + } + } + if (addressNumberASN.containsKey(addressNumber)) { + String[] parts = addressNumberASN.get(addressNumber).split(" ", 2); + relay.setASNumber(parts[0]); + relay.setASName(parts[1]); + } + } } } diff --git a/src/org/torproject/onionoo/Main.java b/src/org/torproject/onionoo/Main.java index 41af72c..e3e7c5b 100644 --- a/src/org/torproject/onionoo/Main.java +++ b/src/org/torproject/onionoo/Main.java @@ -14,8 +14,7 @@ public class Main { cn.readRelaySearchDataFile(new File("out/summary")); cn.readRelayNetworkConsensuses(); cn.setRelayRunningBits(); - cn.lookUpCountries(); - cn.lookUpASes(); + cn.lookUpCitiesAndASes(); cn.readBridgeNetworkStatuses(); cn.setBridgeRunningBits(); diff --git a/web/index.html b/web/index.html index 5087a01..4c3491c 100755 --- a/web/index.html +++ b/web/index.html @@ -153,17 +153,17 @@ database.</li> resolving the relay's first onion-routing IP address. Optional field. Omitted if the relay IP address could not be found in the GeoIP -database.</li> +database, or if the GeoIP database did not contain a country name.</li> <li><b>"region_name":</b> Region name as found in a GeoIP database by resolving the relay's first onion-routing IP address. Optional field. Omitted if the relay IP address could not be found in the GeoIP -database.</li> +database, or if the GeoIP database did not contain a region name.</li> <li><b>"city_name":</b> City name as found in a GeoIP database by resolving the relay's first onion-routing IP address. Optional field. Omitted if the relay IP address could not be found in the GeoIP -database.</li> +database, or if the GeoIP database did not contain a city name.</li> <li><b>"latitude":</b> Latitude as found in a GeoIP database by resolving the relay's first onion-routing IP address. Optional field.
participants (1)
-
karsten@torproject.org