How should we normalise DNS test results?

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hello, I am working on normalisation for all of the DNS based tests right now (i.e. dns_consistency, and dns_injection) and was wondering if any of you had any suggestions with regards to how we should be normalising these results. So far, this is what I have come up with looks like this: {'data_format_version': None, 'input': 'www.ignored.ch', 'options': ['-f', 'citizenlab-urls-global.txt', '-T', 'dns-server-ch.txt'], 'probe_asn': 'AS41715', 'probe_cc': 'CH', 'probe_ip': '127.0.0.1', 'report_filename': 's3://ooni-private/reports-raw/yaml/2016-01-01/dns_consistency-2015-12-3 1T220031Z-AS41715-probe.yamloo', 'report_id': 'bWEWmX6oEftSSJq9yEF5oH0VPOU5VZJooX06gQENo136sSoj9MzlTBk7EjhfH1Td', 'software_name': 'ooniprobe', 'software_version': '1.3.2', 'test_helpers': {'backend': '213.138.109.232:57004'}, 'test_keys': {'annotations': None, 'backend_version': '1.1.4', 'control_resolver': '213.138.109.232:57004', 'errors': {'130.60.128.3': 'dns_lookup_error', '130.60.128.5': 'dns_lookup_error', '194.158.230.53': False, '194.230.1.5': False, '82.195.224.5': 'no_answer'}, 'failed': {'130.60.128.3', '130.60.128.5', '82.195.224.5'}, 'input_hashes': ['3f786850e387550fdab836ed7e6dc881de23001b'], 'queries': [{failure': None, 'hostname': 'www.ignored.ch', 'query_type': 'A', 'resolver_hostname': '213.138.109.232', 'resolver_port': 57004}, {'failure': None, 'hostname': 'www.ignored.ch', 'query_type': 'A', 'resolver_hostname': '212.147.10.10', 'resolver_port': 53}], 'successful': {'194.158.230.53', '194.230.1.5', '195.186.1.111', '81.221.252.10'}}, 'test_name': 'dns_consistency', 'test_runtime': 32.54842686653137, 'test_start_time': 1451605073.0, 'test_version': '0.6'} After looking into the source code for the DNS consistency test, and the dnst template I was able to determine the subject of the DNS query, however, I am not sure how to handle the addr. section which changes depending on whether the associated DNS query has a type of A/SOA/NS (see: https://github.com/TheTorProject/ooni-probe/blob/master/ooni/templates/d nst.py#L153). If you have any suggestions with regards to how to normalise dnst results, I've linked to the raw, and normalised reports below. Gist: https://gist.github.com/TylerJFisher/7372f9c31c54b5207d2a Normalisation routine: https://gist.github.com/TylerJFisher/7372f9c31c54b5207d2a#file-normalise - -py - --- Cheers, Tyler Fisher GPG fingerprint: 8931 45DF 609B EE2E BC32 5E71 631E 6FC3 4686 F0EB (tyler@tylerfisher.org) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWlxpcAAoJEGMeb8NGhvDrQyQQALPRZH/r6w7bPJ+iI2lBky7B CjoFKWje9zKFpTEsl11dzgbdPnbc+e5ww8ntAuHxAdokFgG2iez8lhOzaN6XDFeM KM0rCKlgoi2ZXYtdYNfWbBatY8DnIK4qDl7Yhar9DYO8Giaj5xlGxRvVt8lO4s+a 9a1GImFiJNEcJEU5WZg2+lGIMMeb4XmHev5MhX9UNr6TssJGWRUJQ1HjMSD5L2m4 kll6PFJ6TJetsKzvatkt8KDVkCJAg0j6UIEicHwlxLuwBHz3mIDHZ1xFXcRfBFAl navG2Idl/JsUEir78wnK4A/ssV49s2Cd38QdOpwN5LLA3LtHwUOqQSGmEHsLB9vK +xGB3mCt1XAaMpoSCK+SPMDKJkJ0oqOd8v7Pu3aOzNDEAKsp0ZF+U+kY0YLFgMmt nE4SEgF5RBG7LcCcGOrBoy+/bo8DIu7PjdPPKax3qLo99VCdxEzXarujRAmKHWz/ nz9JlMennWd/v2UCINu1yUPADRXcZj9iReMqpo4zUZZoEH38b04wYvsv3wzDU3hm j2H6aFMyC8872Ygsv0lqb00zJcYfJqMgG/G6iiQ1LD5OtyqEEtnI1VIsb3MVKkfi 7UUb7pF9t/UgEbbdIXq72+4ioISroauTZnYXxSq6BAWeY8fiEprPKic3w6fRgE2X lcHBndiEJa+paJhqiPLj =L335 -----END PGP SIGNATURE-----

Hi Tyler, Thanks for your email!
On Jan 14, 2016, at 04:47, Tyler Fisher <apt.get.apps@gmail.com> wrote:
Signed PGP part Hello,
I am working on normalisation for all of the DNS based tests right now (i.e. dns_consistency, and dns_injection) and was wondering if any of you had any suggestions with regards to how we should be normalising these results.
So far, this is what I have come up with looks like this:
{'data_format_version': None, 'input': 'www.ignored.ch', 'options': ['-f', 'citizenlab-urls-global.txt', '-T', 'dns-server-ch.txt'], 'probe_asn': 'AS41715', 'probe_cc': 'CH', 'probe_ip': '127.0.0.1', 'report_filename': 's3://ooni-private/reports-raw/yaml/2016-01-01/dns_consistency-2015-12-3 1T220031Z-AS41715-probe.yamloo', 'report_id': 'bWEWmX6oEftSSJq9yEF5oH0VPOU5VZJooX06gQENo136sSoj9MzlTBk7EjhfH1Td', 'software_name': 'ooniprobe', 'software_version': '1.3.2', 'test_helpers': {'backend': '213.138.109.232:57004'}, 'test_keys': {'annotations': None, 'backend_version': '1.1.4', 'control_resolver': '213.138.109.232:57004', 'errors': {'130.60.128.3': 'dns_lookup_error', '130.60.128.5': 'dns_lookup_error', '194.158.230.53': False, '194.230.1.5': False, '82.195.224.5': 'no_answer'}, 'failed': {'130.60.128.3', '130.60.128.5', '82.195.224.5'}, 'input_hashes': ['3f786850e387550fdab836ed7e6dc881de23001b'], 'queries': [{failure': None, 'hostname': 'www.ignored.ch', 'query_type': 'A', 'resolver_hostname': '213.138.109.232', 'resolver_port': 57004}, {'failure': None, 'hostname': 'www.ignored.ch', 'query_type': 'A', 'resolver_hostname': '212.147.10.10', 'resolver_port': 53}], 'successful': {'194.158.230.53', '194.230.1.5', '195.186.1.111', '81.221.252.10'}}, 'test_name': 'dns_consistency', 'test_runtime': 32.54842686653137, 'test_start_time': 1451605073.0, 'test_version': '0.6'}
After looking into the source code for the DNS consistency test, and the dnst template I was able to determine the subject of the DNS query, however, I am not sure how to handle the addr. section which changes depending on whether the associated DNS query has a type of A/SOA/NS (see: https://github.com/TheTorProject/ooni-probe/blob/master/ooni/templates/d nst.py#L153).
If you have any suggestions with regards to how to normalise dnst results, I've linked to the raw, and normalised reports below.
Gist: https://gist.github.com/TylerJFisher/7372f9c31c54b5207d2a Normalisation routine: https://gist.github.com/TylerJFisher/7372f9c31c54b5207d2a#file-normalise -py
I think how you have normalised the dns_consistency test is much better and I think that we should eventually integrate this data format directly inside of the ooni-probe tests themselves so that we don’t have to do any further normalised, that are error prone, on future reports. I am a bit torn as to how to resolve the addrs key issue, because on one side I like the idea of not having to dig too much into the answers array to extract the stuff I am interested in, but on the other hand it’s probably best to have things be as consistent as possible. I think the best option is probably to just merge the “addrs” and “answers” into one list and make the items of the list change depending on the type of query (there is no cleaner way around this since the RDATA field in DNS is made this way). I would say every item in the answers list has in the “ttl” key, the rest is specific depending on the type of query like so: * A = “answers”: [{“ipv4”: “xxx.xxx.xxx.xxx”}, {“ipv4”: “xxx.xxx.xxx.xxx”}] * PTR, NS = “ answers”: [{“hostname”: “xxx.yyy”}, {“hostname”: “xxx.yyy”}] * MX = “ answers”: [{“preference”: int, “hostname”: “xxx.yyy”}, {“preference”: int, “hostname”: “xxx.yyy”}] * SOA = “ answers”: [{“serial_number”: int, “refresh_interval”: int, “retry_interval”: int, “expiration_limit”: int, “minimum_ttl”: int, “hostname”: “xxx.yyy”, “responsible_name”: “xxx.yyy.zzz”}, …] Note: For SOA queries we currently don’t collect all the above mentioned data in ooni-probe, but since we are going to change the data format anyways we may as well change it in a way that is future proof. Do you think this makes sense? ~ Arturo
participants (2)
-
Arturo Filastò
-
Tyler Fisher