Hi Mridul,
Thanks for your interest in exitmap.
On Fri, Mar 18, 2016 at 11:26:01AM +0530, Mridul Malpotra wrote:
I will also be reading the tech report on Exitmap and would be grateful if you can recommend any other resource(s) that I should be referring to.
Don't bother reading the technical report. The PETS paper that you've already read is the most recent version.
a. How was the bifurcation between stand-alone and same-process
modules decided? Are there any advantages to allow for multiple forked processes for specific modules?
Do you mean the modules that are written in pure Python versus the modules that use external tools? Generally, we prefer to have pure Python modules, but in some cases it's more convenient and practical to run an external tool, say openssl, and parse its output.
b. For testing active attacks, can there be modules developed
keeping other cleartext protocols like SNMP and Telnet in mind? Alternatively, is there a way to determine what protocols are being used over Tor and their popularity?
Yes, you can certainly write SNMP and Telnet modules. Depending on what particular attack you are trying to expose (e.g., credential sniffing, content injection), the complexity of this can range from straightforward to quite tricky.
At this point, we have no privacy-preserving way to measure port popularity.
c. How is Exitmap being crowdsourced currently? I'm interested to
know how data is being collected from volunteers running the scanner.
A bunch of people, including myself, occasionally run exitmap. Some of these people wrote their own modules, which is great, because simply re-running the same modules wouldn't be all that useful. If somebody catches a bad exit relay, we report the result to bad-relays@torproject.org.
You can see that the process is informal, which is problematic because scans are not archived centrally. It would be neat to have a server that accepts incoming scan results, archives them, and provides an interface to analyse scans. I don't consider this high priority, but you might want to add it as an optional task if you still have time towards the end of GSoC.
1. Achieve autonomous scanning in Exitmap with periodic scans that,
based on a certain algorithm, fetches relay descriptors and automates various subtasks for consistent data collection and verification. The main challenges that I expect will be intelligently recognizing which tasks to automate and when, and making the entire background process execution efficient in resource consumption.
This is the most important issue. A solid implementation of this would be very helpful.
2. Emulating multiple user interaction in individual modules and in
Exitmap overall to provide indistinguishability to Exitmap from regular users. I will try to explore libraries for this purpose like Splinter with Selenium or BeautifulSoup with Requests that help dynamically interaction with the web resource. The main challenges that I expect will be to scale this automated testing alongside the running asynchronous jobs and making the entire scans look like genuine user interactions. Any suggestions on better ways to do this will be helpful.
Yes, that would be great to have. At this point, we are mimicing Tor Browser, which doesn't work that well because the HTTP headers are stored in an orderless dictionary: https://github.com/NullHypothesis/exitmap/blob/master/src/util.py#L192
3. Making the codebase more robust by adding unit test cases. I
plan on using either the plain unittest/unittest2 framework or nose/nose2/pytest tools or any other alternatives that I may find or be recommended. I plan to simultaneously write the unit test cases for new code added and improve upon the exiting testing programs.
Sounds good. For what it's worth, py.test was added in commit 63671d3f.
4. (Optional) I read from the mail threads on the tor-dev mailing
list that the code needs to be converted to be Python3 compatible. Would like your opinion on whether it is a viable option and if it is possible, would like to include this in my list of tasks.
I haven't given it a shot myself, but I cannot think of a reason why it would be hard. (Famous last words!) I would add it to the list of optional tasks.
5. (Optional) If I can spare time in the milestone timeline and if
discussion leads to some clarity, I would like to add another module for more cleartext protocols that could be implemented like SNMP or Telnet. I am also looking at possible local to remote attacks that are active at the application layer and could be tested in Exitmap. I'll update if I find anything.
Sounds good.
Cheers, Philipp