Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

5 Apr 2014

      Hi Karsten,

On 04/05/2014 09:58 AM, Karsten Loesing wrote:
...
On second thought, and after sleeping over this, I'm less convinced that we should use an external library for the caching. We should rather start with a simple dict in memory and flush it based on some simple rules. That would allow us to tweak the caching specifically for our use case. And it would mean avoiding a dependency. We can think about moving to onion-py at a later point. That gives you the opportunity to unspaghettize your code, and once that is done we'll have a better idea what caching needs we have for the challenger tool to decide whether to move to onion-py or not. Would you still want to help write the simple caching code for challenger? 
I cleaned up the caching code and added a simple in-memory dict caching provider that has no further dependencies to onion-py. (it also has no provisions for eviction/flushing at all, but I will add that next. Right now everything is cached forever, but of course a new response from OnionOO replaces an old one.)
I can write the OnionOO API code and caching code for challenger, if I can use Python 3 and the requests library. (See below)
Of course I'd really like to actually have a user for onion-py, since it would help getting the necessary feedback and polish to push the library to version 1.0, but I understand if that isn't appropriate for this project.
...
...
I don't really understand what the code does. What is meant by
"combining" documents? What exactly are we trying to measure? Once I
know that and have thought of a sensible way to integrate it into
onion-py I'm confident I can infact write that glue code :)
Right now, the script sums up all graphs contained in Onionoo's
bandwidth, clients, uptime, and weights documents.  It also limits the
range of the new graphs to max(first) to max(last) of given input graphs.
For example, assume we want to know the total bandwidth provided by the
following 2 relays participating in the relay challenge:
datetime:  0, 1, 2, 3, 4, 5, ...
relay 1:     [5, 4, 5, 6]
relay 2:  [4, 3, 5, 4]
combined:    [8, 9, 9, 6]
This is not perfect for various reasons, but it's the best I came up
with yesterday.  Also, as we all know, perfect is the enemy of good.
(If you're curious, reason #1: the graph goes down at the end, and we
can't say whether it's because relay 2 disappeared or did not report
data yet; reason #2: we're weighting both relays' B/s equally, though
relay 1 might have been online 24/7 and relay 2 only long enough that
Onionoo doesn't put in null; there may be more reasons.)
Ah, I see! :) So for scalar attributes of relays (such as consensus_weight_fraction) it's just a sum, and for histories it's the graphs combined as you just outlined. That makes sense, thank you!
I'm not also sure about Python 3.  Whatever we write needs to run on
Debian Wheezy with whatever libraries are present there.  If they're all
Python 3, great.  If not, can't do.
I would strongly prefer to use Python 3. I understand wanting to use debian stable (I use it myself), but Python 3 is 6 years old and Python 2 is completely dead and its use for new projects is not recommended.
The only mandatory dependency for onion-py, and for me, is requests (I really dislike using urllib* directly - if you want to know why, check https://gist.github.com/kennethreitz/973705), and the python3-requests package in Wheezy is from 2012, and there is no python3-flask. :-(

Is there anything standing against using pip (python3-pip package) to install requests and flask from pypi?
...
Thanks for your feedback!
All the best,
Karsten
Cheers,
Luke

Re: [tor-relays] Metrics for assessing EFF's Tor relay challenge?

Lukas Erlacher