[metrics-team] OnionStats - roadmap?

Karsten Loesing karsten at torproject.org
Tue Aug 23 10:39:59 UTC 2016


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 13/08/16 20:33, Anathema wrote:
> On 12/08/2016 17:21, Karsten Loesing wrote:
>> Hi Anathema,
>> 
>> On 08/08/16 17:50, Anathema wrote:
>>> On 08/08/2016 15:13, Karsten Loesing wrote:
>>>> 
>>>> Maybe an example helps here: assume you have 55 relays and
>>>> 45 bridges and ask for offset=50 and limit=10.  Your
>>>> implementation will return relays 51 to 55, the current
>>>> implementation will also return bridges 1 to 5.
>> 
>>> Oh I see it. Now a question: how is it possible that by
>>> specifying offset=50 and limit=10, you have bridges from 1 to
>>> 5? Since you're skipping 50 results and you have only 45
>>> bridges, makes sense to me to return nothing.
>> 
>>> The logic then is to treat nodes' data as a tape? When I hit
>>> the border (the limit) the system should start from the
>>> beginning?
>> 
>>> After clarified that I'll change it to make it 
>>> backward-compatible.
>> 
>> Here's an actual example:
>> 
> 
> Ok, now I got it completely and I'm working on a fix to make
> onionoo-ng working as expected, thanks!

Cool!

>> I'm not talking about multiple fields here, but about different
>> values of a single field.  And this doesn't have to be
>> complicated, we'll just need to specify how values are sorted.
>> In other words, you'd have to write this down for the protocol
>> specification, and without linking to how ElasticSearch sorts,
>> because Onionoo users shouldn't have to even notice that there's
>> ElasticSearch behind Onionoo.
>> 
> Sure, I'll put down some wording about that.

Okay, great.

>> Yes, it says "should", not "must".  The following two queries
>> return the same relay, where the second fingerprint is the SHA-1
>> value of the first fingerprint:
> 
> Oh ok. So the approach to 'lookup' is: - make a query for the given
> hex string - if no results, assume the given string is SHA-1
> hashed - ? hashes all the fingerprint to match the given string ?

Ideally, you'd add each document to the search index twice, once with
its "fingerprint" (relays) or "hashed_fingerprint" (bridges) and once
with the SHA-1 value of that string that you'd have to compute yourself.

It's probably easiest to try out the current behavior by taking two
actual examples, one relay and one bridge, and see what queries return
those and what queries don't.

>> Yes, but you'll have to do that for _all_ parameters except for 
>> "fingerprint".
>> 
> 
> I though the dataset we have already included all the running
> nodes within 7 days. So, to return those, should I filter for
> last_seen_days
>> = 7 days ? (will try to look at the code later on)

Yes, looking at "last_seen" and only returning those that have been
seen in the last 7 days sounds reasonable.  That's pretty much what
the current code does.

That being said, if ElasticSearch supports searches in the entire data
set in an efficient way, extending this time period to more days or
even removing it entirely would be cool.  However, we'd have to think
about making this change backward-compatible, so that current clients
don't suddenly have to handle several times as many results as they
can handle right now.  For example, I could imagine making "-7" the
new default value for "last_seen_days", so that only those clients
specifying a larger number there would receive more results.  But feel
free to postpone this for after there's a backward-compatible version
of your protocol.

> Thanks

Sorry for being slow in responding.  I'm distracted by many things
these days, yet I'm curious what your OnionStats project and
ElasticSearch integration will provide us.

All the best,
Karsten

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJXvCf/AAoJEC3ESO/4X7XBLyQH/jrvLz/0uKtkCzA+gts41Dv6
5yiK65au66H+WoZvKY5G+wiGpNVxoyDEU8p/B312IgOdqKvAeI+gaW+lJ2hr+/Wd
nVYz4cO87WKdWHetcCIJbKpkW/z6R5GL+Y9M/zOHCl9Gk0CNkN2X4DPn51FwAYNd
YKFaOnumVBtZZ6BmHLpWaPkszxqANctg3iBBqnLamHyXC+IQ9kW1/XfwGCS9nDGd
Ih6GqALv5zDxehGKS0plbDKV3tVmlujXX5vszWlzhkFuQOTpbxe2JG25yz5snBrv
sGUFtcfNx4nciztLols1kEnMJClCq/IRe5sQo0uExb56cK9LhKX524fi3DdPFrI=
=BPh6
-----END PGP SIGNATURE-----


More information about the metrics-team mailing list