[metrics-team] OnionStats - roadmap?

Karsten Loesing karsten at torproject.org
Mon Aug 8 13:13:44 UTC 2016


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/08/16 13:02, Anathema wrote:
> On 08/08/2016 10:44, Karsten Loesing wrote:
>> 
>>> - results are returned for both bridges and relays when using 
>>> 'limit' and 'offset'
>> 
>> The reason for the current behavior is that clients can easily 
>> implement paging of results by setting limit to the number of
>> results they want to display and offset to the number of results
>> on earlier pages that they want to skip.  That won't work anymore
>> with the suggested change.  I'd consider this a bad
>> backward-incompatible change, unless there's a reason for making
>> this change that I'm overlooking.
> 
> Even with my implementation, 'offset' and 'limit' work as expected,
> so I didn't get "That won't work anymore with the suggested
> change".

Maybe an example helps here: assume you have 55 relays and 45 bridges
and ask for offset=50 and limit=10.  Your implementation will return
relays 51 to 55, the current implementation will also return bridges 1
to 5.

> The reason I output both relays and bridges (instead of only
> relays) is for simplicity: let's assume I want 10 bridges and 10
> relays. With the current protocol I've to perform 2 queries. With
> the proposed implementation, it's just 1 query.
> 
> Of course performance changes since returning both nodes is more 
> intensive then just one, but it's just a matter of hardware and
> software resources that we can deal with.

It's also client resources you're talking about here, and some queries
can return a lot of data.

> I'm open to suggestions and I don't have any problem in switching
> back to the backward-compatible change, if the list thinks it's the
> case.

Please change it back.  It's broken as you describe it.

>>> - 'order' parameter's value can be any field, so it's not
>>> limited to the 'consensum_weight'
>> 
>> Oh yes, this one is a good change, and it should be 
>> backward-compatible.  We might have to specify how sorting works
>> for some fields.  For example, are or_addresses sorted
>> alphanumerically? Do we sort by first address in or_addresses or
>> by the (alphanumerically) smallest?  How do we handle missing
>> values?
>> 
> 
> At the moment, IPv4 is treated as string, since I didn't mapped
> the field as 'ip'. You can read more about it here: 
> https://www.elastic.co/guide/en/elasticsearch/reference/current/ip.html
>
>  I need to change the mapping and reindex, it shouldn't take long.
> As you can read, IPv6 is not supported yet.
> 
>> So, now that I'm listing these possible issues, would it be
>> easier to start with the fields that are easy and add more
>> complicated fields later?
> 
> Since I'm leveraging the full ElasticSearch capabilities, the
> sorting fields are already build in, so it would be worst to
> "cripple" ElasticSearch and then "improve" it.

We'll have to specify how things are sorted anyway, because we can't
just say "however ElasticSearch sorts it".  Rather than adding all
fields at once we could start with the ones that need little to no
discussion and then move on to the more complex ones.  I mean, if you
want to go through all of them and specify sorting orders, okay.

I guess the real issue is that we're looking at this from two
different angles.  I'm looking at the existing protocol whether it can
be implemented using ElasticSearch, and you're looking at your
existing ElasticSearch implementation and thinking how to make it
implement the current protocol.  However, the user won't care how the
protocol is implemented, they'll just happily notice that there are
extensions that haven't been there before.  And on the other hand, if
we change the protocol without good reason, current Onionoo client
developers will ask WTF we're thinking.  Hope that makes sense.

>>> The (negative) differences are: - 'lookup' is not implemented:
>>> I was not able to find a difference between 'lookup' and 
>>> 'fingerprint': can you provide some real examples?
>> 
>> The difference between those two parameters is specified on the 
>> protocol page:
>> 
>> https://onionoo.torproject.org/protocol.html
>> 
>> Here's where we added the fingerprint parameter:
>> 
>> https://gitweb.torproject.org/onionoo.git/commit/?id=8f63e74709cd05cd812e33f95ffe51b05d6d537c
>>
>>
>> 
If it turns out to be difficult to implement that parameter, let's
>> talk more.  Maybe we don't need it anymore.  Removing that would
>> be a backward-incompatible change, but let's see.
> 
> The code you linked seems to be relative to the 'fingerprint'
> parameter. However, it doesn't show me any real case scenario. In
> my testing (on the current protocol), 'lookup' works exactly like
> 'fingerprint'. I can "merge" the two and maybe remove one later.

Did you read the part in the specification describing the differences?

>>> - 'search' does not implement: "any 4 hex characters of a 
>>> space-separated fingerprint" and "beginning of a
>>> base64-encoded fingerprint without trailing equal signs": I was
>>> not able to find any relevant case for those
>> 
>> Also mentioned at the meeting:
>> 
>> Sometimes people paste fingerprints from other sources into Atlas
>> or other Onionoo clients, and we should return matching relays to
>> them. We're only returning relays and bridges matching all search
>> terms, so we'll have to store all 4 hex character blocks of a
>> fingerprint and make them searchable.  Let me know if you have
>> more questions about this.
>> 
> 
> Implemented! 
> https://github.com/davinerd/onionoo-ng/commit/1fb3b1ec56d57d697587222fe70fd62133b2b6b4
>
>  I've just one doubt about the "4 hex character blocks": with my 
> implementation you can search for 1 hex block, doesn't matter if
> the first, the second, the third or the fourth. To be honest, I
> don't even enforce the block to be 4 chars, but it can be even 1
> (of course, the amount of data returned will be huge).

That's also the case with the current protocol.  It's not pretty, but
it did the job of returning relays by space-separated fingerprint.
Clients that worry about receiving too many results should always
include the limit parameter.

What we might do, though that would be a backward-incompatible change,
is to add the space-separated fingerprint to the search index as a
single string rather than ten strings.  Example: Tonga has fingerprint
4A0CCD2DDC7995083D73F5D667100C8A5831F16D, which we add to the search
index as "4A0C CD2D DC79 9508 3D73 F5D6 6710 0C8A 5831 F16D", and when
we receive a search for "Tonga 4A0C CD2D DC79", we find that both
nickname and beginning of the space-separated fingerprint match that
query.  But I'm not sure how much that improves.

>> Hope this helps to get your code closer to the current Onionoo 
>> protocol.  Ideally, you'd be able to deploy an Atlas version
>> that points to your Onionoo server and offer that to users.  Let
>> me know if you need help with that.
> 
> I'll try that after we work out the last issues, thanks!
> 
> As a side node, I'd like you to look at 
> https://github.com/davinerd/onionoo-ng/issues/1 (it's the
> explanation and fix for the 'offset' issue iwakeh found during the
> meeting) and let me know your thoughts.

Can you sketch out your suggestion as far as it concerns the protocol
level?  (I can't look at code, or I'll spend half an hour there and
even more people will wonder why their emails are left unanswered.)

> Another thing: I implemented the 'summary' document in a separate 
> branch: https://github.com/davinerd/onionoo-ng/tree/summary_doc
> feel free to take a look and comment.

Same, I'd like to keep this on the conceptual level for now and not
look at code or try out the prototype.

> I've just one question about the summary document, but I'll start a
> new thread.

Okay.

All the best,
Karsten

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJXqIWIAAoJEC3ESO/4X7XBvZAH/0lt+smq+DBIW6+j48O5fGbY
5O8rhMw9bFmfXCA1L8HPUm5S+A7muYFnkmGE37udQAYc47E9ftTprGAJn5q0WQRt
aX9VQIodp3Y/y/ASkmihQE/fyF2mrdpmCaCwDZeB87NOp37FHxYTJJhn3kiiZzVj
ofIagH0O0NzrfNuJj1CbhBf02JxwlJNuvf4ZcM0rAiXKAu/eSJPX0Y+c6Q5FlAFE
kws5f3z2afvT5l4G2VpnMm/TaqRhnq8ceo4X4uTkhnxMHksk7GTXwHJrh6BsbE+z
P/S18H3WrcxnJA8SHNZyy0AdzbOaxGFlV1ZNyfCwzoClsjM+bSgHUCXAICG0yxg=
=SJIr
-----END PGP SIGNATURE-----


More information about the metrics-team mailing list