[metrics-team] OnionStats

Karsten Loesing karsten at torproject.org
Sat Jun 11 17:49:37 UTC 2016


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Anathema,

On 10/06/16 15:26, Anathema wrote:
> Sorry for the late reply but I'm on a long holiday :)

That's a very good reason. :)

> On 06/06/2016 14:13, Karsten Loesing wrote:
>> First, here's a bug report (from a few days ago when I made my
>> first attempt in responding, not sure if this bug still exists):
>> When I enter "nickname:default platform:linux" in the search
>> field and select "Last 6 months" for "First Seen", I get 1 result
>> which was first seen 2016-03-06 08:00:00.  But when I switch to
>> "Last 3 months" (2016-03-02 to 2016-06-02) I get 0 results.
>> Likewise, when I pick "nickname:default" as search, last 6 months
>> gets me 290 entries and last 3 months 0.
>> 
> Will investigate further, thanks.
> 
> However, I search for the last 6 months today (2015-12-10 -
> 2016-06-10, nickname:default platform:linux) and I've 2 nodes (the
> same, not sure why this is happening), first seen 2016-03-06
> 08:00:00. If I put a custom range (2106-03-02 - 2016-06-10) I still
> get the two nodes.
> 
> Did I miss something?

Not sure if you mean this, but did you read the last sentence of the
"last_seen_days" parameter spec?


https://onionoo.torproject.org/protocol.html

"""
 last_seen_days

Return only relays or bridges which have last been seen during the
given range of days ago. A parameter value "x-y" with x <= y returns
relays or bridges that have last been seen at least x and at most y
days ago. Accepted short forms are "x", "x-", and "-y" which are
interpreted as "x-x", "x-infinity", and "0-y". Note that relays and
bridges that haven't been running in the past week are not included in
results, so that setting x to 8 or higher will lead to an empty result
set.
"""

Obviously, the same limitation applies to a dump of all details
documents and implementing something similar to the "last_seen_days"
parameter.

>> Second, maybe you noticed that Onionoo has become faster lately.
>> The reason was a problem with Apache's disk cache, causing all
>> ~1M/h requests to go through the Java process.  That's what you
>> can see in the last few weeks in the following graph.  We're now
>> back to a few 10k/h.
>> 
> 
> Yeah I noticed that, it's great!

Aaaaand, it has become even faster after moving to much faster
hardware and using a different web cache in front of the Onionoo web
server.

>> https://people.torproject.org/~karsten/volatile/onionoo-total-requests-2016-06-05.png
>>
>>
>> 
I wonder how your stack compares to these new statistics:
>> 
>> Request statistics (2016-06-06 11:20:55, 3600 s): Total processed
>> requests: 15599 Most frequently requested resource: details
>> (15278), summary (118), bandwidth (99) Most frequently requested
>> parameter combinations: [lookup, fields] (14474), [lookup] (722),
>> [search] (245) Matching relays per request: .500<2, .900<2,
>> .990<4, .999<16384 Matching bridges per request: .500<1, .900<1,
>> .990<1, .999<8192 Written characters per response: .500<256,
>> .900<512, .990<16384, .999<16777216 Milliseconds to handle
>> request: .500<16, .900<16, .990<64, .999<256 Milliseconds to
>> build response: .500<4, .900<16, .990<512, .999<16384
>> 
> I did some benchmarks that I reported: did you took a look at
> them?

Briefly, but I had some trouble opening the .zip file using an online
service and was then confused by the many files.  Would you mind
uploading non-zipped files to a web server, and can you include a
short explanation what I should be looking at?

> I'd like to reproduce the numbers posted above: do you have a
> script or a methodology to do that? The only think I was able to
> find was the grunt-api-benchmark that I used but I'm not sure how
> those numbers compare with yours, even because I'm not sure I
> understood those metrics:
> 
>> Matching relays per request: .500<2, .900<2, .990<4, .999<16384
> 
> means that in .500 milliseconds you returned 2 relays, then in
> .900 milliseconds you returned 2 nodes, and so on?

Ah, sorry, these stats are indeed not self-explanatory.  The above
line means that 50% (.500) of all requests had less than 2 relays
matching the query, which means 0 or 1.  Same goes for 90%.  99% has
fewer than 4 relays, which means that somewhere between 90 and 99%
there were more than 1 relays in the result.  Finally, 99.9% of
requests had less than 16Ki relays in their results.  The reason why
these are all powers of 2 is that it was cheaper to store statistics
in memory.

This code is implemented in the Onionoo server itself, so I'm afraid
there's no easy way to extract that and run it on your service.  But I
don't think you need to do that if you already have a working system
for measuring performance.

>> Third, when you say you want to throw the entire CollecTor
>> archive into Elasticsearch and have it indexed and searchable in
>> a useful manner, I think you're underestimating how much data is
>> available there.  Well, assuming similar computing resources,
>> that is.  But if you believe I'm wrong, here's a possible use
>> case: https://exonerator.torproject.org/.  Would you want to
>> build a prototype that throws all consensuses since 2007 into
>> Elasticsearch and lets users search by relay IP address or /24
>> and valid-after date +/- 1 day?
>> 
> I'd like to, when I've time :) It's on the TODO list however!

Cool!

>> Last, sorry for not responding to all details in this thread yet.
>> I appreciate your effort in contributing to Tor metrics, I'm just
>> not good at replying to long threads, even if they're really
>> interesting. The shorter the message and the easier to reply to,
>> the earlier you'll get a response, promised! :)
>> 
> 
> You're right, I'm more used to long emails. Let's start keeping
> those short! :)

Haha, cool.  And please also tell me when I'm starting to write
lengthy emails.

Enjoy your holidays!

All the best,
Karsten

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJXXE8xAAoJEC3ESO/4X7XBlxcIAJPqunCoTie+Oh+NDQSjN8hX
4nW4REzOuu2PVze7K0L/AUA51WbpM78zRmua7NMjv3LEXy5Nkk/XUAyoYSmUvdUX
r8orMnnSeMZmCHCGyeFF06LczjMIeUuHy0IEHmWogfOyMJyD5JffKCJbW4gIYUFd
0etB7TNgsZi4Ebm/bBPPlO05e2bUdh9HDhXz8ICunFcQT//B41nRaJhw1a5xYeQy
BioUQx3tybrzutiZpImj7cEzyg/SpBuBMH4P/qISYRu1gnvC5chOxy52byKHGddi
uAr6OhRDfsx/aGpY95A/UL4PuCw/wSlbyTLOfQl7ODTa45PDuU9lziDW2CIEFjA=
=TXFV
-----END PGP SIGNATURE-----


More information about the metrics-team mailing list