[metrics-team] Hello from blackbird

Karsten Loesing karsten at torproject.org
Thu Feb 28 09:00:55 UTC 2019


Hi blackbird,

sorry for the late reply!

On 2019-02-22 23:05, Su Yu wrote:
> No worries. It works now!
> 
> I did some quick plotting in Jupyter notebook (see below; the figures
> are also attached separately). Regarding the relays vs. bridges question
> that you mentioned, it seems the bridges are better at keeping
> themselves not /too/ outdated, but they're actually not that different
> in keeping up-to-date?

Thanks for making these graphs. Though it's hard (for me) to interpret
these results.

One reason might be that these graphs are considering a time frame of
over 1 decade. A lot of things have changed over that time frame:

 - The network has grown a lot over the years, which means that recent
years have a greater weight in those graphs than distant years. This
doesn't have to be a bad thing, it's just probably not intended and
possibly surprising when interpreting the results.

 - Release cycles have changed, with a much shorter cycle in the last
year or two as compared to earlier years. This may skew results even more.

If I were to continue this analysis I'd try to look more at changes over
time. Things I'd look at:

 - How does the shorter release cycle affect update behavior? It's
probably useful to look at Tor's change log to get an idea when versions
have been updated, when versions have been sunset, and which versions
have long-term support.

 - Is there a way to distinguish relays and bridges using unattended
updates? This may require including external data when packages have
been updated. Many relays and bridges are running on Debian or other
Linuxes, so maybe there's a way to include package histories somehow.

> It would also be interesting to look at the regional data, etc.

That would be interesting, too. It requires me to write new code and run
it. Let's put it on a list of potentially interesting things to look at,
and I'll do it if we run out of interesting things to look at using the
currently available data.

Another interesting aspect might be to look at different groups of
relays and bridges: fallback directories, directory authorities,
hard-coded bridges, and so on. This requires writing new code, too.

Yet one more aspect is whether relays and bridges on dev versions are
different from those on non-dev versions? Dev versions that built from
source and have never been recommended. One would think that these
versions are better kept up-to-date, but it might also be the case that
folks build tor from source and then just leave it running forever.

So many things to look at!

> blackbird

All the best,
Karsten



> 
> 
> 
> Karsten Loesing <karsten at torproject.org <mailto:karsten at torproject.org>>
> 于2019年2月22日周五 下午2:08写道:
> 
>     On 2019-02-22 19:10, Su Yu wrote:
>     > Sure - sorry, I forgot to do "reply all".
> 
>     No worries!
> 
>     > Thank you for sending me the data and code! Could you let me know what
>     > encoding it is for the csv? I got encoding error and could not read it
>     > using Python or my Mac..
> 
>     Gah, I compressed the file using xz and somehow included the wrong link.
>     In fact, I'm surprised the link I gave you even worked, probably some
>     Apache magic at play. You can either rename the file to
>     recommended.csv.xz or download it using the correct link:
> 
>     https://people.torproject.org/~karsten/volatile/recommended.csv.xz
> 
>     Hope this works better. Sorry for the confusion!
> 
>     All the best,
>     Karsten
> 
> 
>     >
>     > Thanks,
>     > blackbird
>     >
>     > Karsten Loesing <karsten at torproject.org
>     <mailto:karsten at torproject.org> <mailto:karsten at torproject.org
>     <mailto:karsten at torproject.org>>>
>     > 于2019年2月21日周四 下午4:14写道:
>     >
>     >     Hi blackbird,
>     >
>     >     I'm adding the mailing list back, so that others in the team
>     (or whoever
>     >     else is subscribed) can share their thoughts on this.
>     >
>     >     I just looked at my code, and I think I should clean that up a
>     little
>     >     bit before making it available. It might not be as useful for
>     you in its
>     >     current form.
>     >
>     >     What I can share at this point is the .csv file and the R code
>     to plot
>     >     the graph I shared earlier:
>     >
>     >     https://people.torproject.org/~karsten/volatile/recommended.csv
>     >
>     >   
>      https://people.torproject.org/~karsten/volatile/recommended-2019-02-02.R
>     >
>     >     Maybe you have ideas for visualizing this data in a more
>     useful way.
>     >
>     >     Of course, this data doesn't help with going deeper into the
>     questions I
>     >     mentioned in my earlier reply. I could either clean up my code or
>     >     provide you with more detailed data if you tell me what you need.
>     >
>     >     Thanks!
>     >
>     >     All the best,
>     >     Karsten
>     >
>     >
>     >     On 2019-02-21 18:38, Su Yu wrote:
>     >     > Hi Karsten,
>     >     >
>     >     > Thanks for your reply! Yes, this is definitely an
>     interesting topic. I
>     >     > am happy to look at the data and code, and see what can be done
>     >     from there.
>     >     >
>     >     > blackbird
>     >     >
>     >     > Karsten Loesing <karsten at torproject.org
>     <mailto:karsten at torproject.org>
>     >     <mailto:karsten at torproject.org
>     <mailto:karsten at torproject.org>> <mailto:karsten at torproject.org
>     <mailto:karsten at torproject.org>
>     >     <mailto:karsten at torproject.org <mailto:karsten at torproject.org>>>>
>     >     > 于2019年2月21日周四 上午10:55写道:
>     >     >
>     >     >     On 2019-02-21 03:37, Su Yu wrote:
>     >     >     > Hello everyone,
>     >     >
>     >     >     Hello blackbird,
>     >     >
>     >     >     > My name is Elise (usually known online as
>     “blackbird”(lower
>     >     >     case)). I have been interested in working with Tor for
>     some time;
>     >     >     recently I met Alison in an event, and she kindly
>     directed me
>     >     here.
>     >     >
>     >     >     Glad to meet you here!
>     >     >
>     >     >     > A little about my background: I am a PhD student doing
>     some data
>     >     >     mining/machine learning-related work. My specializations are
>     >     mainly
>     >     >     in deep learning, network analysis, and data
>     visualization. I
>     >     write
>     >     >     Python, know a little Java and R, and some misc
>     languages. I would
>     >     >     be most interested in doing some measurement of the Tor
>     network’s
>     >     >     structure, if possible.
>     >     >
>     >     >     I might have something. I started an analysis of tor
>     software
>     >     versions
>     >     >     in the Tor network three weeks ago, but I can't seem to find
>     >     the time to
>     >     >     dig deeper into it.
>     >     >
>     >     >     I wonder if you'd like to pick this up, see if you can find
>     >     interesting
>     >     >     insights in the data, make some fine graphs, and tell us
>     what
>     >     you found?
>     >     >
>     >     >     Here's what I produced so far:
>     >     >
>     >     >   
>     >   
>       https://people.torproject.org/~karsten/volatile/recommended-2019-02-02.pdf
>     >     >
>     >     >     This graphs shows how quickly relays and bridges in the Tor
>     >     network
>     >     >     update their tor software versions. It shows this for
>     the entire
>     >     >     network.
>     >     >
>     >     >     Maybe there are parts of the Tor network that update
>     their tor
>     >     software
>     >     >     versions faster? The bridges that are hard-coded in Tor
>     >     Browser come to
>     >     >     mind, as do the directory authorities and fallback
>     directories
>     >     shipped
>     >     >     with the tor software. Maybe relays on some operating
>     systems
>     >     update
>     >     >     their version faster than on others? Some countries earlier
>     >     than others?
>     >     >     Home-run relays on dynamic IP addresses differently from
>     those
>     >     run in
>     >     >     data centers?
>     >     >
>     >     >     There might be others on this list with more questions
>     on this
>     >     topic,
>     >     >     all of which we cannot answer yet, because we didn't do
>     a thorough
>     >     >     analysis yet.
>     >     >
>     >     >     I can provide you with data and code that I used for
>     this initial
>     >     >     analysis.
>     >     >
>     >     >     > It is great meeting you, and I look forward to
>     learning more
>     >     about
>     >     >     the team!
>     >     >
>     >     >     Curious whether you'll find this interesting!
>     >     >
>     >     >     > Best,
>     >     >     > blackbird
>     >     >
>     >     >     All the best,
>     >     >     Karsten
>     >     >
>     >
>     >
> 
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 528 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/metrics-team/attachments/20190228/d2af00f9/attachment-0001.sig>


More information about the metrics-team mailing list