tl at rat.io
Wed Nov 16 01:07:08 UTC 2016
Karsten asked me back in January to investigate possible solutions to this problem and I reported back to the metrics-team mailinglist . That mail contains a more detailed evaluation of technical approaches and available solutions.
The technique behind this is called "isomorphic". It has been a hot topic during the last 2 years and is now implemented in many mayor frameworks - see below  for some background.
R/Shiny is probably an easy route to take in the short term but besides the lack of a static version it also IMHO has too many lock ins. R/Shiny only works with R and while R is very common it’s not the only data manipulation language aroud. Any meaningfull data manipulation tool can serialize to CSV or JSON and that’s all you need to feed a visualization engine like D3.js (D3.js is the predominant visualization framework on the web but there are others too, that are less powerful and less initimidating).
Loose coupling of the analytics engine (whatever, if it exports at least CSV or JSON), the web framework (in any case isomorphic, maybe AngularJS because of Apache Zeppelin ) and a visualization engine (like D3.js), would provide a much more flexible and resistant solution.
Regarding Teors question if client side rendering provides anonymity advantages: the size of the data to be transferred to the client is not to be underestimated and it quickly grows. If you need hourly data than the volume is 24x that of daily data. If you want to look at it by country than you might have to transfer 2 orders of magnitude more data (of course it all depends on how elaborately the application is structured and on how outlandish your information desire is). As a rule of thumb you can expect to quickly be in the hundreds of KB and low MB figures. It doesn’t seem efficient to try to hide your real interest in a noisy web app. You should probably rather download the data in big chunks and run it in a local application.
Regarding another question of Teor: any solution will produce a visualization embedded in a webpage that you can link to just as you do now. But right now you have no guarantee that the graph pictured on the page you're linking to won’t change (although I don’t know about the PNGs on the metrics site themselves - maybe they are preserved). Providing a link to a visualization of precisely some chunk of data with specific parameters is in principal possible with an interactive solution. In practice though it depends on the framework - not all of them provide a URL for every state in interactive web applications and the addition of D3.js makes things even more complicated. This would have to be evaluated.
(full disclosure: I went under 3 different handles in the past - "tomlurge", "thms" and "oma" - which is entirely my fault and not cool. but it’s all just me)
There’s also some CMS discussion mixed in - maybe just try to ignore that.
The question on security was resolved later but might surface again with
every new JS library needed.
frameworks that did a lot of work in the browser which before was done on
the server. It was a logical evolution of AJAX/web 2.0 and made the websites
ever more richer in functionality and shinyness. But as more and more
towards the browser initial page load became slower and slow initial page
loads turn customers away. The solution to this problem was called
"isomorphic" web framework: the framework supports rendering the page in the
browser or just as well, with the same logic, on the server (like in the old
and code on the browser and let it do the work or serving a HTML page. This
technique is a solution to the problem described above when only the
startpage and some initial state is rendered server-side and enables very
fast initial page load in the browser while in the background also the JS
libraries and additional data get transferred and, transparent to the user,
the page in the browser becomes interactive. Because this technique provides
the best of both worlds and is a solution to a real problem - loosing users
because of long page loads- it has been implemented in many mayor web
frameworks during the last 2 years.
D3.js is the most prevalent of those libraries and there are examples around
on how to generate SVG graphs with D3 either on the server or make them
render on the client. The whole solution is not without complexities but
it’s not rocket science either.
 An interesting route to examine might be Apache Zeppelin: that is a tool to
visualize data and generate interactive, shareble "notebooks" about findings
in the data. Those findings (data and visualizations) can also be send
around and embedded in web pages.
Technically interesting is that is not primarily a web tool but it is build
with web technolgies, namely D3.js and the AngularJS web framework.
So far I haven’t had time to check for myself but maybe there’s a way to
drive both a website and that tool with the same data that provides a
- from static images of graphs
- to interactive versions on the web
- to full blown locally run analytics.
That to me seems like a path worth investigating with respect to the website
and generally better accessability of metrics data.
 On a website that is static at first, these graphs can be rendered as SVG
additional interactive controls that allow them to tweak parameters of the
enabled, those controls don’t show up and the graph remains static.
> On 14 Nov 2016, at 12:07, Karsten Loesing <karsten at torproject.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> Hi teor,
> On 14/11/16 10:34, teor wrote:
>>> On 14 Nov. 2016, at 02:54, Karsten Loesing
>>> <karsten at torproject.org> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
>>> Hi everyone,
>>> I'm moving this (old but recently continued) thread from an
>>> internal mailing list to this list as suggested by Roger. Any
>>> feedback would be most useful by Wednesday, November 16.
>>> All the best, Karsten
>>> On 07/12/15 14:09, Karsten Loesing wrote:
>>>> Hi everyone,
>>>> we're discussing changing the graphing engine on Tor Metrics
>>>> from generating static images on the server using R/ggplot2
>>>> on the client.
>>>> As a result, Tor Metrics will require Tor Browser users to
>>>> switch to Medium-High Security or lower.
>>>> This switch has some potential with respect to visualizations,
>>>> and the visualization people in the metrics team want to do it.
>>>> It allows producing more graphs like Lunar's bubble graph:
>>>> In fact, it would allow all kinds of visualizations like the
>>>> ones seen in the D3 gallery:
>>>> So, my question is: can you all live with this change?
>>>> The next step will be to ask users on tor-talk@, but I figured
>>>> I should ask here first. If I don't hear any strong objections
>>>> by Thursday, I'll go ask on tor-talk at .
>>>> All the best, Karsten
>>> On 07/12/15 15:22, somebody else wrote:
>>>> fallbacks, correct?
>>>> Likewise, I assume it would be too difficult (for now (?)) to
>>>> on the server and serve them as images in that case? 
>>>>  I'm hand-waving about whether or not this is possible, but
>>>> it seems like it in theory, and at least one person has tried:
> On 08/12/15 00:34, somebody wrote:
>>>> I don't understand why you would want Tor users to lower their
>>>> security in order to get prettier graphs. Is there some
>>>> obvious thing I am missing?
>>>> of web sites I access (via NoScript), and with third party
>>>> PrivacyBadger and sometimes RequestPolicy). Having web sites
>>>> run their own choice of programs in my computer seems like a
>>>> really stupid idea. I can only assume that its prevalence as a
>>>> technique reflects how little respect the average web site has
>>>> for the security of its users.
>>> On 09/12/15 08:45, another person wrote:
>>>> I'd like to know how much work it would be.
>>> On 09/12/15 13:59, Karsten Loesing wrote:
>>>> fallbacks written in a different language would be infeasible.
>>>> But I could imagine that we render D3.js graphs on the server
>>>> using Node.js and return images to the client.
>>>> I'm mostly worried about the lack of interactivity. I'd
>>>> really want to get away from requiring a full round-trip from
>>>> client to server and back just to change something in the
>>>> displayed graph. Maybe we can keep them somewhat interactive,
>>>> or at least add tooltips, by using SVG as image format.
>>>> But please understand that while I'm doing okay at processing
>>>> large amounts of data, I don't know much about web development.
>>>> If people here have ideas, please let me know!
>>>> Note that investigating these options may take a while, mostly
>>>> because people in the metrics team are busy. But we won't
>>>> Thanks for the feedback. This is very helpful!
>>>> All the best, Karsten
>>> On 12/11/16 16:37, Karsten Loesing wrote:
>>>> Hello everyone,
>>>> apologies for digging out such an old thread, but after almost
>>>> starting a new thread on the exact same topic I remembered
>>>> that we had this discussion almost a year ago. I figured it's
>>>> better to continue this thread to avoid discussing the exact
>>>> same things again and instead add new thoughts below.
>>>> So, on Thursday, Linda, iwakeh, and I met in Berlin to talk
>>>> about making the Tor Metrics website more usable. We came up
>>>> with some pretty good ideas to better address the various user
>>>> types---from journalists to data scientists---coming to the Tor
>>>> Metrics website to learn interesting things about the deployed
>>>> Tor network. We'll share results with you in a few weeks from
>>>> now when there's something to see and click on.
>>>> once more.
>>>> To be clear, our immediate plans---including reorganizing
>>>> information and displaying sparklines as quick entry points
>>>> even our planned improvements might work without
>>>> dashboard with user-selected graphs---can be made to work
>>>> The question is: do we have to keep stretching by avoiding a
>>>> web techology that would make our lives and the lives of our
>>>> users so much easier?
>>>> This isn't really something that we can work around by
>>>> want us to switch to another graphing framework such as R Shiny
>>>> (which I used for the webstats prototype) or D3.js (which we
>>>> currently use on Tor Metrics just for the bubbles graph, though
>>>> not in the most efficient way).
>>>> - From a development perspective this switch would make a lot
>>>> of sense, because we'd have to write a lot less code for new
>>>> graphs and because there'd be potential contributors out there
>>>> who'd appreciate working with a known framework. Our current
>>>> graphing engine doesn't scale much longer, and this does slow
>>>> us down.
>>>> because we can. I'm in favor of keeping all the parts of Tor
>>>> Metrics where we provide textual or static information entirely
>>>> But the parts of Tor Metrics where we're providing
>>>> visualizations and letting users explore our data would require
>>>> we shouldn't be maintaining two graphing engines.
>> Are there any privacy or security advantages to having client-side
>> For example, if we download data from the server, and then render
>> what the client is requesting and visualising.
>> improve client privacy in this way? (Or does it cost too much
>> effort or too much bandwidth to pull down larger datasets just to
>> hide what the client is looking at?)
> Fine question. I assume most Tor Metrics users don't care that much
> about leaking to the server what part of the data they're looking at.
> And I assume that those who do might download the CSV files and look
> at them locally. But my assumptions might be wrong.
> something like Shiny, graphs will still be generated on the server,
> and there wouldn't be any difference with regard to client privacy.
> If we use D3.js, the browser downloads the data it needs and produces
> a graph locally.
> I think, overall, I wouldn't add Tor Metrics client privacy towards
> the server as a hard requirement, because we already have too many of
> those. If the solution we decide on offers more client privacy than
> the current solution, great. But if it doesn't, okay.
>>>> So, how do we decide this? I believe that this should be a
>>>> Tor-wide decision. My main worry is that we're sinking weeks
>>>> and weeks of development effort into this switch without many
>>>> Tor people noticing, and then once we publish and they get
>>>> aware, we need to roll back, wasting all the effort.
>>>> But to be honest, we're wasting effort right now by keeping the
>>>> workarounds and implementing hacks with dynamically generated
>>>> HTML forms and potentially dozens of parameters just to avoid
>>>> Here's my suggestion: unless somebody raises a valid concern
>>>> by next Wednesday, November 16, we're putting this on the hold
>>>> again. (That's the day before the next metrics team meeting and
>>>> Vegas meeting, and it should be enough time to raise your
>>>> Otherwise we'll switch.
>>>> Oh, and if you're in favor of switching, please consider
>>>> saying that, too. Thanks.
>>>> All the best, Karsten
>>> On 13/11/16 05:04, Roger Dingledine wrote:
>>>> Does this mean that we'd be breaking the "download a static
>>>> version of the graph" feature too? I use that feature a lot for
>>>> grabbing snapshots to put into presentations, and we also want
>>>> to use it for pulling in metrics graphs in blog posts, e.g.
> as well as for external journalists.
>>>> Oh, I should also say this would be a great topic for the
>>>> tor-project list, since it doesn't need to be sekrit (and
>>>> since then other people could know that it's a topic we're
>>>> considering, and maybe even help us make the right decision).
>>> On 13/11/16 10:08, Karsten Loesing wrote:
>>>> It's quite easy to implement that feature using Shiny, for
>>>> example. Either Shiny would produce a .png file that you can
>>>> download using your browser's "Save Image As..." feature, or
>>>> there would be a "Download Graph" button. It would be just a
>>>> few lines of code.
>>>> And we'd be able to provide features like a "Download graph
>>>> data" button with just a few more lines of code, which would
>>>> require a lot more effort right now.
>> "Download Graph" or "Download graph data" buttons?
>> Would it be possible to give others a URL for a specific graph,
>> without having to save the graph on some other site? Would users
>> If we can't do this, it would be bad for how I and many others
>> typically use Tor Metrics on mailing lists.
> Right, very good point. I don't know Shiny enough yet to give you an
> answer with absolute certainty, but I think that we should be able to
> provide a page with a static image that can be used on mailing lists.
> That page would just have no controls for further customizing the
>> (And if we do allow this feature, I'm not sure how that's any
>> different from server-side rendering of graphs for clients that
> One major difference is that Shiny, for example, allows us to generate
> a user interface with just a few lines of code, whereas we'd otherwise
> have to write and maintain that user interface ourselves. And I'm not
> very optimistic to find a framework similar to Shiny that works
> requirement for most websites.
>> I'm also not sure what you mean by "tables" or "Download graph
>> data" - will there still be CSV data downloads available?
>> tables, and the raw data in the static CSVs?
> The following is not at all final yet, but ideally, all parts that are
> updated twice per day in the background can be obtained without
>> In general, I would prefer to be able to use the Tor Metrics
>> off, and then also provide websites where they need to turn it on.
>> But if the advantages outweigh the security and consistency
>> considerations, I'd be ok with it.
>> (Just like I use Atlas, and the Metrics Bubble graphs.)
> Right, this possible inconsistency is what has stopped me in the past.
> And I think if we really want to be consistent here, we'll have to
> rewrite Atlas and the Metrics bubble graphs to not require client-side
> browser and rewrite those parts.
> But do we really want this? If we recommend that people turn
> turn it on again, isn't that inconsistent, too? I think no, because
> we acknowledge that not everybody has the same requirements. And in
> this case I believe that the target audience of Tor Metrics is not
> exactly the same as the set of high-security Tor Browser users.
> Tor Metrics case might influence future decisions for other
> torproject.org websites. Not an easy decision.
>> have some level of basic functionality, even if it's only static
>> images and tables (which I think should be available for offline
>> use as well).
> Right. Two examples for parts of Tor Metrics that would require
> change lines, add new lines, and so on, and (2) a dashboard where
> users can configure the graphs they want to see next to each other.
> website, learn about where the data comes from, download the raw or
> pre-processed data, and contemplate the graphs posted on mailing lists.
> Thanks for your input here!
> All the best,
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - http://gpgtools.org
> -----END PGP SIGNATURE-----
> tor-project mailing list
> tor-project at lists.torproject.org
More information about the tor-project