Hey!
https://2019.www.torproject.org/docs/running-a-mirror.html.en indicates that the website and distribution directory currently require 30GB and to expect up to 50GB.
dist is currently over 80GB. Is this normal/expected?
Just wondering if this is temporary, or if I should provision a bit more disk space?
On Fri, Jul 23, 2021 at 10:40:30AM -0600, Dave Warren wrote:
https://2019.www.torproject.org/docs/running-a-mirror.html.en indicates that the website and distribution directory currently require 30GB and to expect up to 50GB.
dist is currently over 80GB. Is this normal/expected?
Yes. It depends mainly on how many versions of Tor Browser are on dist at once, and with a stable version and an alpha version, and new releases coming out, sometimes there are quite a few versions published at once.
Just wondering if this is temporary, or if I should provision a bit more disk space?
It's worse than that -- the running-a-mirror page that you point to is on the old website, and there is no equivalent on the new website. We have no plans currently for how to make good use of third-party website mirrors. We used to send them to people with gettor (for censored users who can't reach our main websites), but for now putting content on github and archive.org seems like an easier more scalable approach.
So I think we have the old mirror operators in limbo wondering if it's useful to continue.
Is it? Should we shut this mirror thing down more thoroughly? Or try to rescue it to be useful in a new way?
--Roger
On 7/23/21 11:29 PM, Roger Dingledine wrote:
On Fri, Jul 23, 2021 at 10:40:30AM -0600, Dave Warren wrote:
https://2019.www.torproject.org/docs/running-a-mirror.html.en indicates that the website and distribution directory currently require 30GB and to expect up to 50GB.
dist is currently over 80GB. Is this normal/expected?
Yes. It depends mainly on how many versions of Tor Browser are on dist at once, and with a stable version and an alpha version, and new releases coming out, sometimes there are quite a few versions published at once.
Just wondering if this is temporary, or if I should provision a bit more disk space?
It's worse than that -- the running-a-mirror page that you point to is on the old website, and there is no equivalent on the new website. We have no plans currently for how to make good use of third-party website mirrors. We used to send them to people with gettor (for censored users who can't reach our main websites), but for now putting content on github and archive.org seems like an easier more scalable approach.
So I think we have the old mirror operators in limbo wondering if it's useful to continue.
Is it? Should we shut this mirror thing down more thoroughly? Or try to rescue it to be useful in a new way?
As you said, github, gitlab, archive.org are probably more scalable, and maybe harder to block (it's practically domain fronting). Not only that, but they aren't run by random people. And the Tor Project controls updates... for good... or bad [1].
I've talked to you before about mirrors, on IRC, I offered to take up the mirrors project, currently the web/mirrors page is saying there isn't a maintainer.
What I see is a nearly gone thing. No maintainer, outdated website, better?/other ways for distribution. I personally think, and I say this hosting a mirror [2], it should be shut down for good. People will probably continue to create new mirrors... I did. Is it worth their time and effort?
I'd love to hear what thoughts others have, if there is something, or some way this can be rescued, some way for it to be useful.
--Roger
[1] I see it's still 10.0.14 on github. [2] https://tor.encryptionin.space. No downloads, partially because they take up damn much space, and partly because I also mirror support and point people at my page when clicking how to verify TB, can they trust me?
(Responding to a couple messages in one, since it is all quoted here anyway)
On Fri, Jul 23, 2021, at 18:12, hackerncoder wrote:
On 7/23/21 11:29 PM, Roger Dingledine wrote:
On Fri, Jul 23, 2021 at 10:40:30AM -0600, Dave Warren wrote:
https://2019.www.torproject.org/docs/running-a-mirror.html.en indicates that the website and distribution directory currently require 30GB and to expect up to 50GB.
dist is currently over 80GB. Is this normal/expected?
Yes. It depends mainly on how many versions of Tor Browser are on dist at once, and with a stable version and an alpha version, and new releases coming out, sometimes there are quite a few versions published at once.
Just wondering if this is temporary, or if I should provision a bit more disk space?
It's worse than that -- the running-a-mirror page that you point to is on the old website, and there is no equivalent on the new website. We have no plans currently for how to make good use of third-party website mirrors.
I'm aware it was outdated, but it was the most modern thing that I could find, so it was a reasonable starting point for discussion (and certainly would cause someone to correct me if there was a more recent reference).
I've never really seen the point of full website mirrors, but /dist/ download mirrors seemed like it could have value. And at least at one point there was enough traffic to justify it. The website mirror took approximately zero extra resources, so it didn't make sense to not set it up too.
When I needed a bunch of disk space for another project some moons ago I ended up proxying requests to my /dist/ to the official site with a little local cache. This worked pretty well, and at the time was pulling "enough" downloads per month that it felt like it was worth keeping, although there is no way to know how many are "real" (otherwise censored) users. Once things went back to normal I returned to mirroring normally, and don't currently track anything closely enough to get any indication of utilization.
Does anyone running a mirror have useful bandwidth/hits/something analytics? I could start tracking requested URLs and number of bytes transferred again easily enough myself too, but maybe someone has done the work.
As you said, github, gitlab, archive.org are probably more scalable, and maybe harder to block (it's practically domain fronting). Not only that, but they aren't run by random people. And the Tor Project controls updates... for good... or bad [1].
I must admit, with the list of mirrors being public anyway, I've wondered why someone actively trying to block Tor wouldn't just pull the mirror list and automate it into their firewall. On the other hand, there was legitimate traffic.
What I see is a nearly gone thing. No maintainer, outdated website, better?/other ways for distribution. I personally think, and I say this hosting a mirror [2], it should be shut down for good. People will probably continue to create new mirrors... I did. Is it worth their time and effort?
Sadly this is probably the case.
I was working on setting up ClamAV when they pulled the plug on that one and switched to Cloudflare, which makes sense for their use-case. I still run a SpamAssassin rule mirror, and I'm surprised they haven't done the same, although I think the pool of mirror operators is reasonably stable.
Tor needs to be distributed a bit more widely than any one single provider or CDN, but I certainly wouldn't start a project getting dozens of small/independent mirrors if it didn't already exist. On the other hand, since the infrastructure and volunteers are already here, I'm not sure if it makes sense to pull the plug? But the list of mirrors should be utilized somewhere, somehow.
It also occurs to me that if I were building something new today, some mechanism for tor nodes themselves to proxy http and https requests from the public internet would be relatively straightforward to implement, creating a wide network of sources for the files without requiring individual mirror operators, without replication, without disk space consumption, etc. But again, probably more trouble than it would actually be worth at this point.
On 7/28/21 12:32 AM, Dave Warren wrote:
As you said, github, gitlab, archive.org are probably more scalable, and maybe harder to block (it's practically domain fronting). Not only that, but they aren't run by random people. And the Tor Project controls updates... for good... or bad [1].
I must admit, with the list of mirrors being public anyway, I've wondered why someone actively trying to block Tor wouldn't just pull the mirror list and automate it into their firewall. On the other hand, there was legitimate traffic.
What I see is a nearly gone thing. No maintainer, outdated website, better?/other ways for distribution. I personally think, and I say this hosting a mirror [2], it should be shut down for good. People will probably continue to create new mirrors... I did. Is it worth their time and effort?
Sadly this is probably the case.
Tor needs to be distributed a bit more widely than any one single provider or CDN
Which is what we kinda have. For those that can reach the tpo.org, certainly, it is one provider. But for those that can't (or those that for some reason decide not to use tpo.org), there are at least 3 others.
On the other hand, since the infrastructure and volunteers are already here, I'm not sure if it makes sense to pull the plug? But the list of mirrors should be utilized somewhere, somehow.
That's the hard part. I don't think mirrors are going to be used for load balancing, at least not now. And so what are they good for?
"For people who can reach our website, we have our own webservers that we run. We've been making sure to scale up our webservers to be able to handle the people who want to look at our website and can reach it. So I don't think anybody is speaking of using mirrors from random internet volunteers to replace the website for those who can reach it." [1]
It also occurs to me that if I were building something new today, some mechanism for tor nodes themselves to proxy http and https requests from the public internet would be relatively straightforward to implement, creating a wide network of sources for the files without requiring individual mirror operators, without replication, without disk space consumption, etc. But again, probably more trouble than it would actually be worth at this point.
Isn't that just... kinda Tor? Or a 1 hop through an exit node. It sounds good, but if we are talking censorship, that won't work. And if users can access tpo.org, there isn't much reason for them to use this. It also puts strain on the network. And it doesn't sound like CDN or in any way taking load of the tpo.org servers? So I don't see what it is supposed to do.
[1] https://gitlab.torproject.org/tpo/web/mirrors/-/issues/31990
On 2021-07-28 06:02, hackerncoder wrote:
It also occurs to me that if I were building something new today, some mechanism for tor nodes themselves to proxy http and https requests from the public internet would be relatively straightforward to implement, creating a wide network of sources for the files without requiring individual mirror operators, without replication, without disk space consumption, etc. But again, probably more trouble than it would actually be worth at this point.
Isn't that just... kinda Tor?
It would be a bootstrapping mechanism, using Tor's strengths to help people who don't (yet) have access to Tor get started with plain http and/or https.
Or a 1 hop through an exit node. It sounds good, but if we are talking censorship, that won't work.
No more or less than a public list of mirrors. And admittedly whether mirrors have any ongoing value or not is a part of this topic in general.
Some particularly dumb filters (literally everything DNS based, which is all the rage) may block torproject.org specifically but not be designed to block widely). And of course DNS filters can't block any direct-to-IP URL.
A Twitter bot or other mechanism could provide users with a URL as needed, or any other existing mechanism that provides mirrors today.
And it doesn't sound like CDN or in any way taking load of the tpo.org servers?
If load on the torproject.org infrastructure is a factor then the tor nodes could cache, although that was more than I was initially thinking. My guess is that only a handful of files are being actively used at any particular moment. But this would require some sort of configuration knobs to allow server operators to configure how much disk space and/or memory they're willing to allocate, and if local caching was required then this would need to be disabled by default.
I don't believe the main download site has capacity problems, although I genuinely have no idea. If this was a significant factor, then tor nodes might even contact other tor nodes to pull from their cache rather than pulling it from somewhere central. But we're approaching recreating IPFS at this point, and frankly, doesn't seem like a problem that needs to be solved.
If torproject.org load is a concern, having tor nodes cache updates for torbrowser's autoupdate mechanism would probably be beneficial too.
So I don't see what it is supposed to do.
Zero-effort dynamic mirrors that are always up to date, widely distributed, consuming no resources except when utilized.
Sure, tor node IPs are not secret and but neither are mirror URLs today.
To be clear, I'm not sure that this is actually useful in today's internet, I'm more thinking about what I would build today if there wasn't already a pool of mirrors and wanted to develop something new.
tor-mirrors@lists.torproject.org