[tor-project] minutes from the sysadmin meeting

Antoine Beaupré anarcat at torproject.org
Thu Nov 19 16:44:33 UTC 2020


# Roll call: who's there and emergencies

gaba, hiro and anarcat on mumble, weasel (briefly) checked in on IRC.

No emergencies.

# BTCPayServer hosting

https://gitlab.torproject.org/tpo/tpa/team/-/issues/33750

We weren't receiving donations so hiro setup this service on
[Lunanode][] because we were in a rush. We're still not receiving
donations, but that's because of troubles with the wallet that hiro
will resolve out of band.

[Lunanode]: https://www.lunanode.com/

So this issue is about where we host this service: at Lunanode, or
within TPA? The Lunanode server is already a virtual machine running
Docker (and not a "pure container" thing) so we need to perform
upgrades, create users and so on in the virtual machine.

Let's host it, because we kind of already do anyways: it's just that
only hiro has access for now.

Let's host this in a VM in the new Ganeti cluster at Cymru. If the
performance is not good enough (because the spec mentions SSD, which
we do not have at Cymru: we have SAS), make some room at Hetzner by
migrating some other machines to Cymru and then create the VM at
Hetzner.

hiro is lead on the next steps.

# Tor browser build VM - review requirements

https://gitlab.torproject.org/tpo/tpa/team/-/issues/34122 

Brief discussion about the security implications of enabling user
namespaces in a Debian server. By default this is disabled in Debian
because of concerns that the possible elevated privileges ("root"
inside a namespace) can be leveraged to get root *outside* of the
namespace. In the [Debian bug report discussing this][], [anarcat
asked][] why exactly this was still disabled and [Ben Hutchings][]
responded by giving a few examples of security issues that were
mitigated by this.

[Ben Hutchings]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898446#112
[anarcat asked]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898446#107
[Debian bug report discussing this]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898446

But because, in our use case, the alternative is to give root
directly, it seems that enabling user namespaces is a good
mitigation. Worst case our users get root access, but that's not worse
than giving them root directly. So we are go on granting user
namespace access.

The virtual machine will be created in the new Cymru cluster, assuming
disk performance is satisfactory.

# TPA-RFC-7: root access policy

https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-7-root

Anarcat presented the proposal draft as sent to the team on November
9th. A few questions remained in the draft:

 1. what is the process to allow/revoke access to the TPA team?
 2. is the new permissions (to grant limited `sudo` rights to some
    service admins) acceptable?

In other services, we use a vetting process: a sponsor that already
has access should file the ticket for the person, the person doesn't
request access. That is basically how it works for TPA as well. The
revocation procedure was not directly discussed and still needs to be
drafted.

It was noted that other teams have servers outside of TPA (karsten,
phw and cohosh for example) because of the current limitations, so
other people might use those accesses as well. It will be worth
talking with other stakeholders about this proposal to make sure it is
attuned to the other teams' requirements. Think about the issue with
Prometheus right now which is a good counter-example of when service
admins do *not* require root on the servers ([issue 40089][]).

[issue 40089]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40089

Another example is the `onionperf` servers that were setup elsewhere
because they needed custom `iptables` rules. this might not require
root but just `iptables` access, or at least special `iptables` rules
configured by TPA.

In general, the spirit of the proposal is to bring more flexibility
with what changes we allow on servers to the TPA team. We want to help
teams host their servers with us but that also comes with the
understanding that we need the capacity (in terms of staff and
hardware resources) to do so as well. This was agreed upon by the
people present in the mumble meeting, so anarcat will finish the draft
and propose it formally to the team later.

# Roadmap review

Did not have time to review [the team board][].

[the team board]: https://gitlab.torproject.org/tpo/tpa/team/-/boards

anarcat ranted about people not updating their ticket and was
(rightly) corrected that people *are* updating their tickets. So keep
up the good work!

We noted that the [top-level TPA board][] is not used for triage
because it picks up too many tickets, outside of the core TPA team,
that we cannot do anything about (e.g. the outreachy stuff in the
GitLab lobby).

[top-level TPA board]: https://gitlab.torproject.org/groups/tpo/tpa/-/boards

# Other discussions

## Should we rotate triage responsibility bi-weekly or monthly?

Will be discussed on IRC, email, or in a later meeting later, as we
ran out of time.

# Next meeting

We should resume our normal schedule of doing a meeting the first
Wednesday of the month, which brings us to December 2nd 2020, at
1500UTC, which is equivalent to: 07:00 US/Pacific, 10:00 US/Eastern,
16:00 Europe/Paris

# Metrics of the month

 * hosts in Puppet: 78, LDAP: 81, Prometheus exporters: 132
 * number of apache servers monitored: 28, hits per second: 199
 * number of nginx servers: 2, hits per second: 2, hit ratio: 0.87
 * number of self-hosted nameservers: 6, mail servers: 12
 * pending upgrades: 36, reboots: 0
 * average load: 0.64, memory available: 1.43 TiB/2.02 TiB, running processes: 480
 * bytes sent: 243.83 MB/s, received: 138.97 MB/s
 * planned buster upgrades completion date: 2020-09-16
 * [GitLab tickets][]: 126 issues including...
   * open: 1
   * icebox: 84
   * backlog: 32
   * next: 5
   * doing: 4
   * (closed: 2119)

Note that only two "stretch" machines remain and the "buster" upgrade
is considered mostly complete: those two machines are the SVN and Trac
servers which are both scheduled for retirement.

[GitLab tickets]: https://gitlab.torproject.org/tpo/tpa/team/-/boards

Upgrade prediction graph (which is becoming a "how many machines do we
have graph") still lives at
https://help.torproject.org/tsa/howto/upgrades/

Now also available as the main Grafana dashboard. Head to
<https://grafana.torproject.org/>, change the time period to 30 days,
and wait a while for results to render.

-- 
Antoine Beaupré
torproject.org system administration
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20201119/ddc3b606/attachment.sig>


More information about the tor-project mailing list