[tor-bugs] #30880 [Internal Services/Tor Sysadmin Team]: document backup/restore procedures

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Jun 13 21:21:13 UTC 2019


#30880: document backup/restore procedures
-------------------------------------------------+-------------------------
     Reporter:  anarcat                          |      Owner:  anarcat
         Type:  task                             |     Status:  assigned
     Priority:  Medium                           |  Milestone:
    Component:  Internal Services/Tor Sysadmin   |    Version:
  Team                                           |
     Severity:  Normal                           |   Keywords:
Actual Points:                                   |  Parent ID:
       Points:                                   |   Reviewer:
      Sponsor:                                   |
-------------------------------------------------+-------------------------
 Backup system design and restore procedures are currently not well
 documented in our wiki. Try a few restores and document the heck out of
 this. The [http://opsreportcard.com/section/11 ops report card] recommends
 services be documented with a template like this:

  1. Overview: Overview of the service: what is it, why do we have it, who
 are the primary contacts, how to report bugs, links to design docs and
 other relevant information.
  2. Build: How to build the software that makes the service. Where to
 download it from, where the source code repository is, steps for building
 and making a package or other distribution mechanisms. If it is software
 that you modify in any way (open source project you contribute to or a
 local project) include instructions for how a new developer gets started.
 Ideally the end result is a package that can be copied to other machines
 for installation.
  3. Deploy: How to deploy the software. How to build a server from
 scratch: RAM/disk requirements, OS version and configuration, what
 packages to install, and so on. If this is automated with a configuration
 management tool like cfengine/puppet/chef (and it should be), then say so.
  4. Common Tasks: Step-by-step instructions for common things like
 provisioning (add/change/delete), common problems and their solutions, and
 so on.
  5. Pager Playbook: A list of every alert your monitoring system may
 generate for this service and a step-by-step "what do to when..." for each
 of them.
  6. DR: Disaster Recovery Plans and procedure. If a service machine died
 how would you fail-over to the hot/cold spare?
  7. SLA: Service Level Agreement. The (social or real) contract you make
 with your customers. Typically things like Uptime Goal (how many 9s), RPO
 (Recovery Point Objective) and RTO (Recovery Time Objective).

 While we don't use that template anywhere yet (and it somehow conflicts
 with the [https://www.divio.com/blog/documentation/ documentation best
 practices], we can probably find a middle ground of some sort...

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/30880>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list