[tor-bugs] #32239 [Internal Services/Tor Sysadmin Team]: setup a cache frontend for the blog

Thu Oct 31 21:26:03 UTC 2019

#32239: setup a cache frontend for the blog
-------------------------------------------------+-------------------------
 Reporter:  anarcat                              |          Owner:  anarcat
     Type:  task                                 |         Status:
                                                 |  accepted
 Priority:  Medium                               |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:                                       |  Actual Points:
Parent ID:  #32090                               |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Old description:

> design docs in https://help.torproject.org/tsa/howto/cache/
>
> launch checklist:
>
>  1. alternatives listing and comparison (done)
>  2. deploy a test virtual machine by hand, say `cache-01.tpo` (done)
>  3. benchmark the different alternatives (done, ATS and nginx comparable)
>  4. setup secondary node with Puppet, say `cache-02.tpo` (done)
>  4. validation benchmark against both nodes (partial)
>  5. lower DNS to 300 seconds, wait an hour (set TTL to 10min, waiting)
>  6. flip DNS to the cache node, wait and monitor for 5 minutes
>  7. raise DNS back to 1h if all goes well.
>
> Disaster recovery:
>
>  1. flip DNS back to pantheon

New description:

 design docs in https://help.torproject.org/tsa/howto/cache/

 launch checklist:

  1. alternatives listing and comparison (done)
  2. deploy a test virtual machine by hand, say `cache-01.tpo` (done)
  3. benchmark the different alternatives (done, ATS and nginx comparable)
  4. setup secondary node with Puppet, say `cache-02.tpo` (done)
  4. validation benchmark against both nodes (done)
  5. lower DNS to 10 minutes wait an hour (done)
  6. lower DNS to 3 minutes
  7. *add* one node to the DNS, check if traffic flows properly after 10
 minutes
  8. add the other node to DNS, again checking traffic
  9. if all is well, remove backend from DNS
  10. raise DNS back to 1h if all goes well.

 Disaster recovery:

  1. flip DNS back to backend

--

Comment (by anarcat):

 the original node is now setup with puppet as well. ran into a problem
 when trying to figure out hit ratios: those stats are available only in
 the commercial version.

 we might need to pipe stuff through mtail to get those metrics in
 prometheus. in the meantime, maybe we can still launch without those? :/

 the TTL is still low, and i am thinking of launching tomorrow if nothing
 else comes up. i've changed the procedure slightly to *add* the caching
 servers in the pool instead of replacing the backend completely. that way
 we have a smoother transition and can fall back more easily if something
 goes wrong.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32239#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online