[tor-bugs] #32239 [Internal Services/Tor Sysadmin Team]: setup a cache frontend for the blog

Tue Nov 5 21:15:20 UTC 2019

#32239: setup a cache frontend for the blog
-------------------------------------------------+-------------------------
 Reporter:  anarcat                              |          Owner:  anarcat
     Type:  task                                 |         Status:
                                                 |  accepted
 Priority:  Medium                               |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:                                       |  Actual Points:
Parent ID:  #32090                               |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Old description:

> design docs in https://help.torproject.org/tsa/howto/cache/
>
> launch checklist:
>
>  1. alternatives listing and comparison (done)
>  2. deploy a test virtual machine by hand, say `cache-01.tpo` (done)
>  3. benchmark the different alternatives (done, ATS and nginx comparable)
>  4. setup secondary node with Puppet, say `cache-02.tpo` (done)
>  4. validation benchmark against both nodes (done)
>  5. lower DNS to 10 minutes wait an hour (done)
>  6. open firewall (done)
>  7. lower DNS to 3 minutes (done, around 2019-11-05 16:00:00)
>  8. point DNS to caches
>  11. raise DNS back to 1h if all goes well.
>
> Disaster recovery:
>
>  1. flip DNS back to backend

New description:

 design docs in https://help.torproject.org/tsa/howto/cache/

 launch checklist:

  1. alternatives listing and comparison (done)
  2. deploy a test virtual machine by hand, say `cache-01.tpo` (done)
  3. benchmark the different alternatives (done, ATS and nginx comparable)
  4. setup secondary node with Puppet, say `cache-02.tpo` (done)
  4. validation benchmark against both nodes (done)
  5. lower DNS to 10 minutes wait an hour (done)
  6. open firewall (done)
  7. lower DNS to 3 minutes (done, around 2019-11-05 16:00:00)
  8. point DNS to caches (done)
  11. raise DNS back to 1h if all goes well.

 Disaster recovery:

  1. flip DNS back to backend

--

Comment (by anarcat):

 i've reverted to the originally planned procedure where we just flip the
 switch because it's a simpler procedure. i've also setup a `cache.tpo`
 alias to point to the cluster of machines so we can get other sites in and
 out of rotation with a single CNAME, instead of having to maintain
 possibly multiple entries under multiple entries.

 traffic now seems to be flowing into the nodes without noticeable
 problems. load is negligible:

 {{{
 Load average: 0.03 0.02 0.00
 }}}

 we have space for 12GB of cache on cache-02:

 {{{
 anarcat at cache-02:~$ df -h /var/cache/nginx/
 Filesystem         Size  Used Avail Use% Mounted on
 /dev/mapper/croot   19G  3.0G   15G  18% /
 }}}

 and ~7GB on cache01:

 {{{
 root at cache01:~# df -h /var/cache/nginx/
 Sys. de fichiers Taille Utilisé Dispo Uti% Monté sur
 /dev/sda1          9,8G    1,9G  7,4G  21% /
 }}}

 that's probably what we should pay closest attention to, actually, since
 it's not clear nginx will do the right thing with out of disk space
 conditions.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32239#comment:13>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online