[tor-bugs] #21304 [Obfuscation/Snowflake]: Sanitize snowflake.log

Fri Mar 22 00:26:10 UTC 2019

#21304: Sanitize snowflake.log
-----------------------------------+--------------------------------
 Reporter:  arlolra                |          Owner:  cohosh
     Type:  defect                 |         Status:  needs_revision
 Priority:  Medium                 |      Milestone:
Component:  Obfuscation/Snowflake  |        Version:
 Severity:  Normal                 |     Resolution:
 Keywords:  starter                |  Actual Points:
Parent ID:                         |         Points:  1
 Reviewer:                         |        Sponsor:
-----------------------------------+--------------------------------
Changes (by dcf):

 * status:  needs_review => needs_revision
 * cc: dcf (added)

Comment:

 Thanks, this is looking pretty good.

 We'll need the same functionality across multiple programs: broker,
 client, proxy-go, server. I think you've what you've done so far—putting
 the tests in server and copying the code where needed—is good for
 prototyping. Let me suggest a further simplification: ''only'' worry about
 the server log for now, and then we can perhaps factor the safe-logging
 code into a separate package for use by the other programs.

 * [https://github.com/cohosh/snowflake/compare/ticket21304#diff-
 91bbeda7eb98a7adc57b9e47e2cf5c2bL283 In server],
   logs are scrubbed whether the `--log` option is used or not.
   [https://github.com/cohosh/snowflake/compare/ticket21304#diff-
 ba4bd8a4477426567c409d66c2cf8a28L383 In proxy-go] though,
   logs are only scrubbed when `--log` is used, otherwise they are left
 unscrubbed.
   Is there a reason for the difference?
   I think it's best to default to scrubbed in all cases.
   Maybe even [https://github.com/cohosh/snowflake/compare/ticket21304
 #diff-0efd2b39fd0c8010b9dd51b4f07edf1bL94 in client],
   we should run logs through the scrubber before sending them to
 `ioutil.Discard`,
   both to remind us to use the scrubber after we can rely on #26360 being
 fixed and re-enable non-file logging,
   and to eliminate one difference in the logging code paths
   (imagine a bug in the log scrubber that only manifests when logging to a
 file).
   Maybe use logic like this:
   {{{
 var logOutput io.Writer = os.Stderr
 if logFilename != "" {
         f, err := os.OpenFile(logFilename,
 os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0600)
         if err != nil {
                 log.Fatalf("can't open log file: %s", err)
         }
         defer f.Close()
         logOutput = f
 }
 //We want to send the log output through our scrubber first
 log.SetOutput(&logScrubber{logOutput})
   }}}
   In client, the initial value for `logOutput` can be `ioutil.Discard`
 rather than `os.Stderr`.
 * Speaking of which, the default log output in the absence of a `--log`
 option should be
   `os.Stderr`, not `os.Stdout`.
   [https://golang.org/pkg/log/#pkg-overview Ref]: "That logger writes to
 standard error..."
   There's now a [https://golang.org/pkg/log/#Logger.Writer Logger.Writer]
 method that would allow
   us to ask for the default output, but as it was only
 [https://github.com/golang/go/pull/28399 added in 1.12],
   it's probably still too new to use.
 * The `logScrubber.Write` method only looks at one buffer's worth of data
 at a time,
   so it fails when an IP address is split across buffers. For example,
 this test fails:
   {{{
 type byteAtATimeWriter struct {
         io.Writer
 }
 func (w *byteAtATimeWriter) Write(p []byte) (int, error) {
         total := 0
         for i := range p {
                 n, err := w.Writer.Write(p[i : i+1])
                 total += n
                 if err != nil {
                         return total, err
                 }
         }
         return total, nil
 }
 func TestLogScrubberSplit(t *testing.T) {
         log.SetFlags(0)
         var buff bytes.Buffer
         log.SetOutput(&byteAtATimeWriter{&logScrubber{&buff}})
         log.Print("test 1.2.3.4 test")
         if !bytes.Equal(buff.Bytes(), []byte("test X.X.X.X test\n")) {
                 t.Errorf("%q", buff.Bytes())
         }
 }
   }}}
   See below re `RedactErrorWriter` for an idea on how to deal with this.
 * You can use `!bytes.Equal(a, b)` instead of `bytes.Compare(a, b) != 0`.
 Try this:
   {{{
 gofmt -l -w -r 'bytes.Compare(a, b) != 0 -> !bytes.Equal(a, b)'
 server_test.go
   }}}
   Or, use `buff.String` and use string equality.
 * Should we scrub port numbers, too? A port number could be identifying
 within a short time span.
   See below for more ideas about this.
 * Consider structuring repetitive tests like this:
   {{{
 for _, test := range []struct {
         input, expected string
 }{
         {
                 "http: TLS handshake error from 129.97.208.23:38310:",
                 "http: TLS handshake error from X.X.X.X:38310:\n",
         },
         {
                 "http2: panic serving
 [2620:101:f000:780:9097:75b1:519f:dbb8]:58344: interface conversion:
 *http2.responseWriter is not http.Hijacker: missing method Hijack",
                 "http2: panic serving [X:X:X:X:X:X:X:X]:58344: interface
 conversion: *http2.responseWriter is not http.Hijacker: missing method
 Hijack\n",
         },
         {
                 //Make sure it doesn't scrub fingerprint
                 "a=fingerprint:sha-256
 33:B6:FA:F6:94:CA:74:61:45:4A:D2:1F:2C:2F:75:8A:D9:EB:23:34:B2:30:E9:1B:2A:A6:A9:E0:44:72:CC:74",
                 "a=fingerprint:sha-256
 33:B6:FA:F6:94:CA:74:61:45:4A:D2:1F:2C:2F:75:8A:D9:EB:23:34:B2:30:E9:1B:2A:A6:A9:E0:44:72:CC:74\n",
         },
 } {
         var buff bytes.Buffer
         log.SetOutput(&logScrubber{&buff})
         log.Print(test.input)
         if buff.String() != test.expected {
                 t.Errorf("%q: got %q, expected %q", test.input,
 buff.String(), test.expected)
         }
 }
   }}}
   (Not appropriate for ''every'' test, because sometimes you want to test
 more than one line.)
 * Instead of `X.X.X.X`,
 [https://gitweb.torproject.org/tor.git/tree/src/app/config/config.c?h=tor-0.3.5.8#n1102
 tor uses] `[scrubbed]`, as do
   [https://gitweb.torproject.org/pluggable-transports/meek.git/tree/meek-
 server/meek-server.go?h=0.31#n181 meek-server]
   and
 [https://gitlab.com/yawning/obfs4/blob/obfs4proxy-0.0.9/common/log/log.go#L42
 obfs4proxy].

 ----

 I found an old half-completed branch I had for this ticket, and published
 it just now:
   https://gitweb.torproject.org/user/dcf/snowflake.git/log/?h=redact-error

 I reached the same conclusion you did: that, because of
 [[comment:10|net/http default error logging]], string-based grepping for
 things that look like IP addresses is unavoidable.

 * I started with a
 [https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=redact-
 error&id=98d3dee3426ee75cebcf265c5e31493e66dccfa5 redactError] function
   that introspects a [https://golang.org/pkg/net/#OpError net.OpError]
   and replaces the address with a scrubbed address.
   Then, you have to remember to
 [https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=redact-
 error&id=8f30ef058199c119903cf85720a77d2d2a275e27 call redactError] on all
 `error`s before logging them.
   There are similar functions in [https://gitweb.torproject.org/pluggable-
 transports/meek.git/tree/meek-server/meek-server.go?h=0.31#n188 meek-
 server]
   and
 [https://gitlab.com/yawning/obfs4/blob/obfs4proxy-0.0.9/common/log/log.go#L151
 obfs4proxy].
 * `redactError` on its own is insufficient, because of net/http default
 logging.
   Also, I worry that we'll get an `error` that contains an IP address but
 is not a `net.OpError`.
 * So I added a
 [https://gitweb.torproject.org/user/dcf/snowflake.git/commit/?h=redact-
 error&id=4252b9b8412c372098cbb925a92182fdd80f1ef5 RedactErrorWriter] type
   that is the same idea as `logScrubber`: it hijacks the `Write` method
 and uses regexp to search for addresses.
   The principal difference is that it buffers full lines to solve the
 split-write issue mentioned above.

 Is it worth doing both, structure-based object manipulation and string-
 based patterns? Perhaps not, since the correct functioning of one can mask
 errors in the other. You could think of it as defense in depth, but it
 will be hard to consistently use a `redactError` function everywhere, when
 forgetting it doesn't have any observable consequences. Surely having both
 is safer, but the marginal safety increase of `redactError` may not be
 worth the (albeit small) code complexity increase.

 One other difference is this massive, yet conservative,
 [https://gitweb.torproject.org/user/dcf/snowflake.git/tree/server/error.go?h
 =redact-error#n58 IP address–detecting regexp]. I admit its appearance is
 comical, but it will detect IPv6 addresses not surrounded by brackets, and
 avoid false-positive matches like what you mentioned in comment:11. Also,
 it scrubs port numbers. I would say, test
 [https://github.com/cohosh/snowflake/blob/4b0acda984a5ae4335d206fc534f51efb9303d5d/server/server_test.go#L53
 your] and
 [https://gitweb.torproject.org/user/dcf/snowflake.git/tree/server/error_test.go?h
 =redact-error#n148 my] test cases against the pattern you already have;
 and if there are any misdetections, a more precise regexp may be in order.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21304#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online