[tor-bugs] #29734 [Obfuscation/Snowflake]: Broker should receive country stats information from Proxy and Client

Tor Bug Tracker & Wiki blackhole at torproject.org
Mon Apr 29 21:13:14 UTC 2019


#29734: Broker should receive country stats information from Proxy and Client
-------------------------------------+--------------------------------
 Reporter:  cohosh                   |          Owner:  cohosh
     Type:  enhancement              |         Status:  needs_revision
 Priority:  Medium                   |      Milestone:
Component:  Obfuscation/Snowflake    |        Version:
 Severity:  Normal                   |     Resolution:
 Keywords:  snowflake, geoip, stats  |  Actual Points:  2
Parent ID:  #29207                   |         Points:  1
 Reviewer:  ahf                      |        Sponsor:  Sponsor19
-------------------------------------+--------------------------------

Comment (by cohosh):

 Replying to [comment:24 dcf]:
 > I have just a few further changes to recommend.
 Updated branch: https://github.com/cohosh/snowflake/compare/geoip
 >
 > *
 https://github.com/cohosh/snowflake/blob/fb01c2d1c9d402cc4d96f01df2e6fb6cb37bc9a6/broker/broker.go#L221
 >   {{{
 > strings.Split(r.RemoteAddr, ":")[0]
 >   }}}
 >   This will fail for IPv6 addresses. Better to use net.SplitHostPort and
 check the error return. The host–port splitting could also happen inside
 of `Metrics.UpdateCountryStats`.
 Ack >.< thanks. Done!
 > *
 https://github.com/cohosh/snowflake/blob/fb01c2d1c9d402cc4d96f01df2e6fb6cb37bc9a6/broker/geoip.go#L196
 >   {{{
 > _, err = io.WriteString(hash, scanner.Text())
 >   }}}
 >   A better way to do this may be to do `hashedFile :=
 io.TeeReader(geoipFile, hash)` and then `scanner :=
 bufio.NewScanner(hashedFile)` so that the hash is calculated as a side
 effect of reading the file. There's no need to handle errors when writing
 to a `hash.Hash` because it is documented to never error.
 Done
 > *
 https://github.com/cohosh/snowflake/blob/fb01c2d1c9d402cc4d96f01df2e6fb6cb37bc9a6/broker/geoip.go#L14
 >   {{{
 > Recognized line formats for IPv4 are:
 >     INTIPLOW,INTIPHIGH,CC
 >         and
 >     "INTIPLOW","INTIPHIGH","CC","CC3","COUNTRY NAME"
 >   }}}
 >   It looks like the code doesn't recognize the 5-element syntax, so it
 should be omitted here, or, if it's a common format, documented as not
 being supported.
 Documented as unsupported so we know what our line of thought was in case
 we want to support it later
 >
 > Okay. Let's replace the panic with a `log.Fatal` so the failure gets
 logged.
 >
 Done
 > > > * The `parseEntry` functions need error checking on
 `geoipStringToIP` and `net.ParseIP`, or else they will store a `nil` that
 will result in a panic later on.
 [https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L93
 This error] will be more useful if it includes a line number. Let me
 suggest a different factoring of the geoip parsing code. Have `parseEntry`
 be a plain function, not a method on `GeoIPv4Table`, and have it return
 `(GeoIPEntry, error)` rather than inserting into the table itself.
 GeoIPLoadFile can do the entry insertion, count lines and annotate the
 error result from `parseEntry`.
 > > The difficulty in refactoring it this way is that ipv4 and ipv6 tables
 have differently formated entries that need to be parse differently.
 > > I added error checking to parseEntry and have it return `(GeoIPEntry,
 error)` as suggested, but leave it as a method on GeoIPv4Table and
 GeoIPv6Table.
 >
 > I didn't explain this well. I meant that there should be separate
 functions for the two formats e.g. `parseEntryIPv4` and `parseEntryIPv6`.
 The existing `parseEntry` methods never refer to `table`, so they don't
 need to be methods. But leaving them as methods is fine too.
 Okay, I'm going to leave them as methods to avoid having two different
 LoadFile functions for the different types of table.
 >
 > IMO annotating the error message with the problematic line should be
 done in `GeoIPLoadFile`, not in `parseEntry`. This will eliminate the
 duplication of the common `"Provided geoip file is incorrectly formatted"`
 string and ensure that all the error paths get annotated. What I mean is,
 in the `scanner.Scan()` loop, you can replace `return err` with `return
 fmt.Errorf("Provided geoip file is incorrectly formatted: %s. Line is:
 %+q")`.
 Done, is there a better way of handling the error stubs in `parseEntry`
 other than returning `fmt.Errorf("")`?

 Replying to [comment:25 dcf]:
 > Oh and it looks like country stats don't get incremented whenever
 `GetCountryByAddr` doesn't find a match. I'm afraid this will make
 analysis difficult if a large fraction of proxies aren't covered by the
 geoip database, or the database is improperly installed, or something.
 Could we count them with a country code of `"??"` or something?
 >
 >
 https://github.com/cohosh/snowflake/blob/fb01c2d1c9d402cc4d96f01df2e6fb6cb37bc9a6/broker/geoip.go#L213
 > `GetCountryByAddr` returns `(string, error)`, but failure to find an
 entry in a database is not really an "error". It makes it seem like there
 are three return stats possible (found, not found, error) when really
 there are only two (found, not found). How about changing the signature to
 > {{{
 > func GetCountryByAddr(table GeoIPTable, ip net.IP) (string, bool)
 > }}}
 > where the second return value is an `ok` flag, like when indexing a map
 or reading from a channel. The caller can then do the `"??"` substitution
 whenever `!ok`.

 Okay, so I've changed the GetCountryByAddr signature to what you've
 suggested and then just removed the error return value from
 UpdateCountryStats. I've also added setting the country to `"??"` in
 UpdateCountryStats.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29734#comment:27>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list