Thank you David.  As I mentioned in the other thread, I'm not 100% sure if Snowflake is the only service generating excessive DNS queries. But I'll keep an eye on it and let you guys know if there's something unusual.

I ran some Wireshark traces after compiling the proxy code on Windows last night.  I didn't see anything unusual. I'll try Linux later.


On Tue, Dec 27, 2022 at 8:34 AM David Fifield <david@bamsoftware.com> wrote:
On Mon, Dec 26, 2022 at 01:43:32PM -0500, Cecylia Bocovich wrote:
> On 12/26/22 00:55, John Selbie wrote:
> > While the Snowflake project has good intentions, it doesn't appear to
> > take my hosting costs into consideration. I'm hoping we can have a good
> > discussion on the following:
> >
> > 1) How many snowflake clients and proxies are active and how many STUN
> > requests are each generating towards stunprotocol.org
> > <http://stunprotocol.org>? Do we think the entire worldwide usage of
> > Snowflake could be responsible for millions of STUN queries to
> > stunprotocol.org <http://stunprotocol.org> per day?
> >
> > 2) Expected number of DNS queries (it's a 3-day TTL on these DNS
> > entries, so it blows my mind that there are so many redundant requests).
> > Does Pion or any other part of the Snowflake code tend to go direct to
> > the namespace server itself?
> >
> > 3) Removing stun.stunprotocol.org <http://stun.stunprotocol.org> as the
> > default STUN server.
> >
> > OR...
> >
> > 4) Alternatively, I'm always open to accepting donations to help run the
> > service costs of stunprotocol.org <http://stunprotocol.org>. I'm
> > definitely not getting rich running this thing.
>
> Thank you for reaching out. This is exactly the right place to discuss this.
> This was an oversight on my part not to reach out to you as the operator of
> our configured default STUN server and I'm very sorry for the unexpected
> increased costs. We can absolutely remove stunprotocol.org as the default.

Likewise, thanks for reaching out. Snowflake has grown faster then
expected, in part because of a few crisis situations when it was one of
few unblocked communications channels. Load on STUN servers was not much
of a consideration in the early days when the system was small, but
clearly it's something we need to think about now.

> 1) I would definitely believe the amount of snowflake traffic to
> stunprotocol.org to be this high. We have over 100,000 proxies. According to
> recent metrics[0], there are around 8 million matches a day and therefore
> that many WebRTC ICE gathering requests coming from just the proxies. The
> clients use a randomized subset of configured STUN servers, so the number is
> slightly different but it's safe to say they are also generating a few
> million STUN queries to your server.
>
> 2) I'm not sure about the DNS queries. It also surprises me that there are
> this many, I'll open an issue to investigate why.

My guess is that the proxies that fail to cache DNS are standalone
proxies (and to a lesser extent clients). The Go standard library may or
may not cache DNS, depending on how it was compiled and possibly even on
runtime conditions.

https://pkg.go.dev/net#hdr-Name_Resolution
        On Unix systems, the resolver has two options for resolving
        names. It can use a pure Go resolver that sends DNS requests
        directly to the servers listed in /etc/resolv.conf, or it can
        use a cgo-based resolver that calls C library routines such as
        getaddrinfo and getnameinfo. ... The resolver decision can be
        overridden by setting the netdns value of the GODEBUG
        environment variable to go or cgo ... or while building the Go
        source tree by setting the netgo or netcgo build tag.

When I run `GODEBUG=netdns=1 ./proxy`, it says
        go package net: dynamic selection of DNS resolver
Perhaps we should document `go build -tags netcgo` as the way to build
the proxy, in order to use the OS resolver which is more likely to cache
response records.

> Now for immediate next steps. I've sent an email to people internally to
> start the process of looking into sending some funds your way for the costs.
> We might not get an answer until after everyone is back at work in January.
> In the meantime:
>
> - I'd like to remove stunprotocol.org as the default STUN server for the
> proxies. The reason we added to our list in the first place was because it
> implements RFC 5780 which we are using on the client side to determine NAT
> matching and filtering types (thank you for this implementation BTW!). The
> proxies no longer use this however, so there's no reason we can't have them
> use any number of other public STUN servers.
>
> - We can remove stunprotocol.org from the list of *default* client STUN
> servers and reserve it for a small subset of users instead. This would
> really cut down on the traffic, but keep it open as an option for clients in
> places that have blocked other industry STUN servers.

Another thing to consider is reducing the default polling frequency of
standalone proxies and/or increasing the long-polling delay at the
broker.