debugging unbound on 'torexit' failing DNS queries

newer
>30% of the Tor network runs...

older
Re: [tor-relays] A lot of Warn...

nusenu

18 Jan 2018 18 Jan '18

8:41 a.m.

<tor-admin@portaltodark.world> wrote:

...

Resent under the correct alias.

I'm having high amounts of failures on this VPS (PulseServers). I run a local unbound instance, and see an incredible amount of: Jan 17 19:27:33 torexit unbound: [559:0] notice: sendto failed: Operation not permitted Jan 17 19:27:33 torexit unbound: [559:0] notice: remote address is 198.97.190.53 port 53 Jan 17 19:27:33 torexit unbound: [559:0] notice: sendto failed: Operation not permitted Jan 17 19:27:33 torexit unbound: [559:0] notice: remote address is 192.42.93.30 port 53 Jan 17 19:27:33 torexit unbound: [559:0] notice: sendto failed: Operation not permitted Jan 17 19:27:33 torexit unbound: [559:0] notice: remote address is 192.35.51.30 port 53 Jan 17 19:27:33 torexit unbound: [559:0] notice: sendto failed: Operation not permitted

To give proportion to "incredible amount", Jan 17 19:21:32 torexit rsyslogd: imjournal: 9897 messages lost due to rate-limiting Jan 17 19:22:02 torexit journal: Suppressed 1216 messages from /system.slice/unbound.service Jan 17 19:22:32 torexit journal: Suppressed 1209 messages from /system.slice/unbound.service Jan 17 19:23:02 torexit journal: Suppressed 1827 messages from /system.slice/unbound.service Jan 17 19:23:32 torexit journal: Suppressed 2333 messages from /system.slice/unbound.service Jan 17 19:24:02 torexit journal: Suppressed 3029 messages from /system.slice/unbound.service Jan 17 19:24:32 torexit journal: Suppressed 2822 messages from /system.slice/unbound.service Jan 17 19:25:02 torexit journal: Suppressed 2715 messages from /system.slice/unbound.service Jan 17 19:25:32 torexit journal: Suppressed 3166 messages from /system.slice/unbound.service Jan 17 19:26:02 torexit journal: Suppressed 4093 messages from /system.slice/unbound.service Jan 17 19:26:32 torexit journal: Suppressed 45878 messages from /system.slice/unbound.service Jan 17 19:27:02 torexit journal: Suppressed 30125 messages from /system.slice/unbound.service Jan 17 19:27:32 torexit journal: Suppressed 31764 messages from /system.slice/unbound.service Jan 17 19:28:02 torexit journal: Suppressed 31229 messages from /system.slice/unbound.service

Could it be limits from the VPS provider on the amount of outbound udp/53 connections?

To me this looks more like a local problem? Are you doing any packet filtering on the host (outbound)? Does DNS work on that host if you try manual queries? From the IPs in your logs I assume your unbound is configured to query recursively itself (no upstream forwarding) that is good, can you confirm that and provide your unbound config + iptalbes -vnL? -- https://mastodon.social/@nusenu twitter: @nusenu_

Attachments:

signature.asc (application/pgp-signature — 833 bytes)

Show replies by date

Quintin

18 Jan 18 Jan

5:06 p.m.

No outbound filters, this is my config: **filter* *:INPUT ACCEPT [0:0]* *:FORWARD ACCEPT [0:0]* *:OUTPUT ACCEPT [0:0]* *-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT* *-A INPUT -p icmp -j ACCEPT* *-A INPUT -i lo -j ACCEPT* *-A INPUT -p tcp -m comment --comment "SSH" -s x.x.x.x -m state --state NEW -m tcp --dport 22 -j ACCEPT* *-A INPUT -p tcp -m comment --comment "Tor" -m state --state NEW -m tcp --dport 80 -j ACCEPT* *-A INPUT -p tcp -m comment --comment "Tor" -m state --state NEW -m tcp --dport 443 -j ACCEPT* *-A INPUT -j REJECT --reject-with icmp-host-prohibited* *-A FORWARD -j REJECT --reject-with icmp-host-prohibited* *COMMIT* If I stop tor then "dig @127.0.0.1 google.com" works 100%. It's seems like the pattern is that when tor traffic builds up so do DNS failures. And then my dig @127.0.0.1 only succeeds about 0.1% of the time. At this stage large amounts these errors start appearing: *> Jan 17 19:27:33 torexit unbound: [559:0] notice: remote address is 192.42.93.30 port 53> Jan 17 19:27:33 torexit unbound: [559:0] notice: sendto failed: Operation not permitted* Quintin On Thu, Jan 18, 2018 at 12:42 PM nusenu <nusenu-lists@riseup.net> wrote:

...

<tor-admin@portaltodark.world> wrote:

...
Resent under the correct alias.

I'm having high amounts of failures on this VPS (PulseServers). I run a local unbound instance, and see an incredible amount of: Jan 17 19:27:33 torexit unbound: [559:0] notice: sendto failed: Operation not permitted Jan 17 19:27:33 torexit unbound: [559:0] notice: remote address is 198.97.190.53 port 53 Jan 17 19:27:33 torexit unbound: [559:0] notice: sendto failed: Operation not permitted Jan 17 19:27:33 torexit unbound: [559:0] notice: remote address is 192.42.93.30 port 53 Jan 17 19:27:33 torexit unbound: [559:0] notice: sendto failed: Operation not permitted Jan 17 19:27:33 torexit unbound: [559:0] notice: remote address is 192.35.51.30 port 53 Jan 17 19:27:33 torexit unbound: [559:0] notice: sendto failed: Operation not permitted

To give proportion to "incredible amount", Jan 17 19:21:32 torexit rsyslogd: imjournal: 9897 messages lost due to rate-limiting Jan 17 19:22:02 torexit journal: Suppressed 1216 messages from /system.slice/unbound.service Jan 17 19:22:32 torexit journal: Suppressed 1209 messages from /system.slice/unbound.service Jan 17 19:23:02 torexit journal: Suppressed 1827 messages from /system.slice/unbound.service Jan 17 19:23:32 torexit journal: Suppressed 2333 messages from /system.slice/unbound.service Jan 17 19:24:02 torexit journal: Suppressed 3029 messages from /system.slice/unbound.service Jan 17 19:24:32 torexit journal: Suppressed 2822 messages from /system.slice/unbound.service Jan 17 19:25:02 torexit journal: Suppressed 2715 messages from /system.slice/unbound.service Jan 17 19:25:32 torexit journal: Suppressed 3166 messages from /system.slice/unbound.service Jan 17 19:26:02 torexit journal: Suppressed 4093 messages from /system.slice/unbound.service Jan 17 19:26:32 torexit journal: Suppressed 45878 messages from /system.slice/unbound.service Jan 17 19:27:02 torexit journal: Suppressed 30125 messages from /system.slice/unbound.service Jan 17 19:27:32 torexit journal: Suppressed 31764 messages from /system.slice/unbound.service Jan 17 19:28:02 torexit journal: Suppressed 31229 messages from /system.slice/unbound.service

Could it be limits from the VPS provider on the amount of outbound udp/53 connections?

To me this looks more like a local problem? Are you doing any packet filtering on the host (outbound)?

Does DNS work on that host if you try manual queries?

From the IPs in your logs I assume your unbound is configured to query recursively itself (no upstream forwarding) that is good, can you confirm that and provide your unbound config + iptalbes -vnL?

-- https://mastodon.social/@nusenu twitter: @nusenu_

-- 0101100101000001010010000101011101000101010010000010000001000010 0100110001000101010100110101001100100000010110010100111101010101

teor

5:17 p.m.

...

On 19 Jan 2018, at 06:06, Quintin <tor-admin@portaltodark.world> wrote:

No outbound filters, this is my config:

*filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -p tcp -m comment --comment "SSH" -s x.x.x.x -m state --state NEW -m tcp --dport 22 -j ACCEPT -A INPUT -p tcp -m comment --comment "Tor" -m state --state NEW -m tcp --dport 80 -j ACCEPT -A INPUT -p tcp -m comment --comment "Tor" -m state --state NEW -m tcp --dport 443 -j ACCEPT -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -j REJECT --reject-with icmp-host-prohibited COMMIT

If I stop tor then "dig @127.0.0.1 google.com" works 100%. It's seems like the pattern is that when tor traffic builds up so do DNS failures. And then my dig @127.0.0.1 only succeeds about 0.1% of the time. At this stage large amounts these errors start appearing:

...
Jan 17 19:27:33 torexit unbound: [559:0] notice: remote address is 192.42.93.30 port 53 Jan 17 19:27:33 torexit unbound: [559:0] notice: sendto failed: Operation not permitted

Try setting RelayBandwidthRate to 95% of your link capacity. Then wait a few hours. If you are still having issues: * check if you have a lot of inbound connections from a small number of IPs, * read recent threads for firewall rules to limit inbound connection load. T

nusenu

5:44 p.m.

Quintin:

...

No outbound filters, this is my config:

If I stop tor then "dig @127.0.0.1 google.com" works 100%. It's seems like the pattern is that when tor traffic builds up so do DNS failures. And then my dig @127.0.0.1 only succeeds about 0.1% of the time. At this stage large amounts these errors start appearing:

Do you reach your server's conntrack limit? https://stackoverflow.com/questions/6240951/sendto-operation-not-permitted-n... (you didn't say anything about unbound's config) -- https://mastodon.social/@nusenu twitter: @nusenu_

Quintin

6:15 p.m.

...

Do you reach your server's conntrack limit?

The word conntrack never appears in my logs, so I don't think it's that. The ISP also requires this from tor exits: net.netfilter.nf_conntrack_max = 10000

...

Try setting RelayBandwidthRate to 95% of your link capacity.

Why 95%? Are you thinking to give it more bandwidth?

...

From the IPs in your logs I assume your unbound is configured to query recursively itself (no upstream forwarding) that is good, can you confirm that and provide your unbound config + iptalbes -vnL?

Correct, unbound is recursive. Here's the config: server: verbosity: 1 statistics-interval: 0 statistics-cumulative: no extended-statistics: no num-threads: 2 interface-automatic: no do-ip6: no chroot: "" username: "unbound" directory: "/etc/unbound" log-time-ascii: yes pidfile: "/var/run/unbound/unbound.pid" harden-glue: yes harden-dnssec-stripped: yes harden-below-nxdomain: yes harden-referral-path: yes use-caps-for-id: no unwanted-reply-threshold: 10000000 prefetch: yes prefetch-key: yes rrset-roundrobin: yes minimal-responses: yes module-config: "validator iterator" trusted-keys-file: /etc/unbound/keys.d/*.key auto-trust-anchor-file: "/var/lib/unbound/root.key" val-clean-additional: yes val-permissive-mode: no val-log-level: 1 include: /etc/unbound/local.d/*.conf remote-control: control-enable: no server-key-file: "/etc/unbound/unbound_server.key" server-cert-file: "/etc/unbound/unbound_server.pem" control-key-file: "/etc/unbound/unbound_control.key" control-cert-file: "/etc/unbound/unbound_control.pem" include: /etc/unbound/conf.d/*.conf Quintin -- 0101100101000001010010000101011101000101010010000010000001000010 0100110001000101010100110101001100100000010110010100111101010101

nusenu

6:45 p.m.

Quintin:

...

...
Do you reach your server's conntrack limit?

The word conntrack never appears in my logs, so I don't think it's that. The ISP also requires this from tor exits: net.netfilter.nf_conntrack_max = 10000

How many conntrack entries do you actually have when you get sendto failed: Operation not permitted log entries? sysctl net.netfilter.nf_conntrack_count or cat /proc/sys/net/netfilter/nf_conntrack_count Regardless of whether this is the root-cause or not, nf_conntrack_max = 10k is probably to low for an exit relay. If nf_conntrack_count is near nf_conntrack_max, does the problem go away when you temporarily increase nf_conntrack_max? -- https://mastodon.social/@nusenu twitter: @nusenu_

Quintin

20 Jan 20 Jan

6 p.m.

Ah, thats it. My conntrack entries are full and temporarily increasing it resolves the problem. What would be a reasonable conntrack limit for a tor exit? On Thu, Jan 18, 2018 at 10:45 PM nusenu <nusenu-lists@riseup.net> wrote:

...

Quintin:

...
...
Do you reach your server's conntrack limit?

The word conntrack never appears in my logs, so I don't think it's that. The ISP also requires this from tor exits: net.netfilter.nf_conntrack_max = 10000

How many conntrack entries do you actually have when you get sendto failed: Operation not permitted log entries?

sysctl net.netfilter.nf_conntrack_count or cat /proc/sys/net/netfilter/nf_conntrack_count

Regardless of whether this is the root-cause or not, nf_conntrack_max = 10k is probably to low for an exit relay.

If nf_conntrack_count is near nf_conntrack_max, does the problem go away when you temporarily increase nf_conntrack_max?

-- https://mastodon.social/@nusenu twitter: @nusenu_

-- 0101100101000001010010000101011101000101010010000010000001000010 0100110001000101010100110101001100100000010110010100111101010101

nusenu

21 Jan 21 Jan

8:06 p.m.

New subject: debugging unbound on 'torexit' failing DNS queries (solved)

Quintin:

...

Ah, thats it. My conntrack entries are full and temporarily increasing it resolves the problem.

I'm glad we found the problem and the solution. Your exit appears to be offline since 2018-01-20 20:00, expected downtime? https://atlas.torproject.org/#details/92E3764D5485DC4AC01178271FB5A8A2D90DA9...

...

What would be a reasonable conntrack limit for a tor exit?

The amount of states depend on your consensus weight (and probably exit policy), do you require a stateful packet filter? -- https://mastodon.social/@nusenu twitter: @nusenu_

eric gisse

10:54 p.m.

New subject: debugging unbound on 'torexit' failing DNS queries (solved)

I can kinda answer that. I run an exit node that happily does 200-250mbit/s according to netdata accounting and my monitoring regularly pegs it at nearly 200k connections. Usually 100-150k. On Sun, Jan 21, 2018 at 4:06 PM, nusenu <nusenu-lists@riseup.net> wrote:

...

Quintin:

...
Ah, thats it. My conntrack entries are full and temporarily increasing it resolves the problem.

I'm glad we found the problem and the solution.

Your exit appears to be offline since 2018-01-20 20:00, expected downtime? https://atlas.torproject.org/#details/92E3764D5485DC4AC01178271FB5A8A2D90DA9...

...
What would be a reasonable conntrack limit for a tor exit?

The amount of states depend on your consensus weight (and probably exit policy), do you require a stateful packet filter?

-- https://mastodon.social/@nusenu twitter: @nusenu_

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

Quintin

24 Jan 24 Jan

5:07 p.m.

New subject: debugging unbound on 'torexit' failing DNS queries (solved)

Seems my VPS got suspended when I increased the connlimit above 10000. Do you think my INPUT filters which use conntrack could have caused this issue? On Mon, Jan 22, 2018 at 10:55 AM eric gisse <jowr.pi@gmail.com> wrote:

...

I can kinda answer that.

I run an exit node that happily does 200-250mbit/s according to netdata accounting and my monitoring regularly pegs it at nearly 200k connections. Usually 100-150k.

On Sun, Jan 21, 2018 at 4:06 PM, nusenu <nusenu-lists@riseup.net> wrote:

...
Quintin:

...
Ah, thats it. My conntrack entries are full and temporarily increasing

it

...
...
resolves the problem.

I'm glad we found the problem and the solution.

Your exit appears to be offline since 2018-01-20 20:00, expected downtime?

https://atlas.torproject.org/#details/92E3764D5485DC4AC01178271FB5A8A2D90DA9...

...
...
What would be a reasonable conntrack limit for a tor exit?

The amount of states depend on your consensus weight (and probably exit

policy),

...
do you require a stateful packet filter?

-- https://mastodon.social/@nusenu twitter: @nusenu_

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

-- 0101100101000001010010000101011101000101010010000010000001000010 0100110001000101010100110101001100100000010110010100111101010101

nusenu

5:14 p.m.

New subject: debugging unbound on 'torexit' failing DNS queries (solved)

Quintin:

...

Seems my VPS got suspended when I increased the connlimit above 10000. Do you think my INPUT filters which use conntrack could have caused this issue?

You did confirm that already, no? -- https://mastodon.social/@nusenu twitter: @nusenu_

Quintin

26 Jan 26 Jan

3:37 a.m.

New subject: debugging unbound on 'torexit' failing DNS queries (solved)

Hi nusenu, Server has now been unsuspended, and is back online. You asked "do you require a stateful packet filter?". Do you mean to disable conntrack? I have removed all my connection tracking iptables entries. My iptables looks like this now. Will keep an eye on it now. **filter* *:INPUT ACCEPT [0:0]* *:FORWARD ACCEPT [0:0]* *:OUTPUT ACCEPT [6716:3141641]* *-A INPUT -p icmp -j ACCEPT* *-A INPUT -i lo -j ACCEPT* *-A INPUT -s x.x.x.x -p tcp -m comment --comment SSH -m tcp --dport 22 -j ACCEPT* *-A INPUT -p tcp -m comment --comment Tor -m tcp --dport 80 -j ACCEPT* *-A INPUT -p tcp -m comment --comment Tor -m tcp --dport 443 -j ACCEPT* *-A INPUT -j DROP* *-A FORWARD -j DROP* *COMMIT* Quintin On Wed, Jan 24, 2018 at 9:15 PM nusenu <nusenu-lists@riseup.net> wrote:

...

Quintin:

...
Seems my VPS got suspended when I increased the connlimit above 10000. Do you think my INPUT filters which use conntrack could have caused this issue?

You did confirm that already, no?

-- https://mastodon.social/@nusenu twitter: @nusenu_

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

-- 0101100101000001010010000101011101000101010010000010000001000010 0100110001000101010100110101001100100000010110010100111101010101

nusenu

8:02 a.m.

New subject: debugging unbound on 'torexit' failing DNS queries (solved)

If your hoster suspends your server if you exceed 10k concurrent connections I'm afraid it is probably not suitable for an exit relay (regardless of your own iptables ruleset). A non-exit (single instance) relay would fit into a 10k limit. -- https://mastodon.social/@nusenu twitter: @nusenu_

Quintin

12:37 p.m.

New subject: debugging unbound on 'torexit' failing DNS queries (solved)

Hmmm. I think it's time to change to another provider. Quintin On Fri, Jan 26, 2018 at 12:43 PM nusenu <nusenu-lists@riseup.net> wrote:

...

If your hoster suspends your server if you exceed 10k concurrent connections I'm afraid it is probably not suitable for an exit relay (regardless of your own iptables ruleset).

A non-exit (single instance) relay would fit into a 10k limit.

-- https://mastodon.social/@nusenu twitter: @nusenu_

_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays

-- 0101100101000001010010000101011101000101010010000010000001000010 0100110001000101010100110101001100100000010110010100111101010101

2748

Age (days ago)

2756

Last active (days ago)

List overview

13 comments

4 participants

participants (4)

eric gisse
nusenu
Quintin
teor