DNS over TLS
A few months ago I finally got optical internet connection with native
IPv6 . This prompted me to slightly reconfigure my devices: I
abandoned HE.net tunnel as my IPv6
outlet.
I explored my DNS hardening as well. At first I flipped
a DNSSEC and DNSOverTLS on for systemd-resolved without much
thought and got it to do something. The result seemed to work.
Later I noticed that netstat takes a while to resolve local IPv6
addresses back to domain names: it took 10+ seconds per printed line.
This seems to be a reproducer:
$ dig slyfox.uni.cx AAAA | grep -v '^;'
slyfox.uni.cx. 37 IN AAAA 2a00:23c8:a613:101:7912:b70d:1c91:ab02
$ time dig -x 2a00:23c8:a613:101:7912:b70d:1c91:ab02
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
;; communications error to 127.0.0.53#53: timed out
; <<>> DiG 9.18.14 <<>> -x 2a00:23c8:a613:101:7912:b70d:1c91:ab02
;; global options: +cmd
;; no servers could be reached
real 0m15.028s
user 0m0.002s
sys 0m0.005s
Reverse resolve was not only slow: it also did not return a reasonable
DNS response (NXDOMAIN or similar). This means result did not get
cached at all and subsequent run was as slow.
Why did slowness start happening? IPv4-only world with HE.net tunnel
had no such problems.
I have effectively external IPv6 address for some internet facing
resources. They are in the same subnet as my desktop! This makes local
reverse resolution to be very slow.
systemd-resolved as is does not provide detailed enough logs. I had to
strace it to see what DNS servers and protocols it talks to. Not
very convenient.
strace revealed that systemd-resolved sends some of reverse lookup
requests to link-specific DNS server which happens to be my ISP router
(over DoT port which router does not support). Other requests are sent
to globally configured DNS. That was a surprise.
Disabling DNSOverTLS=yes option also did not help: the router still
did not want to resolve or forward the requests upstream.
On top of that systemd-resolved did not keep a TCP connection to
DoT servers. As a result each (even successful) resolve
takes a while due to a 3-way TCP (and TLS?) handshake. This adds
about 10ms to each (uncached) query.
I wanted a bit more flexibility where my DNS requests go. Thus, I
switched over to a familiar unbound package. unbound does keep TCP
connections to DoT servers (which makes even uncached DNS requests
under 10ms). Debugging story is also more straightforward: specifying
verbosity: ... parameter in the config is enough to see what is being
resolved and how the forwards happen.
On NixOS the naive switch to unbound is a few lines of service
setup:
{ ... }:
{
# Slow at resolving reverse loopups for IPv6, like:
# $ dig -x 2002:...
services.resolved.enable = false;
services.unbound = {
enable = true;
settings = {
forward-zone = [
{
name = ".";
forward-tls-upstream = "yes";
forward-addr = [
"2001:4860:4860::8888@853#dns.google"
"2001:4860:4860::8844@853#dns.google"
"8.8.8.8@853#dns.google"
"8.8.4.4@853#dns.google"
];
}
];
};
};
}Most of “code” here is to specify DoT-capable recursive servers. The
above expands to the following unbound configuration:
$ cat /etc/unbound/unbound.conf
server:
access-control: 127.0.0.0/8 allow
access-control: ::1/128 allow
auto-trust-anchor-file: /var/lib/unbound/root.key
chroot: ""
directory: /var/lib/unbound
do-daemonize: no
interface: 127.0.0.1
interface: ::1
ip-freebind: yes
pidfile: ""
tls-cert-bundle: /etc/ssl/certs/ca-certificates.crt
username: unbound
forward-zone:
forward-addr: 2001:4860:4860::8888@853#dns.google
forward-addr: 2001:4860:4860::8844@853#dns.google
forward-addr: 8.8.8.8@853#dns.google
forward-addr: 8.8.4.4@853#dns.google
forward-tls-upstream: yes
name: .
remote-control:
control-cert-file: /var/lib/unbound/unbound_control.pem
control-enable: no
control-interface: 127.0.0.1
control-interface: ::1
control-key-file: /var/lib/unbound/unbound_control.key
server-cert-file: /var/lib/unbound/unbound_server.pem
server-key-file: /var/lib/unbound/unbound_server.key
Now all my direct and reverse queries are nice and quick:
$ time dig -x 2a00:23c8:a613:101:7912:b70d:1c91:ab02
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 6507
;; AUTHORITY SECTION:
0.a.2.ip6.arpa. 1274 IN SOA pri.authdns.ripe.net. dns.ripe.net. 1687019637 3600 600 864000 3600
;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Sat Jun 17 22:15:27 BST 2023
;; MSG SIZE rcvd: 161
WARNING: do not use these settings as is until you fully understand the implications of abandoning your default resolver. Non-exhaustive list of things to note:
some of local requests are forwarded to the public
DNS: you might need to tweak your local zones a bit more if you want to prevent leaking our resolution requests related to machines in you networkall your non-local requests go to a single entity: you might now want to send all your DNS queries to a single public
DNSserver for privacy reasonsmDNSis disabled and your local resources might stop resolving as is
I have local workarounds for all of these. Your setup will likely be different.
When picking among DoT public servers I was choosing between:
Cloudflare:1.1.1.1and similarGoogle:8.8.8.8and similarQuad99.9.9.9and similar
These are all for-profit companies. You might want to evaluate other
DoT-capable servers.
Anyway. Of the above somehow Cloudflare latency is consistently 2x-3x
slower for initial DoT setup than the other two:
$ dig +tls @1.1.1.1 kernel.org
;; ANSWER SECTION:
kernel.org. 100 IN A 139.178.84.217
;; Query time: 50 msec
;; SERVER: 1.1.1.1#853(1.1.1.1) (TLS)
$ dig +tls @8.8.8.8 kernel.org
;; ANSWER SECTION:
kernel.org. 175 IN A 139.178.84.217
;; Query time: 14 msec
;; SERVER: 8.8.8.8#853(8.8.8.8) (TLS)
$ dig +tls @9.9.9.9 kernel.org
;; ANSWER SECTION:
kernel.org. 289 IN A 139.178.84.217
;; Query time: 19 msec
;; SERVER: 9.9.9.9#853(9.9.9.9) (TLS)
Maybe Cloudflare does not have a close enough TLS termination near
me? mtr -4 -T -P 853 says that both 1.1.1.1 and 8.8.8.8 are 4.5ms
away from me, while 9.9.9.9 is 9ms away from me. Non-TLS queries are
as performant as mtr-reported values:
$ dig @1.1.1.1 kernel.org
;; ANSWER SECTION:
kernel.org. 52 IN A 139.178.84.217
;; Query time: 5 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
$ dig +tcp @8.8.8.8 kernel.org
;; ANSWER SECTION:
kernel.org. 92 IN A 139.178.84.217
;; Query time: 5 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (TCP)
$ dig +tcp @9.9.9.9 kernel.org
;; ANSWER SECTION:
kernel.org. 36 IN A 139.178.84.217
;; Query time: 9 msec
;; SERVER: 9.9.9.9#53(9.9.9.9) (TCP)
I settled on dns.google (8.8.8.8 and backups).
Parting words
systemd-resolved needs a bit of hand holding to make reverse resolves
to work quickly. unbound is a bit easier to get up and running.
DoT latency tax is around 2-3x as it needs more RTTs supposedly for
TLS-related machinery. If your resolver is close enough it’s not too
bad.
In theory DNS-over-QUIC could shrink latency further down to levels of
unencrypted DNS. Once DoQ gains more popularity and it’s support
gets added to unbound: https://github.com/NLnetLabs/unbound/issues/743.
Have fun!