排查本地 www.cloudflare.com 指向 127.0.0.1 问题

最近访问 www.cloudflare.com 时发现域名解析出来后总被指向到 127.0.0.1,以为被劫持了:

$ ping www.cloudflare.com
PING www.cloudflare.com (127.0.0.1) 56(84) 比特的数据。
64 比特,来自 localhost (127.0.0.1): icmp_seq=1 ttl=64 时间=0.090 毫秒

检查了 /etc/hosts 没有手动添加的记录,再用 Wireshark 抓 DNS 协议的包,发现竟然没有解析 www.cloudflare.com 的数据包,所以问题应该就出在本地 DNS 缓存了。

用 strace 抓包验证下:

strace -f -o /tmp/cloudflare curl www.cloudflare.com

这里主要注意加上 -f 参数,才能跟踪子进程的调用,因为有些问题出在子进程上,不跟踪子进程会丢失关键信息。

既然是网络连接,对抓出来的系统调用,只用关心 connect 调用即可,grep 一下:

$ fgrep connect /tmp/cloudflare
932422 connect(7, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (没有那个文件或目录)
932422 connect(7, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (没有那个文件或目录)
932422 connect(7, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 30) = 0
932422 connect(7, {sa_family=AF_INET6, sin6_port=htons(80), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2606:4700::6810:7c60", &sin6_addr), sin6_scope_id=0}, 28) = -1 ENETUNREACH (网络不可达)
932422 connect(7, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
932422 connect(7, {sa_family=AF_INET6, sin6_port=htons(80), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2606:4700::6810:7b60", &sin6_addr), sin6_scope_id=0}, 28) = -1 ENETUNREACH (网络不可达)
932422 connect(7, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
932422 connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
932421 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (操作现在正在进行)
932421 connect(5, {sa_family=AF_INET6, sin6_port=htons(80), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2606:4700::6810:7c60", &sin6_addr), sin6_scope_id=0}, 28) = -1 ENETUNREACH (网络不可达)
932421 connect(5, {sa_family=AF_INET6, sin6_port=htons(80), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2606:4700::6810:7b60", &sin6_addr), sin6_scope_id=0}, 28) = -1 ENETUNREACH (网络不可达

过滤掉 IPv6 相关的,及前两行找不到文件的,关键调用信息就这几条:

932422 connect(7, {sa_family=AF_UNIX, sun_path="/run/dbus/system_bus_socket"}, 30) = 0
932422 connect(7, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
932422 connect(7, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
932421 connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (操作现在正在进行)

也就是通过 UNIX socket 和 /run/dbus/system_bus_socket 通信后,解析出 127.0.0.1,那应该和 systemd 的服务有关。顺便又检查了下 /etc/resolv.conf 配置的 DNS 服务器地址:

nameserver 127.0.0.53

DNS 指向的是本地 127.0.0.53,注释又写明了:

# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.

注释说明了本地的 DNS 解析依赖 systemd-resolved 服务,用 lsof 看 53 端口占用进程也能得知:

$ sudo lsof -i :53
COMMAND     PID            USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
systemd-r 10549 systemd-resolve   16u  IPv4  94859      0t0  UDP localhost:domain
systemd-r 10549 systemd-resolve   17u  IPv4  94860      0t0  TCP localhost:domain (LISTEN)

再用 resolvectl 命令查看 www.cloudflare.com 解析的地址:

$ resolvectl query www.cloudflare.com
www.cloudflare.com: 127.0.0.1                  -- link: enp0s31f6
                    ::1                        -- link: enp0s31f6

排查到这里,说明是某台 DNS 服务器就把 www.cloudflare.com 解析到 127.0.0.1,才导致本地缓存了这个地址,至于问题到底是发生在公司还是家里,我在两处分别抓过包,最后发现问题出在电信分配给家里路由器的默认 DNS 上,以下为抓包结果:

15	1.821116103	192.168.1.7	61.139.2.69	DNS	101	Standard query 0x6fdd A www.cloudflare.com OPT
19	2.216707191	61.139.2.69	192.168.1.7	DNS	94	Standard query response 0x6fdd A www.cloudflare.com A 127.0.0.1

用 dig 命令指定 DNS 服务器来验证下解析:

$ dig www.cloudflare.com @61.139.2.69

; <<>> DiG 9.11.33-RedHat-9.11.33-1.fc33 <<>> www.cloudflare.com @61.139.2.69
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36311
;; flags: qr; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.cloudflare.com.		IN	A

;; ANSWER SECTION:
www.cloudflare.com.	3600	IN	A	127.0.0.1

如上,问题果然出在电信的 DNS 服务器上,于是在路由器上更改 DNS 服务器后得以解决。