It's always DNS

HCHTech

Well-Known Member
Reaction score
3,828
Location
Pittsburgh, PA - USA
I've got a client with a single DC, running Server 2019. I got a complaint that their users couldn't download content from a vendor site. The site itself comes up fine, but when selecting an item to download, the browser displays the "the connection has timed out" error.

Trying this site from my computer works just fine, so I start looking for problems there. Not being blocked by the firewall or content filtering or GEO-IP filtering, anything like that. Looking in the DNS logs, I see error 5504:

"The DNS server encountered an invalid domain name in a packet from 9.9.9.9. The packet will be rejected. The event data contains the DNS packet."

Hmm. searching this error talks about clearing the DNS cache and restarting the DNS service, which didn't change the symptom in my case. I changed the DNS forwarders so that quad9 wasn't primary, cleared the cache again and restarted both the netlogon and dns services, no change. Now I get the same error listing the new primary forwarder in the message.

I'm the only one that accesses their servers, and I didn't change anything recently.

Running dcdiag, I get all passes except one:

"Dynamic registration or deletion of one or more DNS records associated with DNS domain 'theirADdomain' failed. These records are used by other computers to locate this server as a domain controller or as an LDAP server. "

It's been a couple of years since I set this server up, but I know I had clean dcdiag runs then. Also, they aren't reporting any other problems with browsing or access, so this seems to be a narrow problem. I would say it's a problem with the site itself, but it works from my office and a laptop there works when you connect it to a cellphone hotspot. So it's definitely a problem with the infrastructure there.

How should I diagnose this further?
 
Are all the clients joined to the Domain? When you tried your laptop was it on their network? Did you flush the DNS cache on all the workstations?
 
Are all the clients joined to the Domain? When you tried your laptop was it on their network? Did you flush the DNS cache on all the workstations?
Yes, all clients are joined to the domain. I wasn't onsite, so I used a domain laptop to confirm the problem and talked them through testing it with a hotspot. When I changed the forwarders, I tested results on another server, and flushed the dns cache on that server. I don't have a browser installed on the DC (other than IE, I suppose) so I couldn't test there.
 
First...what's the history of this active directory? Does it have a history of prior domain controllers?
Let's review the TCP/IP v4 settings....
The server should look at itself for primary DNS. With a single DC...it should only look at itself. Nothing else. So, looking at TCP/IP properties, for example..
IP address: 192.168.10.10
Subnet: 255.255.255.0
Gateway 192.168.10.1
Primary DNS 192.168.10.10 or 127.0.0.1
Secondary DNS: <blank>

Never ever have any external DNS server in those two DNS entries. The only time you'd have other DNS servers listed (secondary DNS)...is in some cases where you have multiple DCs in active directory, there are certain times it helps some initial setups when you list the others. Once everything is done with additional DCs being added to the domain, you usually can resort back to just having primary DNS as itself, and AD will replicate through sites 'n services settings.

Next...on the server, fire up DNSMGMT.MSC and go to the forwarders tab and this is where you list your external DNS servers. I do quad9 here for clients not on DNS Filter. If clients are on DNS Filter, I put their DNS servers here. Or...some IT people put Google DNS or the ISPs DNS here...but...those don't do any "safe filtering"...and I want another "layer of protection" for my clients so at the very least I'll have quad9 there....as non safe DNS servers are useless to me.

Also while in DNSMGMT.MSC I like to set up scavenging rules to ripple down in zones.

DHCP from the server to clients....
Example...from IPCONFIG /ALL on a workstation
IP address: 192.168.10.103
Subnet:255.255.255.0
Gateway: 192.168.10.1
Primary DNS: 192.168.10.10

Nothing else for DNS, period...for a single DC network. If you have multiple DCs it's fine to have the second DC as the secondary DNS.
 
First...what's the history of this active directory? Does it have a history of prior domain controllers?
Nope - it was setup fresh with the installation of this server in 2020
The server should look at itself for primary DNS. With a single DC...it should only look at itself. Nothing else. So, looking at TCP/IP properties, for example..
IP address: 192.168.10.10
Subnet: 255.255.255.0
Gateway 192.168.10.1
Primary DNS 192.168.10.10 or 127.0.0.1
Secondary DNS: <blank>
Primary DNS is the static IP of the DC. Secondary DNS is 127.0.0.1
Next...on the server, fire up DNSMGMT.MSC and go to the forwarders tab and this is where you list your external DNS servers. I do quad9 here for clients not on DNS Filter.
We have the following forwarders:
9.9.9.9
45.90.28.144 (nextdns)
149.112.112.112 (secondary for quad9)
8.8.8.8

Because the event log error said invalid domain name was received from 9.9.9.9, I tried changing the order of the forwarders to make nextdns primary, and that just made the event log error say the invalid domain name was received from 45.90.28.144.

Also while in DNSMGMT.MSC I like to set up scavenging rules to ripple down in zones.
Scavenging no-refresh and refresh intervals are both set at 7 days. Our DHCP lease time is 3 days, so I should probably adjust that downward so it's not longer than the DHCP lease time, although I don't think this impacts my problem.
DHCP from the server to clients....
Looks just like your example. Primary DNS on the workstations is the static of the DC.

I have also cleared the DNS cache on the DC with the "dnscmd <DNSServerName> /clearcache" command.

Interesting, I rebooted the DC this morning and while the site still doesn't work, I'm not getting 5504 errors in the DNS log anymore when I try to load that site.

The site we're having trouble with is:

www.assaabloydooraccessories.us/en/products/sliding-folding/straight-sliding-systems/pf28200a/

This site loads fine. If you expand the DOWNLOADS section in the middle of the page, then try to download one of those files (like the installation PDF), we get the "This site took too long to respond" error page in any browser. When I do it from my office, it works just fine, and I'm also using quad9 as my DNS.

I might have another clue, however. Looking BACK in the DNS event log, I see some 5504 entries from DNS servers that ARE NOT in our list of forwarders. There are some stating that there was an invalid domain name in a packet from 204.74.110.101, which an nslookup reveals is "edns101.ultradns.org" whoever that is. I wonder if somebody plugged in a home router somewhere or something like that...
 
My personal opinion...if I'm using safe DNS services (such as Q9)....I avoid putting any non-safe DNS services (such as 8's)...because, a query that may time on from Q9 has a slim chance that it may continue down the list of DNS servers until it receives a resolve....thus bypassing any blocking that safe DNS provides.

I try to keep DNS forwarders are 2x max. Although...usually I just have the 9's there and nothing else.
 
That makes sense. Just as a test, I removed the other forwarders, Forced a scavenge and cleared the cache, then restarted the DNS server service and the netlogon service; then went to another server, did a flushdns and tried again - no change in symptom. I'm going to look into why I might be seeing old 5504 errors from that ultradns.org server, that's got me a little worried...

Edit: I could also remove ALL forwarders which should make it use the root hints list directly just to see if it had an effect...
 
Last edited:
I might have another clue, however. Looking BACK in the DNS event log, I see some 5504 entries from DNS servers that ARE NOT in our list of forwarders. There are some stating that there was an invalid domain name in a packet from 204.74.110.101, which an nslookup reveals is "edns101.ultradns.org" whoever that is. I wonder if somebody plugged in a home router somewhere or something like that...
Interesting because it looks like ultradns is a paid service, no free limited use options. I agree with keeping forwarders simple. I only ever use one, quad 8
 
I've heard of, but never tried, UltraDNS.
If it is only a paid service, you do have to whitelist the WAN IP of the client's you set up on it (like I have to do with DNS Filter). Else, requests will get denied.

But should not see those from quad9.
 
I have the distinct feeling I'm chasing my tail here - I have no proof that those DNS log errors are behind my problem. I now have a couple of 5501 errors, which are "The DNS server encountered a bad packet from 208.17.117.106. Packet processing leads beyond packet length. The event data contains the DNS packet." I don't get a result on an nslookup for that IP, and I only have a couple of those, not every time I try to visit my problem site.

I've done some packet capturing on the firewall and don't see any dropped packets from that site, for example. I've also disabled the local WIndows firewall on the machine I was testing with, but that didn't make any difference.

I'm going to stop for a bit and maybe take a different tack. This client has 2 WAN connections, maybe I'll disable the main one to force it to failover to the backup and see if the problem continues. Just another datapoint, I guess. I'll also ask if any other sites are exhibiting this kind of behavior.
 
This might be a little outside the box, but if you manually set the dns server on a workstation at the office to quad 9 (I know you wouldn't leave it that way) and try to download the file what happens?
It would test the same infrastructure without including the DNS server.
 
Curious what the firewall is? Any IDS/IPS or other threat blocker? (I know you mentioned no above...just triple checking)
It's a Sonicwall TZ500. Latest firmware, recent reboot (yesterday). I had previously tried disabling all of the security services, to no avail. IPS, Geo-IP, Gateway AV, Gateway AS & content filtering. Not doing RBL. All of these have warning pages that display if they block something, and we're not getting those, so I don't think it's the firewall.

Does the server have multiple NICs? If any are unused...are they disabled?
The server is a Hyper-V Host and has 2 x Intel 10GbE NICs. They are teamed, so both in use. 2 Guests, a DC and an App Server. The DNS on the Host is the static of the DC.

This might be a little outside the box, but if you manually set the dns server on a workstation at the office to quad 9 (I know you wouldn't leave it that way) and try to download the file what happens?
It would test the same infrastructure without including the DNS server.
I didn't think of this, so just tried it. When I manually set a workstation DNS to quad9, it does NOT make things magically start working. Still get a timeout trying to save a download from that page. I tried 8.8.8.8 as well just for fun, but no joy there either.

Then I changed the WAN failover setting so the secondary ISP was now primary (They normally have FIOS as primary and Comcast as secondary). So I changed the setting so that Comcast was the primary network. I can confirm this with a visit to whatismyip.com from a workstation since it now reports the public Comcast IP.

Unfortunately, this didn't help either. Which means I'm no further in narrowing this down. Next week, I think I'll take a laptop onsite and connect it directly to the comcast modem to see if things work then - I suspect they will, which will only tell me it's either the firewall or the server. Right now, the ISP is still on the suspect list, I think - haha,
 
Unfortunately, this didn't help either. Which means I'm no further in narrowing this down. Next week, I think I'll take a laptop onsite and connect it directly to the comcast modem to see if things work then - I suspect they will, which will only tell me it's either the firewall or the server. Right now, the ISP is still on the suspect list, I think - haha,
Wrong. It does help. By changing the DNS server on the workstation to Quad 9 you have eliminated it being the server. It's either the firewall or possibly the ISP. Most likely the firewall...
 
@HCHTech
Have you checked the MTU settings on the WAN interfaces to verify they are set properly for the connection they are using?

Also not sure if you already mentioned it, but have checked your clients public IPs to see if they are on a blocklist. I've ran into a few times where websites use some sort of web firewall and probably subscribe to blocklists, and randomly some clients cannot load a site that they frequent. I end up having to route the traffic to the website out through a VPN tunnel to another office and NAT it through that offices Internet connection when it happens. Same AD domain, same DNS forwarding, same sonicwall firewalls with the same config at each site.

Do their machines have some sort of security software installed that does SSL inspection or DNS filtering that might be intercepting the traffic and causing problems?

For AD DNS forwarders I always stick to 1's, 8's, and 9's, but also have root hints enabled. If you nslookup the domain name they are trying to access at the clients site and then at your site, is the resolution the same?

Is there a LAN inbound NAT/Forward rule in the sonicwall that is redirecting all DNS traffic to a specific DNS server? I do this on my home firewall so no matter what hardcoded external DNS servers IOT devices or applications might try and use, the firewall will redirect it to my pihole server.

Maybe load up wireshark on their network and try and capture DNS traffic.
 
Haha - I love that movie. Yeah - sorry for the delay - I have not figured it out, but I have learned a few more things.

The customer has 2 internet connections, FIOS business and Comcast business. Both are static IPs. The setup is a failover, so everyone uses the FIOS connection unless it is down, then traffic switches to the Comcast connection.

Switching to Comcast for the primary connection didn't change the problem in any way. Same as before - site works, but downloads don't.

I checked their 2 public IPs for being on a blacklist, and sure enough they are.

The FIOS IP is on Spamhaus "Policy Blocklist" and the SORBS DUHL Blacklist. DUHL = Dynamically Allocated hosts/networks, which wouldn't seems to be right since they have a static IP, not Dynamic as far as I read. The explanatory text for the Spamhaus Policy listing says:

"The inclusion of your IP address on the Policy Blocklist is standard for the vast majority of internet users and is not the result of your actions. Here are some key PBL facts for your understanding:
  • Being on this list does not mean that you won’t be able to send emails.
  • You do not need to request removal from the list.
  • This listing is controlled by your Internet Service Provider (ISP), not Spamhaus.Your ISP lists ranges of IP addresses that shouldn’t be sending email directly to the internet.

The SORBS page shows an active listing for a /19 network (8192 addresses) which does include my clients IP. So this is a case where some other IPs in that range are showing bad behavior and my client got swept up in the range.

The Comcast IP is on the UCEPROTECTL2 Blocklist. Looking at the detail, this is the same issue, a /13 range is on the list, which is 524,285 IPs in total, including the one my client has.

I'm inclined to believe that these blacklist entries are NOT RELATED to the problem. Remember, despite it working outside of their network, the only problem they are having is with ONE SINGLE WEBSITE. Further, the site itself comes up just fine and can be browsed (it has thousands of pages). The only problem is when you try to download something. Being on a blacklist wouldn't cause that kind of symptom.

I don't have a NAT rule for DNS in the firewall. I double-checked the MTU settings and they are correct. Again, I don't believe anything along these lines would produce the symptoms I'm seeing. The problems would be more wide-spread.

I'm out of ideas and out of billable time to spend on this issue. I asked the client to report this to the company who has the website, that's about all I could offer. For now, they are downloading the few things they need from the site over a cellphone hotspot.
 
Yeah I "don't think" it's related to your issue either. Comcast IPs are frequently on black lists. But I don't tend to see that block DNS requests coming from there.

I'd still chase down that darned Comcrud security edge service...did ya read the link up there...it will intercept and molest DNS queries...and we have seen it mess things up with our clients with odd DNS behavior and odd browser behavior (sometimes resulting in 'insecure site" warnings in the browsers)
 
Back
Top