Crazy Consulting Work
Okay, so I put in a bid to troubleshoot an Active Directory problem for a small company (600 computers)... and got chosen.
Chief Issue: Can Join XP and Server 2003 Computers to Domain, but cannot join Vista, 7, Server 2008, Server 2008 R2 computers to domain.
Error: RPC Server unavailable.
Bid Amount: $8,000
Okay, so I get there and first look at an XP machine and a Windows 7 machine to try to find the differences.
1. I run IPCONFIG /ALL on both systems... Everything is configured the same from the same DHCP server (obviously the IP addresses are different)... No problems here.
2. Next, I take a look at the Operation's Masters (FSMO roles) with "netdom query fsmo" All the FSMO roles are running, and I can ping the servers.
3. Next, I took a look in Sites and Services and compared the AD site info with the sub-nets and Domain Controllers... no problem...
4. Took a quick peak at replication with REPLMON to ensure all the Domain Controllers are properly syncing their Global Catalogs. Overall, I confirmed Active Directory is not broken.
5. Next I thought the XP and Vista+ machines might be talking to a different Domain Controller or getting a different answer from DNS or that something third-party is installed... something strange..., so I decided to poke into DNS.
6. I pinged things like servers, and domain controllers from both XP and 7... Got the same responses... Great.
7. To find an RPC server & join a domain, it is going to need to query the SRV records from DNS, so I did an nslookup on the SRV records for things like LDAP and kerberos.
... Basically queries like this nslookup -type=SRV _ldap._tcp.dc._msdcs.addomain.com
Found the problem is DNS:
Windows XP would display the SRV records for the Domain Controllers of the Active Directory Domain.
Windows 7 would NOT display any SRV records..
i.e. I got something like:
*** dnsserver.addomain.com can't find _ldap._tcp.dc._msdcs.addomain.com: Non-existent do
I told them to take me to the DNS servers
8. First thing I noticed was a failed hard drive in the Array that they didn't notice. Okay, that is NOT the cause of this problem, so I let them know.
9. I took a look at the DNS console on that Domain Controller & DNS server... Everything was fine!
10. I said, "You have two DNS servers, let's just try to reboot this one."
11. I reboot it and it must have taken 20+ minutes to boot. I looked at their IT guy in disgust and said, "Does it always hang taking forever to start networking and apply computer policies?" He said, "Yeah"
12. Before logon, I am greeted with "Windows has detected an IP Address Conflict." This is a DNS server and a Domain Controller with an IP Conflict.
13. Their tech said, "It always does that; we just click OK."
14. I asked, "What else has the same IP address?" He didn't know!
15. Shutdown the DNS server, so I can track it down...
16. Logon to their Cisco Catalyst switch and then do the following:
There was no password!
switch# ping 10.x.x.x. (the IP of their DNS Server)
switch# show arp
It basically listed a long list that scrolled...
So, I did...
switch# show arp | include 10.x.x.x (the IP address of their DNS server that I just pinged)
I got a response like
10.x.x.x 1234.5678.9abc (A Cisco formatted Mac Address)
Okay, so I ran:
switch# show mac-address-table address 1234.5678.9abc
I can't remember the next command, but it was a show cdp neighbors (some argument for the interface & for detail)
It told me that the DNS server was on Interface Gigabit 0/1, which was fiber, lol... So, I followed that Fiber to an LIU... I asked their tech where the other end of it is.
17. He took me to another server-room... I logged onto the switch. ping, show arp, show mac-address-table... blah blah blah.. Interface FastEthernet 0/24
18. Followed that cable and it went to their Cisco firewall!!!
19. Logged onto the Cisco Firewall and did a show-run. That IP address was set as a management IP address.
20. The firewall was also running a caching DNS server!!! That was why those nslookups on Windows 7 would work for things like nslookup google.com.... nslookup server.addomain.com... but no SRV records (those aren't cached)...
21. I changed the IP address and removed the caching DNS server functionality from the Cisco Firewall... Then I booted the Domain Controller/DNS server, which booted in like 3 minutes!
Q: So, why was XP getting the SRV records from the other DNS server and Vista/7 getting it from t he Firewall?
A: Windows XP seems to do a Round-Robin using any of its configured DNS servers though it WILL pick on on its local subnet before using a remote DNS server... This is the same behavior on Vista/7. I.e. If you have DNS 1 and DNS 2 setup on all your systems, you don't want them to ALL hit DNS 1 every time... Microsoft knows this! That said, you don't want to go to DNS 2 if it is across a slow WAN with high latency... Again, MS knows this.
So why the difference?
A: Vista/7 WILL query a caching DNS server, first. Hence, they were seeing records from the Firewall NOT another, properly configured Domain Controller... Hence, there were no SRV records & Vista/7 couldn't find the RPC server.
This took about 4 hours and I got paid $8,000 for which I need to file a 1099-Misc with the IRS.