My first reaction was that this was textbook vendor finger-pointing, PLUS, aren't these things self-policing? If a hop on the route to anything is having trouble, isn't it automatically taken offline or traffic rerouted? That was my understanding, although admittedly I don't really know.
Yes, my initial reaction was that this feels like typical finger-pointing . If this were truly a Verizon backbone issue, I would expect broader impact with other Verizon customers.
Most impairments such as latency, jitter, or high utilization will not cause a BGP routing protocol session within the provider network to drop or a route to be withdrawn! BGP reacts to reachability failures not performance degradation. The type of failover behavior causing a failover is more consistent with a hard outage (i.e. fiber cut or complete path failure).
In short, the routing protocol has no awareness of application performance metrics such as VoIP call quality. As long as the next hop remains reachable, the route will stay installed. It would take some detailed engineering such as BFD tied to route withdrawal or automation modifying policy (i.e. prefix-lists) to remove a path based on performance rather than reachability.
When I do a tracert or an MTR run to their servers from the client location, the problem hop isn't even on the list! Of course, they didn't even want to hear that, they just sent me another screenshot of their test results. So now, I'm opening a ticket with Verizon with the inability to reproduce the "evidence" AND the inability to test whether any fix they might do was successful - it's maddening.
If the problem hop does not appear in your MTR, the path being tested may not be the same due to Policy-Based-Routing. Traceroute/MTR use ICMP which may not follow the exact same forwarding path as application traffic like VoIP due to ECMP or upstream routing differences, but Verizon would have to tell you to be sure.
If there were true packet loss at a specific hop, you would normally see that loss continue in subsequent hops. If subsequent hops are clean, it is very likely Verizon may simply be rate-limiting ICMP rather than dropping transit traffic.
Also, between two visible Layer-3 hops there can be significant Layer-2 infrastructure issues that segment that always present at all in a Traceroute. For example if I have several switches with a trunked VLAN carrying Hop C to Hop E in your A-Z path... In short something like an unmitigated loop broadcast storm could drown out traffic on a VLAN making it hard for Hops to reliably get their L2 frames containing the packets they need to route.
The larger issue is that without being able to reproduce the problem from the client network, it is difficult to pinpoint the problem. You need to bring Verizon into this probably.
When I suggested THEY talk to Verizon, their response was "It's our policy NOT to communicate with a client's ISP." Right. Just lay the blame somewhere else and back away. Close ticket.
I understand you may not communicate directly with a client’s ISP; however, since the reported issue is within the transit path if not in the customer's, coordinated validation is necessary. If you are seeing consistent loss or latency, please provide source/destination IPs, timestamps, protocol used, and evidence of downstream impact to Verizon with actionable data. Beyond that this is ultimately what SLAs are for ... your measurable, verifiable performance issues.
You only get SLA credits when it is confirmed on the Carrier's side though.
Their router is at the edge now, so I have no visability into that. Our equipment looks all good. We've got a managed Unifi switch there and there are no problems at all that I can see. I haven't dug through the firewall logs specifically, but we do have alerting enabled and are receiving no alerts.
If their PE router is at the demarc, you really do NOT have visibility into anything upstream of that. From what you describe, your side looks clean no switch errors, no alerts, nothing obvious pointing to local impairment. That said, UniFi lacks any great deep diagnostics or historical interface counters, so it is terribly hard to prove your network clean.
If I were isolating this, I would temporarily bypass the UniFi entirely. Plug a single phone (or small test device) directly into a known-good switch even better, a basic Cisco like a 9300X or something with clear interface counters and make that the whole network for testing!
Either way... One phone.... One switch.... Direct to the VoIP carrier handoff.
If the issue still happens in that stripped-down setup, it is almost certainly upstream. If it disappears, then you know the UniFi setup deserves a closer look.
Good Luck