Automated Root-Cause Analysis: Enhancing IT Operations in the Cloud- Centric Virtualized World
May 14, 2024Top 5 Reasons Why xTend for Linux is Your Organization’s New Best Friend
July 6, 2024Internet Service Providers (ISPs) wish to deliver to their customers (e.g., residential subscribers, commercial enterprises, etc.) excellent digital experience, such as low delay (i.e., latency), low jitter, high bandwidth, etc. While trying to provide superior digital experience to customers, ISPs need to continuously deal with issues in their infrastructures, such as:
- frequent network disruptions due to issues arising at various domains: customer premises, upstream peering networks, cloud provider’s networks, etc.;
- slow issue resolution times, hence frequent truck rolls (i.e., technician and service truck dispatch to customer premises); and
- customer dissatisfaction arising from repetitive service calls.
According to Gartner, the average cost of network downtime can be up to $9,000 per minute. For Internet Service Providers (ISPs), network downtime can be even more costly as their customers depend on them to provide uninterrupted Internet services. Every minute of downtime ultimately pushes the customer to give up on the ISP and look for other better and competing solutions. Therefore, network reliability and reducing customer churn are of utmost importance to the ISPs.
To measure the quality of service (QoS) received by their customers and Service-Level Agreement (SLA) assurance, ISPs today use a plethora of tools, ranging from speed tests, ping tests, etc. to sophisticated in network hardware probes. A speed test enables a customer premises device to connect to a nearby server and allows customers to see what download and upload bandwidths they are getting. The ISP can also run a ping test remotely from a Customer-Premise Equipment (CPE) to determine the delay to some specific server anywhere on the Internet. The main problem with these methods is that, if there is an issue, e.g., high delay or low bandwidth, these tests cannot pinpoint the problem location, which is the important information needed before the ISP can determine if or what corrective action it can take.
In today’s AI-dominated world, ISPs need to establish a process to learn and predict, if a customer’s service performance is “heading south” before it breaks down, via continuous monitoring, observability, and analytics. This process will diagnose customer issues faster, reduce the rate at which trouble tickets are generated by customers, remotely rectify customer problems if possible, and if not, dispatch a technician with enough background information and experience to resolve the customer-premise issues. Essentially, a predictive and automated Root-Cause Analysis (RCA) process is needed to effectively analyze collected data, identify the root causes of network disruptions, pinpoint problem locations, and proactively (if possible, automatically) take remediation actions before the issues affect the customers.
For ISPs, reactive ways of measurement data collection (using old toolsets such as ping, etc.) are coming up short. ISPs need to deploy proactive monitoring and maintenance which include tools and technologies for holistic and continuous network monitoring and predictive RCA for preventing service disruptions. The predictive RCA process should also be able to correlate end-to-end and hop-by-hop network performance metrics (continuously collected using active probing) with other metrics such as CPE metrics (e.g., resource, wireless, etc.) as well as domain/peering metrics. The benefits of such automated RCA for ISPs include:
- learning and predicting customer health issues to resolve them faster;
- improved network reliability, enhanced customer satisfaction, and reduced customer churn; and
- reduced downtime and operational costs.
This is exactly what the Ennetix xVisor’s automated RCA solution is designed to solve. Ennetix’s solution enables the ISPs to collect active measurements from customer premises. In particular, xVisor can determine the Internet path from the customer premise to the applications, identifying all the intermediate hops and domains. xVisor collects all the relevant metrics such as end-to-end delay, jitter, bandwidth, packet-loss rate (PLR), and application-level metrics, along with hop-level metrics such as per-hop delay, packet-loss rate, etc. Using xVisor, ISPs can pinpoint where in the IT infrastructure the problem is occurring when the customer is having poor experience (e.g., high delay for download/upload; lack of connectivity to important applications; frozen video and/or jittery audio during important online meetings; etc.) Note that the problem could be in the customer’s premise, in the ISP’s network, or in the cloud network (that hosts the application servers).
ISP can achieve the following benefits using Ennetix xVisor automated RCA framework:
- Reduce significantly service calls from customers.
- Reduce significantly truck rolls (i.e., technician dispatch) to customer premises.
- Share customer experience information with application providers and Over-The-Top (OTT) service providers, enabling them to better serve their clients.
- Create situational awareness, security analytics, and threat intelligence to better serve their customers.
- Improve customer goodwill and reduce customer churn.
A recent ISP case study of successful xVisor RCA implementation exemplifies these benefits. Using Ennetix xVisor in trouble Broadband Access Network (BAN) locations, the ISP could:
- identify chronic performance issues, which are typically hard to pinpoint by technicians;
- provide a comprehensive perspective on measurements and diagnostics, not just latency issues;
- utilize the xVisor framework to develop a tailored diagnostic solution with an e2e process of comprehensive data collection, analytics, reporting, and visualization; and
- enable the ISP’s Ops team to create an RCA process addressing the trouble BAN locations and aid them in providing support solutions for dispatchers.
Please contact us to enable such automated RCA in your network for better customer satisfaction.