Remediation
Get the job done with end in mind!
Current Environment
Modern, cloud-centric IT infrastructures are very complex. To ease the maintenance and management of a component/sub-system (SSO, Active Directory, LDAP, Application Servers, Web Servers, Database, etc.), the component providers have made available a lot of options (both on-premise and cloud based), but these stay as standalone notification-only capabilities that need to be further triaged before actions can be taken. Even in a mid-sized IT environment, there could be tens of such sub-systems that need to be tuned (to co-exist with other sub-systems), resulting in an exponential increase of parameters that must be tweaked and adjusted.
In the event of an outage, the human tendency of each component owner is ensuring the health of their component and in the process, the overall health of the IT environment may suffer. The attitude of my system is working fine results in a longer troubleshooting and remediation cycle, impacting customer satisfaction and productivity.
Remediation
Diagnosis of a system outage is a good first step. But the key is to address and fix the outage at the earliest to ensure smooth continuity of business. After the Root Cause Analysis of an outage is accomplished, ideally, the failure should be addressed in an automated manner with as little human intervention as necessary
- Dynamic Workload Management, i.e., provisioning of resources on-demand to match the requirements. This could either be setting up more instances to handle the workload or increasing the bandwidth of ingress/egress connections. In cases where the resources are constrained, this could also mean be taking resources away from not-so-critical components in the environment and making them available to business critical processes.
- Alerting the failing component owners with as much information as possible as opposed to your component is not responding. This eliminates the diagnosis time for the affected team leading to faster consensus on fixing the issue.
- Automating the responses is the logical next step as incidence detection gets automated. Time is critical when it comes to protecting critical resources when there is a security breach. Be it cutting the access of the “bad elements,” or isolating a rogue user or quarantining an infected entity; automated remediation is the only path to fastest response.
Benefits of Remediation
- Reduces the time taken to fix the root cause of the issue.
- Automated provisioning of system resources to meet workload and bandwidth demand can be faster than a human operator/technician doing the same.
- Eases the identification and alerting of Subject Matter Experts needed to address the issue.
- Improves end-user satisfaction and experience, by ensuring a healthy IT environment.
xVisor Remediation
xVisor features out-of-the-box integrations with major network and system infrastructure vendor products, popular IT Systems Management platforms such as ServiceNow, PagerDuty, etc., and remediation tools such as Terraform, Ansible, Puppet, Chef, etc. xVisor enriches the service management process with detailed troubleshooting hints, next steps, and automated remediation recommendations. The effectiveness of the xVisor engine increases as the AI/ML models improve over time as more data being collected. Over time, human guidance and ingestions of diverse event streams enable xVisor to deliver better intelligence and actionable insights.