AIOps for Deterministic Digital Operations
May 7, 2021Since Gartner coined the buzzword “AIOps” a few years ago and predicted that the IT operations world would undergo a major shift from traditional IT management techniques, a hype grew around the term “AIOps”. Growingly, it feels like what “AIOps” is currently to the “IT Operations” world, is similar to what “SDN” used to be for the “Networking” world a few years back.
Traditional vendors are interpreting AIOps from their strong suite and packaging their solutions as AIOps while providing advanced version of their conventional functionalities using Artificial Intelligence and Machine Learning (AI/ML). For example, Application Performance Monitoring (APM) solutions are using AI/ML to advance application analytics and putting their stakes as AIOps solutions. Similarly, legacy service management vendors are using AIOps ubiquitously while providing analytical service management using AI/ML methods. This type of staking the claim to AIOps fame is creating more confusion in the market about the definition of AIOps. To reduce ambiguity and to avoid controversy, let us define AIOps in a broader fashion: “Use of AI/ML to streamline and/or automate any aspect of the IT operations process.”
IT infrastructures have been upended over the past few years with the introduction of a slew of new technologies: from cloud virtual machines to containers to micro services to serverless computing, etc. Enterprises are engaging in a digital transformation journey where the old school of infrastructure (from on-prem servers, routers, switches, gateways, etc.) is becoming obsolete like pay phones.
In such an ever-changing and fast-paced IT infrastructure world, IT operations teams are facing three-prong challenges: proliferation of new infrastructure technologies that got introduced very recently; vast amounts and types of data generated by these infrastructures; and lack of experienced professionals to manage these infrastructures efficiently.
Before even trying to manage these infrastructures and take advantage of AI-powered solutions, IT operations teams needed to figure out what data is available to collect for operations analyses, how to collect the data, what is the ground truth in these new sets of data, and whether these data sets are enough to make accurate management decisions. Ultimately, without meaningful data, managing the new breed of infrastructures would be like throwing “darts in the dark”. Meaningful and relevant data is good, but unrelated plethora of data makes IT operations even more dreadful. Good AIOps solutions are as good as their ground-truth data; without the right data set, they become another burden on toolsets!
So, what to look for in an AIOps solution? It essentially boils down to what the IT operations team’s end goal is. Is it creating order in a chaotic world? Is it reduction of noise by the way of ingesting and correlating millions of logs/metrics generated by your infrastructure? Is it operational tool consolidation and reducing burden on IT operations team? Is it brining the operational efficiency in your service and ticket management process? Whatever the goal is, it is absolutely necessary to ensure the solution is the right fit in the operations team. The team may not be able to provide the right data set (may be due to organizational restrictions?) to get optimal outcome from the AIOps solution. In that case, it may be prudent to select an AIOps solution that works under the existing process and can still provide operational efficiency the team is looking for.
Now, what to expect from an AIOps solution?
If an AIOps solution is promising the world and can analyze and automate every aspect of IT operation, there are reasons to be skeptical! This most probably will not happen overnight! Rather we should look for solutions that can create trust and confidence in the operations team.
While generating and presenting the results of the AI-powered analyses, AIOps tools should provide sufficient human-friendly hints/clues behind the results. It is important to remember that many operations personnel are still used to the threshold-based alert generations. Transition from threshold-based operation to automated analytics would not come in a day; it would take a trust-building process through meaningful interactions with AIOps tools and platforms where operations team can depend on these tools to make accurate and informed decisions.
Ultimately, someone’s job depends on these decisions. It is still not clear and too early to call whether AIOps platforms/tools can deliver on automating the entire IT operations process. Considering this, it may be prudent to bring in AI revolutions in IT operations in evolutions, minimize the human involvement as much as possible but not zero it out in the trust-building process, which may ultimately lead to fully-automated IT operations.
Happy Ops!