RESEARCH AND INNOVATION

ChatOps: Encouraging Conversations in IT Operations

 
November 27, 2018

We all know the power of collaboration in operations – when teams interact, learn, and work together, operations improve multifold. Engaging in conversations brings in better understanding about the service operations for all stakeholders, helping improve how users plan and schedule their service requests, thus enhancing performance. In fact, collaboration is one of the fundamental principles of DevOps – the cultural shift that has enabled businesses to become agile and responsive.

What if, in addition to teamwork, we bring systems into our conversations? What if our systems – applications, databases, servers, and networks – are able to interact with our teams, respond to their questions, notify them of alerts, or, in general, chat with them? The conversations can only get more meaningful, and therefore fruitful.

ChatOps refers to the use of chat interfaces in IT operations. Through chatbots, ChatOps facilitates interactions with systems, using conversations to provide information, help us with performing tasks, or guide us towards a solution.

Automated Information Pooling

One of the important steps in incident management is investigation and diagnosis. This often involves running a series of commands to fetch diagnostic information from the system, gathering metrics/logs, and analyzing them to have a better understanding of the underlying issue.

In the case of high-priority incidents, the time taken to do this can prove costly. With ChatOps, it is possible to invoke a simple request from the chat channel to automate the fetching of diagnostic information from multiple data sources – related metrics, graphs, db logs, webserver logs, system/db commands, etc., – and present them on a common interface for collaborative troubleshooting.

Another key aspect of operations is event management, which involves identifying events that are significant, deriving inferences from them, and notifying the operations team to take appropriate action. When these events are brought into the chat channel, it becomes possible for multiple teams to deliberate and work together on the events in a timely manner.

So, what do we gain by implementing ChatOps?

  1. Faster recovery time: In a crisis situation, or when a high-priority incident gets logged, being on chat with all stakeholders makes it possible to collaborate, brainstorm the symptoms/causes, and arrive at a solution/workaround quickly.
  2. Transparency and knowledge management: Visibility into common information on the chat interface ensures better transparency in decision-making and accountability. Entire chat conversations can also be archived or attached to the incident for future reference, or to add to the knowledge base. This would also be useful for audit purposes.
  3. Simplified interface: Execution of operational tasks can be invoked through a simple chat interface without the use of the complex syntax of actual commands.
  4. Improved system security: Operations team members do not need individual permissions and access privileges to perform tasks on the systems. Instead, there is role-based access to the chat room, which ensures that the users are authenticated and are in turn allowed to execute only the specific tasks that they are entitled to.

However, ChatOps is not without its challenges. With increase in its adoption, there will also be a significant increase in the volume of conversations, and it is likely that critical exchanges are missed out among the noise. ChatOps also involves a cultural change, something that the underlying DevOps principles demand. Getting teams to collaborate in a transparent manner requires a change in their mindset, since they have so far only worked in silos.

However, the business benefits derived in terms of improved operational efficiency and the social benefits in terms of improved collaboration among teams far outweigh these challenges. So, how do you see your IT operations gaining from ChatOps?

 

Tags

Jayashree Arunkumar is a Technical Architect with more than 22 years of experience in the IT industry, specifically in the areas of DevOps, enterprise infrastructure management, service management, infrastructure automation, enterprise networking, and system administration and support.