Sat, Dec 21, 2024 | Jumada al-Aakhirah 20, 1446 | DXB ktweather icon0°C

How GitOps approach can save millions for large tech firms — Revolutionising ML operations with Amreth Chandrasehar

As demonstrated in Chandrasehar’s work through the years, one of the critical aspects of GitOps in MLOps is the integration of model observability

Published: Mon 9 Oct 2023, 4:26 PM

  • By
  • Shagun Sharma

Top Stories

Artificial Intelligence (AI) is no longer just science fiction, it has now become a part of our daily lives. From voice assistants to personalised recommendations on streaming platforms, AI is all around us. But have you ever wondered how these AI systems work behind the scenes? Well, that's where machine learning operations, or MLOps, comes into play. Think of MLOps as the conductor of an orchestra, making sure all the AI models and algorithms work harmoniously to deliver the best possible results.

In the world of tech giants, the stakes are higher. These companies operate on a grand scale, handling vast amounts of data and serving millions, if not billions of users globally. Ensuring the seamless operation of their AI systems is not just a matter of convenience, but one of survival. This is where the GitOps approach emerges as a game-changer, revolutionising ML operations and potentially saving these tech giants millions of dollars.

Meet the expert, Amreth Chandrasehar:

To help us navigate this landscape, we turn to Amreth Chandrasehar, a seasoned expert in AI and technology. Chandrasehar has been at the forefront of this digital revolution, working with big names like T-Mobile and Amazon. He's the wizard behind the curtains, making sure the AI machinery hums along smoothly. Currently serving as director of ML engineering, observability, and site reliability engineering at Informatica, he shares his insights and illustrates how GitOps is transforming the way large tech companies handle their AI operations.

Understanding the Complexity of MLOps for Tech Giants

Large tech firms operate in a complex environment where AI systems are the driving force behind their services. These systems continuously evolve, requiring constant attention at every stage of their lifecycle, from development and training to deployment and real-time monitoring.

“Imagine managing a high-speed train hurtling down a track at full throttle. You need to keep it on course, ensure it runs efficiently, and prevent any unexpected derailments. Now, multiply this by the millions of users relying on your services. That's the challenge tech giants face daily in the world of AI," says Chandrasehar.

The GitOps Approach: A Revolution in MLOps

GitOps is the cavalry that arrives to simplify and streamline this daunting task. It's a set of practices and tools that automates the entire process of managing AI systems, from coding and development to deployment and monitoring.

At its core, GitOps is about leveraging the power of version control, typically associated with software development, to manage the full machine learning lifecycle. It provides a single source of truth for AI systems, ensuring that every change is documented, approved through pull request processes, and subjected to automated quality checks.

The GitOps Pipeline: Efficiency and Automation

Chandrasehar highlighted that GitOps employs a sophisticated pipeline, comprising two vital components: Continuous Integration (CI) and Continuous Deployment (CD).

The CI phase encompasses pre-commit and post-commit pipelines. In the pre-commit phase, the system checks whether quality metrics are met. In the post-commit phase, end-to-end test cases are executed, and if quality metrics are met, the artifact is pushed to the model registry.

The CD portion of GitOps focuses on deploying ML models in a Model Serving/Inferencing infrastructure. This deployment pipeline is fully automated and can be triggered manually or in response to a new artifact being available.

Model Observability: The Critical Component

As demonstrated in Chandrasehar’s work through the years, one of the critical aspects of GitOps in MLOps is the integration of model observability. Through frameworks, SDKs, or custom code, model metrics are continuously monitored. The expert says that this is essential to evaluate model accuracy, performance, data drift, and data quality, ensuring that the AI models continue to perform optimally.

The Impact of GitOps: Efficiency and Cost Savings

GitOps, when applied to ML development, revolutionises key areas like data ingestion, pre-processing, training, evaluation, and deployments across various environments. Furthermore, it extends its benefits to the deployment and operation of MLOps platforms.

“Tools like Kubeflow and MLFlow can be deployed using GitOps, creating a unified approach for managing changes, enforcing pull request (PR) approval processes, and conducting automated quality checks’’, Chandrasehar highlights. This streamlined approach leads to increased collaboration, faster deployment, enhanced security, and greater reliability of applications and infrastructure. Apart from this, GitOps also introduces a host of benefits, including increased collaboration, accelerated deployment speed and frequency, enhanced security, and improved reliability. It enables a consistent and standardised approach to deploying applications, tools, and services within an organisation.

Crucially, GitOps is known to drive efficiency across multiple dimensions. Developers can build ML models faster with continuous feedback mechanisms, promptly identify issues through continuous integration pipelines, and resolve them swiftly using automated continuous delivery. Furthermore, it simplifies day-to-day operations, offering stability and reliability to customers while optimising operational efficiency.

Chandrasehar, drawing from his experience in implementing GitOps at T-Mobile and Informatica — where the approach enabled the management of millions of containers and petabytes of data with ease — said in his concluding remarks that this implementation clearly underscores how GitOps can lead to significant cost efficiencies, including enhanced developer productivity, customer service, and operational efficiency, ultimately saving organisations millions of dollars.

— Shagun Sharma is a business journalist.



Next Story