No Playbook
Posts
Are our measures of AI enough to understand the value delivered?

Are our measures of AI enough to understand the value delivered?

Exploring how conventional AI metrics align with, or fall short of, delivering measurable business outcomes

Will Huynh & Alpha (AI Writer)
June 25, 2024

Artificial Intelligence (AI) has become essential in modern business, driving efficiencies and uncovering new opportunities. But how do we truly measure its value? Common metrics like accuracy scores and benchmarks are often used, but do they really capture the full picture? These technical measures, while useful, may not always align (or is unclear how they align) with the business outcomes that matter most.

I want to briefly explore whether our current AI metrics are sufficient to understand the value it delivers. Are we missing crucial insights by relying solely on technical metrics?

The Limitations of Traditional Metrics

Technical Performance Measures

In AI, including both conventional machine learning solutions and Generative AI, technical performance measures like accuracy scores, quality indexes, precision, recall, and benchmarks are the standard metrics for evaluating model performance. These metrics provide a quantifiable way to assess how well an AI model performs its designated tasks. For instance, an accuracy score can tell us the percentage of correct predictions made by a model, while benchmarks offer a comparative against other models or industry standards. These metrics importantly serve as a mechanism to select the best algorithmic or technology approach in delivering the intended output.

However, these metrics are often limited in scope. They focus primarily on the algorithm's ability to perform specific actions under controlled conditions, which may not fully capture the complexities and variabilities of real-world applications. For example, a model with high accuracy in an ‘innovation lab’ setting might struggle when deployed in a dynamic business environment where data can be noisy, unpredictable, and changes over time. Moreover, these metrics don't capture the wider internal business and external implications of AI deployment. Relying solely on technical metrics can lead to a narrow and potentially misleading understanding of AI's effectiveness and impact.

Disconnect with Business Outcomes

One of the most significant limitations of traditional technical metrics is their disconnect from business outcomes. While an AI model might excel in terms of accuracy or precision, these metrics don't necessarily translate to business value. Business and functional owners are more concerned with how AI can drive revenue, reduce costs, or improve customer satisfaction. These outcomes are often not directly measurable through technical performance metrics alone.

For instance, an AI model designed to improve customer service might have a high accuracy rate in identifying customer queries. However, if it fails in attributable contribution to enhanced customer satisfaction or reduced response times, the model's technical success becomes less meaningful from a business perspective. This disconnect creates a gap between what data scientists measure and what business stakeholders care about, making it challenging to demonstrate the AI's true value. This does not claim that the measures are invalid, but rather highlights the gap in measurement.

The Uncodified Business Outcomes

Challenges in Defining Success

Defining success for AI initiatives is a complex endeavor, primarily because business outcomes are often multifaceted and not easily codified. Unlike technical metrics, which are straightforward and quantifiable, business outcomes can be subjective and influenced by various external factors. For example, the success of an AI-driven marketing campaign might depend on customer engagement, brand perception, and long-term sales growth, none of which are easily measurable through traditional metrics.

Additionally, different stakeholders may have varying definitions of success. While a data scientist might focus on model accuracy, a marketing executive might prioritise customer engagement metrics or qualitative feedback. This divergence in perspectives makes it challenging to establish a unified criteria for measuring AI's contribution to business success.

Granularity and Detail

The absence of detailed and granular metrics further complicates the measurement of AI's impact on business outcomes. High-level metrics like revenue growth or cost reduction are often influenced by multiple factors, making it difficult to isolate the specific contribution of AI. Without detailed metrics that capture the nuances of AI's performance, businesses struggle to assess its true value.

For example, an AI system designed to optimise supply chain operations might contribute to cost savings. However, these savings could also result from other factors like improved supplier negotiations or changes in market conditions. Without granular metrics that isolate AI's impact, attributing business success to AI becomes more like a pseudo-science.

Examples of Metrics Typically not Holistically Codified

Customer Satisfaction: While surveys and feedback forms can provide some insights, they often lack the granularity needed to measure AI's specific impact on customer satisfaction.
Employee Productivity: AI tools can enhance productivity, but measuring this impact requires detailed metrics that capture changes in workflow efficiency and employee performance.
Innovation and Creativity: AI can drive innovation, but its contribution to creative processes is challenging to quantify and often goes unmeasured.

These examples highlight the need for more nuanced and detailed business measures that can capture the multifaceted impact of AI on business outcomes.

Bridging the Gap: Towards Meaningful Metrics

To bridge the gap between technical performance and business outcomes, alternative frameworks are needed. One such approach is outcome-based metrics, which focus on the end results that matter to the business. These metrics go beyond technical performance to consider factors like customer satisfaction, revenue growth, and operational efficiency. Whilst this is not too unfamiliar in business (with many structures including OKRs, etc.), it is often not linked to the outcomes of AI models.

Additionally, combining technical and business metrics can offer a more comprehensive evaluation of AI's performance. A holistic approach involves integrating technical measures like accuracy and precision with business metrics such as ROI, customer satisfaction, and operational efficiency. This combined perspective provides a fuller picture of AI's value, addressing both its technical proficiency and its real-world impact. Though I do acknowledge that this is an incredibly difficult challenge, and one without any simple solutions, but it is a necessary if AIs are to be effectively managed and guided within organisations.

The importance of defined and adaptable metrics cannot be overstated.

AI is not static; its performance and impact can evolve over time.

As AI systems learn from new data and adapt to changing conditions, their effectiveness can fluctuate.

This dynamic nature necessitates continuous evaluation to ensure that AI remains aligned with business goals and delivers sustained value. As business needs and AI capabilities evolve, the metrics used to evaluate AI must also adapt. Flexible metrics allow businesses to capture the changing impact of AI and ensure that it continues to deliver value in a dynamic environment.

Reflecting on the continued growth of AI and the increasing need for their management and control, it's clear that bridging the gap between AI's technical prowess and its business impact will be key.

As we navigate the interlacing of human and AI work, the question remains: Are we ready to redefine our business measures to truly understand AI's value and the impacts on human metrics?

Such that business metrics can evolve alongside AI, capturing not just its efficiency but its transformative potential.

This article has been written with the guidance of a human and the penmanship of AI in collaboration, whatever that actually means.