BLOG
    Finding Value in AI Requires Better Metrics to Prove ROI
    FS Speakers at Reuters Momentum AI Conference Shares Views on AI Best Practice, Metrics and Factors for Success

    As GenAI become old hat and Agentic AI all the rage, financial institutions are finding they must be clearer about their approach and impact of AI on the business – with the focus squarely on finding value and proving ROI. This was a common theme discussed at the Reuters Momentum AI London 2025 conference, held earlier this week. The conference’s second day focused specifically on financial services, with representation from major insurers, banks including retail, corporate & investment banking (CIB), securities servicing, and solution providers. Speakers included senior executives from BoA, BNY, Citi, European Central Bank, Intesa Sanpaolo, Lloyds, HSBC, MillTech, Nationwide, OpenAI, Swiss Re, Zurich Insurance Group, and more.

    Over the day there was consensus among speakers and audience in several areas:

    • There is a move away from “use case overload” and “vanity metrics”.
    • Transitioning from POC to production and defining value with metrics are key challenges.
    • An over-emphasis on risk can create a bigger risk of falling behind, so balance risk controls with potential impact to the business.
    • Future tables stakes vary across retail and corporate/institutional banking, with the former likely incorporating more autonomous agents (“super agents”) while the latter retaining a personal touch (“super powered” bankers/advisors/traders)
    • The vast majority (up to 85%) of use cases are still internal.
    • It is early days for Agentic AI despite the marketing hype.

    A summary of discussions around use cases, metrics, best practices, table stakes, gaps between hype and reality and how to “push past the POC” appear follows.

    Use Cases

    Common early AI uses cases identified by speakers included:

    • Finance-related, e.g., credit risk profiling to shorten time frame to decision.
    • Document processing and insight extraction: many FS process relies on documentation, so this is a great place to start.
    • Simple customer service focus leverage chatbots and summarization capabilities.

    One bank speaker described how they peaked at 100s of use case, but realized many were duplicative, did not drive value and critically, did not scale. They went back to the drawing board to build their AI roadmap in a more centralized and thoughtful manner incorporated the following ideas;

    • A “control tower” group of P&L owners who meet monthly to prioritize use cases.
    • Ensure understanding of actual benefits from each use case, with expectation around in-year realization of these benefits.
    • Verify the use case will be scalable.
    • Decide early to buy or build.

    Their target for this year was to find 50 use cases delivering £50m in “in-year” benefits and are on track so far. However, they discovered that most of benefit comes from a small number of use cases, so a takeaway is to green light a smaller number of initiatives next year, thinking harder about value.

    Another speaker mentions a “Big Bets” approach; instead of going for 10-15 bets, they will pick 1-2 to allow the organizing to really build out their AI muscles first.

    Speakers pointed to a shift in recent internal conversations, from asking what to do specifically to how to best drive value. Criteria to consider in this case include speed (getting to a decision faster), efficiency of current process and value of hyper personalization.

    The first two are often inward facing and for now, most speakers agreed most use cases are internally focused (as high as 85% overall). However, this is expected to rapidly change in the retail space especially, with internal use cases quickly becoming “table stakes”, so thinking about the value of personalization will become more important. Note that for CIB however there was a view that use cases will remain skewed internally, due to the higher regulatory and risk of external use cases, but also due to customer requirement for a “higher touch” service.

    Metrics

    The topics of finding value, ROI and metrics came up throughout the day. Speakers were clear that use cases need to be driven by the business, and metrics based on actual impact on: customers and/or value to the business. One bank speaker shared their guidelines around metrics as follows:

    • Value must be realized “in-year” for use case to be green lit.
    • The metrics must be one of the following: cost avoidance, cost saving or incremental revenue creation.
    • For incremental revenue the business must commit to a figure exceeding existing targets.

    He shared that currently most metrics are around cost avoidance or savings for now, but these are tied to tangible figures. He pointed to an example of increased revenue related to the credit memo space, as a streamlined credit approval process meant more loans could be written. The business did not feel comfortable committing to a firm target this year, but he is hoping that will change as confidence around this technology grows. Going forward, a business commitment to a firm target will be required to green light an incremental revenue use case.

    Best Practices and Table Stakes

    A speaker from a global bank shared the best following best practices:

    • Embedding AI workflows into existing platforms to leverage existing controls and data already embedded in the workflow.
    • Combining different types of AI to drive value, not use cases focused on one type of technology.
    • Prioritization of high impact use cases as the key to success. The bank continually asks what are the big bets and where can they expect returns? Then focus on those answers to drive value.

    Another put it more simply:

    • Reduce number of POCs.
    • Adapt AI platform for scale up.
    • Remember that responsible AI involves a range of areas, including compliance, reporting, risk, people, impact and more.

    He noted that by centralizing technology best practice but creating spokes to the business they are seeing a new, stronger relationship between the technology and business teams. The result is an increased appetite for experimentation that is grounded in good process.

    Views were shared on common “good patterns” as well as “anti-patterns” when it comes to building trust in AI. The “anti-patterns” include:

    • Being incredibly restrictive.
    • Or the opposite, encouraging “POC proliferation”

    The “good patterns” include:

    • Start with employee literacy, ensuring staff has skills to execute more complex use cases, e.g. important data in spreadsheets to AI models for standard analysis such as Monte Carlo simulations.
    • Lead from the front by ensuring senior executives demonstrate active use of AI.
    • Do not bolt-on AI but rather re-imagine and re-design.

    There was less consensus around table stakes as it is still early days, and as AI technology has potential to fundamentally change the banking model. However, having the right data, with a modern tech stack and investing in an AI workbench came up multiple times in the day. As several speakers noted, most banks still have legacy and siloed technology stacks, so any conversation around table stakes must consider this.

    Regarding best practice around governance, several speakers raised challenges around both third- and fourth-party risks. With AI being embedded everywhere it is becoming increasingly important to keep clear lines of communications with suppliers and clients. The need to develop effectively monitoring practices and tools is key. Autonomous reasoning where the AI does something you did not ask it to do was raised as a real concern.

    To address this in particular, the concept of use of centralized workbenches was discussed and considered critical to success and key to any AI roadmap.

    When it comes to metrics, speakers agreed that the age of so-called “vanity metrics” is over. Understanding current state is important. One speaker referred to a well know claim that 40% of AI projects fail but another asked what the context is, noting that this could be a great result if the expected failure rate for that type of project is typically higher. Several speakers pointed to using known, existing metrics as a starting point to build AI impact metrics. For example, one bank’s securities servicing division is ranked by customers around ease of interaction, so the AI project metrics used this as the base to aim for a higher-level target.

    Gap Between Hype and Reality

    One speaker said that last year the hype gap was around PowerPoint slides, but now it is around people building agents, with claims that 1000s of agents are being built. In reality, he added, major solution providers are only recently announcing ways to manage agents, and workbenches are still being built out.

    Another key gap is around security especially as banks still have a lot of legacy technology. There is renewed energy around modernizing infrastructure especially when it comes to microservices, as without these capabilities, speakers felt security gaps will continue to be a big issue.

    Finally, as with emerging technology adoption in general, and certainly with RPA and ML, this is not just about technology but people and trust as well. Several speakers noted there are gaps here as well.

    Pushing past the POC

    Getting stuck at the POC phase was a frequent challenge, as scaling for enterprise, and integration with legacy systems often presenting significant roadblocks. Risk concerns often emerge as well, especially considering that AI technology is non-deterministic (the same inputs may not produce the same results).

    Creating rigid tests for AI solutions, called “Evals” was mooted as a solution to the issue of attaining determinism. The suggestion was to spend a few months building Evals, which means defining common inputs and expect out outputs which can then be tested for and used to gauge accuracy of solutions. This approach supports better scaling in future as well. The number of examples in each Eval will be tied to the risk profile of the solution – if accuracy is less critical then less tests will be needed. For high impact cases, then efforts to define edge cases is important.

    Other more technically oriented advice included:

    • Ensure data pipelines are robust and metadata consistent.
    • LLMOPs and AgentOps infrastructure need to support experimentation.-With experimentation very expensive however aim for robust but efficient approaches.
    • Considering skipping the POC stage altogether, opting for an MVP to test integration with legacy at the start.
    • Leverage learning from other emerging technology adoption, such as move to cloud.
    • Be sure to integrate AI into processes change management to future proof.
    • Utilize abstraction layers where relevant, e.g. for API integration.
    • Carefully consider your data layer as this is likely to change massively.
    • Empower your “humans in the loop” with enhanced tools for monitoring e.g. visualization of flows, dashboards with analytics around trace data etc.

    Wrap Up

    All in all a very informative conference, with speakers generously sharing their real world learnings.

    For those interested specifically in AI data governance, I also had a fascinating discussion with Katie Fowler, Director Responsible Business, Thomson Reuters Foundation. She mentioned that there is currently limited data on how companies are using AI tools and what this means for people, society and the environment. To help address this - and offer companies a way to benchmark themselves - her company is powering a free, voluntary survey by the AICDi that is grounded in UNESCO's Recommendations on the Ethics of AI. It covers the impact of AI on the workforce, legal accountability, environment impact and data privacy/bias.

    Celent subscribers can learn more about the use of AI in capital markets and wider FS by accessing our reports here:

    Gen AI: Turbocharging AI in Capital Markets

    Shedding Light on Agentic AI in Capital Markets

    Gen AI: Lens on Use Cases in Capital Markets

    AI Testing

    Accelerating AI Adoption in AML

    Celent AI Hub page