Evolving model risk management in the age of generative AI and public large language models
03 December 2025
Evolving Model Risk Management (MRM) in the Age of GenAI and Public LLMs
Re-imagining governance for a new era of intelligent automation
Generative AI (GenAI) and large language models (LLMs) are redefining the boundaries of model use within financial institutions. From credit analysis to documentation and customer engagement, these tools are enhancing speed and efficiency across banking functions. Yet, their complexity, opacity, and reliance on external data introduce new dimensions of model risk that traditional governance frameworks were never designed to manage.
This paper explores how banks worldwide can evolve their Model Risk Management (MRM) frameworks to address the distinctive challenges posed by GenAI. It outlines practical considerations around explainability, data privacy, and continuous oversight, and proposes a structured approach that balances innovation with accountability.
As GenAI redefines how models behave, model risk must evolve from static validation to dynamic assurance.
In today’s landscape, public LLMs such as ChatGPT, Gemini, and Claude are being rapidly deployed to generate insights, automate workflows, and transform decision making. Their ability to synthesise complex data and produce fluent narratives has made them attractive across risk, compliance, and credit functions.
However, these capabilities introduce governance challenges. Traditional MRM frameworks were designed for deterministic, explainable, and internally controlled models. GenAI systems, by contrast, are probabilistic, dynamic, and often externally managed. Their logic cannot be easily explained or replicated, and outputs may shift unpredictably.
This raises critical questions around transparency, accountability, and trust. Regulators across the globe are expected to expand model governance expectations to include GenAI, particularly where these systems influence decision making or customer outcomes. Banks that proactively adapt their MRM frameworks will be best positioned to demonstrate control and maintain regulatory confidence as the use of GenAI deepens.
The business problem
Unseen risks often arise not from models themselves, but from how they are used and governed.
GenAI tools are increasingly relied upon for tasks that once required human judgment: drafting credit reviews, summarising policies, automating compliance documentation, and generating analytics code. While these innovations improve efficiency, they also create exposure in areas existing MRM frameworks may not capture.
Training data behind public LLMs is unverified and may contain inaccuracies or bias. Models behave non-deterministically, meaning identical prompts can yield different outputs, undermining reproducibility. Their architecture is externally managed, limiting the bank’s control over changes affecting model behaviour. Staff may also use public GenAI tools without safeguards, inadvertently inputting confidential or customer data into systems outside institutional control.
Without proper oversight, these vulnerabilities could breach data protection regulations, introduce biased outputs, or erode confidence in automated decisions. The challenge for risk leaders is therefore strategic as well as technical: enabling innovation while protecting decision integrity, regulatory compliance, and customer trust.
Key considerations and challenges
The defining risks of GenAI lie at the edges, where accountability, explainability, and ownership blur.
The first challenge is conceptual: determining what qualifies as a “model” in a GenAI context. Conventional definitions, quantitative tools that generate outputs for decision making, fail to capture the hybrid nature of GenAI systems. A generative chatbot summarising loan terms might appear operational, yet its influence on customer understanding or decision support means it should fall under model governance. Establishing clear classification criteria is therefore essential. Moreover, it is not only the complexity but also the sheer number of models that has increased significantly, which places higher demands on the operational validation function, requiring additional expert resources and support to ensure all models are effectively assessed and governed.
Another important challenge is use case verification and approval. LLMs can support hundreds of potential applications across banking functions, but not all should be operationalised. Without a structured process to evaluate, approve, and prioritise each use case, banks risk ungoverned deployment of high impact models, creating operational, regulatory, and reputational vulnerabilities. The rise of “shadow GenAI”, where employees use public LLMs informally to assist with drafting or analysis outside approved channels, adds a practical dimension to this risk, highlighting the urgency of formalised use case governance. Ensuring that only validated applications are deployed is critical to maintaining oversight and regulatory compliance.
Explainability is equally complex. GenAI models process billions of parameters and produce probabilistic results. Even advanced interpretability tools provide limited transparency into why a model generates a given output. This opacity challenges both validation and audit functions. Looking ahead, the emergence of agentic AI, where systems can initiate and coordinate actions with minimal human input, is likely to make control, oversight, and explainability even more complex, reinforcing the importance of governance frameworks that can manage increasingly autonomous model behaviour.
Data privacy is another pressing concern. Using public GenAI models can expose sensitive internal information through prompts, potentially breaching confidentiality, or local data residency requirements. Bias and fairness compound the risk: because GenAI models are trained on internet scale data, they can inadvertently reflect cultural or linguistic biases that compromise neutrality and customer fairness.
Finally, model drift in LLMs is continuous. Vendors update weights and training data frequently, which can alter outputs without notification. Banks need to establish mechanisms for ongoing monitoring and vendor assurance to detect and respond to such changes quickly.
A proposed framework for evolved MRM
Governance must shift from periodic validation to continuous, ecosystem-wide oversight.
An evolved MRM approach should integrate GenAI into the existing governance ecosystem through a layered framework.
The first layer focuses on identification and classification. Banks should expand their model taxonomy to explicitly recognise GenAI systems, whether internally developed, accessed through APIs, or embedded in vendor platforms. As part of this layer, banks should implement a formal use case verification and approval process. Each potential GenAI application should be evaluated for business value, risk exposure, and regulatory implications before being approved for deployment. Only validated use cases should enter the enterprise-wide model inventory, detailing purpose, data dependencies, ownership, and vendor linkages. Such visibility forms the foundation of effective oversight.
The second layer concerns governance and accountability. Clear ownership structures are required across business, risk, and technology teams. Each GenAI application should undergo a governance review to evaluate its intended use, data sensitivity, and decision criticality. The depth of oversight should be proportional to the potential customer or financial impact, with periodic reviews to ensure alignment with ethical and regulatory expectations. Effective governance also depends on human capability and skill readiness; risk, validation, and operational teams need the appropriate expertise to evaluate and interpret GenAI behaviour, ensuring governance mechanisms are applied meaningfully.
The third layer involves validation and explainability. While traditional back testing is unsuitable, new assurance techniques can test model stability and integrity. These include prompt sensitivity analysis, scenario-based output assessment, and systematic bias testing. Recording prompt response logs provides a traceable audit trail, enabling validators to reconstruct reasoning and assess the consistency of outputs over time. In addition, several statistical and quantitative techniques can further enhance assurance:
- Response consistency scoring – measures how similar model outputs are when the same prompt is run multiple times, providing a quantitative view of reproducibility and output stability.
- Semantic stability testing – assesses whether the meaning of responses remains consistent when prompts are paraphrased or reworded, helping detect semantic drift or hallucination.
- Controlled perturbation analysis – evaluates how small, intentional variations in wording or context affect model behaviour, revealing sensitivity and robustness to prompt changes.
- Embedding drift analysis – tracks how the semantic representation of model outputs changes across time or model versions, identifying hidden drift or degradation following vendor updates.
- Statistical parity testing – compares output distributions across demographic or contextual groups to detect unequal treatment or bias in GenAI-generated outcomes.
A fourth layer should address data management and privacy. Banks must set clear policies on acceptable data use in GenAI environments, ensuring no sensitive information is entered into public models. Where possible, internal deployments should rely on secure, sandboxed environments, with anonymisation or tokenisation applied to training or prompt data. Privacy impact assessments should be mandatory before any GenAI solution goes live.
Finally, continuous monitoring and vendor oversight complete the framework. Banks should track changes in model versions, monitor for drift or degradation in output quality, and require vendors to provide transparency on updates and data governance practices. Central dashboards that flag anomalies, expired validations, or unexpected responses can support ongoing assurance and regulatory readiness.
The path forward
In the GenAI era, governance is not a constraint, it is the foundation of responsible innovation.
Banks today stand at a pivotal point. GenAI promises unprecedented gains in efficiency and insight, yet it simultaneously tests the limits of current governance. Those institutions that treat model risk management as an evolving discipline, not a static compliance exercise, will be best placed to integrate GenAI safely and effectively.
An evolved MRM framework enables banks to manage emerging risks while preserving the freedom to innovate. It builds confidence among regulators, customers, and internal stakeholders that GenAI is being deployed responsibly and transparently. Above all, it ensures that technological progress aligns with the region’s broader objectives of ethical innovation, data stewardship, and financial stability.
In this new landscape, model assurance becomes continuous, and governance becomes an accelerator, not an obstacle, of intelligent transformation.
Get in touch
Send us an email if you would like to learn more about how 4most can support your organisation’s model risk management strategy – info@4-most.co.uk.
Interested in learning more?
Contact usInsights
EBA publish final ‘Guidelines on Environmental Scenario Analysis’: What European banks need to know about the future of managing ESG risks
19 Dec 25 | Banking
Solvency II Level 2 Review finalised: What insurers should focus on before 2027
17 Dec 25 | Insurance
Effectively managing climate-related risks: What banks and insurers need to do under PRA’s updated supervisory statement (SS5/25)
12 Dec 25 | Banking