Bridging the Evidence Gap: Adapting NICE Standards for AI-Driven Digital Health Innovations

Bridging the Evidence Gap: Adapting NICE Standards for AI-Driven Digital Health Innovations

Bridging the Evidence Gap: Adapting NICE Standards for AI-Driven Digital Health Innovations

London – The National Institute for Health and Care Excellence (NICE) has long been the gold standard for evaluating medical technologies in the UK. Yet a growing chorus of researchers, clinicians, and industry leaders warns that its evidence standards framework (ESF) is struggling to keep pace with the rapid evolution of digital health technologies (DHTs), particularly AI diagnostics and wearable devices. A 2025 viewpoint paper published in JMIR mHealth and uHealth by Bahadori and colleagues from Imperial College London lays bare the systemic mismatch: while DHTs can update in near-real time, learn from real-world data, and continuously improve after deployment, the NICE ESF remains a static, trial-centric assessment tool.

[IMAGE: A timeline showing the rapid release of new DHT versions (e.g., software updates every 3–6 months) against the slow pace of NICE evaluations (often 18–24 months per assessment).]

The result is a paradox. On one hand, the UK government has championed digital health innovation as a cornerstone of the NHS’s future. On the other, the very framework meant to ensure safety and efficacy is creating bottlenecks that delay market access, inflate development costs, and discourage investment in next-generation medical devices. This article dissects the ESF’s shortcomings, uses AliveCor’s KardiaMobile as a case study, and explores a proposed dynamic, co-designed adaptive framework that could bridge the evidence gap.


Why the Current NICE ESF Falls Short

The NICE ESF was originally designed for static medical devices—hardware or software that remain unchanged once approved. Under that model, evidence from a single randomized controlled trial (RCT) can satisfy the requirement for clinical and cost-effectiveness. But AI-driven DHTs operate on a fundamentally different logic.

Lack of Adaptability for Machine Learning Models

Machine learning algorithms, especially those used in AI diagnostics, often update in near-real time as new training data becomes available. For example, an AI that detects atrial fibrillation from ECG signals might be retrained monthly to improve accuracy. Under the current ESF, each iteration could theoretically require a fresh evidence submission, creating an unsustainable cycle of reassessment. “The framework treats each software version as a new device, ignoring the reality that continuous learning is the core value proposition,” the Imperial College authors note.

Undervaluation of Real-World Evidence

The NICE ESF heavily prioritizes evidence from traditional RCTs, which are expensive, slow, and often poorly suited to digital health. Wearable devices and remote monitoring tools generate continuous streams of real-world evidence (RWE)—data from everyday use in diverse patient populations. Yet this evidence is frequently dismissed as “low quality” because it lacks randomization and control. Meanwhile, RCTs for DHTs may be infeasible due to rapid obsolescence: by the time a trial concludes, the algorithm being tested may already be outdated.

No Support for Continuous Learning Systems

Perhaps the most critical gap is the framework’s inability to accommodate continuous learning systems—algorithms that improve after deployment by analyzing outcomes from real-world use. These systems are the backbone of next-generation AI diagnostics. Without a regulatory path that allows for iterative evidence accumulation, innovators face a stark choice: either freeze the algorithm at the point of approval, limiting its potential, or forgo the UK market altogether.

Misalignment from Lack of Stakeholder Co-Design

The ESF was developed primarily by regulators and health economists, with limited input from DHT developers, clinicians, and patients. This top-down approach leads to evidence requirements that do not reflect how digital health products are actually used or updated. “Producers end up running studies that satisfy NICE’s checklist but generate little actionable insight for real-world deployment,” the viewpoint paper argues. The result is delayed market access and a misallocation of scarce research resources.

[IMAGE: A diagram contrasting the old evidence pyramid (with RCTs at the top) against a new cycle of continuous RWE collection and model updates, showing a circular, iterative flow.]


Case Study: AliveCor’s KardiaMobile Under NICE Scrutiny

The challenges identified in the ESF are not theoretical. A vivid illustration comes from AliveCor’s KardiaMobile, a portable, AI-powered ECG device that patients can use at home to detect atrial fibrillation. The device has been cleared by the U.S. Food and Drug Administration and is widely used in several countries. Yet its path to NHS adoption has been fraught with friction.

Traditional Trial Methodologies Hit a Wall

KardiaMobile’s core innovation lies in its machine learning algorithm, which continuously improves as it processes more ECG recordings. To align with NICE’s evidence expectations, AliveCor had to commission lengthy and costly clinical trials that measured the device against a static gold standard—12‑lead ECG interpretation by cardiologists. The trials took years to complete, during which the algorithm itself had already undergone multiple refinements. “We were essentially trying to prove the efficacy of a moving target,” a former AliveCor executive told industry media.

Economic Evidence Hurdles

NICE also demands robust health economic data to assess cost-effectiveness. For a device like KardiaMobile, this means demonstrating reduced stroke rates, fewer hospital admissions, or improved quality of life—outcomes that take years to observe and require large sample sizes. The economic modeling required by NICE’s reference case is built on assumptions that do not easily accommodate the dynamic cost structure of software‑based interventions. AliveCor spent millions on studies that, in retrospect, added little to the evidence base that clinicians already had from real-world registries.

Real-World Performance Data Ignored

Throughout the process, AliveCor had amassed terabytes of real-world data from hundreds of thousands of users. These data showed that the device accurately detected atrial fibrillation in home settings, with high user satisfaction. Yet under the ESF, such RWE could not substitute for a traditional RCT. The rigid categorization meant that real-world performance data was relegated to secondary analyses, delaying the device’s adoption in the NHS by several years.

Systemic Friction Between Innovation Speed and Evidence Rigidity

The KardiaMobile case is emblematic of a broader pattern. Many AI‑driven wearables and diagnostics face similar hurdles, leading to what experts call the “evidence valley of death”—a gap between promising technology and regulatory acceptance. The viewpoint paper from Imperial College identifies this friction as a systemic failure, not a problem with any single product. “The current system rewards slow, static devices and punishes adaptive, learning systems,” the authors argue.

[IMAGE: A photo or illustration of the KardiaMobile device paired with a graph showing time-to-market delays (e.g., 3 years for FDA clearance vs. 6 years for NICE recommendation).]


A Path Forward: The Dynamic Adaptive Model

Recognizing the inadequacy of the status quo, Bahadori et al. propose a radical rethinking: replacing the static ESF with a living, adaptive model that is co-designed with industry, clinicians, regulators, and patients. Their proposal, detailed in the same viewpoint paper, outlines three core principles.

Embed Real-World Evidence Strategies

First, the new framework must formally accept and prioritize RWE. Pragmatic trials, registry‑based studies, and analyses of de‑identified data from wearables should be given equal weight alongside traditional RCTs. The authors suggest that NICE establish clear guidelines for what constitutes acceptable RWE, including standards for data quality, representativeness, and bias control. This would allow devices like KardiaMobile to submit ongoing performance data rather than a single, static study.

Allow Rolling Evidence Submissions

Second, the framework should support rolling evidence submissions. Instead of requiring a single comprehensive evidence dossier at the point of market access, companies could submit evidence incrementally, with NICE providing iterative feedback. An AI diagnostic might start with a small pilot study to gain conditional approval, then submit data from post‑market surveillance every six months to confirm ongoing safety and effectiveness. This approach mirrors the concept of “living systematic reviews” already used in evidence‑based medicine.

Support Post-Market Surveillance as a Learning Loop

Third, the adaptive model must treat post-market surveillance not as a burden but as a learning loop. Continuous learning systems would be required to report real-world outcomes, algorithm updates, and any performance drift. In return, NICE would offer a streamlined re‑evaluation process—perhaps an automated dashboard that flags when a product deviates from its expected performance. This would turn regulatory oversight into a dynamic partnership rather than a periodic checkpoint.

Co-Design with Stakeholders

Underpinning all these changes is the principle of co‑design. The Imperial College team argues that a steering group of DHT developers, NHS clinicians, patient representatives, health economists, and regulators should jointly define evidence thresholds. This would ensure that requirements are both rigorous and feasible, and that they align with the real-world context of digital health innovation.

[IMAGE: An infographic showing a cyclical process: “Conditional Approval” → “Real‑World Data Collection” → “Algorithm Update” → “NICE Dashboard Review” → “Continued Approval.”]


Implications for Digital Health Innovation and Policy

Adopting an adaptive framework would have profound implications. For innovators, it could reduce the cost and time of bringing AI diagnostics and wearables to market, making the UK a more attractive destination for digital health investment. For the NHS, it would mean earlier access to technologies that could improve patient outcomes and reduce healthcare costs. For patients, it would enable faster adoption of tools that empower self‑management and remote monitoring.

Yet the transition will not be easy. NICE must overcome institutional inertia, and regulators will need new expertise in data science, machine learning, and real‑world evidence. There are also risks: a too‑lenient framework could allow ineffective or unsafe devices to enter the market. But the viewpoint paper argues that a well‑designed adaptive model—with clear guardrails and transparent reporting—can manage these risks more effectively than the current static system.

Regulatory Policy and Economic Viability

The proposed shift also has implications for regulatory policy. The Medicines and Healthcare products Regulatory Agency (MHRA) is already exploring a “software as a medical device” pathway, but coordination with NICE’s evidence requirements remains fragmented. An aligned, adaptive framework across both agencies could create a seamless pathway from CE marking to NHS adoption.

Economically, the current ESF creates a perverse incentive: companies spend millions on trials that prove a snapshot of efficacy, even though the real value of a digital health innovation lies in its ability to improve over time. By embracing continuous evidence collection, the adaptive model could make health technology assessments more cost‑effective for both the NHS and industry.


Conclusion: Time to Build the Bridge

The 2025 viewpoint from Imperial College makes a compelling case that the NICE ESF is no longer fit for purpose in the era of AI diagnostics, wearable technology, and continuous learning systems. The static, trial‑centric framework creates an evidence gap that stifles digital health innovation and delays patient access to potentially life‑saving tools.

AliveCor’s KardiaMobile experience illustrates the real‑world consequences of this mismatch. But it also points toward a solution: a dynamic, co‑designed adaptive framework that values real‑world evidence, supports rolling submissions, and treats post‑market surveillance as part of a learning cycle.

The bridge between innovation speed and evidence rigor is not impossible to build. It requires regulators, developers, clinicians, and patients to sit at the same table—and to agree that the evidence standards of yesterday should not hold back the health technologies of tomorrow.

[IMAGE: Abstract infographic showing a rigid, old‑fashioned checklist on one side (symbolizing NICE ESF) clashing with a flowing digital stream of data nodes, wearables, and AI algorithms on the other side. In the center, a bridge or transformation arrow labeled “Adaptive Co‑Design.” No text, no watermark. Professional, medical‑tech aesthetic with cool blues and greens.]