The Silent Crisis of Lost Science: How FAIR² Data Management Could Unlock 90% of Frontier Research

Published: February 18, 2025

Introduction: The 90% Problem That No One Wants to Talk About

On February 18, 2025, Frontiers launched what it describes as the "world's first all-in-one, AI-powered" data management service, branded FAIR² Data Management. The accompanying claim—that "90% of Science Is Lost" (Source 1: Frontiers product literature)—is a startling assertion that demands rigorous examination. If accurate, this statistic represents not merely an academic inconvenience but a structural failure in the global research enterprise.

The proposition underlying FAIR² is deceptively simple: scientific data is being generated at unprecedented rates, yet the vast majority becomes inaccessible within a few years—trapped in proprietary formats, stored on obsolete hardware, or stripped of the contextual metadata necessary for interpretation. Frontiers positions FAIR² as the solution to this crisis.

However, this launch warrants analysis beyond the press release. The core economic logic driving FAIR² is not about data storage capacity—a problem largely solved by declining cloud costs—but about search cost and compute readiness. The bottleneck in frontier science has migrated from data generation to data interoperability. The question is whether FAIR² represents a genuine infrastructure innovation or merely a commercial entrant into an already crowded market.

The Hidden Economics: Why "Lost Data" is a Liability on the Balance Sheet of Science

The economic implications of non-FAIR data are quantifiable and substantial. When 90% of scientific data becomes effectively lost, the global research and development economy absorbs a recurring cost structure that can be modeled through three primary mechanisms.

First, duplicate experimentation. Without interoperable data, researchers cannot efficiently determine whether a proposed experiment has already been conducted. This forces redundant resource allocation. A 2023 analysis of biomedical research estimated that irreproducible results alone cost the United States $28 billion annually in preclinical research expenditure (Source 2: PLOS Biology, reproducibility cost analysis). Extrapolated to global R&D expenditure, which exceeded $2.4 trillion in 2023 (Source 3: UNESCO Science Report), the wastage rate becomes a macroeconomic concern.

Second, irreproducible results. Data stored without sufficient metadata—detailing collection methodologies, instrument calibration, environmental conditions, and processing pipelines—cannot be validated. This undermines the foundational principle of scientific reproducibility and erodes trust in published findings across disciplines.

Third, wasted longitudinal data. For complex systems research, particularly in climate science and biodiversity, historical baselines cannot be recreated. Dr. Johan Rockstrom of the Potsdam Institute for Climate Impact Research has documented how missing or inaccessible climate data from the 1980s and 1990s creates gaps in planetary boundary models that cannot be retroactively filled (Source 4: Rockstrom, J., Nature commentary series). Similarly, Dr. Carlos Nobre of the University of São Paolo has noted that Amazonian biodiversity data collected before 2000—often stored on physical media or in legacy formats—represents an irreplaceable baseline irretrievably eroding as forest ecosystems transform.

The "Data Wastage Rate"—the proportion of research investment yielding no reusable data assets—emerges as a critical metric that FAIR² explicitly targets. The launch signals that the market has reached a tipping point where the cost of data management infrastructure is finally lower than the accumulated cost of data loss. This economic calculus explains why a commercial entity like Frontiers would invest in this capability: the total addressable market for research data management is estimated at $8.5 billion by 2027 (Source 5: MarketsandMarkets, Research Data Management Market report).

FAIR²: More Than a Tool? A New Layer of Research Infrastructure

Frontiers claims FAIR² is the "world's first all-in-one, AI-powered" data management service. This assertion requires careful scrutiny against existing solutions from established competitors including Figshare, Dryad, Zenodo, and institutional repositories.

The distinguishing characteristic appears to be the integration of artificial intelligence for automated metadata extraction and annotation. Traditional data management requires manual curation—researchers must describe their datasets, specify variables, and document methodologies. This labor-intensive process creates a barrier to adoption. FAIR²'s AI component analyzes raw data files, extracts structural and semantic features, and generates FAIR-compliant metadata automatically.

The FAIR principles—Findable, Accessible, Interoperable, Reusable—were established in 2016 by a consortium of researchers and publishers (Source 6: Wilkinson, M., Scientific Data). FAIR² extends this framework by adding an actionable layer: data organized for direct ingestion by AI models. This represents a significant architectural shift. Traditional data repositories function as digital libraries—users locate data, download it, and process it locally. FAIR² positions data as machine-readable assets that AI models can query, correlate, and analyze without human intermediation.

This is analogous to the evolution of software development infrastructure. In the 1990s, companies maintained proprietary data formats and custom integration code. The standardization of Application Programming Interfaces (APIs) transformed software economics by enabling modular, composable systems. FAIR² proposes a similar transformation for scientific data: standardized, AI-readable formats that allow models trained on one dataset to seamlessly incorporate others.

Dr. Vanessa Boanada Fuchs (Leading House Latin America) and Dr. Kamila Markram (Frontiers) have both emphasized that this interoperability is critical for frontier science, where breakthroughs increasingly occur at disciplinary intersections—climate change impacts on infectious disease patterns, biodiversity loss effects on agricultural productivity, or AI-driven materials discovery for energy storage.

Market Infrastructure or Vendor Lock-in?

The critical question facing the research community is whether FAIR² will function as open infrastructure or as a proprietary platform that creates vendor dependency.

The data management market exhibits strong network effects: the value of a platform increases with the number of datasets it contains and the number of researchers who can access them. If FAIR² becomes the dominant platform, it could establish de facto standards for metadata schemas, data formats, and API protocols. This would concentrate market power in a single commercial entity—a concerning prospect for a research enterprise that depends on open access and community governance.

Frontiers' track record provides some basis for skepticism. The publisher has faced criticism for its article processing charge model and for the proliferation of special issues that some researchers characterize as editorial quality lapses (Source 7: Science investigation into Frontiers' peer review practices). However, FAIR² operates under different economic logic: Frontiers generates revenue from data management services rather than from publication fees for datasets, potentially aligning its incentives with data quality and accessibility.

The counterargument is that no single institution—university, government agency, or philanthropic foundation—has demonstrated the willingness to invest the substantial capital required to build AI-powered data infrastructure at scale. Frontiers, with its existing infrastructure and user base of hundreds of thousands of researchers, may be uniquely positioned to make this investment. The strategic question is whether the resulting platform will include data portability guarantees—the ability for researchers to migrate their datasets to competing platforms without data loss or format incompatibility.

Sector Impacts: Climate and Biodiversity as Stress Tests

Climate science and biodiversity research represent extreme test cases for FAIR²'s capabilities, for three reasons.

First, these fields generate heterogeneous data types: satellite imagery, genomic sequences, field observations, sensor network streams, and model outputs. Interoperability across these formats is technically demanding.

Second, temporal coverage is critical. Climate models require historical data extending back decades or centuries. Biodiversity baselines, once lost to habitat destruction or species extinction, cannot be regenerated. The data wastage problem in these fields is not merely economic—it represents irrecoverable scientific knowledge.

Third, these fields are intrinsically interdisciplinary. Understanding climate impacts on Amazonian ecosystems requires integrating meteorological data, hydrological measurements, soil composition analyses, and species distribution records. Dr. Carlos Nobre's work on Amazon dieback scenarios depends on data fusion across these domains—precisely the kind of integration that FAIR²'s AI-powered metadata extraction and correlation capabilities are designed to enable.

If FAIR² can demonstrate measurable improvements in data reuse rates and cross-disciplinary discovery in climate and biodiversity research, it will provide credible evidence for broader adoption across other scientific domains.

Forward Projections: Three Scenarios for the Data Management Market

The launch of FAIR² occurs at an inflection point for the scientific data economy. Three scenarios describe plausible trajectories.

Scenario 1: Platform Consolidation. FAIR² gains sufficient traction to establish dominant market share, particularly among early-career researchers and institutions that lack robust local data management infrastructure. The platform's AI capabilities improve with scale, creating a virtuous cycle of increasing accuracy and adoption. Competitors either exit the market or integrate with FAIR²'s protocols. The risk is that Frontiers exercises market power through pricing changes or restrictive data licensing terms.

Scenario 2: Fragmented Ecosystem. The research community, wary of vendor lock-in, supports multiple interoperable platforms. Open-source alternatives emerge that replicate FAIR²'s AI metadata extraction capabilities. Funding agencies mandate data sharing standards that ensure portability across platforms. The fragmentation reduces the network effects that drive discovery, but preserves institutional autonomy.

Scenario 3: Hybrid Infrastructure. FAIR² operates as a commercial layer atop community-governed open repositories. Its value proposition is the AI-powered metadata extraction and correlation engine, while data storage remains distributed across institutional and national repositories that maintain ownership and access control. This hybrid model combines the efficiency of proprietary software with the governance of open infrastructure.

The most likely outcome in the near term (2025-2028) is Scenario 3. Funding agencies in the European Union and United States have invested substantially in open research data infrastructure and are unlikely to cede governance to a single commercial entity. However, FAIR² may capture significant market share among researchers who prioritize ease of use and AI capabilities over governance concerns.

Conclusion: The Unseen Infrastructure of Discovery

The claim that 90% of scientific data is lost remains unverified, but the directional trend is clear: the volume of data generated exceeds the capacity of existing infrastructure to preserve and make it usable. The economic inefficiency—measured in duplicated experiments, irreproducible results, and foregone discoveries—is substantial.

FAIR² represents a commercial response to this market failure. Its AI-powered approach to metadata extraction and its integration of data management across the research lifecycle address genuine technical bottlenecks. Whether it solves those problems without creating new ones—in the form of platform dependency or market concentration—will determine its long-term contribution to frontier science.

For researchers in climate science and biodiversity, where every lost historical data point may represent an irretrievable piece of Earth system knowledge, the stakes could not be higher. The silent crisis of lost science may finally receive the infrastructure investment it demands. The question remaining is who will control that infrastructure—and on what terms.