The Baseline Problem: When GFMs Become the Auditor, Not the Accused (When Rules Change What a Foundation Model Is For)

Apr 8

The previous post made the case that tempting as it might be GFMs are poorly suited to establishing the kind of long-run environmental baselines that climate policy (and by implication any regulatory environment that is baseline dependent) requires and that deploying them as baseline-setters creates serious governance risks. That argument stands. But there is a second question worth asking: are there specific, rules-bounded contexts in which GFMs could actually serve (climate) accountability rather than undermine it?

The answer, carefully qualified, is I think yes if a recently adopted piece of EU implementing regulation does, whether intentionally or more likely not, point exactly where that boundary could be.

The instrument is being asked to testify but cannot be cross-examined; courtesy ChatGPT>Dall-E

‍ ‍

A Different Question Entirely

‍ ‍The critique of GFMs as baseline-setters rests on their temporal shallowness: training data anchored to a recent, already-degraded landscape normalises that degradation as the new normal, amnesties historical damage, and embeds a convenient starting point with scientific authority. All of that remains true until and unless deeper temporal GFMs are rolled out. The absence of provenance metadata identifying sources and dates brings it’s own challenges which may yet add to this series.

‍ ‍But there is a distinction between a GFM that sets the baseline and a GFM that operates against a baseline that has already been legally fixed. In the second role, the model’s temporal limitations become largely irrelevant. What matters instead is its spatial consistency, its coverage, and its ability to detect variance from a known reference state. Those are things GFMs do well.

‍ ‍The question shifts from “what was the state of this landscape twenty years ago?” to “does this landscape today look the way a member state’s flexibility claim says it should?” That is a different and arguably more tractable problem.

‍ ‍

What the Regulation Creates

‍ ‍Recent EU Implementing Regulation 2025/2043 (https://eur-lex.europa.eu/eli/reg_impl/2025/2043/oj/eng) sets out the evidentiary framework under which member states can claim compensation for underperformance against their LULUCF targets during the 2026–2030 period. Two grounds are available: long-term climate change impacts (demonstrated through aridity index shifts over at least 20 consecutive years) and legacy effects of past management practices on organic soils in member states with high proportions of such soils.

‍ ‍The critical requirement is in Article 4. To claim compensation, a member state must demonstrate excess emissions or diminishing removals by comparison, specifically by comparing the affected area against an unaffected area of the same core characteristics: similar size, land use, climate, terrain, and soil type. Alternatively, the same area can be compared against itself in a historical period (since 1990) when it would not have qualified as affected.

‍ ‍That comparison requirement is, on the face of it, an invitation to construct convenient reference areas and convenient historical windows, precisely the REDD+ dynamic described in the previous post. The more technical the evidentiary framework, the more vectors it creates for sophisticated actors to game it.

‍ ‍But it could also be seen as an invitation for independent verification.

‍ ‍

GFMs as Audit Layer, Not Baseline Setter

‍ ‍If the legally anchored baseline is the 2016-18 member state inventory data as per the 2023 LULUCF amendment (https://eur-lex.europa.eu/eli/reg/2023/839/oj/eng) then a rules-based GFM trained on that inventory data encodes the legally mandated reference state in spatial form. It does not set the baseline. The baseline is already set in law. The GFM simply operationalises it consistently across all member states, at scale, in a form that can be applied to incoming compliance claims.

‍ ‍Applied to the Article 4 comparison requirement, such a model could interrogate whether the “similar unaffected area” a member state selects genuinely resembles the affected area in the relevant characteristics, or whether it has been chosen to maximise the apparent emissions differential. It could flag cases where the historical comparison window has been selected to exclude a high-removal period that would reduce the claimed deficit. It could provide a Commission-level consistent view of which areas genuinely satisfy the aridity shift criteria in Article 2, independent of member state designation.

‍ ‍None of this requires the GFM to answer questions about the 1990s. It requires it to characterise spatial patterns consistently in the present and recent past, and to identify anomalies against a fixed reference. That seems within its capability envelope.

‍ ‍

The Governance Argument

‍ ‍The monitoring gap under current LULUCF architecture is structural. Centralised information is only updated every three years, compliance assessment is necessarily retrospective, and the primary evidence base remains member state inventory submissions subject only to technical Commission review. Conventional audit it isn’t.

‍ ‍A consistently-applied GFM layer would not close that gap entirely, but it would change the dynamic. Member states submitting flexibility claims under 2025/2043 would know that an independent spatial characterisation of their designated affected areas, and of their chosen comparison areas, is available to the Commission and potentially to the public as well. With the comparison methodology not yours alone the incentive to game the claim recedes whether or not this approach is adopted.

‍ ‍This is not a novel principle. It mirrors the logic of Copernicus data being used to cross-check national inventory submissions, an independent signal that is not controlled by the reporting party and that creates at least the possibility of detected discrepancy. What a rules-based GFM adds is the ability to perform that cross-check at fine spatial resolution, consistently across all member states, against the specific claim being made rather than just at the level of aggregate totals.

‍ ‍

The Conditions That Must Hold

‍ ‍For this to work, several conditions must be satisfied, and they are not trivially met.

‍ ‍The training data must genuinely reflect the legally mandated reference period. A GFM trained on convenient post-2020 data encodes a different reference state than one anchored to 2016-18 inventories. The choice of training corpus is itself a governance decision, not a technical one, and needs to be made transparently and with appropriate oversight.

The model must be rules-based and explainable in the relevant dimensions. The black box problem that makes GFM-derived baselines hard to challenge legally cuts both ways: a GFM used as an audit layer needs to be able to explain why it characterises a comparison area as dissimilar, or why an aridity designation is not supported by the spatial evidence. Opacity is a liability whether you are defending a claim or challenging one.

Governance of the model must sit with an independent body, not with member states or the private sector alone. The Commission or the EEA might be homes for a tool of this kind. If a member state can influence the model used to audit its own claims, the independence value is lost immediately.

And critically, the model must be understood as one input among several, not as the determinative answer. Article 4 requires verifiable evidence; a GFM output is a characterisation, not proof. Its proper role is to surface anomalies for human expert review, not to replace the review process.

‍ ‍

What This Changes, and What It Doesn’t

The argument here is deliberately narrow. GFMs remain unsuitable as baseline-setters for climate policy. The temporal critique stands. The amnesty risk is real and the polluter pays inversion is not solved by finding a good use case elsewhere.

What this analysis suggests is that there exists a specific, bounded role where the properties of GFMs align with a genuine policy need such as with independent spatial auditing of member state flexibility claims against a legally-fixed reference. The need is real: the flexibility mechanisms under the 2023 LULUCF amendment create documented manipulation vectors, and the evidentiary framework of 2025/2043, while well-intentioned, follows the REDD+ pattern of creating technical complexity that is easier to exploit than to challenge.

The question for regulators, lawyers, and the geospatial industry is whether the governance conditions for this narrower role can be established. That means resolving who builds and controls the model (assuming the public ones don’t suffice or are prone to hard to trace and irreversible changes), how its outputs can be contested, and what legal weight, if any, they can carry in Commission review processes and, ultimately in theory, before the ECJ.

Those are not primarily technical questions. They are institutional ones. And answering them well requires exactly the kind of cross-disciplinary thinking between Earth observation, climate law, and governance design that is not yet fully developed.

‍ ‍

The sophistication of the tool should not be mistaken for the validity of the question it’s being asked to answer. But where the baseline is fixed in law, the question is spatially explicit, and the model sits beyond the reach of the reporting party, GFMs offer something the current compliance architecture largely lacks: an independent spatial witness. Building the governance conditions for that role is harder than building the model. It is also more important.

‍ ‍

embeddingsearth observationLULUCFGFMs

James Cutler