The Damages Expert’s New Blind Spot?

23 June 2026

Karthik Balisagar explores the future of valuation and damages experts in an AI-enabled world, highlighting why professional judgement, independence and accountability will remain essential to credible expert evidence.

The old worry about an expert was a lean towards the party paying for the work. The new worry, less familiar, is that two experts on opposite sides may come to lean towards the same machine. An expert who is independent of the instructing party but not of the tools is a curious creature, and possibly a new one. The duty of independence, in its English formulation in The Ikarian Reefer,1 was built to test the older form of bias, and the whole adversarial arrangement around it, opposing reports, opposing counsel, the court or tribunal between, is built to expose it. The newer concern is harder to see, and not addressed by any of that apparatus. An expert may remain scrupulously independent of the client and yet, without knowing it, share models, training data and default answers with the expert on the other side.

The concern has three parts, each leading to the next. The first is bias, in a form the duty did not anticipate: anchoring on a confident first draft, bias embedded in training data, and a possible correlation across opposing experts drawing on tools shaped by overlapping material. The second is that of supervision, the profession’s existing safeguard, is designed to catch a different kind of failure and so may not catch this one. The third is that the disclosure of AI use, the remedy now being proposed, is addressed to a different failure too, and so may not cure it. On the Future of the Damages Expert mapped the wider terrain; this piece follows one path across it from end to end.

None of this is an argument against adoption. The productivity case is real, the tools are not going away, and the question is what intelligent use of them looks like, since unintelligent use is conspicuously well-resourced. The argument is only that these are matters worth thinking through before the consequences of failing to do so become entrenched.

An expert who is independent of the instructing party but not of the tools is a curious creature, and possibly a new one.

The known boundaries

Three findings frame the question, drawn principally from two studies. The first is that these tools can materially improve performance within their frontier. The second is that performance can deteriorate outside that frontier. The third is that leading models may produce a narrower range of answers than the human opinion they are asked to reflect. The first two come from a 2023 field experiment by researchers at Harvard, MIT, Warwick and Wharton, conducted with 758 consultants at a major consulting firm, the most-cited empirical work on the subject.2 Its productivity results, the first finding, are well known: consultants using one of the current generation of large language models worked about 25 per cent faster and, on tasks within the model’s competence, produced work of materially higher quality.

The second finding concerns the limits of the tool. On tasks lying beyond the boundary of what the model had been trained to do, the same consultants were 19 percentage points less likely to reach the correct answer than colleagues working without it. The researchers described the frontier of machine capability as “jagged” rather than smooth: the model performs reliably within a range of tasks and falls away sharply at the edges, giving no indication from the machine itself to its user that an edge has been crossed. An answer just beyond the boundary arrives in the same confident register as an answer well within it. The user who cannot independently see where the boundary lies has no reliable way to tell the two apart, and that is precisely when the tool is most tempting.

The third finding comes from a separate, 2025 study, which tested 21 state-of-the-art large language models against the preferences of a representative sample of 15,000 people across five countries.3 It found that the models, taken together, varied far less in their answers than the people did: the spread of human opinion was wide, the spread of model opinion was comparatively narrow, and clustered. The study concerned everyday questions rather than damages work, so the inference for our subject is necessarily analogical. It is not, however, fanciful. If the leading models converge on a narrow band of answers, two experts consulting them on the same question are liable to be drawn towards the same band, from whichever side they begin.

An exchange with the model

Consider a ten-year supply agreement in the oil and gas sector, terminated in breach. Damages turn on the price at which the contracted volumes would have been sold across the counterfactual period. A movement of a few dollars per barrel, applied to those volumes over a decade, can move the pleaded loss by several hundred million dollars. The long-run price assumption is the heart of the case.

I put to the model the question an expert might begin with: In assessing damages for breach of a long-term oil and gas supply agreement, what long-run price assumptions should be used for the counterfactual period 2025 to 2035? The answer was competent. It distinguished forward curves, useful only for the nearer years, from the fundamental forecasts published by recognised houses; named the institutional sources, the International Energy Agency, the US Energy Information Administration and OPEC; named the principal commercial forecasters; and flagged the energy transition, and a geopolitical risk premium, as complications. A serviceable framework, and one most experts would recognise.

A second prompt, asking for the criticisms of those forecasters, produced what the first had not: the International Energy Agency’s historically poor record on the pace of renewable deployment, and the continuing contest over its oil-demand scenarios, particularly around the pace and timing of any peak; the body of retrospective work on errors in the Energy Information Administration’s forecasts, especially of prices; and the broader caution that institutional and commercial forecasts are not independent draws from nowhere. They rely on overlapping data, macroeconomic assumptions, policy expectations and modelling conventions, even though their published outlooks have at times diverged sharply as institutional views on the energy transition have shifted. A third prompt, on forecasters whose past accuracy had beaten the consensus, returned noticeably less. A fourth, on real-options and certainty-equivalent methods for handling long-run price uncertainty, surfaced alternatives to a simple deterministic treatment of price in a discounted cash flow, more familiar in the valuation literature than in the headline institutional outlooks usually cited in disputes.

What the exchange shows, first, is not new. The model’s opening answer was not the limit of what it held; patient questioning drew out a great deal more. But each further question had to be known about before it could be put forward: that the institutional forecasters have records worth probing, that contrarian forecasters exist as a category, that methods exist beyond discounted cash flow. No competent expert stops at the first answer, and a careful one runs well past four questions. Every inquiry, though, stops somewhere, and where it stops is set by what the inquirer knows to ask. That has always been true. A researcher with a library and a month was bounded by their own knowledge in just the same way, and there would be little to say if it ended there.

It does not end there. Two things have changed. The first is that the old tools advertised the edge of one’s knowledge, while the model conceals it. A month among the textbooks tended to throw up a contradiction between sources, or a citation to a school of thought one had not heard of; the friction of the search was itself the signal that the subject was larger than one had assumed. The model returns a single fluent answer that carries no such friction: it does not say what it has passed over, and reads no differently whether it has passed over a little or a great deal. The boundary the library exposed through friction, the model hides behind composure. The second change is that the boundary may now fall in much the same place for both sides. Two researchers in a library stopped at different points, carrying different gaps, because they had read different things and trained under different hands; their blind spots were their own, and an opposing expert’s were different again. Gaps that were once idiosyncratic, and so tended to cancel out between opposing experts, may now be shared; and a common gap does not cancel. The limitation itself is old; its invisibility, and its possible sameness across opposing experts, are not. The first is why a model-assisted draft is harder to supervise than a thinly researched one ever was; the second is why two opposing experts may converge for a reason that has nothing to do with their judgement meeting: their tools began in the same place.

What is the client paying for?

The exchange has already illustrated two of these mechanisms, though without naming them. The quiet authority of the confident draft, pulling the expert towards it before a view has been formed, is anchoring. The bounded first answer, weighted towards the most heavily published sources, is training-data bias. A third bias the exchange could not illustrate, because it needs two experts rather than one, sits behind both. The three are worth separating, because they are not equally within the expert’s power to correct.

Anchoring, to take it first, works in two places, on the model and on the expert who reads it. On the model: a 2024 experimental study found that a model’s own output could be measurably pulled towards a figure or an opinion planted in the prompt, that it followed a planted view even when expressly told to disregard it, and that the obvious corrections, instructing it to reason step by step or to ignore the anchor, did not remove the pull. The framing of a question, once made, is not easily undone by a later instruction; the path is in large part set at the point of asking.4 On the expert: the confident draft exerts its own pull on the mind that reads it before forming a view. A 2026 randomised trial of physicians, all of whom had completed formal training in the use of AI, found that exposure to wrong recommendations from a model cut the accuracy of their diagnoses by about 14 percentage points, even though they were free to ignore it.5 The two compound one another: the answer the expert sees has already been shaped by how it was asked, and the expert is then disposed to accept it. The implication for damages work is real but bounded. The senior expert is less likely than the analyst to be anchored by an answer that is wrong on its face. The senior expert has no comparable protection against an answer that is incomplete in a way that cannot be seen, which is the harder problem.

Training-data bias, the second, is what the oil and gas exchange exposed. A model’s answers reflect the material it was trained on, and that material is not evenly distributed; it favours the institutions and jurisdictions that published the most. A 2022 study from Stanford tested what happens when different decision-makers draw on the same underlying data, and found that shared data reliably pushes their outcomes together: the more the source material is held in common, the more alike the results, and the more they share the same blind spots. The exact composition of the leading models’ training data is not, for the most part, publicly known. What is generally accepted, however, is that they have all drawn extensively on internet-scale public text and on the standard reference corpora of academic, governmental and journalistic material that this implies, so the underlying material is, on any reasonable view, substantially shared. To the extent that this is so, two experts relying on different models may nevertheless be drawing on a substantially common store, regardless of the surface differences between the particular tools.6 The model can say what it has drawn upon. It cannot readily say what it has left out, and the expert cannot readily find out, unless something of the expert’s own is given to the model to work from, a point to which I return.

Correlation across experts, the third, is the most consequential, because it is the one the individual expert cannot cure by diligence. The idea is drawn from a 2021 paper in the Proceedings of the National Academy of Sciences, which showed that when independent decision-makers come to rely on the same predictive tool, their errors become more correlated, and the system loses some of the benefit of independent evaluations.7Translated to expert evidence: where opposing experts draw on tools trained on overlapping material, their errors on the shared elements of a problem are liable to become more correlated, and less likely to offset one another. Where two experts reach their views independently, the distance between them may tell the court or tribunal something important: two careful professionals who still disagree show that the question is genuinely open, and the width of the gap is a rough measure of how much room for honest difference it holds. Where their tools have quietly pulled both towards the same answer, the gap narrows for reasons that have nothing to do with the question becoming clearer, and the tribunal may read as agreement what is in part an artefact of shared infrastructure rather than independent judgement converging on the truth.

Three mechanisms, three different remedies, and a descending order of comfort. Anchoring can be blunted by sequence: forming a provisional view before consulting the tool, rather than after. Training-data bias can be blunted by deliberately seeking inputs from outside the model’s best-trodden ground. Correlation across experts cannot be reliably blunted by the individual expert acting alone. An expert may do everything right and still be caught by it, because it is a feature of the system the experts have in common. If there is an answer, it lies less in disclosure alone than in the deliberate design of the workflow: forming a provisional view of one’s own before the model is consulted, drawing deliberately on sources outside the model’s well-trodden ground, and reviewing the questions put to the tool as closely as the answers it returns. That last is where supervision now has to do its work.

An expert may remain scrupulously independent of the client and yet, without knowing it, share models, training data and default answers with the expert on the other side.

Where supervision now lives

Supervision has always been the expert’s own duty, and it has become harder to discharge. In the older model of damages work, a thinly researched draft from a junior team announced itself on the page. Thin sections, missing citations, questions left hanging were all visible, and the senior expert’s eye, trained over years, went straight to them. The senior expert could tell the difference between a section that was thin because the analysis behind it was thin and one that was thin because an able analyst had written it up poorly. The model has obliged the profession by making thin work look uncannily like good work. A draft built on research that stopped at the consensus reads exactly like one built on research that canvassed the full range of opinion: both fluent, both properly sourced within their range, both internally coherent. The tell-tale signs that work is incomplete are themselves missing, which is a more difficult problem than incompetence, because incompetence at least looks like itself.

If that is right, the place where judgement has to be exercised has moved. The decisions that matter: which questions to ask, which methods to weigh, which bodies of opinion to consult, have shifted from the analyst doing the work to the prompts that framed it. The senior expert is no longer checking the analyst’s reasoning at each step, but checking, in substance, whether the questions put to the model were the questions a senior expert would have put, which requires reconstructing what was asked, not merely to read what was produced.

It may be said that supervision has always meant asking what the junior has missed, and that the medium has changed but the task has not. There is force in that. The difference is that the senior expert’s instinct for what a thin piece of work looks like was trained on a medium that no longer reliably produces the symptom. Whether a new instinct develops, and how fast, is something the profession will discover in practice rather than settle in advance. In the meantime, supervising model-assisted work means interrogating the prompts that produced the work, and not the conclusions alone, a thing few review processes are currently built to do. Part of what such review would look for is whether the question was put neutrally. The anchoring evidence suggests a prompt should be kept as free as possible of the answer it is meant to elicit, and that this discipline cannot be repaired after the fact, since a model will follow a planted view even when later told to set it aside. Neutrality of that kind is not the same as asking little; the expert must still know enough to canvass the full range, while taking care not to write a preferred conclusion into the question. This sits with the earlier discipline of forming a provisional view before consulting the tool; the two are one principle seen from both ends. The expert forms a view first so the tool does not anchor it, and keeps that view out of the prompt so as not to steer the tool into returning a preliminary answer of one’s own dressed as independent support.

That is, on its face, an additional layer of work, and it has a commercial edge. The traditional pyramid of analysts, directors and a testifying expert rested on the gap between plentiful junior labour and scarce senior judgement. If the junior labour compresses while the senior supervision expands, the pyramid starts to flatten.

There is a further commercial consequence, less comfortable to state. If the shared tool tends to pull every user toward a common baseline, the way to pull back from it is to feed the model something the other side does not have: a distinct body of methodology and accumulated case knowledge, built over years and held by the firm rather than the machine. The capacity to do that is not evenly held. It favours the practices, and the individual experts, whose decades of work constitute a library worth feeding in. The tool may commoditise the baseline while raising the premium on whatever sits above it, which is the proprietary knowledge that cannot be downloaded. One should be cautious before drawing the inference too firmly: an able newcomer who uses these tools well may outperform a complacent incumbent, and the differentiating knowledge is only differentiating to the extent that firms genuinely hold different things.

A model cannot stake a reputation, answer on oath, or bear personal accountability for the conclusions it produces.

What disclosure is calibrated to catch

If supervision is adjusting, disclosure has barely begun to. The two principal instruments to date are the Silicon Valley Arbitration and Mediation Center’s Guidelines on the Use of Artificial Intelligence in Arbitration of April 2024, and the Chartered Institute of Arbitrators’ Guideline on the Use of AI in Arbitration of March 2025.8 The first treats disclosure of AI use as a matter of judgement on the facts, aimed at protecting the integrity of the proceedings; the second ties it to whether the use might affect the evidence, the outcome, or the delegation of an expert’s duties. The Civil Justice Council’s consultation on AI in court documents, which closed on 14 April 2026 and expressly takes in expert reports, points the same way.9

Each of these is aimed, sensibly, at a particular failure: the careless or incompetent use of a tool, where the output is wrong or invented and the expert has not checked it. They answer the cases already in the law reports: the expert who could not say what had been asked of the model, the filing built on authorities that did not exist. They will do useful work, and they should be adopted.

A confident first draft introduces the risk of cognitive anchoring: one sees the model's answer before forming one's own, and finds it disproportionately difficult to set aside.

They are not, however, built to catch the case this piece has described. Picture two experts on opposite sides, each diligent, each using the tools competently, each disclosing that use in full, each verifying every source the model put in front of them. Both may still produce reports that sit within a narrower range of opinion than the underlying literature, on the same facts and instructions, would have yielded without the tools. Each can certify, accurately, that every disclosure obligation imposed on them has been met. Any such convergence is invisible to the court or tribunal, because nothing in the present rules asks the question that would reveal it: not whether the tool was used or disclosed, but whether reliance on it quietly narrowed the range of opinion the expert would otherwise have reached. Disclosure as presently conceived records that a tool was used; it does not record what the use may have closed off.

The International Chamber of Commerce has a Task Force on Artificial Intelligence in Dispute Resolution, and substantive guidance from it is awaited with interest. There is a case for the principal administering institutions, perhaps acting together, to produce a living instrument of their own, revised as the technology and the practice evolve. Such an instrument would not displace the existing soft law, which is valuable; it would take up the questions that soft law does not yet reach. The one raised here is among them.

A note in closing

The old question was whether the expert was independent enough of the party paying the fee. The new question is whether the expert is independent enough of the tools shared, through their common training, with the expert on the other side. The duty as it stands, in The Ikarian Reefer and its equivalents elsewhere, was written on the assumption that the influences on an expert are human and therefore traceable: counsel, documents, the views of colleagues, each of which can be put to the expert and tested. The new influences are neither obvious nor easily traced. Whether the duty as drafted reaches them is, I suspect, a question that will be asked within the next few years rather than the next few decades. Three things follow from this.

The first is the process itself. The system of party-appointed experts rests on a premise: that two independent specialists, given the same facts, will sometimes differ, and that the distance between them tells the tribunal something true about the difficulty of the question. The divergence is what the process exists to supply, not a flaw in it. A court or an arbitral tribunal reaches a reasoned result in part by weighing that divergence, and the soundness of that process is one of the quieter ways in which the rule of law is maintained. The risk this piece raises is that shared tools may quietly narrow that divergence, leaving the tribunal to weigh a difference that looks like independent judgement but is in part the echo of a common source. The risk is not yet demonstrated in expert evidence, in the sense of having been measured in the box; the point is rather that the conditions for it are now present, and that the safeguards calibrated to a different age have not caught up. If that is right, the consequence would not show itself in any single report, and a problem of that kind would call for attention while the practice is still forming, rather than after it has set.

The tool may commoditise the baseline while raising the premium on whatever sits above it, which is the proprietary knowledge that cannot be downloaded.

The second is the tribunal that must weigh the result. There is a contrary instinct, and a sound one: that a court or tribunal would generally welcome more agreement between experts rather than less, and might regard some loss of analytical range as a fair exchange for it. The instinct is right far more often than not. The wish for experts to converge is the same proper wish that lies behind the single joint expert, and a tribunal is usually well served when careful people, having reasoned independently, arrive at the same place. The difficulty is narrower, and it is not a difficulty with the instinct but with the present facts. Convergence earned by independent reasoning is the kind a tribunal is right to value. Convergence produced because both experts consulted the same model has the same appearance and a different cause, and the one safeguard a tribunal relies on, that the two opinions were arrived at independently, is the very thing the shared tool may have quietly removed. The agreement looks identical to the decision-maker. Only its provenance has changed, and provenance is precisely what a tribunal has no ready way to see.

The third is whether any of this is real. A colleague, reading this, wondered whether the whole concern might be a paper tiger. Experts will go on differing, the thought runs, and differ widely, for all the reasons already given: different facts, different instructions, the framework each must work within. That is true, and those differences will remain. But the persistence of difference tells us nothing about how much difference has quietly gone. We can see the gap that remains; we cannot see the wider gap that might have existed had both experts not leaned on the same tool, because the reports they would otherwise have written do not exist to compare. That a thing cannot yet be measured is not evidence that it is not there. Whether the narrowing is large enough to matter, or small enough to ignore, is a question the profession has not yet tried to answer, and could. Until it does, the honest position is that no one knows whether the tiger is paper or real, which is itself a reason to go and look.

The legend of Icarus, for those to whom it is unfamiliar, has Daedalus the craftsman build wings of feathers and wax for himself and his son, to escape the island of Crete. The wings work. Daedalus warns the boy to fly neither too low, where the sea will soak the feathers, nor too high, where the sun will melt the wax. Icarus, exhilarated, forgets the warning, climbs too far, and the wings fail him. The instructive part is not the fall. It is that the boundary he could not cross was invisible from the air.

The wings, on the whole, work, and the profession is plainly going to keep flying. But the boundary they are now working against is worth keeping in view; and an expert who finds, on reflection, that agreement with the opposite number came rather more readily than the facts deserve might at least wish to know whether it was the expert’s, or the machine’s.

The foregoing has been prepared for the purpose of public contribution to the discussion and debate about how artificial intelligence is likely to affect the profession of the damages expert. It does not constitute professional advice, is not intended to be relied upon as such, and is not to be taken as the view of any firm or institution with which the author is associated. The subject moves quickly; observations which are accurate at the time of writing may not remain so for long, and the author accepts no responsibility for such obsolescence as this piece may shortly acquire.

Endnotes

1 National Justice Compania Naviera SA v Prudential Assurance Co Ltd (The Ikarian Reefer) [1993] 2 Lloyd’s Rep 68.

2 F. Dell’Acqua, E. McFowland III, E. Mollick, H. Lifshitz, K. C. Kellogg, S. Rajendran, L. Krayer, F. Candelon, K. R. Lakhani, Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality, Organization Science (2026).

3 L. H. Zhang, S. Milli, K. Jusko and others, Cultivating Pluralism in Algorithmic Monoculture: The Community Alignment Dataset, arXiv:2507.09650 (2025).

4 J. Lou and Y. Sun, Anchoring Bias in Large Language Models: An Experimental Study, arXiv:2412.06593 (2024).

5 I. A. Qazi, A. Ali, A. U. Khawaja and others, Automation Bias in Large Language Model–Assisted Diagnostic Reasoning among Physicians Trained in AI Literacy — A Randomized Clinical Trial, NEJM AI 3(5) (2026).

6 R. Bommasani, K. A. Creel, A. Kumar, D. Jurafsky, P. Liang, Picking on the Same Person: Does Algorithmic Monoculture lead to Outcome Homogenization?, Advances in Neural Information Processing Systems 35 (NeurIPS 2022).

7 J. Kleinberg and M. Raghavan, Algorithmic monoculture and social welfare, Proceedings of the National Academy of Sciences 118(22) (2021), e2018340118.

8 Silicon Valley Arbitration and Mediation Center, Guidelines on the Use of Artificial Intelligence in Arbitration (30 April 2024); Chartered Institute of Arbitrators, Guideline on the Use of AI in Arbitration (March 2025, updated September 2025).

9 Civil Justice Council, Use of AI in Preparing Court Documents: Interim Report and Consultation (17 February 2026), consultation closed 14 April 2026  (interim report and consultation PDF).

There are more than 170 member organisations
of the IVSC, operating in 137 countries worldwide. Join them.

Become part of a global network working to enhance valuation standards and professionalism.

There are more than 200 member organisations
of the IVSC, operating in 137 countries worldwide. Join them.

Become part of a global network working to enhance valuation standards and professionalism.