“AI as the Expert”: How Courts Are Grappling with Algorithms as Quasi-Expert Witnesses and What Rule 707 Would Change

Oct 16

Why this matters

A growing slice of modern litigation features machine-generated evidence offered not merely as a demonstrative aid but as the substantive answer to a disputed question—who was where, whose face is on a grainy video, whether a sound was a gunshot, or whether mixed DNA includes the defendant. In those moments, the algorithm functions as a de facto expert: it applies specialized methods to data and outputs an opinion the factfinder is asked to credit. The proposed Federal Rule of Evidence 707 (Machine-Generated Evidence) would pull those machine opinions squarely under Rule 702’s reliability gatekeeping whenever they are offered without a human expert.

This piece surveys where courts already treat AI like an expert (sometimes explicitly, sometimes by effect), the pitfalls that have surfaced, and how Rule 707 would channel these disputes into a more disciplined framework.

What “AI as an expert” looks like in practice

1) The algorithm stands in for the expert (or eclipses them)

Cell-site/OSINT triangulation via “Cybercheck.” In 2022, an Ohio jury convicted a defendant of murder after hearing from the developer of an AI tool called Cybercheck, who told jurors his system could “triangulate” a phone’s location with >90% accuracy based on public data—despite the absence of direct physical evidence placing the defendant at the scene. Subsequent reporting surfaced credibility and validation concerns; in other matters, courts excluded Cybercheck evidence or prosecutors withdrew it when source code and methods could not be meaningfully scrutinized. The case history illustrates a machine’s output functioning as the expert opinion for the prosecution.
Facial recognition outputs. Investigative reporting has documented arrests and prosecutions built primarily on face-matching software “hits,” which are often treated operationally as dispositive despite internal policies that warn such matches are “nonscientific.” Courts are increasingly asked to decide whether those outputs can be used as substantive evidence and on what foundation. In at least one 2025 matter, a judge excluded facial-recognition evidence citing reliability and transparency concerns.
Gunshot detection (“ShotSpotter”). Courts have confronted whether ShotSpotter detections qualify for expert-evidence treatment. Massachusetts’s SJC recently backed a Daubert/Lanigan-style reliability inquiry into the technology, signaling that automated classifications about gunfire are not self-proving. The Seventh Circuit has also flagged district-court errors in admitting ShotSpotter evidence without adequate Daubert analysis (though it deemed the error harmless in that case).

2) The hybrid: human expert + algorithmic core

Probabilistic genotyping (TrueAllele/STRmix). In many DNA cases, a human forensic analyst testifies, but the core inference—the probability a person is a contributor to a mixed sample—comes from proprietary software. Courts have admitted such evidence while wrestling with black-box concerns, defense access to source code, and Daubert factors like validation and error rates. The academic and practitioner literature reflects both support and sustained criticism of admitting opaque models even with a sponsoring expert.

3) When the “expert” fails basic rigor

AI expert testimony rejected for hallucinated citations. Multiple 2025 reports describe experts in AI matters whose submissions were struck after they relied on fabricated sources generated by AI. These episodes underscore that even when a human is nominally “the expert,” courts are increasingly forced to police what the AI actually did and whether the human can reliably explain it.

Bottom line: Across these domains, the machine is effectively performing the Rule 702 work: applying specialized principles and methods to data to generate an opinion for the trier of fact. The reliability questions are the same (data quality, method validity, error rates, reproducibility, and proper application) regardless of whether a person or program produced the opinion.

Where Rule 707 fits: closing the “no-expert” loophole

Rule 707 (draft) would require courts to apply Rule 702’s reliability test to machine-generated outputs when they are offered without a human expert, while carving out basic scientific instruments (e.g., calibrated thermometers) from its scope. The Advisory Committee advanced the draft for public comment in May 2025 (vote 8–1), stressing an exploratory posture given fast-moving technology.

Why it matters: As the Cybercheck example shows, parties sometimes present machine outputs directly, inviting the jury to credit the algorithm’s conclusion without a qualified witness who can satisfy Rule 702. The draft would restore parity: a machine opinion must meet the same reliability criteria that would apply if a human expert offered it.
The DOJ’s dissent: The Department of Justice voted against publication, arguing that Rule 702 already covers these issues when a human expert is involved and warning about burden, cost, and trade-secret friction. Even if you share that skepticism, Rule 707 directly targets the machine-only pathway that current rules do not expressly police.

Practical implications: treating AI “like an expert” under Rule 702 (with or without Rule 707)

What proponents must be prepared to show

Data & provenance. How were training and case-specific inputs gathered, cleaned, and validated? Are they representative and free from known biases?
Method validity & error rates. Benchmarks, cross-validation, false-positive/false-negative rates, and performance under noisy, real-world conditions.
Reproducibility & auditability. Versioning, change logs, and the ability for an opponent or neutral expert to re-run the model or probe its sensitivity to assumptions (the Cybercheck litigation turned on these deficits).
Explainability. Even if the model is complex, what interpretable outputs (confidence scores, feature importances) help a court assess whether the method was reliably applied to case facts?
Fit & helpfulness. Does the output actually answer a fact in issue, or does it risk automation bias, such as overpersuading the jury with an impressive but non-probative visual/score? (Think Rule 403 in close cases.)

Discovery and protective-order realities

Expect fights over source code, weights, and training data. Some courts have allowed admission without full code access; others have excluded when reproducibility or auditability was impossible. Protective orders, neutral-expert testing, and staged disclosures are becoming standard tools to balance adversarial testing against trade-secret concerns.

Strategy for proponents

If you intend to offer a machine output without a sponsoring expert, build a 707-ready record now: validation studies, error rates, and a method-fit narrative mapped to Rule 702(a)–(d).
Even better: consider “wrapping a human around the machine.” A qualified expert who can explain data, methods, and limitations reduces 707 exposure and aids intelligibility—though courts will still probe the algorithmic core.

Strategy for opponents

Demand validation artifacts and audit trails; ask how many false positives in conditions mirroring the case (e.g., the actual camera quality).
Seek neutral testing or limited code access under protective orders; highlight mismatches between the model’s training domain and the case facts.
Use Rule 403 where a dazzling interface masks thin probative value or high confusion risk.

Case study snapshots: how courts are treating AI today

Cybercheck (Ohio/Colorado) — Algorithm as the expert.
- Court excluded Cybercheck evidence when the defense could not obtain the underlying code; prosecutors withdrew such evidence in other cases after reliability questions surfaced; at least one conviction heavily depended on the developer’s testimony about the AI’s accuracy. These matters epitomize “AI as expert witness.” Business Insider
Facial recognition (multiple jurisdictions) — From leads to linchpin.
- Despite policy warnings that face matches are “nonscientific,” arrests and prosecutions have proceeded primarily on facial-recognition outputs, with documented wrongful arrests and growing judicial skepticism. One 2025 decision excluded such evidence outright for reliability and transparency failures. The Washington Post+1
ShotSpotter (MA/7th Cir.) — Treating detection like opinion.
- Massachusetts’s high court has supported Daubert-style inquiry into ShotSpotter; the Seventh Circuit criticized admission without a proper reliability analysis (though found harmless error). Together, they endorse treating the detection as expert-grade evidence requiring Rule 702 rigor. BostonGlobe.com+1
Probabilistic genotyping (NY and elsewhere) — Human sponsor, algorithmic core.
- Courts have admitted TrueAllele and STRmix results with human experts while fielding persistent black-box and access objections. The jurisprudence demonstrates that even with a human sponsor, the machine’s methodology is legally consequential. New York Courts+1

Authentication vs. reliability vs. hearsay: keeping the lanes clear

Authentication (Rules 901/902) answers what it is (e.g., “this is the output the system produced”), not whether the underlying method is reliable. A self-authenticated digital record under Rule 902(13) can still be excluded on reliability grounds.
Reliability (Rule 702 / proposed 707) asks whether the method is valid, applied properly, and helpful. That is where AI in litigation lives or dies.
Hearsay. Fully automated outputs typically lack a human declarant, so many courts treat them as non-hearsay, but that does not bypass the reliability gate. (The hard work is still under 702/707.)

Deepfakes and “liar’s dividend”: special authentication problems

Even genuine media can be questioned in a deepfake era; conversely, fabricated media can look persuasive. Committees have floated a heightened-authentication pathway when a party makes a plausible showing of AI fabrication—placing extra weight on metadata, chain of custody, and forensic analysis before reliability is even reached. Expect more courts to formalize such procedures while Rule 707 moves through comment.

Key takeaways for practitioners and clients

Assume Rule 702 rigor for any machine opinion. Whether you have a human expert or not, build a validation dossier (data, benchmarks, error rates, versioning, reproducibility). That’s the evidentiary price of admission for machine-generated evidence.
Plan for discovery friction. Protective orders, neutral-expert testing, and staged disclosures are the norm where trade secrets meet due-process rights. Cybercheck’s trajectory shows what happens when auditability collapses: exclusion or withdrawal.
Beware automation bias. Juries (and sometimes litigants) over-trust confident scores and slick visuals. Fit the method to the specific facts and be prepared for Rule 403 fights.
Wrap a human around the machine—or be 707-ready. If you go machine-only, you’ll need to satisfy Rule 702 standards under Rule 707. If you sponsor with a human expert, ensure they can do more than repeat outputs; they must explain and defend the method.
Facial recognition and gunshot detection are not “self-proving.” Expect Daubert hearings and, in some courts, skepticism without rigorous validation.

Looking ahead: what Rule 707 would likely change

Uniform gatekeeping for machine-only offerings. Today’s results are uneven: some courts admit algorithmic results with little more than chain-of-custody; others demand full Daubert records. Rule 707 would reduce that variance by requiring Rule 702 findings for machine-only outputs.
Earlier, deeper reliability litigation. Expect pretrial mini-Dauberts focused on algorithms (validation, real-world error, and testability) plus protective orders for code/data.
Vendor behavior will adapt. Tools intended for court use will need audit logs, version control, built-in explainability, and “litigation disclosure modes” to permit neutral testing without IP surrender. (Vendors that cannot support that will see their outputs excluded more often.)
No safe harbor for black boxes. Whether 707 passes or not, courts are moving toward explainable, testable AI. This is especially true where the machine’s output replaces traditional expert analysis.

A note on state momentum

While no state has yet cloned FRE 707, state high courts and trial courts are pressing Daubert-style reliability reviews for AI in litigation and tightening administrative policies on judicial use of AI. In parallel, several states have enacted deepfake laws that, while not evidentiary rules, reflect a tightening posture toward synthetic media in legal settings. The direction of travel is clear: increased scrutiny and structure.

Conclusion: Treat the machine like the expert it “is”

The legal system’s challenge is not deciding whether AI can ever help. It already does. The challenge is ensuring that when an algorithm’s output is offered as the answer—when the machine steps into the expert role—courts and counsel demand what Rule 702 has always required: reliable methods, sufficient data, proper application, and helpfulness to the trier of fact. The proposed Federal Rule of Evidence 707 would hard-wire that expectation for machine-only proffers. Practitioners should operate as if that expectation is already here.

If you’re preparing or opposing machine-generated evidence, start with a Rule 702 checklist and build forward: validation, error rates, reproducibility, fit, and clear documentation. That is how to make AI in litigation work for truth-seeking, and how to keep deepfake evidence and other unreliable machine opinions out.

Brian Walters