Forensic Ballistics Reliability: 66% Error Rate Exposes Critical Flaws

Reading mode

For over a century, forensic ballistics has helped convict thousands of defendants. Prosecutors present expert testimony declaring that bullets or shell casings match a specific gun “to the exclusion of all other firearms.” Juries hear this as scientific certainty. But a growing body of research now questions forensic ballistics reliability at its core, revealing that the discipline rests on assumptions that have never been rigorously proven.

The Science That May Not Be Science

The premise of firearms identification seems intuitive: when a gun fires, the barrel leaves unique scratches on the bullet, and the firing mechanism leaves distinctive marks on the cartridge case. Match those marks, and you can link a bullet to a specific weapon. This logic has underpinned criminal prosecutions since the early 1900s.^[s]

The problem is that forensic ballistics reliability has never been established with the statistical rigor applied to DNA analysis. In 2009, the National Academy of Sciences published a landmark report concluding that “sufficient studies have not been done to understand the reliability and reproducibility of the methods.”^[s] Seven years later, the President’s Council of Advisors on Science and Technology reinforced this finding, stating that “the current evidence still falls short of the scientific criteria for foundational validity.”^[s]

When Examiners Get It Wrong

Patrick Pursley spent nearly 24 years in an Illinois prison for a murder he did not commit. The case against him relied primarily on a firearms examiner who testified that bullets from the crime scene matched a gun found in Pursley’s home “to the exclusion of all other firearms.”^[s]

Years later, when the evidence was finally run through the National Integrated Ballistic Information Network database, the system failed to find a match between the test-fired bullets and the crime scene evidence. Two independent experts then re-examined all the physical evidence and concluded that neither the bullets nor the cartridge cases came from Pursley’s gun.^[s] In January 2019, a judge acquitted him.

Pursley’s case illustrates a central concern about forensic ballistics reliability: the conclusions depend entirely on the subjective judgment of individual examiners, with no objective threshold for what constitutes a “match.”

The Numbers Behind the Doubt

When researchers study how well firearms examiners actually perform, the results vary dramatically depending on how you count the data. A major FBI-sponsored study found false positive rates below 1%, which sounds reassuring.^[s] But that number hides something important: examiners in the study called a large percentage of comparisons “inconclusive.”

When statisticians recalculated the data treating inconclusives as potential errors, the picture changed dramatically. For tests where examiners had to determine if bullets came from different guns, the potential error rate climbed to 66%.^[s] The actual error rate lies somewhere between these extremes, but without proper study design, no one can say where.

Courts Begin to Respond

Some courts have started limiting what firearms examiners can say. In 2023, Maryland’s highest court ruled that an examiner should not have been permitted to offer an “unqualified opinion” that crime scene bullets were fired from the defendant’s gun.^[s] The court found that while the methodology can support conclusions that markings are “consistent or inconsistent” with a particular firearm, it cannot reliably support categorical statements of identity.

This ruling reflects a broader judicial trend. Courts increasingly require examiners to acknowledge the limitations of their methodology rather than testifying to “practical certainty” or matching “to the exclusion of all other firearms.”^[s]

The Path Forward

Researchers at the National Institute of Standards and Technology are developing new approaches that could eventually put forensic ballistics reliability on firmer ground. Using high-resolution 3D scanning, they are building databases that capture the actual surface topography of bullets and cartridge cases. Unlike traditional 2D microscopy, this data is not affected by lighting conditions and can be analyzed by both humans and algorithms.^[s]

The goal is to develop statistical methods similar to those used in DNA analysis, where experts can state the likelihood of a match with quantifiable certainty. But researchers estimate it will take three to five more years before these methods are routinely accepted in court.^[s]

Until then, the questions about forensic ballistics reliability remain unresolved. People continue to be convicted based on testimony that may sound more certain than the science supports.

The Foundational Problem with Forensic Ballistics Reliability

Firearms identification rests on two assumptions: first, that the interior surfaces of gun barrels and firing mechanisms leave marks unique to each individual weapon; second, that trained examiners can reliably distinguish these individual characteristics from “class characteristics” shared by all guns of the same make and model. Neither assumption has been validated to the standards applied in other forensic disciplines.^[s]

The methodology dates to the early 1900s. Calvin Goddard helped establish the Bureau of Forensic Ballistics in New York City in 1925, and the FBI created a dedicated firearms identification unit in 1932.^[s] For decades, courts accepted this evidence with limited scientific scrutiny under the Frye standard, which required only “general acceptance” within the relevant community. The 1993 Daubert decision raised the bar by requiring scientific reliability, not just acceptance, but the fundamental questions about forensic ballistics reliability went largely unexamined until the 2009 National Academy of Sciences report.

The Statistical Debate Over Error Rates

Modern validation studies, called “black box” studies, attempt to measure how accurately examiners can link bullets and casings to specific firearms. A large FBI Laboratory study involved 173 qualified examiners performing 8,640 comparisons. The reported false positive rate was 0.656% for bullets and 0.933% for cartridge cases.^[s]

These numbers appear to support forensic ballistics reliability. But statisticians have identified critical methodological problems. When examiners cannot determine whether samples match, they can declare “inconclusive” rather than making a definitive call. In an earlier FBI-Ames Laboratory study reanalyzed by independent statisticians, examiners declared inconclusive on 51% of all bullet comparisons and 42% of cartridge case comparisons; among comparisons of different-source bullets, the inconclusive rate reached 65%.^[s]

The reported error rates treat all inconclusives as correct responses. Critics argue this is mathematically equivalent to letting students answer “I don’t know” on an exam and counting those as correct. When inconclusives are treated as potential errors, the false positive rate for different-source bullet comparisons rises from 0.70% to 66.1%.^[s] The true error rate lies somewhere between these bounds, but existing studies cannot determine where.

The Hawthorne Effect and Blinded Testing

A 2025 analysis of data from the Houston Forensic Science Center revealed another problem: examiners behave differently when they know they are being tested. The Houston lab had attempted to slip mock samples into routine casework without examiners knowing. When examiners identified a sample as a test, they were 43.5% more likely to call it inconclusive.^[s]

Cognitive neuroscientist Itiel Dror, who studies bias in forensic analysis, wrote that this finding “refutes the validity of the entire enterprise of the black-box studies as they have been conducted to date.”^[s] If examiners are more cautious during tests, the low error rates in validation studies may not reflect real-world performance.

Case Study: Patrick Pursley

The wrongful conviction of Patrick Pursley demonstrates how forensic ballistics reliability failures can destroy lives. In 1993, Pursley was convicted of murder in Rockford, Illinois. The prosecution had no eyewitness identification, no confession, no DNA, and no fingerprints linking him to the crime. The case rested primarily on a state firearms examiner who testified that bullets and cartridge casings from the crime scene matched a 9mm Taurus recovered from Pursley’s home “to the exclusion of all other firearms.”^[s]

Pursley spent years writing to innocence organizations, but there was no legal mechanism to request new ballistics testing. In 2007, after advocates helped pass an amendment to Illinois’s post-conviction forensic testing statute, his attorneys filed the first motion under the new law. The motion requested comparison of the evidence against the National Integrated Ballistic Information Network database.

After years of legal wrangling, the Illinois State Police finally entered images of the evidence into NIBIN in late 2011. The system failed to find a digital match between the test-fired samples and the crime scene evidence. Two independent forensic firearms experts, John Murdock and Chris Coleman, then re-examined all the physical evidence. Both concluded, independently, that neither the bullets nor the cartridge cases came from the gun found in Pursley’s home.^[s]

On January 16, 2019, a judge acquitted Pursley. He had served nearly 24 years for a murder he did not commit, convicted on testimony that was later disproved.

Systemic Failures: The Rhode Island Case

Even basic competence cannot be assumed. In a recent Rhode Island case, three trained forensic examiners declared a match between cartridge cases that had differences in class characteristics, the most fundamental and objective features of firearms identification. As forensic experts noted, this was “the forensic equivalent of declaring that two tires of entirely different sizes match the same vehicle.”^[s]

The error occurred because of confirmation bias: examiners focused on similarities while overlooking obvious differences. The problem was compounded by non-blind verification, where subsequent reviewers knew the first examiner’s conclusion before conducting their own analysis. One prominent firearms examiner testified that in over 50 years of practice, he had never seen a second examiner disagree with the first.^[s]

The Maryland Ruling and Its Implications

In June 2023, Maryland’s Supreme Court issued a significant ruling in Kobina Ebo Abruquah v. State of Maryland. The court held that a firearms examiner should not have been permitted to offer an “unqualified opinion” that crime scene bullets were fired from the defendant’s gun.^[s]

The court reviewed the scientific literature and found that while the methodology can support reliable conclusions that markings are “consistent or inconsistent” with those on bullets fired from a particular firearm, “those reports, studies, and testimony do not, however, demonstrate that the methodology used can reliably support an unqualified conclusion that such bullets were fired from a particular firearm.”^[s]

The ruling requires a new trial for Abruquah and establishes precedent limiting how firearms examiners can testify in Maryland courts. Similar limitations have been imposed in other jurisdictions following the PCAST report.^[s]

Toward Objective Methods

The National Institute of Standards and Technology is working to establish forensic ballistics reliability on firmer empirical ground. Mechanical engineer Xiaoyu Alan Zheng has led the development of the NIST Ballistics Toolmark Research Database, which uses high-resolution 3D microscopy to create virtual models of toolmarks on bullets and cartridge cases.^[s]

Unlike traditional 2D images, which vary depending on lighting conditions, 3D surface topography data is consistent and repeatable. This allows for algorithmic comparison and the development of statistical similarity metrics. The goal is to eventually provide likelihood ratios similar to those used in DNA analysis, giving courts quantifiable measures of uncertainty rather than subjective examiner opinions.^[s]

However, building a reference population large enough to support statistical statements will take years. Researchers estimate that 3D methods with full statistical backing are three to five years from routine courtroom acceptance.^[s]

The Unresolved Questions

Statistician Maria Cuellar, who examined 28 validation studies, found “methodological flaws that are so grave that they render the studies invalid.”^[s] Her assessment: “It’s not saying, ‘This is bad.’ It’s saying, ‘We don’t know how bad this is.'”

Alicia Carriquiry, a statistician at Iowa State University, agrees that current methods cannot establish true error rates but emphasizes that firearms examination is not inherently invalid. “When you use appropriate methods, like high resolution microscopy and the appropriate statistical methods and so on, you actually do get good results,” she told Undark. “The subjective approach to evaluation is the problem, not the absence of marks.”^[s]

Until the discipline develops objective, statistically validated methods, forensic ballistics reliability will remain an open question. Courts are beginning to acknowledge these limitations, but juries still hear expert testimony that often sounds more certain than the science supports. People continue to be convicted, and sometimes wrongfully imprisoned, on evidence whose foundations have never been proven.

The Forensic Analysis of Ballistics: Why Modern Technology Is Challenging Old Certainties

The Science That May Not Be Science

When Examiners Get It Wrong

The Numbers Behind the Doubt

Courts Begin to Respond

The Path Forward

The Foundational Problem with Forensic Ballistics Reliability

The Statistical Debate Over Error Rates

The Hawthorne Effect and Blinded Testing

Case Study: Patrick Pursley

Systemic Failures: The Rhode Island Case

The Maryland Ruling and Its Implications

Toward Objective Methods

The Unresolved Questions

Sources

The Science That May Not Be Science

When Examiners Get It Wrong

The Numbers Behind the Doubt

Courts Begin to Respond

The Path Forward

The Foundational Problem with Forensic Ballistics Reliability

The Statistical Debate Over Error Rates

The Hawthorne Effect and Blinded Testing

Case Study: Patrick Pursley

Systemic Failures: The Rhode Island Case

The Maryland Ruling and Its Implications

Toward Objective Methods

The Unresolved Questions

Sources

Related

Open Source AI vs Corporate AI: What Each Model Will and Won’t Tell You

The Forensic Pathology of Poisoning: Why Heavy Metals Remain the Silent Weapon of Choice

Forensic Handwriting Analysis in the Digital Age: Why Courts Are Abandoning Expert Testimony

Forensic Ballistics Evidence: 100 Years of Dangerous Assumptions