Carnegie Mellon University
Browse
xtai_phd_stat_2019.pdf (23.33 MB)

Matching Problems in Forensics

Download (23.33 MB)
thesis
posted on 2019-10-18, 19:24 authored by Xiao Hui TaiXiao Hui Tai
Forensic evidence refers to DNA, fingerprints, bullets and cartridge cases, shoeprints and digital evidence left behind when a crime is committed. The underlying assumption is that the perpetrator of the crime, or tools they might have used, leave identi fiable characteristics on the evidence that can be traced back to the source. This is the basis of forensic matching, where pairs of evidence are compared, to infer if they came from the same source. For speci fic pairs of comparisons, such as whether a particular cartridge case comes from a suspect's gun, an inference of a match could have probative value and be used as testimony in courts.
With the exception of DNA evidence, this is done manually by trained examiners who make judgments based on their experience and training. Despite the widespread use of forensic evidence in courts, as well as the high stakes involved in criminal investigations, there has been a lack of scienti fic research to back up this claim of being able to reliably match evidence to source. Examiner error rates are unknown, and it is difficult to attach a quantitative value of the weight of evidence to a subjective opinion. Beginning in the 1990s, exonerations due to DNA evidence revealed problems in many forensic science disciplines, and examiners have been found to have overstated forensic results, leading at least in part to wrongful convictions. As a
result, there has been a push in recent years towards automatic methods for making comparisons. In this thesis, the goal is to develop such methods; in particular, to produce similarity scores for pairwise comparisons of evidence. Apart from addressing the issues raised, automatic methods can be used in the following other ways. They can generate 1) a ranking of similarities of pairs, which could be used to generate investigative leads, or for blind verifi cation; 2) a match or non-match conclusion; 3) a linked or disambiguated data set; 4) random match probabilities or likelihood ratios, as measures of the weight of evidence. Now, record linkage in statistics is the process of inferring which entries in different databases correspond to the
same real-world identity, in the absence of a unique identifier. Forensic matching can be thought of as an
application of record linkage; simply think of records as evidence and real-world entity as the source of the
sample. Steps commonly used in forensic matching can be demonstrated to correspond to steps typically used
in record linkage. By thinking about forensics problems in the context of record linkage, one immediately has
well-developed frameworks and tools at one's disposal. I describe a framework that can be used to develop automatic forensic matching methods in a systematic manner. This simpli fies the record linkage process and adapts it to a forensic context. I apply this to develop automatic methods for two forensic matching problems. The first is rearms identification, where cartridge cases are compared to infer if they were red from the same gun. I develop an open source, fully automatic method to compare 2D optical images and 3D topographies, and evaluate performance on over a dozen
publicly available data sets. The second problem is matching accounts on anonymous marketplaces. I use
marketplace data scraped over eight years, and generate a set of features that is costly for an adversary to mimic. Through these examples, I demonstrate how forensic matching problems can be tackled in general to achieve various objectives, in a more principled manner. I hope that this is a step in the direction of making forensic matching more scientific and rigorous.

History

Date

2019-05-28

Degree Type

  • Dissertation

Department

  • Statistics

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

William F. Eddy

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC