Carnegie Mellon University
Browse

A proposed workflow for semi-automating citation searching and screening: An example from a rapid review on digital applications for child online safety

Download (1013.02 kB)
<p dir="ltr">Context</p><p dir="ltr">Citation searching (CS) is a recommended supplementary search step in most evidence synthesis reviews, particularly when the topic is difficult to search for. It usually occurs after screening has taken place and included studies have been identified. The references (backward CS) and citing articles (forward CS) of included studies are gathered and screened for additional relevant studies. Depending on the number of included studies, the added screening burden of CS can be substantial. Thus, semi-automated approaches can be useful to improve efficiency.</p><p dir="ltr">Objective(s)</p><p dir="ltr">In a rapid review of digital applications for improving children's online safety, we aimed to carry out backward and forward CS using a semi-automated workflow. We sought to optimize the transparency and reproducibility of this workflow through a thorough documentation process and the use of software based on FAIR principles (Findability, Accessibility, Interoperability, and Reuse) for data and information management. </p><p dir="ltr">Methods</p><p dir="ltr">We developed a semi-automated workflow using citationchaser, SR Accelerator's Deduplicator and Sysrev. Citationchaser is an open source R-based Shiny application which uses citation data from the free and open scholarly database Lens.org. Deduplicator is a free deduplication tool. Sysrev is a flexible web-based screening platform built on FAIR principles. We automated the CS process with citationchaser and used the Sysrev Auto Labeler to semi-automate the screening of these references, which makes use of a large language model and bespoke screening prompts to determine the potential eligibility of studies. Potential time savings were calculated based on conservative estimates in the literature of 30 seconds per article.</p><p dir="ltr">Results</p><p dir="ltr">A total of 923 unique records were identified from CS based on 51 included studies. We used random samples of relevant and irrelevant studies identified in the first phase of the project by two independent human screeners to design the Sysrev Auto Label prompt and assess its accuracy prior to running it on the CS references. We estimated time spent on prompt design plus single human screening of GPT-screened records to compare to likely time spent on double human screening of all CS references to estimate time savings. Time savings estimates for our workflow were estimated to be approximately 12 hours of human screening time saved. We developed a documentation template for the use of LLMs including such elements as model used, exact prompts, date of LLM run, records used, recall and precision measures, and rationale for changes to the prompt. </p><p dir="ltr">Conclusions</p><p dir="ltr">Using automation tools for citation searching and screening shows considerable potential for time savings. Performance was sufficient for a step that is often skipped due to the additional screening burden and thus is considered a low stakes application of this new technology. Sysrev provides a no-code platform for using and assessing large language models for screening. </p><p><br></p>

History

Date

2025-07-10

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC