From Supercomputer to Static Site: Boiling Down Big Research Data for Preservation and Usability

Lincoln, Matthew

doi:10.1184/R1/18280082.v1

mdl_pp_c4l2020.pptx (18.2 MB)

From Supercomputer to Static Site: Boiling Down Big Research Data for Preservation and Usability

presentation

posted on 2022-01-12, 15:16 authored by Matthew LincolnMatthew Lincoln

"Print & Probability" is an interdisciplinary and inter-institutional project to develop new techniques for visual anomaly detection in the OCR of early printed books. By detecting damaged letterforms that create consistent aberrations, the project aims to allow direct inference of letterpress printers at scale.

This presentation will detail the unique data management issues that the resulting 13 billion+ character images present, and how CMU Libraries is strategizing to publish extracts of these data that are both sustainable and usable. The team’s research software engineer will outline the design and technologies behind their management pipeline: a REST API interface to a database managed at the Pittsburgh Supercomputing Center to store and filter image data and metadata from the automated extraction pipeline; and a Vue JS-based web interface to assess results and provide new annotations for model training. Finally, this talk will present plans to distill this massive research database into a data deposit of interest to computer scientists and digital humanities researchers, as well as a sustainable static site that presents a human-and-machine-curated collection of distinctive early type usable by historians and librarians of rare books.

Presented at code4lib 2020, Pittsburgh, PA.

History

Date

2020-03-09

Usage metrics

Keywords

digital humanities research data managment digital publishing digital preservation

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

From Supercomputer to Static Site: Boiling Down Big Research Data for Preservation and Usability

History

Date

Usage metrics

Categories

Keywords

Licence

Exports