Authors: Salari A, Kiar G, Lewis L, Evans AC, Glatard T
File-based localization of numerical perturbations in data analysis pipelines.
Gigascience. 2020 Dec 02; 9(12):
Authors: Salari A, Kiar G, Lewis L, Evans AC, Glatard T
Abstract
BACKGROUND: Data analysis pipelines are known to be affected by computational conditions, presumably owing to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear.
METHOD: We present Spot, a tool to identify which processes in a pipeline create numerical differences when executed in different computational conditions. Spot leverages system-call interception through ReproZip to reconstruct and compare provenance graphs without pipeline instrumentation.
RESULTS: By applying Spot to the structural pre-processing pipelines of the Human Connectome Project, we found that linear and non-linear registration are the cause of most numerical instabilities in these pipelines, which confirms previous findings.
PMID: 33269388 [PubMed - in process]
Keywords: Neuroimaging; Operating Systems; Pipelines; Reproducibility;
PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33269388
DOI: 10.1093/gigascience/giaa106