Authors: Yang Z, Li K, Ramandi SG, Brassard P, Khellaf A, Trinh VQ, Zhang J, Chen L, Rowsell C, Varma S, Plataniotis K, Hosseini MS
Computational pathology (CPath) leverages histopathology images to enhance diagnostic precision and reproducibility in clinical pathology. However, publicly available datasets for CPath that are annotated with extensive histological tissue type (HTT) taxonomies at a granular level remain scarce due to the significant expertise and high annotation costs required. Existing datasets, such as the Atlas of Digital Pathology (ADP), address this by offering diverse HTT annotations generalized to multiple organs, but limit the capability for in-depth studies on specific organ diseases. Building upon this foundation, we introduce ADPv2, a novel dataset focused on gastrointestinal histopathology. Our dataset comprises 20,004 image patches derived from healthy colon biopsy slides, annotated according to a hierarchical taxonomy of 32 distinct HTTs of 3 levels. Furthermore, we train a multilabel representation learning model following a two-stage training procedure on our ADPv2 dataset. By leveraging the VMamba model architecture, we achieve a mean average precision of 0.88 in multilabel colon HTT classification.. Finally, we show that our dataset is capable of an organ-specific in-depth study for potential biomarker discovery by analyzing the model's prediction behavior on tissues affected by different colon diseases, which reveals statistical patterns that confirm the two pathological pathways of colon cancer development. Our dataset is publicly available here: Part 1, Part 2, and Part 3.
Keywords: ADPv2 dataset; Biomarker discovery; Computational pathology; Deep learning; Multilabel representation learning;
PubMed: https://pubmed.ncbi.nlm.nih.gov/41658283/
DOI: 10.1016/j.jpi.2025.100537