IARC 60th Anniversary - 19-21 May 2026
Session : 19/05/26 - Posters
Quality Assessment and Validation of Population-Based Cancer Registry Data from Sub-Saharan Africa
HLOPHE N. 1, MALULU-CHIWELE L. 1, LIU B. 1, OKO-OBOH A. 1, JEDY-AGBA E. 1, PARKIN M. 1, KANTELHARDT E. 1, DIKARLO P. 1, SANTOS P. 1, ARIYOUH AMIDOU S. 1, CODJO BRUN L. 1, TEONESA D. 1, NSOUHO R. 1, GNAHATIN F. 1, ASSEFA M. 1, ANGELA F. 1, KAMATE B. 1, IGBINOBA F. 1, EKANEM I. 1, OGUNBIYI F. 1, EZEOME E. 1, SOMDYALA N. 1, ZERD F. 1, THEOPHIL MMBAGA B. 1, AFYUSISYE F. 1, BUKIRWA P. 1, CHOKUNONGA E. 1
1 Martin-Luther University Halle-Wittenberg, Halle (Saale), Germany
Background: Population-based cancer registries (PBCRs) are essential for cancer surveillance in Sub-Saharan Africa, providing critical data for incidence estimates, survival analysis, public health planning, and epidemiological research. High-quality data are therefore crucial for all data-driven public health actions and policy decisions. However, to our knowledge, systematic cross-validation of PBCR data against original clinical records has not been conducted in this region.
Objectives: To assess the quality of PBCR data (collected by registrars) through validation against data collected from clinical files (collected by researchers and trained registrars approximately 3 years later), using data from three population-based cancer registry studies: NORA Care, SurvCan, and SurvCan Plus.
Methods: A total of 12,415 patient records were analysed. Four key variables were validated between PBCR records and clinical files: age at diagnosis, date of diagnosis, sex, and ICD-10 code. Agreement was categorized as match, minor disagreement, major disagreement, or missing. Criteria included: for date of diagnosis (match: ≤90 days; minor: 90-365 days; major: >365 days), age at diagnosis (match: <2 years; minor: 2-5 years; major: >5 years), sex (match: equal; major: any difference), and ICD-10 (match: equal; minor: anatomically adjacent sites; major: any other difference). A scoring system assigned 3 points for matches, 2 for minor disagreements, 1 for major disagreements, and 0 for missing data. Overall quality per record was classified as high (≥10 points), medium (6-9 points), low (1-5 points), or too many missings (≥3 missing variables).
Results: Missing data ranged from 17.8% (sex) to 22.2% (date of diagnosis). For records where data were available, variable-specific concordance rates (proportion of matches) were: sex 98.4%, ICD-10 95.8%, date of diagnosis 90.0%, and age at diagnosis 82.6%. For date of diagnosis, 96.6% of records showed agreement within ±1 year between registry and clinical sources. Age at diagnosis exhibited the highest discrepancies, with 12.6% major disagreements and a systematic tendency for clinical file ages to be older than registry-recorded ages. For ICD-10 coding, minor disagreements (2.8% overall) were primarily attributable to anatomically adjacent site classification challenges, particularly lung versus trachea documentation and colon versus rectum cases. Major ICD-10 disagreements (1.36%) revealed more substantial coding discrepancies, with the most notable being non-Hodgkin lymphoma documented in registry data versus Hodgkin lymphoma in corresponding clinical files, highlighting potential challenges in lymphoma subtype classification. Overall quality assessment revealed 87.2% high-quality records, 12.4% medium-quality, and 0.4% low-quality records. For 17.5% of records, quality could not be evaluated because of missing data. Quality distribution varied slightly by study: NORA Care - 84.4%, SurvCan 92.8%, and SurvCan Plus 89.3% high quality records.
Conclusion: Based on the present analysis, Sub-Saharan African PBCRs demonstrate good data quality, with 87.2% of all records achieving high concordance with clinical sources. Identified patterns of disagreement—particularly age documentation biases and disease classification challenges—provide specific targets for quality improvement. These findings support the reliability of PBCR data for cancer surveillance and epidemiological research in the region, while highlighting persistent challenges with missing files when aiming to retrieve those after a 2-8 year period (totally 17.5%).